-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possible change in uchime*_denovo results between v2.22 and v2.29 #591
Comments
By closer inspection, it seems like I introduced the bug already in commit c5f1645 on 13 Feb 2023. The code diverged into separate branches at this point, so the dates are not always informative. Commit cb43bc7 from 9 February and earlier seems okay. It appears to be related to the experimental chimera detection in long sequences that was introduced at this point. It should have been independent of the other chimera detection algorithms, but unfortunately it seems to have disturbed them. |
There seems to be two reasons why results from version 2.22.1 and earlier are different from later versions. The first reason is that the size of the smoothing window was increased from 32 to 64 nucleotides, starting in commit 22ffa0e. This was an unintended change. It didn't actually affect the test example that @colinbrislawn provided, but have changed the results in other cases. The second reason is that chimera parent candidates that had no "winning" positions at all in the smoothed window may have been evaluated in version 2.22.1 and earlier. This was an error that was actually corrected in version 2.23.1 and later. Here is an alignment of the specific case that showed different behavior in the different versions (using the
Here, sequence Q is the query sequence, B is the best scoring candidate, and A is another candidate. There is only a single position where A is more similar to Q than B: position 89. After smoothing out matches within a window of 32 (or 64) nucleotides, A has no winning positions at all. A therefore shouldn't really have been considered. In most cases I think sequences like sequence A would have obtained a very low score and would have been eliminated anyway. Here the score is 0.2232 which is below the default limit of 0.28 in I think the right thing to do is to adjust the window size back to 32 as it originally was, but otherwise keep it as it is now. I'll do some testing to see the effect of the changes. Preliminary tests with |
Thank you for your prompt investigation, Torbjørn! Related, can you recommend a toy alignment we should us for testing in Qiime2? In that other thread I was looking for a child and two parents that produced different results for |
Hi @colinbrislawn, Here is a set of toy sequences that should give different results with
I started with sequence P of 160 bp. It was then duplicated into Q, A, and B. In A I introduced a few substitutions in the second half of the sequence, while in B I introduced a few substitutions in the first half of the sequence. Q is identical to P except for 1 substitution in each half of the sequence, not in the same positions as the substitutions in A or B. P and Q have an abundance of 1, while A and B have an abundance of 5. P and Q could be chimeras with A and B as parents. Using |
Here are the results of Command:
Output:
Results in
Results in
|
Here are the results of Command:
Output:
Results in
Results in
|
Here are the results of Command:
Output:
Results in
The file |
I have tested the Here are the results:
As can be seen from the table, the bug seems to increase the number of chimera prediction by a very small amount (especially for window size 32). These may be false positives, but they have passed the score and divergence tests. The long window size had a larger effect and has reduced the number of positive predictions somewhat. It might have increased the number of false negatives. The real amount of chimeras in this dataset is unknown, as far as I know. |
I also tested Here are the numbers for
And here are the numbers for
In both cases, it seemed like the bug had a larger effect on the number of predicted chimeras than for |
Version 2.29.4 has been released with a fix for the window size. |
Tests developed by @colinbrislawn (see qiime2/q2-vsearch#100) show different chimera detection results when using the
--uchime*_denovo
commands, with vsearch v2.22.1.With vsearch 2.29.3, the results are different. This should be investigated.
The text was updated successfully, but these errors were encountered: