Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New module: Kraken2/Bracken on Unaligned Sequences for Contamination Detection #1351

Closed
wants to merge 242 commits into from

Conversation

egreenberg7
Copy link
Contributor

@egreenberg7 egreenberg7 commented Aug 7, 2024

Closes #271. This contribution adds Kraken2/Bracken as an optional quality control step to the rnaseq pipeline for the HISAT2 and STAR/Salmon aligners. Contamination is a widely reported issue in rna-sequencing data, and the use of metagenomics tools can be used to address this. Kraken2 is particularly strong at detecting low levels of pathogens, which makes it appropriate for this task. This PR adds the option of providing a Kraken2 database to perform taxonomic classifications on unaligned reads.

Note: If the --bracken-precision parameter is set to something other than 'S', the current MultiQC version does not work properly. In future versions of MultiQC, this will not be an issue (see this MultiQC bug fix).

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/rnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated. [I do not think this is needed for my PR.]
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link
Member

@MatthiasZepper MatthiasZepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (Let's get this merged!), provided that you rebase before to the latest dev (or merge in those changes).

@maxulysse ideally should help with the subway map, though.

@egreenberg7
Copy link
Contributor Author

egreenberg7 commented Sep 19, 2024

The subway map has been updated, though not the animated one. I had an issue with the merge request, so I unintentionally now have all the 3.15 commits showing up here. Unsure if we can do something about that here or if I should open a new, clean PR from my fork. I'm guessing opening a new, clean PR may be easiest. Otherwise, the linting failure seems to be due to the HISAT2 patch that allows for using contaminant screening to have the unaligned reads saved for further processing downstream.

@Shaun-Regenbaum
Copy link
Contributor

Congrats @egreenberg7 :) Truly awesome work!

@egreenberg7
Copy link
Contributor Author

New PR will be opened to show more clear version changes due to the merge issues. I'll also copy some of the comments over for the record

@maxulysse
Copy link
Member

No need to copy over the comments, just mention this PR, that'll be enough

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants