Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General cleanup #447

Merged
merged 9 commits into from
Sep 30, 2024
Merged

General cleanup #447

merged 9 commits into from
Sep 30, 2024

Conversation

nschcolnicov
Copy link
Contributor

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/smrnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Sep 26, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 8e54dbc

+| ✅ 231 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗   3 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

❔ Tests ignored:

  • nextflow_config - Config default ignored: params.fastp_known_mirna_adapters

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-09-30 14:48:31

@nschcolnicov nschcolnicov marked this pull request as ready for review September 27, 2024 20:55
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was removed in latest dev

Comment on lines 148 to 149
ch_three_prime_adapter = Channel.value(params.three_prime_adapter)
ch_phred_offset = Channel.value(params.phred_offset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be in prepare_genome or pipeline initialisation

bowtie_index = ch_bowtie_index // channel: [genome.1.ebwt, genome.2.ebwt, genome.3.ebwt, genome.4.ebwt, genome.rev.1.ebwt, genome.rev.2.ebwt]
bowtie_index = ch_bowtie_index // channel: [ val(meta), path(index) ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user inputs the index from a .tar.gz we need to make sure it has this structure. I think currently it has the previous structure (genome.ebwt, etc).

Copy link
Contributor Author

@nschcolnicov nschcolnicov Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was updated in a previous PR, it now has a meta , but I had forgotten to update the channel comment.
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. We can simply have the channel to be [id:"bowtie_index", [workspace/etc/bowtie_index]]. It does not have to have the path to the ebwt files as the modules can handle the whole directory (and mirdeep cannot handle it without the directory).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the module and file name table_merge and datatable_merge can have the same name for consistency. Also have a main.nf file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same applies to edger_qc. Should be main.nf.

modules.json Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pigz_uncompress is no longer used and can be removed

modules.json Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

untarfiles shall be replaced by untar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not straightforward, lets do it on a separate PR: https://github.com/orgs/nf-core/projects/74/views/7?pane=issue&itemId=81583371

@@ -53,9 +53,9 @@ workflow NFCORE_SMRNASEQ {
ch_reference_hairpin // channel: [ val(meta), path(fasta) ]
ch_mirna_gtf // channel: [ path(GTF) ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful for ch_mirna_gtf to have a meta as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need it, but will add it for consistency

@atrigila
Copy link
Contributor

Is this PR addressing #400 or a subset of #400?

@nschcolnicov
Copy link
Contributor Author

Is this PR addressing #400 or a subset of #400?

I'm not sure I understand this question, can you elaborate?

@atrigila
Copy link
Contributor

atrigila commented Sep 30, 2024

Are we addressing the whole scope of #400 or some sub-tasks here (in this PR)?

  • Address the TODOs left in the pipeline.
  • Check variables, add "ch_" to channel and "val_" to value variables
  • Make sure that variable names remain consistent across workflows.
  • Assess which emitted files from subworkflows are actually being used.
  • Replace any dsl1 or groovy funcitons which can be replaced with nextflow DSL2 functions.

@nschcolnicov
Copy link
Contributor Author

Are we addressing the whole scope of #400 or some sub-tasks here (in this PR)?

  • Address the TODOs left in the pipeline.
  • Check variables, add "ch_" to channel and "val_" to value variables
  • Make sure that variable names remain consistent across workflows.
  • Assess which emitted files from subworkflows are actually being used.
  • Replace any dsl1 or groovy funcitons which can be replaced with nextflow DSL2 functions.
  • Reviewing parameters modules and subworkflows to check there are no unused items.

Ah, I see, yes we adressing all of the items in the issue, we could also add, I will also add, reviewing parameters modules and subworkflows to check there are no unused items.

Copy link
Contributor

@atrigila atrigila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also run nf-core lint to check TODOs left in the pipeline and found a few issues:

╭───────────────────────╮
│ LINT RESULTS SUMMARY  │
├───────────────────────┤
│ [✔] 460 Tests Passed  │
│ [?]   1 Test Ignored  │
│ [!]  26 Test Warnings │
│ [✗]   1 Test Failed   │
╰───────────────────────╯

@@ -7,7 +7,7 @@ include { BOWTIE_ALIGN as BOWTIE_MAP_GENOME } from '../../modules/nf-core/bowtie

workflow GENOME_QUANT {
take:
ch_bowtie_index // channel: [genome.1.ebwt, genome.2.ebwt, genome.3.ebwt, genome.4.ebwt, genome.rev.1.ebwt, genome.rev.2.ebwt]
ch_bowtie_index // channel: [ val(meta), [ path(genome.1.ebwt), path(genome.2.ebwt), path(genome.3.ebwt), path(genome.4.ebwt), path(genome.rev.1.ebwt), path(genome.rev.2.ebwt) ] ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I wasn't clear in my previous comment, but for this to work in mirdeep2 ch_bowtie_index should be of structure:

channel: [ val(meta), [ path(one_directory)] and not
channel: [ val(meta), [ path(genome.1.ebwt), path(genome.2.ebwt), path(genome.3.ebwt), path(genome.4.ebwt), path(genome.rev.1.ebwt), path(genome.rev.2.ebwt) ] ].

Please write me if we need to review this together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that is a bit more tricky to update, I created a new ticket to address it: #451

@@ -55,7 +55,7 @@ workflow PREPARE_GENOME {
ch_bowtie_index = UNTAR_BOWTIE_INDEX.out.files
ch_versions = ch_versions.mix(UNTAR_BOWTIE_INDEX.out.versions)
} else {
ch_bowtie_index = Channel.fromPath("${val_bowtie_index}**ebwt", checkIfExists: true).map{it -> [ [id:it.baseName], it ] }.collect()
ch_bowtie_index = Channel.fromPath("${val_bowtie_index}**ebwt", checkIfExists: true).map{it -> [ [id:'bowtie_index'], it ] }.collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mirdeep2 is a bit tricky in how the index should be named:

-p ${index}/${meta2.id} \\
.

So if the path to the index ch_bowtie_index is something like:
/work/bowtie_index
and the files inside that directory are:
/work/bowtie_index/genome.1.ebwt
/work/bowtie_index/genome.2.ebwt
/work/bowtie_index/genome.3.ebwt

Then the meta should be: genome.

This renaming is done automatically when indexing the fasta in the else statement:

            // Index FASTA with nf-core Bowtie1
            INDEX_GENOME ( CLEAN_FASTA.out.output )
            ch_versions      = ch_versions.mix(INDEX_GENOME.out.versions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to be addressed in #451

}

emit:
fasta = ch_fasta // channel: [ val(meta), path(fasta) ]
has_fasta = bool_has_fasta // boolean
bowtie_index = ch_bowtie_index // channel: [genome.1.ebwt, genome.2.ebwt, genome.3.ebwt, genome.4.ebwt, genome.rev.1.ebwt, genome.rev.2.ebwt]
bowtie_index = ch_bowtie_index // channel: [ val(meta), [ path(genome.1.ebwt), path(genome.2.ebwt), path(genome.3.ebwt), path(genome.4.ebwt), path(genome.rev.1.ebwt), path(genome.rev.2.ebwt) ] ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment, due to how mirdeep2 is structured this should have the structure: channel: [ val(meta), [ path(directory_index)] ]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ch_bowtie_index // channel: [ val(meta), index ]
ch_bowtie_index // channel: [ val(meta), [ path(genome.1.ebwt), path(genome.2.ebwt), path(genome.3.ebwt), path(genome.4.ebwt), path(genome.rev.1.ebwt), path(genome.rev.2.ebwt) ] ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be changed in a nf-core subworkflow as it will break linting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, lets take care of it in #451

@apeltzer apeltzer merged commit f1768c5 into nf-core:dev Sep 30, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants