MIRTOP_STATS IndexError #477

anastasiaprime · 2024-10-15T09:39:29Z

Description of the bug

Hello!

I'm trying to process my small rnaseq data using only R1 reads and always get the same error. What could it be? nextflow.log

I use smrnaseq v2.4.0, nextflow version 24.04.4
I tried dev version, but had the same error.
Command line: nextflow run nf-core/smrnaseq -profile docker --input samplesheet_1.csv --outdir Results_R1_test --fasta /mnt/cephfs8_rw/oncology/refseqs/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa --mirgenedb true --mirgenedb_species Hsa --mirgenedb_mature /mnt/cephfs8_rw/oncology/refseqs/Homo_sapiens/miRNA/hsa.fas --mirgenedb_hairpin /mnt/cephfs8_rw/oncology/refseqs/Homo_sapiens/miRNA/hsa-pre.fas --mirgenedb_gff /mnt/cephfs8_rw/oncology/refseqs/Homo_sapiens/miRNA/hsa.gff --mirtrace_species hsa -c config
Config only for resources (max_cpus, max_memory)
Error:
`ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS (770902000404_S5)'

Caused by:
Process NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS (770902000404_S5) terminated with an error exit status (1)

Command executed:

mirtop
stats

--out stats
770902000404_S5_mirtop.gff

cat <<-END_VERSIONS > versions.yml
"NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS":
mirtop: $(echo $(mirtop --version 2>&1) | sed 's/^.*mirtop //')
END_VERSIONS

Command exit status:
1

Command output:
['stats', '--out', 'stats', '770902000404_S5_mirtop.gff']
Command error:
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
/opt/conda/lib/python3.12/site-packages/Bio/pairwise2.py:278: BiopythonDeprecationWarning: Bio.pairwise2 has been deprecated, and we intend to remove it in a
future release of Biopython. As an alternative, please consider using Bio.Align.PairwiseAligner as a replacement, and contact the Biopython developers if you
still need the Bio.pairwise2 module.
warnings.warn(
10/15/2024 09:29:02 INFO Run stats.
10/15/2024 09:29:02 INFO Reading: 770902000404_S5_mirtop.gff
Traceback (most recent call last):
File "/opt/conda/bin/mirtop", line 10, in
sys.exit(main())
^^^^^^
File "/opt/conda/lib/python3.12/site-packages/mirtop/command_line.py", line 34, in main
['stats', '--out', 'stats', '770902000404_S5_mirtop.gff']
stats(kwargs["args"])
File "/opt/conda/lib/python3.12/site-packages/mirtop/gff/stats.py", line 38, in stats
out.append(_calc_stats(fn))
^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/mirtop/gff/stats.py", line 82, in _calc_stats
df = _summary(lines)
^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/mirtop/gff/stats.py", line 130, in _summary
df_sum = _add_missing(df_sum)
^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/mirtop/gff/stats.py", line 110, in _add_missing
df2 = pd.DataFrame({'category': category, 'sample': df['sample'].iat[0], 'counts': 0}, index=[0])
~~~~~~~~~~~~~~~~^^^
File "/opt/conda/lib/python3.12/site-packages/pandas/core/indexing.py", line 2527, in getitem
return self.obj._get_value(*key, takeable=self._takeable)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/pandas/core/series.py", line 1234, in _get_value
return self._values[label]
~~~~~~~~~~~~^^^^^^^
IndexError: index 0 is out of bounds for axis 0 with size 0
Work dir:
/mnt/cephfs8_rw/oncology/miRNA/220118_VH00195_67_AAAHV32M5_fastq4/work/39/2e514730d2d4ad416afc2023156668

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details
ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting

-- Check '.nextflow.log' file for details

`

Command used and terminal output

No response

Relevant files

No response

System information

No response

The text was updated successfully, but these errors were encountered:

atrigila · 2024-10-15T12:52:12Z

Hi! Thank you for reporting this bug. I think I know where the issue is and I am working on a solution. Just to confirm, could you please run the same command but without the --mirtrace_species hsa flag? Thank you!

anastasiaprime · 2024-10-16T10:33:32Z

Hi @atrigila ! I run the command as you asked and the pipeline completed successfully, but Mirtrace and Mirtop didn't run

atrigila · 2024-10-16T15:43:12Z

The mirtop step requires both the mirtrace_species and its corresponding mirtrace gff file. If mirtrace_species is provided but the mirtrace gff is not supplied, the pipeline still attempts to run mirtop, which results in an error. This occurs because mirtop expects input data that depends on the presence of a valid GFF file for the specified mirtrace_species. Without this, the tool cannot properly process the data and fails.

mirtop was also not available for runs with using mirgenedb even in previous versions of the pipeline v2.3.1, which required --mirtrace_species to be present, for example:

smrnaseq/subworkflows/local/mirna_quant.nf

Lines 98 to 105 in 5901bea

    
           if (params.mirtrace_species){ 
        
               MIRTOP_QUANT ( BOWTIE_MAP_SEQCLUSTER.out.bam.collect{it[1]}, FORMAT_HAIRPIN.out.formatted_fasta.collect{it[1]}, gtf ) 
        
               ch_mirtop_logs = MIRTOP_QUANT.out.logs 
        
               ch_versions = ch_versions.mix(MIRTOP_QUANT.out.versions) 
        
               TABLE_MERGE ( MIRTOP_QUANT.out.mirtop_table ) 
        
               ch_versions = ch_versions.mix(TABLE_MERGE.out.versions) 
        
           }

This same behavior can be reproduced in older versions of the pipeline (nextflow run smnrnaseq -profile test_mirgenedb,docker --outdir test_mirgenedb_old --mirtrace_species hsa, commit id f8fd872034e214fe922118275cdfdf6e498a7f5c)

I will update documentation to clearly state that mirtop supports mirtrace inputs only and emit warnings in the code. I also contacted mirtop developers to see if there is a workaround using mirgenedb inputs. I'll add in this issue if I have any updates.

atrigila · 2024-11-12T17:50:04Z

@anastasiaprime the issue should be solved now, but let us know if you have any additional questions. Just take into account that when using MirgeneDB inputs mirtop is hard-coded to use the pre sequences, which originate from the hairpin FASTA, rather than the pri sequences, which come from the mature FASTA. Users must provide pre files from the start to ensure consistency between the FASTA and GFF files, as the coordinates in the GFF file are referenced to pre sequences. This also ensures that names in the BAM file will match those in the GFF.

anastasiaprime added the bug Something isn't working label Oct 15, 2024

nschcolnicov added this to smrnaseq Nov 8, 2024

nschcolnicov assigned nschcolnicov and atrigila Nov 8, 2024

nschcolnicov mentioned this issue Nov 11, 2024

Issue 477 #481

Merged

11 tasks

nschcolnicov closed this as completed by moving to Done in smrnaseq Nov 11, 2024

nschcolnicov mentioned this issue Nov 12, 2024

Add docs mirtop #482

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MIRTOP_STATS IndexError #477

MIRTOP_STATS IndexError #477

anastasiaprime commented Oct 15, 2024 •

edited

Loading

atrigila commented Oct 15, 2024

anastasiaprime commented Oct 16, 2024

atrigila commented Oct 16, 2024 •

edited

Loading

atrigila commented Nov 12, 2024

MIRTOP_STATS IndexError #477

MIRTOP_STATS IndexError #477

Comments

anastasiaprime commented Oct 15, 2024 • edited Loading

Description of the bug

Command used and terminal output

Relevant files

System information

atrigila commented Oct 15, 2024

anastasiaprime commented Oct 16, 2024

atrigila commented Oct 16, 2024 • edited Loading

atrigila commented Nov 12, 2024

anastasiaprime commented Oct 15, 2024 •

edited

Loading

atrigila commented Oct 16, 2024 •

edited

Loading