Error in DGEList(counts = data, genes = rownames(data)) : non-numeric values found in counts #282

mdozmorov · 2023-09-09T00:34:15Z

Description of the bug

This issue is identical to #218, which was closed but not resolved. After debugging, the problem is that the data object being supplied to DGEList(counts=data,genes=rownames(data)) has 0 rows and 0 columns

smrnaseq/bin/edgeR_miRBase.r

Line 55 in 18d6c84

dataDGE<-DGEList(counts=data,genes=rownames(data))

The preceding code filters out rows and columns that have only zeros, but no code checks the data dimensions afterward.

This occurred when I analyzed two samples, to see if the pipeline runs. I'm new to Nextflow, don't know why my samples produce such results. Don't know how the output of edgeR_miRBase.r, or the lack of it, will affect the downstream steps.

Command used and terminal output

DIRIN=/Users/bluedot/data/WorkData/2023-08.miRNA-seq
INPUT=${DIRIN}/samplesheet_full.csv
DIROUT=${DIRIN}/OUT_test
GENOME=/Users/bluedot/data/ExtData/UCSC/hg38/hg38.fa
BWAINDEX=/Users/bluedot/data/ExtData/UCSC/hg38/hg38.fa
CHROMSIZE=/Users/bluedot/data/ExtData/UCSC/genometable.hg38.txt
MIRBASE_GFF=/Users/bluedot/data/ExtData/UCSC/hg38/hsa.gff3
MIRBASE_MATURE=/Users/bluedot/data/ExtData/UCSC/hg38/mature.fa
MIRBASE_HAIRPIN=/Users/bluedot/data/ExtData/UCSC/hg38/hairpin.fa

nextflow run nf-core/smrnaseq --input ${INPUT} \
  --outdir ${DIROUT} \
  -profile 'singularity' \
  --genome GRCh38 \
  --mirtrace_species 'hsa' \
  --protocol 'illumina' \
  --mirna_gtf ${MIRBASE_GFF} \
  --mature ${MIRBASE_MATURE} \
  --hairpin ${MIRBASE_HAIRPIN} \
  -resume
====

ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:SMRNASEQ:MIRNA_QUANT:EDGER_QC'

Caused by:
  Process `NFCORE_SMRNASEQ:SMRNASEQ:MIRNA_QUANT:EDGER_QC` terminated with an error exit status (1)

Command executed:

  edgeR_miRBase.r sample01_mature.sorted.idxstats sample09_mature.sorted.idxstats sample01_mature_hairpin.sorted.idxstats sample09_mature_hairpin.sorted.idxstats

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SMRNASEQ:SMRNASEQ:MIRNA_QUANT:EDGER_QC":
      r-base: $(echo $(R --version 2>&1) | sed 's/^.*R version //; s/ .*$//')
      limma: $(Rscript -e "library(limma); cat(as.character(packageVersion('limma')))")
      edgeR: $(Rscript -e "library(edgeR); cat(as.character(packageVersion('edgeR')))")
      data.table: $(Rscript -e "library(data.table); cat(as.character(packageVersion('data.table')))")
      gplots: $(Rscript -e "library(gplots); cat(as.character(packageVersion('gplots')))")
      methods: $(Rscript -e "library(methods); cat(as.character(packageVersion('methods')))")
      statmod: $(Rscript -e "library(statmod); cat(as.character(packageVersion('statmod')))")
  $hairpin
  [1] "sample01_mature_hairpin.sorted.idxstats"
  [2] "sample09_mature_hairpin.sorted.idxstats"

Command error:
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred

  Attaching package: ‘gplots’

  The following object is masked from ‘package:stats’:

  $mature
  [1] "sample01_mature.sorted.idxstats"
  [2] "sample09_mature.sorted.idxstats"

  $hairpin
  [1] "sample01_mature_hairpin.sorted.idxstats"
  [2] "sample09_mature_hairpin.sorted.idxstats"

  Error in DGEList(counts = data, genes = rownames(data)) :
    non-numeric values found in counts
  Execution halted

Relevant files

No response

System information

nextflow version 23.04.1.5866
HPC
local
singularity
CentOS Linux 7
2.2.2

The text was updated successfully, but these errors were encountered:

mdozmorov · 2023-10-12T01:21:35Z

The following helped me to complete the pipeline.

nextflow run nf-core/smrnaseq --input ${INPUT} \
  --outdir ${DIROUT} \
  -profile 'singularity' \
  --genome GRCh38 \
  --mirtrace_species 'hsa' \
  --protocol 'illumina' \
  --skip_mirdeep \
  -resume \
  -r fix-mirtop-gff

christopher-mohr · 2024-02-26T10:33:41Z

Hi @mdozmorov, can you confirm that this issue does not exist anymore in 2.3.0 ?

atrigila · 2024-08-23T15:31:54Z

I tried to reproduce this error in the latest dev version, using two samples as in the original post, but I couldn't. The pipeline completed successfully.

Code:

DIRIN=/workspace/
INPUT=${DIRIN}/smrnaseq/assets/samplesheet.csv
DIROUT=${DIRIN}/test_issue_282
// MIRBASE_GFF: This file is downloaded automatically if using iGenomes
// MIRBASE_MATURE: This file is downloaded automatically from mirbase if not provided
// MIRBASE_HAIRPIN: This file is downloaded automatically from mirbase if not provided

 nextflow run nf-core/smrnaseq \
--input ${INPUT}   \
--outdir ${DIROUT}   \
-profile illumina,singularity   \
--genome GRCh38   \
--mirtrace_species 'hsa'   \
-r dev

Samplesheet:

sample,fastq_1
Clone1_N1,s3://ngi-igenomes/test-data/smrnaseq/C1-N1-R1_S4_L001_R1_001.fastq.gz
Clone9_N1,s3://ngi-igenomes/test-data/smrnaseq/C9-N1-R1_S7_L001_R1_001.fastq.gz

mdozmorov added the bug Something isn't working label Sep 9, 2023

apeltzer added this to smrnaseq Aug 8, 2024

apeltzer added this to the 2.4.0 milestone Aug 20, 2024

atrigila self-assigned this Aug 23, 2024

apeltzer closed this as completed Aug 26, 2024

github-project-automation bot moved this from On Hold to Done in smrnaseq Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in DGEList(counts = data, genes = rownames(data)) : non-numeric values found in counts #282

Error in DGEList(counts = data, genes = rownames(data)) : non-numeric values found in counts #282

mdozmorov commented Sep 9, 2023

mdozmorov commented Oct 12, 2023

christopher-mohr commented Feb 26, 2024

atrigila commented Aug 23, 2024

Error in DGEList(counts = data, genes = rownames(data)) : non-numeric values found in counts #282

Error in DGEList(counts = data, genes = rownames(data)) : non-numeric values found in counts #282

Comments

mdozmorov commented Sep 9, 2023

Description of the bug

Command used and terminal output

Relevant files

System information

mdozmorov commented Oct 12, 2023

christopher-mohr commented Feb 26, 2024

atrigila commented Aug 23, 2024