Error printing whole nanocomp report #76

NikoLichi · 2024-03-21T14:27:46Z

Hi Wouter,

I used nanocomp to compare different runs on technical samples. It runs well before printing the whole report. Please see the errors below.
This may be related to the issue #41
I will give a try to the create_feather script and see how it goes...

Thanks for any other help,
Niko


2024-03-20 19:02:08,621 Writing html report.
2024-03-20 19:02:08,794 Error tokenizing data. C error: Expected 2 fields in line 21, saw 53
Traceback (most recent call last):
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/nanocomp/NanoComp.py", line 85, in main
    make_report(plots, settings["path"], stats_df=stats_df)
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/nanocomp/NanoComp.py", line 410, in make_report
    html_content.append(utils.stats2html(path + "NanoStats.txt"))
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/nanocomp/utils.py", line 31, in stats2html
    df = pd.read_csv(statsf, sep=":", header=None, names=["feature", "value"])
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 626, in _read
    return parser.read(nrows)
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1923, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
  File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 905, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows
  File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
  File "parsers.pyx", line 2061, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 21, saw 53

Traceback (most recent call last):
  File "/home/itg/niko/miniconda3/envs/nanopack/bin/NanoComp", line 10, in <module>
    sys.exit(main())
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/nanocomp/NanoComp.py", line 85, in main
    make_report(plots, settings["path"], stats_df=stats_df)
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/nanocomp/NanoComp.py", line 410, in make_report
    html_content.append(utils.stats2html(path + "NanoStats.txt"))
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/nanocomp/utils.py", line 31, in stats2html
    df = pd.read_csv(statsf, sep=":", header=None, names=["feature", "value"])
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 626, in _read
    return parser.read(nrows)
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1923, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
  File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 905, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows
  File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
  File "parsers.pyx", line 2061, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 21, saw 53

The text was updated successfully, but these errors were encountered:

NikoLichi · 2024-03-22T14:41:12Z

Hi Wouter,

I gave it a try again with the feather files and have exactly the same errors.
All the other files are produced but not the final compiled file.

How this can be solved?

This is the code I am using after feather:

NanoComp -t 32 --verbose -f pdf --feather $FILEIN -p NovoVsGeneC -o NovoVsGenC/qualCont/nanopack --names $PREFIX

Thanks and all the best,
Niko

wdecoster · 2024-03-22T19:20:32Z

Hi Niko,

That is quite remarkable. Could you share the NanoStats.txt file?

Thanks,
Wouter

NikoLichi · 2024-03-25T09:19:03Z

Hi Wouter,

If you mean the .log file after the Nanocomp run, it is enclosed below. Otherwise, please let me know which file you refer to.

All the best,
Niko
NovoVsGeneC_5don5TimPYFNanoComp_20240322_1426.log

wdecoster · 2024-03-26T13:24:57Z

Was there no NanoStats.txt file? That should also be generated in /NovoVsGenC/qualCont/nanopack/

NikoLichi · 2024-03-26T15:06:39Z

I completely missed the file across all the other files, sorry.
Here it is.

NovoVsGeneC_5don5TimPYFNanoStats.txt

wdecoster · 2024-03-28T08:21:08Z

Aha now I see! I see the read identifiers on line 21 are like "141:329|2a491583-c20d-48e9-8ccc-49afe630be59". Is this a duplex run?

NikoLichi · 2024-03-28T08:35:47Z

Oh... interesting
No duplex run. This is cDNAseq from an RNA isolation protocol.

This is the output after trimming and finding directionality using Pychopper. They add those identifiers (e.g.,"141:329|") before the actual read identifier.

But... nanocomp is able to process all the metrics with those headers in separate files (HTML and PDF), but the consensus globall output (HTML) fails.

wdecoster · 2024-03-28T08:37:55Z

Yes, most of the time it doesn't care about that :, but it is used when generating the HTML report of the NanoStats file. Let me see if I can come up with an easy fix.

wdecoster · 2024-03-28T10:22:40Z

As a workaround, could you see if running with --tsv_stats fixes things?

NikoLichi · 2024-03-28T15:39:00Z

Yes! It worked! Thanks!

I ran the command for two data sets already, and it works fine!
I'll keep this command trick in mind when using pychopper sequences.

All the best,
Niko

wdecoster · 2024-03-28T20:02:12Z

Thanks for the feedback!

NikoLichi closed this as completed Mar 28, 2024

wdecoster mentioned this issue Jul 2, 2024

Report not generated: error tokenizing data wdecoster/NanoPlot#373

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error printing whole nanocomp report #76

Error printing whole nanocomp report #76

NikoLichi commented Mar 21, 2024

NikoLichi commented Mar 22, 2024 •

edited

Loading

wdecoster commented Mar 22, 2024

NikoLichi commented Mar 25, 2024 •

edited

Loading

wdecoster commented Mar 26, 2024

NikoLichi commented Mar 26, 2024

wdecoster commented Mar 28, 2024

NikoLichi commented Mar 28, 2024

wdecoster commented Mar 28, 2024

wdecoster commented Mar 28, 2024

NikoLichi commented Mar 28, 2024

wdecoster commented Mar 28, 2024

Error printing whole nanocomp report #76

Error printing whole nanocomp report #76

Comments

NikoLichi commented Mar 21, 2024

NikoLichi commented Mar 22, 2024 • edited Loading

wdecoster commented Mar 22, 2024

NikoLichi commented Mar 25, 2024 • edited Loading

wdecoster commented Mar 26, 2024

NikoLichi commented Mar 26, 2024

wdecoster commented Mar 28, 2024

NikoLichi commented Mar 28, 2024

wdecoster commented Mar 28, 2024

wdecoster commented Mar 28, 2024

NikoLichi commented Mar 28, 2024

wdecoster commented Mar 28, 2024

NikoLichi commented Mar 22, 2024 •

edited

Loading

NikoLichi commented Mar 25, 2024 •

edited

Loading