Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error printing whole nanocomp report #76

Closed
NikoLichi opened this issue Mar 21, 2024 · 11 comments
Closed

Error printing whole nanocomp report #76

NikoLichi opened this issue Mar 21, 2024 · 11 comments

Comments

@NikoLichi
Copy link

Hi Wouter,

I used nanocomp to compare different runs on technical samples. It runs well before printing the whole report. Please see the errors below.
This may be related to the issue #41
I will give a try to the create_feather script and see how it goes...

Thanks for any other help,
Niko


2024-03-20 19:02:08,621 Writing html report.
2024-03-20 19:02:08,794 Error tokenizing data. C error: Expected 2 fields in line 21, saw 53
Traceback (most recent call last):
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/nanocomp/NanoComp.py", line 85, in main
    make_report(plots, settings["path"], stats_df=stats_df)
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/nanocomp/NanoComp.py", line 410, in make_report
    html_content.append(utils.stats2html(path + "NanoStats.txt"))
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/nanocomp/utils.py", line 31, in stats2html
    df = pd.read_csv(statsf, sep=":", header=None, names=["feature", "value"])
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 626, in _read
    return parser.read(nrows)
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1923, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
  File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 905, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows
  File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
  File "parsers.pyx", line 2061, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 21, saw 53

Traceback (most recent call last):
  File "/home/itg/niko/miniconda3/envs/nanopack/bin/NanoComp", line 10, in <module>
    sys.exit(main())
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/nanocomp/NanoComp.py", line 85, in main
    make_report(plots, settings["path"], stats_df=stats_df)
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/nanocomp/NanoComp.py", line 410, in make_report
    html_content.append(utils.stats2html(path + "NanoStats.txt"))
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/nanocomp/utils.py", line 31, in stats2html
    df = pd.read_csv(statsf, sep=":", header=None, names=["feature", "value"])
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 626, in _read
    return parser.read(nrows)
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1923, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/home/itg/niko/miniconda3/envs/nanopack/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
  File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 905, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows
  File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
  File "parsers.pyx", line 2061, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 21, saw 53
@NikoLichi
Copy link
Author

NikoLichi commented Mar 22, 2024

Hi Wouter,

I gave it a try again with the feather files and have exactly the same errors.
All the other files are produced but not the final compiled file.

How this can be solved?

This is the code I am using after feather:

NanoComp -t 32 --verbose -f pdf --feather $FILEIN -p NovoVsGeneC -o NovoVsGenC/qualCont/nanopack --names $PREFIX

Thanks and all the best,
Niko

@wdecoster
Copy link
Owner

Hi Niko,

That is quite remarkable. Could you share the NanoStats.txt file?

Thanks,
Wouter

@NikoLichi
Copy link
Author

NikoLichi commented Mar 25, 2024

Hi Wouter,

If you mean the .log file after the Nanocomp run, it is enclosed below. Otherwise, please let me know which file you refer to.

All the best,
Niko
NovoVsGeneC_5don5TimPYFNanoComp_20240322_1426.log

@wdecoster
Copy link
Owner

Was there no NanoStats.txt file? That should also be generated in /NovoVsGenC/qualCont/nanopack/

@NikoLichi
Copy link
Author

I completely missed the file across all the other files, sorry.
Here it is.

NovoVsGeneC_5don5TimPYFNanoStats.txt

@wdecoster
Copy link
Owner

Aha now I see! I see the read identifiers on line 21 are like "141:329|2a491583-c20d-48e9-8ccc-49afe630be59". Is this a duplex run?

@NikoLichi
Copy link
Author

Oh... interesting
No duplex run. This is cDNAseq from an RNA isolation protocol.

This is the output after trimming and finding directionality using Pychopper. They add those identifiers (e.g.,"141:329|") before the actual read identifier.

But... nanocomp is able to process all the metrics with those headers in separate files (HTML and PDF), but the consensus globall output (HTML) fails.

@wdecoster
Copy link
Owner

Yes, most of the time it doesn't care about that :, but it is used when generating the HTML report of the NanoStats file. Let me see if I can come up with an easy fix.

@wdecoster
Copy link
Owner

As a workaround, could you see if running with --tsv_stats fixes things?

@NikoLichi
Copy link
Author

Yes! It worked! Thanks!

I ran the command for two data sets already, and it works fine!
I'll keep this command trick in mind when using pychopper sequences.

All the best,
Niko

@wdecoster
Copy link
Owner

Thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants