Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got negative values on weighted histogram plot #73

Closed
najohink opened this issue Sep 22, 2023 · 8 comments
Closed

Got negative values on weighted histogram plot #73

najohink opened this issue Sep 22, 2023 · 8 comments

Comments

@najohink
Copy link

Hello,

I am using NanoComp v1.23.1 and got a weird plot after filtering my input fastq files (see attached image).
Screenshot 2023-09-22 163016

When I did the same command on input fastq which were not filtered, I got normal plots. But after filtering my fastq files to only keep 1-27kb reads, I now get negative values in the weighted plots. Is this "normal"?

Can you also explain the difference between weighted and normalized?

best,
S

@najohink
Copy link
Author

I forgot to add the photo of the unfiltered fastq output plot:

Screenshot 2023-09-22 164308

@wdecoster
Copy link
Owner

I am very confused and will need to think about this.

@najohink
Copy link
Author

najohink commented Sep 26, 2023

I filtered my dataset with FiltLong before running NanoComp and getting the weird result.

In the meantime, I figured out how to do what I wanted by running this:

df3 = pickle.load(open('barcode03_1-27kb_NanoComp-data.pickle', 'rb'))

bins = numpy.arange(0, 30000, 500)
h3 = numpy.histogram(df3['lengths'], bins=bins)

plt.bar(h3[1][:-1], height = h3[0], width=450)

xdata3 = (h3[1][:-1] + h3[1][1:])/2
ydata3 = xdata3 * h3[0]
plt.bar(xdata3, ydata3, width=450)
ydata3[xdata3 > 25000].sum() / ydata3.sum()

I was interested in knowing what percent of the total bases my full length sequence was. So I wanted to divide the 26kb bases by the total number of bases, but wanted to also keep out the weird long stuff from the dataset, hence filtering with FiltLong.

@wdecoster
Copy link
Owner

Does the plot without weighted look normal? I will explain later what those mean when I'm at the computer...

@najohink
Copy link
Author

Yes, the others look normal. Only the two weighted plots have negative values.

@wdecoster
Copy link
Owner

So normalized plots mean that every dataset in the plot adds up to "1" - so datasets with significant differences in yield can still be compared on length. Without normalization, just the number of reads is used.
And weighted means that instead of the number of reads per bin, the number of bases per bin is used (as is also the case in the minKNOW interface). As such, a read of 25000 bases in the bin of 24000-26000 will increase the count on the y-axis for 25000 rather than just 1.

@wdecoster
Copy link
Owner

Do you think it would be possible to share the data that caused this?

@wdecoster
Copy link
Owner

So I haven't been able to replicate this. Please let me know if someone runs into a similar issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants