Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: sequence item 0: expected str instance, numpy.int64 found #3

Open
Bio-finder opened this issue Apr 19, 2022 · 5 comments
Open

Comments

@Bio-finder
Copy link

Bio-finder commented Apr 19, 2022

Good morning,
I have an issue when I use your tool with the example data:
After installing the tool in python 3.7 I get the following error when I run it:

 pHierCC -p YERwgMLST.cgMLSTv1.profile.gz -o YERwgMLST.cgMLSTv1.HierCC

2022-04-19 10:39:51,454 | Loaded in allelic profiles with dimension: 4371 and 1554. The first column is assumed to be type id.

2022-04-19 10:39:51,455 | Start HierCC assignments

2022-04-19 10:39:51,566 | Calculate distance matrix

2022-04-19 10:40:09,869 | Start Single linkage clustering

2022-04-19 10:40:10,883 | Attach genomes onto the tree.

Traceback (most recent call last):
  File "/home/bebergk/venvs/camel_3.0_p3.7/bin/pHierCC", line 8, in <module>
    sys.exit(phierCC())

  File "/home/bebergk/venvs/camel_3.0_p3.7/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)

  File "/home/bebergk/venvs/camel_3.0_p3.7/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)

  File "/home/bebergk/venvs/camel_3.0_p3.7/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)

  File "/home/bebergk/venvs/camel_3.0_p3.7/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)

  File "/home/bebergk/venvs/camel_3.0_p3.7/lib/python3.7/site-packages/pHierCC/pHierCC.py", line 135, in phierCC
    fout.write('\t'.join([n] + [str(rr) for rr in r[1:]]) + '\n')

TypeError: sequence item 0: expected str instance, numpy.int64 found

Could you help me to see what is going wrong?

Also, just in case, my final objective would be to be able to reproduce the HC numbers from Enterobase by using your tool with the file "profiles.list.gz". Do you think that it's possible and that I proceed the right way?

Best regards,

@Bio-finder
Copy link
Author

Changing [n] to [str(n)] on line 135 fixed the issue.

In addition to that I spotted a more annoying mistake which caused the name of the sequence types to be mixed during the clustering so I post also here the solution: you need to change in line 36 names = mat.T[0] by names = mat.T[0].deepcopy() or your index will be mixed during the reordering of mat in the code.

@a-damC
Copy link

a-damC commented Mar 28, 2023

Thanks @Bio-finder
I manually changed this in the version I downloaded via conda.
This seems to have worked for me.

@eam12
Copy link

eam12 commented Aug 14, 2023

@a-damC What version of pHierCC did you download via Conda? It seems that the most up to date version currently available via Conda (v.1.24) only has 126 lines. I think the version you and @Bio-finder were editing was 1.26 or 1.27?

@a-damC
Copy link

a-damC commented Aug 14, 2023

@eam12 I can't remember but I have started to use ReporTree for clustering. It's currently well maintained and an all encompassing set of code that includes single linkage clustering. https://github.com/insapathogenomics/ReporTree

@eam12
Copy link

eam12 commented Aug 15, 2023

@a-damC Many thanks for the recommendation! ReportTree looks like a really nice alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants