Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classifier and CPU consumption #9

Open
fescudie opened this issue Mar 4, 2015 · 6 comments
Open

Classifier and CPU consumption #9

fescudie opened this issue Mar 4, 2015 · 6 comments

Comments

@fescudie
Copy link

fescudie commented Mar 4, 2015

Hi,

When I use RDP classifier with my own databank (a very large 16S databank) the CPU usage of RDP is unacceptable : up to 2360% (see below).
This phenomena doesn't appear with the default databank and is more reduced with the databank provided in example of RDP train classifier.
How can I reduce the CPU consumption/nb threads of RDP classifier ?

Command with my databank:

java -Xmx15g -jar path/to/classifier.jar classify -c 0.8 -t path/to/my_bank.properties -o result.rdp sub.fasta

Consumption:

top - 09:51:00 up 56 days, 22:36,  0 users,  load average: 15.10, 23.87, 20.76
Tasks: 840 total,  11 running, 829 sleeping,   0 stopped,   0 zombie
Cpu(s): 81.2%us,  0.2%sy,  0.0%ni, 18.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  264438700k total, 84939736k used, 179498964k free,   174172k buffers
Swap: 16777208k total,    36100k used, 16741108k free, 64703676k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                        
 65765 fescudie  20   0 18.4g 6.4g  10m S 2360.5  2.6   4:59.91 java                                                                                                                                         
 65850 fescudie  20   0 13684 1776  880 R  0.7  0.0   0:00.05 top                                                                                                                                            
 65432 fescudie  20   0  104m 1948 1408 S  0.0  0.0   0:00.15 bash 

Consumption with threads:

top - 10:33:10 up 56 days, 23:18,  0 users,  load average: 14.83, 10.51, 10.28
Tasks: 1305 total,  11 running, 1294 sleeping,   0 stopped,   0 zombie
Cpu(s): 41.4%us,  2.5%sy,  0.0%ni, 56.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  264438700k total, 83889500k used, 180549200k free,   174876k buffers
Swap: 16777208k total,    36100k used, 16741108k free, 64773160k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                        
 66871 fescudie  20   0 18.4g 5.3g   9m R 70.3  2.1   0:16.90 java                                                                                                                                           
 66876 fescudie  20   0 18.4g 5.3g   9m S 29.7  2.1   0:02.20 java                                                                                                                                           
 66889 fescudie  20   0 18.4g 5.3g   9m S 29.7  2.1   0:02.31 java                                                                                                                                           
 66891 fescudie  20   0 18.4g 5.3g   9m S 29.7  2.1   0:02.22 java                                                                                                                                           
 66897 fescudie  20   0 18.4g 5.3g   9m S 29.7  2.1   0:02.27 java                                                                                                                                           
 66878 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.05 java                                                                                                                                           
 66879 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.01 java                                                                                                                                           
 66881 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.07 java                                                                                                                                           
 66882 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.13 java                                                                                                                                           
 66884 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:01.99 java                                                                                                                                           
 66886 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.19 java                                                                                                                                           
 66890 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.12 java                                                                                                                                           
 66892 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.16 java                                                                                                                                           
 66893 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.29 java                                                                                                                                           
 66894 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:01.68 java                                                                                                                                           
 66895 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.04 java                                                                                                                                           
 66896 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.27 java                                                                                                                                           
 66898 fescudie  20   0 18.4g 5.3g   9m S 29.4  2.1   0:02.11 java                                                                                                                                           
 66875 fescudie  20   0 18.4g 5.3g   9m S 29.1  2.1   0:02.22 java                                                                                                                                           
 66877 fescudie  20   0 18.4g 5.3g   9m S 29.1  2.1   0:02.26 java                                                                                                                                           
 66899 fescudie  20   0 18.4g 5.3g   9m S 29.1  2.1   0:02.26 java                                                                                                                                           
 66885 fescudie  20   0 18.4g 5.3g   9m S 28.7  2.1   0:02.13 java                                                                                                                                           
 66880 fescudie  20   0 18.4g 5.3g   9m S 28.4  2.1   0:02.19 java                                                                                                                                           
 66874 fescudie  20   0 18.4g 5.3g   9m S 28.1  2.1   0:02.01 java                                                                                                                                           
 66872 fescudie  20   0 18.4g 5.3g   9m S 26.8  2.1   0:01.99 java                                                                                                                                           
 66873 fescudie  20   0 18.4g 5.3g   9m S 26.1  2.1   0:02.00 java                                                                                                                                           
 66883 fescudie  20   0 18.4g 5.3g   9m S 24.1  2.1   0:02.03 java                                                                                                                                           
 66888 fescudie  20   0 18.4g 5.3g   9m S 22.1  2.1   0:01.62 java                                                                                                                                           
 66887 fescudie  20   0 18.4g 5.3g   9m S 21.8  2.1   0:01.92 java                                                                                                                                           
 66912 fescudie  20   0 14080 2168  884 R  1.0  0.0   0:00.11 top                                                                                                                                            
 65432 fescudie  20   0  104m 1948 1408 S  0.0  0.0   0:00.44 bash                                                                                                                                           
 66870 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java                                                                                                                                           
 66900 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java                                                                                                                                           
 66901 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java                                                                                                                                           
 66902 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java                                                                                                                                           
 66903 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java                                                                                                                                           
 66904 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.10 java                                                                                                                                           
 66905 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.09 java                                                                                                                                           
 66906 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java                                                                                                                                           
 66907 fescudie  20   0 18.4g 5.3g   9m S  0.0  2.1   0:00.00 java  

Command with RDP default databank:

java -Xmx15g -jar path/to/classifier.jar classify -c 0.8 -o result.rdp sub.fasta

Consumption:

top - 09:53:41 up 56 days, 22:39,  0 users,  load average: 9.96, 17.82, 18.93
Tasks: 840 total,  10 running, 830 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.0%us,  0.0%sy,  0.0%ni, 75.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  264438700k total, 78978564k used, 185460136k free,   174216k buffers
Swap: 16777208k total,    36100k used, 16741108k free, 64768832k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                        
 65863 fescudie  20   0 18.4g 703m  10m S 100.1  0.3   1:18.87 java                                                                                                                                          
 65917 fescudie  20   0 13684 1784  880 R  0.3  0.0   0:00.36 top                                                                                                                                            
 65432 fescudie  20   0  104m 1948 1408 S  0.0  0.0   0:00.16 bash      

Command with 'Example command to train classifier':

java -Xmx1g -jar path/to/classifier.jar train -o mytrained -s path/to/RDPTools/classifier/samplefiles/new_trainset.fasta -t path/to/RDPTools/classifier/samplefiles/new_trainset_db_taxid.txt
cp path/to/RDPTools/classifier/samplefiles/rRNAClassifier.properties mytrained
java -Xmx15g -jar path/to/classifier.jar classify -c 0.8 -t mytrained/rRNAClassifier.properties -o result.rdp sub.fasta

Consumption:

top - 10:23:54 up 56 days, 23:09,  0 users,  load average: 9.19, 8.95, 10.32
Tasks: 840 total,  10 running, 830 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.5%us,  0.1%sy,  0.0%ni, 74.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  264438700k total, 78953232k used, 185485468k free,   174720k buffers
Swap: 16777208k total,    36100k used, 16741108k free, 64773140k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                        
 66617 fescudie  20   0 18.4g 590m  10m S 120.6  0.2   0:29.30 java                                                                                                                                          
 66655 fescudie  20   0 13684 1784  884 R  0.7  0.0   0:00.13 top                                                                                                                                            
 65432 fescudie  20   0  104m 1948 1408 S  0.0  0.0   0:00.30 bash

Thanks in advance.

@wangqion
Copy link
Contributor

wangqion commented Apr 9, 2015

The CPU usage is likely proportional to the number of terminal taxa in the training set. There are about 2000 genera (terminal taxa) in the default 16S taxonomy. You can request less than 1G memory. Do you know how many terminal taxa in your training set?

@fescudie
Copy link
Author

fescudie commented Apr 9, 2015

My taxonomy contains 82561 terminal taxa.
The memory used cannot be reduced because the taxonomy is very large. When I have reduced memory the classifier has returned an out of memory error.
The threads seem opened only when the classifier loads the taxonomy not for classification. Why these threads are necessary ? It is not possible to load taxonomy with only one thread ?
Actually, to solve this problem I use taskset. With this program all the threads run on the same CPU. But this is not the best solution.

@wangqion
Copy link
Contributor

We haven't used Classifier on large number of terminal taxa. The largest on
we have is the Fungal ITS UNITE training set containing 20,221 species. The
Classifier uses a single thread but the Java garbage collection may use
more threads if lack of memory. I am wondering if you can try use 30 or
40GB memory just to see if the number of threads being used can be reduced.

On Thu, Apr 9, 2015 at 3:23 AM, fescudie [email protected] wrote:

My taxonomy contains 82561 terminal taxa.
The memory used cannot be reduced because the taxonomy is very large. When
I have reduced memory the classifier has returned an out of memory error.
The threads seem opened only when the classifier loads the taxonomy not
for classification. Why these threads are necessary ? It is not possible to
load taxonomy with only one thread ?
Actually, to solve this problem I use taskset. With this program all the
threads run on the same CPU. But this is not the best solution.


Reply to this email directly or view it on GitHub
#9 (comment).

Qiong

@fescudie
Copy link
Author

With 100GB I have the same problem.
I will continu to used taskset.

@wangqion
Copy link
Contributor

This is interesting. I am wondering if you would like to share your
training files with me for further debugging.

Qiong

On Mon, Apr 13, 2015 at 6:59 AM, fescudie [email protected] wrote:

With 100GB I have the same problem.
I will continu to used taskset.


Reply to this email directly or view it on GitHub
#9 (comment).

Qiong

@fescudie
Copy link
Author

You can get the training files at this URL: http://genoweb.toulouse.inra.fr/~fescudie/.
But this phenomena is already present with the training example dataset in RDPTools (120% CPU):

java -Xmx1g -jar path/to/classifier.jar train -o mytrained -s path/to/RDPTools/classifier/samplefiles/new_trainset.fasta -t path/to/RDPTools/classifier/samplefiles/new_trainset_db_taxid.txt
cp path/to/RDPTools/classifier/samplefiles/rRNAClassifier.properties mytrained
java -Xmx15g -jar path/to/classifier.jar classify -c 0.8 -t mytrained/rRNAClassifier.properties -o result.rdp sub.fasta

Consumption:

top - 10:23:54 up 56 days, 23:09,  0 users,  load average: 9.19, 8.95, 10.32
Tasks: 840 total,  10 running, 830 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.5%us,  0.1%sy,  0.0%ni, 74.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  264438700k total, 78953232k used, 185485468k free,   174720k buffers
Swap: 16777208k total,    36100k used, 16741108k free, 64773140k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                        
 66617 fescudie  20   0 18.4g 590m  10m S 120.6  0.2   0:29.30 java                                                                                                                                          
 66655 fescudie  20   0 13684 1784  884 R  0.7  0.0   0:00.13 top                                                                                                                                            
 65432 fescudie  20   0  104m 1948 1408 S  0.0  0.0   0:00.30 bash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants