Merged tool #29

Landbarsch · 2024-11-21T13:33:30Z

Merged the Subsampling and Multiprocessing Aspects into one Skript

code. made an extra file for the sampler version

for a complete Tool

.DS_Store

vxdetector/testlog

sjanssen2 · 2024-11-21T15:40:51Z

vxdetector/VXdetector.py

+def sample_fastq(file_path, sample_size, sampled_indices=None):
+    '''Get random parts from the FASTQ file based on shared indices for paired-end reads.'''
+    sampled_reads = []  # List to hold sampled reads
+    open_func = gzip.open if file_path.endswith('.gz') else open
+
+    # Count total reads in the file by iterating line by line
+    with open_func(file_path, 'rt') as f:
+        total_reads = sum(1 for _ in f) // 4
+
+    # Adjust sample_size if it's greater than total_reads
+    if sample_size > total_reads:
+        sample_size = total_reads
+
+    if sampled_indices is None:
+        sampled_indices = sorted(random.sample(range(total_reads), sample_size))
+
+    with open_func(file_path, 'rt') as f:
+        read_idx = 0
+        read_buffer = []
+        for line in f:
+            read_buffer.append(line)
+            if len(read_buffer) == 4:  # Once we have a full read
+                if read_idx in sampled_indices:
+                    sampled_reads.extend(read_buffer)  # Add the read if it matches the sampled index
+                read_buffer = []  # Clear the buffer for the next read
+                read_idx += 1  # Move to the next read
+
+                # Stop if we've collected enough reads
+                if len(sampled_reads) >= sample_size * 4:
+                    break
+
+    return sampled_reads, sampled_indices


you should first merge my PR #28 to get a much faster version of this function + documentation!

sjanssen2 · 2024-11-21T15:42:23Z

vxdetector/VXdetector.py

+    open_func = gzip.open if output_path.endswith('.gz') else open  # Use gzip if output is .gz
+
+    with open_func(output_path, 'wt') as f:  # Write in text mode ('wt')
+        f.writelines(sampled_reads)


 def do_statistic(result):


it's hard for me to see what these changes are good for? Is this "just" refactoring the code OR also changing the functionality. You might want to make sure the github actions pass as they should include some unit tests

the sample_fastq method from the memory_efficient_ sampling branch. Sampled indices is no longer used in sample_fastq, yet needed for the workflow to funtion.

Anna-Rehm · 2024-12-12T15:19:46Z

Please remove the .DS files using .gitignore so we don't track them anymore. Additionally, do we need to track the testlog? There seems to be a module missing ("imp") according to the Git Actions output. In #30 Stefan mentioned nosetest maybe being substituted with pynose. Maybe that might solve our problem?

ignore DS_Store

Anna-Rehm · 2024-12-13T13:31:10Z

It seems like pynose can't be installed using conda. You will most likely have to change parts of the test.yml

algorithm_vxdetector/.github/workflows/github_tests.yml

Line 18 in 2b2a01a

- name: Install dependencies

and add the installation of the package via pip + remove the respective package from the environment.yml at the same time.

…rged_Tool

…to Merged_Tool

sjanssen2 · 2025-02-05T08:44:15Z

Hi @Landbarsch ich sehe, dass Du fleißig versuchst die Tests zu korrigieren. Es scheint gar nicht so einfach zu sein. Vielleicht liegt es daran, dass Du zu viele Änderungen in einem versuchst zu beherrschen.

Ich sehe z.B. Änderungen in den Funktionen workflow und do_statistic. Aber haben die direkt etwas miteinander zu tun? Vielleicht machst Du erstmal noch einen neuen Branch auf, in dem Du nur die Änderungen an do_statistic versuchst einzupflegen. Wenn das klappt, könnte man diese Änderungen in den Branch hier hinein mergen und hätte einen übersichtlicheren Pull Request.

Jeremy Meiss and others added 9 commits August 5, 2024 10:35

add subsampling

d38ae75

add subsampling

4684232

add subsampling

d3e1caf

"add subsampling"

c3266ff

add subsampling

68876c5

update

8d8438e

updated sample size, now working

7f08f11

added comments to the skript, removed the debugg-

5611569

code. made an extra file for the sampler version

Merged Multiprocessing Apsect with Subsampling

b851734

for a complete Tool

Landbarsch requested review from sjanssen2 and Anna-Rehm November 21, 2024 13:33

sjanssen2 requested changes Nov 21, 2024

View reviewed changes

Landbarsch added 4 commits November 26, 2024 15:31

Added Docstring to the missing Methods. Added

765987b

the sample_fastq method from the memory_efficient_ sampling branch. Sampled indices is no longer used in sample_fastq, yet needed for the workflow to funtion.

Formatting

7f0908c

more formatting

dd79888

formatted for flake8

3cf6e0f

Landbarsch added 6 commits December 12, 2024 16:29

Delete .DS_Store

3849a7d

Update .gitignore

41f3d6a

ignore DS_Store

Delete vxdetector/testlog

ceaa08e

Update environment.yml

844c824

Update environment.yml

8867fa8

Update environment.yml

2b2a01a

Landbarsch added 6 commits January 18, 2025 22:37

Update environment.yml

37f8601

Update environment.yml

85b5d96

Update VXdetector.py

9f15b02

Update test_VXdetector.py

015bbf5

Update test_VXdetector.py

528ab57

Merge branch 'master' of github.com:jlab/algorithm_vxdetector into Me…

95ac674

…rged_Tool

Landbarsch added 5 commits January 31, 2025 16:06

Merge branch 'Merged_Tool' of github.com:jlab/algorithm_vxdetector in…

b79162a

…to Merged_Tool

cleanuo

d2833b7

TestChanges

7d60e26

added biom to env

10398e9

-format

4ab20f7

Landbarsch marked this pull request as ready for review January 31, 2025 15:41

Landbarsch added 9 commits February 3, 2025 13:48

Changed test to include new params

0edb289

Formatiing for Lint

bf931b6

Changed the interact_bowtie2 Script

b910cef

formatting of interact_bowtie2

99018bc

fixes for testing

cc4fadc

reduced sample size for the test

b2743c3

changed a parameter

545c8c2

Changes to test_interact_bowtie

e2d5f2b

undo changes

7f6cfc8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merged tool #29

Merged tool #29

Landbarsch commented Nov 21, 2024

sjanssen2 Nov 21, 2024

sjanssen2 Nov 21, 2024

Anna-Rehm commented Dec 12, 2024

Anna-Rehm commented Dec 13, 2024

sjanssen2 commented Feb 5, 2025

Merged tool #29

Are you sure you want to change the base?

Merged tool #29

Conversation

Landbarsch commented Nov 21, 2024

sjanssen2 Nov 21, 2024

Choose a reason for hiding this comment

sjanssen2 Nov 21, 2024

Choose a reason for hiding this comment

Anna-Rehm commented Dec 12, 2024

Anna-Rehm commented Dec 13, 2024

sjanssen2 commented Feb 5, 2025