Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit to subpeak regions for motif scanning (and output them). #8

Closed
j-andrews7 opened this issue May 1, 2023 · 0 comments
Closed

Comments

@j-andrews7
Copy link
Contributor

Pretty simple with pyfaidx, something like:

import pyfaidx

def extract_sequences_from_bed(fasta_path, bed_path, output_path):
    # Open the FASTA file
    fasta = pyfaidx.Fasta(fasta_path)

    # Open the output file
    with open(output_path, 'w') as output_file:
        # Open the BED file
        with open(bed_path, 'r') as bed_file:
            for line in bed_file:
                fields = line.strip().split('\t')
                chrom = fields[0]
                start = int(fields[1])
                end = int(fields[2])
                # Use pyfaidx to extract the sequence
                sequence = fasta[chrom][start:end]
                # Write the sequence to the output file
                output_file.write(f'>{chrom}:{start}-{end}\n')
                output_file.write(str(sequence) + '\n')

Actually probably want two output files, one in BED format, the other in FASTA format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants