Skip to content

Latest commit

 

History

History
1778 lines (1612 loc) · 54.2 KB

README.md

File metadata and controls

1778 lines (1612 loc) · 54.2 KB

Gene and Primer Sequence Analysis for SARS-CoV-2, EGFR(Non Small Lung Cancer Cell), Influenza DNAs

Visit Company website by clicking on the Logo

Gene and Primer Sequence Analysis for SARS-CoV-2, EGFR(Non Small Lung Cancer Cell), Influenza DNAs

This Notebook is Created by Owais Ahmad for SARS-CoV-2 Gene Sequence Analysis.

This is for Medical Research Purpose only.

First time setup - Generating credentials

  1. Sign in or create a new IDT account here.
  2. Go to your user name drop down menu at the top right of the page, and select My account.
  3. Click the API access link.
  4. Click the Request new API key button.
  5. Append those 4 secret in Kaggle Add-ons
  6. Comment down for any help.

How can I check my Oligo primers to ensure there are no significant primer design issues?

  • The difference between melting temperatures (Tm) of the primers should be less than 5°C.
  • The GC content should be between 35-80% or equivalent to the product being amplified.
  • The Delta G value of any self-dimers, hairpins, and heterodimers should be weaker (more positive) than -9.0 kcal/mole. Positive numbers indicate that the actual secondary structure shown will not form at all.
  • Avoid 3' complementarity between the two primers to prevent primer dimers. The IDT OligoAnalyzer APIs can be used to assess these different criteria for a proposed oligo.

Basic Library Installations

!pip install -q openpyxl
from __future__ import print_function
from base64 import b64encode
import json
from urllib import request, parse
import pandas as pd
import requests
import pandas as pd
from tqdm.notebook import tqdm
from functools import reduce
from kaggle_secrets import UserSecretsClient

Secret Token Fetch from IDT-DNA. Note this will expire in 10 Mins and if you have a very long set of sequence to perform analysis then call below function again to get new Secret token

def get_bearer_token(client_id, client_secret, idt_username, idt_password):


    authorization_string = b64encode(bytes(client_id + ":" + client_secret, "utf-8")).decode()
    request_headers = { "Content-Type" : "application/x-www-form-urlencoded",
                        "Authorization" : "Basic " + authorization_string }
                    
    data_dict = {   "grant_type" : "password", "scope" : "test","username" : idt_username,"password" : idt_password }
    request_data = parse.urlencode(data_dict).encode()

    post_request = request.Request("https://www.idtdna.com/Identityserver/connect/token", data = request_data, headers = request_headers,method = "POST")

    response = request.urlopen(post_request)

    body = response.read().decode()
    
    if (response.status != 200):
        raise RuntimeError("Request failed with error code:" + str(response.status) + "\nBody:\n" + body)
    
    body_dict = json.loads(body)
    return body_dict["access_token"]
    
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("IDT_password_here")
secret_value_1 = user_secrets.get_secret("IDT_username_here")
secret_value_2 = user_secrets.get_secret("client_id_here")
secret_value_3 = user_secrets.get_secret("client_secret_here")
    
client_id = str(secret_value_2)
client_secret = str(secret_value_3)
idt_username = str(secret_value_1)
idt_password = str(secret_value_0)


token = get_bearer_token(client_id, client_secret, idt_username, idt_password)
print("Secret Token Fetched (This will expire in 10 minutes): ",token)
Secret Token Fetched (This will expire in 10 minutes):  c7adab9d36e0f8a43664dc398cc78322

Here SARS-CoV-2 Gene Sequences are fetched from csv file which is stored on GitHub. This we get from blasting on NCBI

url="https://github.com/Owaiskhan9654/Gene-Sequence-Primer-/blob/main/NEB%20Primer%20Sequence.xlsx?raw=true"
# Sequence fetch for Analysis

response = requests.get(url)

dest = 'GENE Primer Sequence.xlsx'

with open(dest, 'wb') as file:
    file.write(response.content)

GENE_df = pd.read_excel("GENE Primer Sequence.xlsx", sheet_name=2,header=1).dropna()
GENE_df.reset_index(drop=True,inplace=True)
GENE_df.columns
Index(['Primer name', 'Sequence', 'Synthesis scale'], dtype='object')
GENE_df.to_csv('SARS-CoV-2_Primer_Sequences.csv',index=False)
GENE_df
Primer name Sequence Synthesis scale
0 Gene-E1-F3 TGAGTACGAACTTATGTACTCAT 10 nm
1 Gene-E1-B3 TTCAGATTTTTAACACGAGAGT 10 nm
2 Gene-E1-FIP ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG 10 nm
3 Gene-E1-BIP TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT 10 nm
4 Gene-E1-LF CGCTATTAACTATTAACG 10 nm
5 Gene-E1-LB GCGCTTCGATTGTGTGCGT 10 nm
6 N2-F3 ACCAGGAACTAATCAGACAAG 10 nm
7 N2-B3 GACTTGATCTTTGAAATTTGGATCT 10 nm
8 N2-FIP TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC 10 nm
9 N2-BIP CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA 10 nm
10 N2-LF GGGGGCAAATTGTGCAATTTG 10 nm
11 N2-LB CTTCGGGAACGTGGTTGACC 10 nm
12 ACTB-F3 AGTACCCCATCGAGCACG 10 nm
13 ACTB-B3 AGCCTGGATAGCAACGTACA 10 nm
14 ACTB-FIP GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA 10 nm
15 ACTB-BIP CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC 10 nm
16 ACTB-LF TGTGGTGCCAGATTTTCTCCA 10 nm
17 ACTB-LB CGAGAAGATGACCCAGATCATGT 10 nm

image.png

Sequence_dict = {}

for i in GENE_df.index:
    Sequence_dict[GENE_df['Sequence'][i]] = GENE_df['Primer name'][i]

Sequence_dict

Primer_dict = {}

for i in GENE_df.index:
    Primer_dict[GENE_df['Primer name'][i]] = GENE_df['Sequence'][i]

Primer_dict
{'Gene-E1-F3': 'TGAGTACGAACTTATGTACTCAT',
 'Gene-E1-B3': 'TTCAGATTTTTAACACGAGAGT',
 'Gene-E1-FIP': 'ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG',
 'Gene-E1-BIP': 'TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT',
 'Gene-E1-LF': 'CGCTATTAACTATTAACG',
 'Gene-E1-LB': 'GCGCTTCGATTGTGTGCGT',
 'N2-F3': 'ACCAGGAACTAATCAGACAAG',
 'N2-B3': 'GACTTGATCTTTGAAATTTGGATCT',
 'N2-FIP': 'TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC',
 'N2-BIP': 'CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA',
 'N2-LF': 'GGGGGCAAATTGTGCAATTTG',
 'N2-LB': 'CTTCGGGAACGTGGTTGACC',
 'ACTB-F3': 'AGTACCCCATCGAGCACG',
 'ACTB-B3': 'AGCCTGGATAGCAACGTACA',
 'ACTB-FIP': 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA',
 'ACTB-BIP': 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC',
 'ACTB-LF': 'TGTGGTGCCAGATTTTCTCCA',
 'ACTB-LB': 'CGAGAAGATGACCCAGATCATGT'}
%%time

headers = {
    'Content-Type': 'application/json',
    'Accept': 'application/json',
    'Authorization': 'Bearer '+token,
}

Primer_Name = []
Sequence = []
Complement = []
length = []
GCContent = []
MeltTemp = []
NmoleOD = []
OligoConc_list = []
MgConc_list = []
NaConc_list = []
dNTPsConc_list = []
NucleotideType_list=[]
count = 0
for i in tqdm(GENE_df.index):
    Primer_Name1=GENE_df.iloc[i]['Primer name']
#     print(Primer_Name1)
    oligo_Conc_dict={'F3':0.2,'B3':0.2,'Fip':1.6,'Bip':1.6,'Lf':0.4,'Bf':0.4} 
    dntp_Conc_dict={'F3':1.4,'B3':1.4,'Fip':1.4,'Bip':1.4,'Lf':1.4,'Bf':1.4} 
    mg_Conc_dict={'F3':8,'B3':8,'Fip':8,'Bip':8,'Lf':8,'Bf':8} 
    Na_Conc_dict={'F3':50,'B3':50,'Fip':50,'Bip':50,'Lf':50,'Bf':50}
    if 'F3' in Primer_Name1:
        oligo_Conc,dntp_Conc,mg_Conc,Na_Conc=oligo_Conc_dict['F3'],dntp_Conc_dict['F3'],mg_Conc_dict['F3'],Na_Conc_dict['F3']
    elif 'B3' in Primer_Name1:
        oligo_Conc,dntp_Conc,mg_Conc,Na_Conc=oligo_Conc_dict['B3'],dntp_Conc_dict['B3'],mg_Conc_dict['B3'],Na_Conc_dict['B3']
    elif 'FIP' in Primer_Name1:
        oligo_Conc,dntp_Conc,mg_Conc,Na_Conc=oligo_Conc_dict['Fip'],dntp_Conc_dict['Fip'],mg_Conc_dict['Fip'],Na_Conc_dict['Fip']
    elif 'BIP' in Primer_Name1:
        oligo_Conc,dntp_Conc,mg_Conc,Na_Conc=oligo_Conc_dict['Bip'],dntp_Conc_dict['Bip'],mg_Conc_dict['Bip'],Na_Conc_dict['Bip']
    elif 'LF' in Primer_Name1:
        oligo_Conc,dntp_Conc,mg_Conc,Na_Conc=oligo_Conc_dict['Lf'],dntp_Conc_dict['Lf'],mg_Conc_dict['Lf'],Na_Conc_dict['Lf']
    elif 'BF' in Primer_Name1:
        oligo_Conc,dntp_Conc,mg_Conc,Na_Conc=oligo_Conc_dict['Bf'],dntp_Conc_dict['Bf'],mg_Conc_dict['Bf'],Na_Conc_dict['Bf']
    
    #print(oligo_Conc)
    NucleotideType="DNA"
    data = '{  "Sequence": "' + GENE_df.iloc[i]['Sequence'] + '",  "NaConc": '+  str(Na_Conc)+\
    ',  "MgConc": '+  str(mg_Conc)+',   "dNTPsConc": '+  str(dntp_Conc)+',  "OligoConc": '+  str(oligo_Conc)+\
    ',   "NucleotideType": "DNA" }'
    response = requests.post(
        'https://www.idtdna.com/Restapi/v1/OligoAnalyzer/Analyze',
        headers=headers,
        data=data)
    json_data = json.loads(response.text)
    Primer_Name.append(Sequence_dict[json_data['Sequence'].replace(" ", '')])
    Sequence.append(json_data['Sequence'])
    Complement.append(json_data['Complement'])
    length.append(json_data['Length'])
    GCContent.append(json_data['GCContent'])
    MeltTemp.append(json_data['MeltTemp'])
    NmoleOD.append(json_data['NmoleOD'])
    OligoConc_list.append(json_data['OligoConc'])
    MgConc_list.append(mg_Conc)
    NaConc_list.append(Na_Conc)
    dNTPsConc_list.append(dntp_Conc)
    NucleotideType_list.append(NucleotideType)


df1=pd.DataFrame({"Primer Name":Primer_Name, "Sequence":Sequence,"Complement":Complement,"OligoConc":OligoConc_list,"Na+ Conc":NaConc_list,\
                  "Mg++ Conc":MgConc_list,"dNTPs Conc":dNTPsConc_list,"Nucleotide Type":NucleotideType_list,"length":length,"GCContent":GCContent,"MeltTemp":MeltTemp,"NmoleOD":NmoleOD,})
CPU times: user 419 ms, sys: 25.9 ms, total: 445 ms
Wall time: 18.2 s

In this below DataFrame you can check for Length of Formation, Its GC content, Melting Temperature, and NmoleOD

df1
Primer Name Sequence Complement OligoConc Na+ Conc Mg++ Conc dNTPs Conc Nucleotide Type length GCContent MeltTemp NmoleOD
0 Gene-E1-F3 TGA GTA CGA ACT TAT GTA CTC AT ATG AGT ACA TAA GTT CGT ACT CA 0.2 50 8 1.4 DNA 23 34.8 61.0 4.40
1 Gene-E1-B3 TTC AGA TTT TTA ACA CGA GAG T ACT CTC GTG TTA AAA ATC TGA A 0.2 50 8 1.4 DNA 22 31.8 60.4 4.57
2 Gene-E1-FIP ACC ACG AAA GCA AGA AAA AGA AGT TCG TTT CGG AA... CTG TCT CTT CCG AAA CGA ACT TCT TTT TCT TGC TT... 1.6 50 8 1.4 DNA 42 42.9 75.8 2.26
3 Gene-E1-BIP TTG CTA GTT ACA CTA GCC ATC CTT AGG TTT TAC AA... ACG TGA GTC TTG TAA AAC CTA AGG ATG GCT AGT GT... 1.6 50 8 1.4 DNA 44 40.9 75.4 2.41
4 Gene-E1-LF CGC TAT TAA CTA TTA ACG CGT TAA TAG TTA ATA GCG 0.4 50 8 1.4 DNA 18 33.3 53.4 5.69
5 Gene-E1-LB GCG CTT CGA TTG TGT GCG T ACG CAC ACA ATC GAA GCG C 0.4 50 8 1.4 DNA 19 57.9 68.6 5.86
6 N2-F3 ACC AGG AAC TAA TCA GAC AAG CTT GTC TGA TTA GTT CCT GGT 0.2 50 8 1.4 DNA 21 42.9 61.2 4.52
7 N2-B3 GAC TTG ATC TTT GAA ATT TGG ATC T AGA TCC AAA TTT CAA AGA TCA AGT C 0.2 50 8 1.4 DNA 25 32.0 62.2 4.20
8 N2-FIP TTC CGA AGA ACG CTG AAG CGG AAC TGA TTA CAA AC... GGC CAA TGT TTG TAA TCA GTT CCG CTT CAG CGT TC... 1.6 50 8 1.4 DNA 42 47.6 77.6 2.43
9 N2-BIP CGC ATT GGC ATG GAA GTC ACA ATT TGA TGG CAC CT... TAC ACA GGT GCC ATC AAA TTG TGA CTT CCA TGC CA... 1.6 50 8 1.4 DNA 40 47.5 77.5 2.59
10 N2-LF GGG GGC AAA TTG TGC AAT TTG CAA ATT GCA CAA TTT GCC CCC 0.4 50 8 1.4 DNA 21 47.6 66.1 4.85
11 N2-LB CTT CGG GAA CGT GGT TGA CC GGT CAA CCA CGT TCC CGA AG 0.4 50 8 1.4 DNA 20 60.0 67.3 5.37
12 ACTB-F3 AGT ACC CCA TCG AGC ACG CGT GCT CGA TGG GGT ACT 0.2 50 8 1.4 DNA 18 61.1 64.8 5.73
13 ACTB-B3 AGC CTG GAT AGC AAC GTA CA TGT ACG TTG CTA TCC AGG CT 0.2 50 8 1.4 DNA 20 50.0 64.9 4.93
14 ACTB-FIP GAG CCA CAC GCA GCT CAT TGT ATC ACC AAC TGG GA... TGT CGT CCC AGT TGG TGA TAC AAT GAG CTG CGT GT... 1.6 50 8 1.4 DNA 40 55.0 79.0 2.59
15 ACTB-BIP CTG AAC CCC AAG GCC AAC CGG CTG GGG TGT TGA AG... GAC CTT CAA CAC CCC AGC CGG TTG GCC TTG GGG TT... 1.6 50 8 1.4 DNA 38 63.2 80.9 2.79
16 ACTB-LF TGT GGT GCC AGA TTT TCT CCA TGG AGA AAA TCT GGC ACC ACA 0.4 50 8 1.4 DNA 21 47.6 66.9 5.19
17 ACTB-LB CGA GAA GAT GAC CCA GAT CAT GT ACA TGA TCT GGG TCA TCT TCT CG 0.4 50 8 1.4 DNA 23 47.8 66.1 4.28
!mkdir "Output Data GENE Analysis"
df1.to_csv('Output Data GENE Analysis/GENE_Analysis.csv', index=False)
%%time

Primer_Name = []
Sequence = []
Thermo = []
DeltaS = []
DeltaG = []
DeltaH = []

count = 0

for i in tqdm(list(GENE_df.Sequence)):
    data = '{  "Sequence": "' + i + '",  "NaConc": 50,  "FoldingTemp": 37,\
    "MgConc": 8, "NucleotideType": "DNA" }'
    response = requests.post(
        'https://www.idtdna.com/Restapi/v1/OligoAnalyzer/Hairpin',
        headers=headers,
        data=data)
    json_data = json.loads(response.text)
    Primer_Name.append(Sequence_dict[json_data[0]['sequence']])
    Sequence.append(json_data[0]['sequence'])
    Thermo.append(json_data[0]['thermo'])
    DeltaS.append(json_data[0]['deltaS'])
    DeltaG.append(json_data[0]['deltaG'])
    DeltaH.append(json_data[0]['deltaH'])

df2 = pd.DataFrame({
    "Primer Name": Primer_Name,
    "Sequence": Sequence,
    "Thermo": Thermo,
    "DeltaG": DeltaG,
    "DeltaS": DeltaS,
    "DeltaH": DeltaH,
})
CPU times: user 412 ms, sys: 36.9 ms, total: 449 ms
Wall time: 17.6 s

If the highest hairpin Tm is at or above your annealing temperature, that hairpin is likely to impede hybridization

df2
Primer Name Sequence Thermo DeltaG DeltaS DeltaH
0 Gene-E1-F3 TGAGTACGAACTTATGTACTCAT 45.9 -3.59 -172.08 -54.9
1 Gene-E1-B3 TTCAGATTTTTAACACGAGAGT 30.9 -0.23 -38.81 -11.8
2 Gene-E1-FIP ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG 42.6 -1.77 -100.40 -31.7
3 Gene-E1-BIP TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT 49.1 -3.46 -143.67 -46.3
4 Gene-E1-LF CGCTATTAACTATTAACG 15.7 0.67 -72.02 -20.8
5 Gene-E1-LB GCGCTTCGATTGTGTGCGT 39.4 -1.62 -112.64 -35.2
6 N2-F3 ACCAGGAACTAATCAGACAAG 40.8 -0.39 -24.52 -7.7
7 N2-B3 GACTTGATCTTTGAAATTTGGATCT 29.8 -0.63 -130.72 -39.6
8 N2-FIP TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC 41.5 -3.16 -191.67 -60.3
9 N2-BIP CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA 34.1 -1.08 -118.45 -36.4
10 N2-LF GGGGGCAAATTGTGCAATTTG 43.1 -2.88 -159.70 -50.5
11 N2-LB CTTCGGGAACGTGGTTGACC 44.2 -1.46 -75.94 -24.1
12 ACTB-F3 AGTACCCCATCGAGCACG 26.0 -0.07 -68.86 -20.6
13 ACTB-B3 AGCCTGGATAGCAACGTACA 23.6 0.10 -73.46 -21.8
14 ACTB-FIP GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA 41.4 -2.79 -170.07 -53.5
15 ACTB-BIP CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC 63.8 -6.65 -171.22 -57.7
16 ACTB-LF TGTGGTGCCAGATTTTCTCCA 17.2 0.67 -85.77 -24.9
17 ACTB-LB CGAGAAGATGACCCAGATCATGT 34.4 -0.82 -86.48 -26.6
df2.to_csv('Output Data GENE Analysis/GENE_HairPins.csv', index=False)
%%time

Primer_Name = []
Sequence_Bonds = []
Sequence_Sequences = []
Sequence_DeltaG = []
Sequence_BasePairs = []
Sequence_Dimer = []
Sequence_SequencePair = []
count = 1
for i in tqdm(list(GENE_df.Sequence)):
    #     print(str(i))
    params = {
        'primary': str(i),
    }
    temp = 1
    response = requests.post(
        'https://www.idtdna.com/Restapi/v1/OligoAnalyzer/SelfDimer',
        params=params,
        headers=headers)
    json_data_Sequence = json.loads(response.text)
    Primer_Name.append(Sequence_dict[i])
    Sequence_Sequences.append(i)
    Sequence_DeltaG.append(json_data_Sequence[0]['DeltaG'])
    Sequence_BasePairs.append(json_data_Sequence[0]['BasePairs'])
    Sequence_Dimer.append(json_data_Sequence[0]['Dimer'])
    Sequence_Bonds.append(json_data_Sequence[0]['Bonds'])
    Sequence_SequencePair.append(temp)
    temp = temp + 1
    Sequence_Sequences.append(i)
    Primer_Name.append(Sequence_dict[i])
    Sequence_DeltaG.append(json_data_Sequence[1]['DeltaG'])
    Sequence_BasePairs.append(json_data_Sequence[1]['BasePairs'])
    Sequence_Dimer.append(json_data_Sequence[1]['Dimer'])
    Sequence_Bonds.append(json_data_Sequence[1]['Bonds'])
    Sequence_SequencePair.append(temp)
    temp = 1



df3=pd.DataFrame({"Primer Name":Primer_Name,'Sequence Pair Number':Sequence_SequencePair,'Sequence':Sequence_Sequences,'DeltaG':Sequence_DeltaG,\
                  'BasePairs':Sequence_BasePairs,'Dimer':Sequence_Dimer,'Bonds':Sequence_Bonds,})
df3
CPU times: user 403 ms, sys: 28.4 ms, total: 431 ms
Wall time: 19.1 s
Primer Name Sequence Pair Number Sequence DeltaG BasePairs Dimer Bonds
0 Gene-E1-F3 1 TGAGTACGAACTTATGTACTCAT -8.77 7 None [2, 2, 2, 2, 2, 2, 2, 0, 0, 1, 0, 0, 1, 0, 0, ...
1 Gene-E1-F3 2 TGAGTACGAACTTATGTACTCAT -3.65 4 None [1, 0, 0, 2, 2, 2, 2, 0, 0, 1, 0, 0, 0, 0, 0, ...
2 Gene-E1-B3 1 TTCAGATTTTTAACACGAGAGT -4.85 4 None [0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 2, 2, 2, 0, 1, ...
3 Gene-E1-B3 2 TTCAGATTTTTAACACGAGAGT -3.61 2 None [0, 0, 0, 1, 0, 2, 2, 0, 1, 0, 0, 0, 0, 0, 0, ...
4 Gene-E1-FIP 1 ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG -9.08 5 None [0, 0, 0, 0, 2, 2, 2, 2, 2, 0, 0, 1, 1, 0, 0, ...
5 Gene-E1-FIP 2 ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG -8.47 5 None [0, 0, 0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, ...
6 Gene-E1-BIP 1 TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT -8.64 6 None [1, 0, 2, 2, 2, 2, 2, 2, 0, 0, 0, 1, 1, 1, 1, ...
7 Gene-E1-BIP 2 TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT -6.30 4 None [2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
8 Gene-E1-LF 1 CGCTATTAACTATTAACG -4.85 4 None [0, 0, 0, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, ...
9 Gene-E1-LF 2 CGCTATTAACTATTAACG -4.85 4 None [0, 0, 2, 2, 2, 2, 0, 0, 0, 1, 1, 1, 1, 0, 0, ...
10 Gene-E1-LB 1 GCGCTTCGATTGTGTGCGT -9.89 4 None [2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
11 Gene-E1-LB 2 GCGCTTCGATTGTGTGCGT -6.76 4 None [0, 0, 0, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, ...
12 N2-F3 1 ACCAGGAACTAATCAGACAAG -3.07 2 None [0, 2, 2, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
13 N2-F3 2 ACCAGGAACTAATCAGACAAG -1.60 2 None [1, 0, 0, 2, 2, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, ...
14 N2-B3 1 GACTTGATCTTTGAAATTTGGATCT -9.25 6 None [0, 0, 0, 1, 0, 0, 2, 2, 2, 2, 2, 2, 0, 0, 1, ...
15 N2-B3 2 GACTTGATCTTTGAAATTTGGATCT -4.62 4 None [0, 0, 0, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, ...
16 N2-FIP 1 TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC -10.20 5 None [2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, ...
17 N2-FIP 2 TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC -9.28 4 None [2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
18 N2-BIP 1 CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA -5.38 4 None [0, 0, 0, 1, 0, 1, 0, 0, 2, 2, 2, 2, 0, 0, 1, ...
19 N2-BIP 2 CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA -5.37 4 None [1, 0, 0, 2, 2, 2, 2, 0, 0, 1, 0, 0, 0, 0, 0, ...
20 N2-LF 1 GGGGGCAAATTGTGCAATTTG -11.22 7 None [2, 2, 2, 2, 2, 2, 2, 0, 0, 1, 1, 1, 1, 1, 1, ...
21 N2-LF 2 GGGGGCAAATTGTGCAATTTG -7.05 4 None [0, 1, 0, 0, 0, 2, 2, 2, 2, 0, 0, 0, 1, 0, 0, ...
22 N2-LB 1 CTTCGGGAACGTGGTTGACC -6.30 4 None [0, 0, 1, 1, 0, 0, 0, 0, 2, 2, 2, 2, 0, 0, 0, ...
23 N2-LB 2 CTTCGGGAACGTGGTTGACC -4.41 3 None [2, 2, 2, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, ...
24 ACTB-F3 1 AGTACCCCATCGAGCACG -6.76 4 None [1, 0, 0, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, ...
25 ACTB-F3 2 AGTACCCCATCGAGCACG -3.65 4 None [0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
26 ACTB-B3 1 AGCCTGGATAGCAACGTACA -6.30 4 None [0, 0, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, ...
27 ACTB-B3 2 AGCCTGGATAGCAACGTACA -3.65 4 None [0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
28 ACTB-FIP 1 GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA -6.34 4 None [1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 2, 2, 2, 2, ...
29 ACTB-FIP 2 GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA -6.31 4 None [2, 2, 2, 2, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, ...
30 ACTB-BIP 1 CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC -12.50 6 None [1, 0, 0, 0, 2, 2, 2, 2, 2, 2, 0, 0, 0, 0, 0, ...
31 ACTB-BIP 2 CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC -9.75 4 None [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, ...
32 ACTB-LF 1 TGTGGTGCCAGATTTTCTCCA -5.02 3 None [1, 0, 2, 2, 2, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, ...
33 ACTB-LF 2 TGTGGTGCCAGATTTTCTCCA -5.02 3 None [2, 2, 2, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, ...
34 ACTB-LB 1 CGAGAAGATGACCCAGATCATGT -5.38 4 None [0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
35 ACTB-LB 2 CGAGAAGATGACCCAGATCATGT -5.00 4 None [1, 0, 2, 2, 2, 2, 0, 1, 0, 0, 1, 0, 1, 1, 1, ...
df3.to_csv('Output Data GENE Analysis/GENE_SelfDimers.csv', index=False)
GENE_df
Primer name Sequence Synthesis scale
0 Gene-E1-F3 TGAGTACGAACTTATGTACTCAT 10 nm
1 Gene-E1-B3 TTCAGATTTTTAACACGAGAGT 10 nm
2 Gene-E1-FIP ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG 10 nm
3 Gene-E1-BIP TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT 10 nm
4 Gene-E1-LF CGCTATTAACTATTAACG 10 nm
5 Gene-E1-LB GCGCTTCGATTGTGTGCGT 10 nm
6 N2-F3 ACCAGGAACTAATCAGACAAG 10 nm
7 N2-B3 GACTTGATCTTTGAAATTTGGATCT 10 nm
8 N2-FIP TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC 10 nm
9 N2-BIP CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA 10 nm
10 N2-LF GGGGGCAAATTGTGCAATTTG 10 nm
11 N2-LB CTTCGGGAACGTGGTTGACC 10 nm
12 ACTB-F3 AGTACCCCATCGAGCACG 10 nm
13 ACTB-B3 AGCCTGGATAGCAACGTACA 10 nm
14 ACTB-FIP GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA 10 nm
15 ACTB-BIP CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC 10 nm
16 ACTB-LF TGTGGTGCCAGATTTTCTCCA 10 nm
17 ACTB-LB CGAGAAGATGACCCAGATCATGT 10 nm
def SequencePairs(arr, n):
    a=[]
    for i in range(n):
        for j in range(n):
            a.append((arr[i],arr[j]))    
    return a
 
list_GENE_SEQUENCE=GENE_df.Sequence
n = len(list_GENE_SEQUENCE)
 
SequencePairs_list = SequencePairs(list_GENE_SEQUENCE, n)


for i in SequencePairs_list:
    if i[0]==i[1]:
        SequencePairs_list.remove(i)
        
        
for i in SequencePairs_list:
    if (i[0],i[1]) in SequencePairs_list and (i[1],i[0]) in SequencePairs_list:
        SequencePairs_list.remove((i[1],i[0]))
print('All the possible Primer Dimer possible Sets are \n')        
print(SequencePairs_list)
All the possible Primer Dimer possible Sets are 

[('TGAGTACGAACTTATGTACTCAT', 'TTCAGATTTTTAACACGAGAGT'), ('TGAGTACGAACTTATGTACTCAT', 'ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG'), ('TGAGTACGAACTTATGTACTCAT', 'TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT'), ('TGAGTACGAACTTATGTACTCAT', 'CGCTATTAACTATTAACG'), ('TGAGTACGAACTTATGTACTCAT', 'GCGCTTCGATTGTGTGCGT'), ('TGAGTACGAACTTATGTACTCAT', 'ACCAGGAACTAATCAGACAAG'), ('TGAGTACGAACTTATGTACTCAT', 'GACTTGATCTTTGAAATTTGGATCT'), ('TGAGTACGAACTTATGTACTCAT', 'TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC'), ('TGAGTACGAACTTATGTACTCAT', 'CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA'), ('TGAGTACGAACTTATGTACTCAT', 'GGGGGCAAATTGTGCAATTTG'), ('TGAGTACGAACTTATGTACTCAT', 'CTTCGGGAACGTGGTTGACC'), ('TGAGTACGAACTTATGTACTCAT', 'AGTACCCCATCGAGCACG'), ('TGAGTACGAACTTATGTACTCAT', 'AGCCTGGATAGCAACGTACA'), ('TGAGTACGAACTTATGTACTCAT', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('TGAGTACGAACTTATGTACTCAT', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('TGAGTACGAACTTATGTACTCAT', 'TGTGGTGCCAGATTTTCTCCA'), ('TGAGTACGAACTTATGTACTCAT', 'CGAGAAGATGACCCAGATCATGT'), ('TTCAGATTTTTAACACGAGAGT', 'ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG'), ('TTCAGATTTTTAACACGAGAGT', 'TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT'), ('TTCAGATTTTTAACACGAGAGT', 'CGCTATTAACTATTAACG'), ('TTCAGATTTTTAACACGAGAGT', 'GCGCTTCGATTGTGTGCGT'), ('TTCAGATTTTTAACACGAGAGT', 'ACCAGGAACTAATCAGACAAG'), ('TTCAGATTTTTAACACGAGAGT', 'GACTTGATCTTTGAAATTTGGATCT'), ('TTCAGATTTTTAACACGAGAGT', 'TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC'), ('TTCAGATTTTTAACACGAGAGT', 'CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA'), ('TTCAGATTTTTAACACGAGAGT', 'GGGGGCAAATTGTGCAATTTG'), ('TTCAGATTTTTAACACGAGAGT', 'CTTCGGGAACGTGGTTGACC'), ('TTCAGATTTTTAACACGAGAGT', 'AGTACCCCATCGAGCACG'), ('TTCAGATTTTTAACACGAGAGT', 'AGCCTGGATAGCAACGTACA'), ('TTCAGATTTTTAACACGAGAGT', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('TTCAGATTTTTAACACGAGAGT', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('TTCAGATTTTTAACACGAGAGT', 'TGTGGTGCCAGATTTTCTCCA'), ('TTCAGATTTTTAACACGAGAGT', 'CGAGAAGATGACCCAGATCATGT'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'CGCTATTAACTATTAACG'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'GCGCTTCGATTGTGTGCGT'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'ACCAGGAACTAATCAGACAAG'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'GACTTGATCTTTGAAATTTGGATCT'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'GGGGGCAAATTGTGCAATTTG'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'CTTCGGGAACGTGGTTGACC'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'AGTACCCCATCGAGCACG'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'AGCCTGGATAGCAACGTACA'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'TGTGGTGCCAGATTTTCTCCA'), ('ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG', 'CGAGAAGATGACCCAGATCATGT'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'CGCTATTAACTATTAACG'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'GCGCTTCGATTGTGTGCGT'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'ACCAGGAACTAATCAGACAAG'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'GACTTGATCTTTGAAATTTGGATCT'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'GGGGGCAAATTGTGCAATTTG'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'CTTCGGGAACGTGGTTGACC'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'AGTACCCCATCGAGCACG'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'AGCCTGGATAGCAACGTACA'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'TGTGGTGCCAGATTTTCTCCA'), ('TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT', 'CGAGAAGATGACCCAGATCATGT'), ('CGCTATTAACTATTAACG', 'GCGCTTCGATTGTGTGCGT'), ('CGCTATTAACTATTAACG', 'ACCAGGAACTAATCAGACAAG'), ('CGCTATTAACTATTAACG', 'GACTTGATCTTTGAAATTTGGATCT'), ('CGCTATTAACTATTAACG', 'TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC'), ('CGCTATTAACTATTAACG', 'CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA'), ('CGCTATTAACTATTAACG', 'GGGGGCAAATTGTGCAATTTG'), ('CGCTATTAACTATTAACG', 'CTTCGGGAACGTGGTTGACC'), ('CGCTATTAACTATTAACG', 'AGTACCCCATCGAGCACG'), ('CGCTATTAACTATTAACG', 'AGCCTGGATAGCAACGTACA'), ('CGCTATTAACTATTAACG', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('CGCTATTAACTATTAACG', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('CGCTATTAACTATTAACG', 'TGTGGTGCCAGATTTTCTCCA'), ('CGCTATTAACTATTAACG', 'CGAGAAGATGACCCAGATCATGT'), ('GCGCTTCGATTGTGTGCGT', 'ACCAGGAACTAATCAGACAAG'), ('GCGCTTCGATTGTGTGCGT', 'GACTTGATCTTTGAAATTTGGATCT'), ('GCGCTTCGATTGTGTGCGT', 'TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC'), ('GCGCTTCGATTGTGTGCGT', 'CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA'), ('GCGCTTCGATTGTGTGCGT', 'GGGGGCAAATTGTGCAATTTG'), ('GCGCTTCGATTGTGTGCGT', 'CTTCGGGAACGTGGTTGACC'), ('GCGCTTCGATTGTGTGCGT', 'AGTACCCCATCGAGCACG'), ('GCGCTTCGATTGTGTGCGT', 'AGCCTGGATAGCAACGTACA'), ('GCGCTTCGATTGTGTGCGT', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('GCGCTTCGATTGTGTGCGT', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('GCGCTTCGATTGTGTGCGT', 'TGTGGTGCCAGATTTTCTCCA'), ('GCGCTTCGATTGTGTGCGT', 'CGAGAAGATGACCCAGATCATGT'), ('ACCAGGAACTAATCAGACAAG', 'GACTTGATCTTTGAAATTTGGATCT'), ('ACCAGGAACTAATCAGACAAG', 'TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC'), ('ACCAGGAACTAATCAGACAAG', 'CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA'), ('ACCAGGAACTAATCAGACAAG', 'GGGGGCAAATTGTGCAATTTG'), ('ACCAGGAACTAATCAGACAAG', 'CTTCGGGAACGTGGTTGACC'), ('ACCAGGAACTAATCAGACAAG', 'AGTACCCCATCGAGCACG'), ('ACCAGGAACTAATCAGACAAG', 'AGCCTGGATAGCAACGTACA'), ('ACCAGGAACTAATCAGACAAG', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('ACCAGGAACTAATCAGACAAG', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('ACCAGGAACTAATCAGACAAG', 'TGTGGTGCCAGATTTTCTCCA'), ('ACCAGGAACTAATCAGACAAG', 'CGAGAAGATGACCCAGATCATGT'), ('GACTTGATCTTTGAAATTTGGATCT', 'TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC'), ('GACTTGATCTTTGAAATTTGGATCT', 'CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA'), ('GACTTGATCTTTGAAATTTGGATCT', 'GGGGGCAAATTGTGCAATTTG'), ('GACTTGATCTTTGAAATTTGGATCT', 'CTTCGGGAACGTGGTTGACC'), ('GACTTGATCTTTGAAATTTGGATCT', 'AGTACCCCATCGAGCACG'), ('GACTTGATCTTTGAAATTTGGATCT', 'AGCCTGGATAGCAACGTACA'), ('GACTTGATCTTTGAAATTTGGATCT', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('GACTTGATCTTTGAAATTTGGATCT', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('GACTTGATCTTTGAAATTTGGATCT', 'TGTGGTGCCAGATTTTCTCCA'), ('GACTTGATCTTTGAAATTTGGATCT', 'CGAGAAGATGACCCAGATCATGT'), ('TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC', 'CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA'), ('TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC', 'GGGGGCAAATTGTGCAATTTG'), ('TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC', 'CTTCGGGAACGTGGTTGACC'), ('TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC', 'AGTACCCCATCGAGCACG'), ('TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC', 'AGCCTGGATAGCAACGTACA'), ('TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC', 'TGTGGTGCCAGATTTTCTCCA'), ('TTCCGAAGAACGCTGAAGCGGAACTGATTACAAACATTGGCC', 'CGAGAAGATGACCCAGATCATGT'), ('CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA', 'GGGGGCAAATTGTGCAATTTG'), ('CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA', 'CTTCGGGAACGTGGTTGACC'), ('CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA', 'AGTACCCCATCGAGCACG'), ('CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA', 'AGCCTGGATAGCAACGTACA'), ('CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA', 'TGTGGTGCCAGATTTTCTCCA'), ('CGCATTGGCATGGAAGTCACAATTTGATGGCACCTGTGTA', 'CGAGAAGATGACCCAGATCATGT'), ('GGGGGCAAATTGTGCAATTTG', 'CTTCGGGAACGTGGTTGACC'), ('GGGGGCAAATTGTGCAATTTG', 'AGTACCCCATCGAGCACG'), ('GGGGGCAAATTGTGCAATTTG', 'AGCCTGGATAGCAACGTACA'), ('GGGGGCAAATTGTGCAATTTG', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('GGGGGCAAATTGTGCAATTTG', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('GGGGGCAAATTGTGCAATTTG', 'TGTGGTGCCAGATTTTCTCCA'), ('GGGGGCAAATTGTGCAATTTG', 'CGAGAAGATGACCCAGATCATGT'), ('CTTCGGGAACGTGGTTGACC', 'AGTACCCCATCGAGCACG'), ('CTTCGGGAACGTGGTTGACC', 'AGCCTGGATAGCAACGTACA'), ('CTTCGGGAACGTGGTTGACC', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('CTTCGGGAACGTGGTTGACC', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('CTTCGGGAACGTGGTTGACC', 'TGTGGTGCCAGATTTTCTCCA'), ('CTTCGGGAACGTGGTTGACC', 'CGAGAAGATGACCCAGATCATGT'), ('AGTACCCCATCGAGCACG', 'AGCCTGGATAGCAACGTACA'), ('AGTACCCCATCGAGCACG', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('AGTACCCCATCGAGCACG', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('AGTACCCCATCGAGCACG', 'TGTGGTGCCAGATTTTCTCCA'), ('AGTACCCCATCGAGCACG', 'CGAGAAGATGACCCAGATCATGT'), ('AGCCTGGATAGCAACGTACA', 'GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA'), ('AGCCTGGATAGCAACGTACA', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('AGCCTGGATAGCAACGTACA', 'TGTGGTGCCAGATTTTCTCCA'), ('AGCCTGGATAGCAACGTACA', 'CGAGAAGATGACCCAGATCATGT'), ('GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA', 'CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC'), ('GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA', 'TGTGGTGCCAGATTTTCTCCA'), ('GAGCCACACGCAGCTCATTGTATCACCAACTGGGACGACA', 'CGAGAAGATGACCCAGATCATGT'), ('CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC', 'TGTGGTGCCAGATTTTCTCCA'), ('CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC', 'CGAGAAGATGACCCAGATCATGT'), ('TGTGGTGCCAGATTTTCTCCA', 'CGAGAAGATGACCCAGATCATGT')]
%%time

Sequence_Name1 = []
Sequence_Name2 = []
Sequence_Bonds = []
Sequence_Sequences1 = []
Sequence_Sequences2 = []
Sequence_DeltaG = []
Sequence_BasePairs = []
Sequence_Dimer = []
Sequence_SequencePair = []
count = 1
for i in tqdm(SequencePairs_list):
    temp=0
    params = {
        'primary': i[0],
        'secondary': i[1],}
    response = requests.post('https://www.idtdna.com/Restapi/v1/OligoAnalyzer/HeteroDimer', params=params, headers=headers)

    json_data_Sequence = json.loads(response.text)
    
    Sequence_Name1.append(Sequence_dict[i[0]])
    Sequence_Name2.append(Sequence_dict[i[1]])
    Sequence_Sequences1.append(i[0])
    Sequence_Sequences2.append(i[1])
    Sequence_DeltaG.append(json_data_Sequence[0]['DeltaG'])
    Sequence_BasePairs.append(json_data_Sequence[0]['BasePairs'])
    Sequence_Dimer.append(json_data_Sequence[0]['Dimer'])
    Sequence_Bonds.append(json_data_Sequence[0]['Bonds'])
    Sequence_SequencePair.append(temp)
    temp = temp + 1
    Sequence_Name1.append(Sequence_dict[i[0]])
    Sequence_Name2.append(Sequence_dict[i[1]])
    Sequence_Sequences1.append(i[0])
    Sequence_Sequences2.append(i[1])
    Sequence_DeltaG.append(json_data_Sequence[1]['DeltaG'])
    Sequence_BasePairs.append(json_data_Sequence[1]['BasePairs'])
    Sequence_Dimer.append(json_data_Sequence[1]['Dimer'])
    Sequence_Bonds.append(json_data_Sequence[1]['Bonds'])
    Sequence_SequencePair.append(temp)
    temp = 1
CPU times: user 3.21 s, sys: 225 ms, total: 3.44 s
Wall time: 2min 42s
  • The Delta G value of any heterodimers should be weaker (more positive) than -9.0 kcal/mole.
  • Positive numbers indicate that the actual secondary structure shown will not form at all.
df4=pd.DataFrame({'Primary Sequence name':Sequence_Name1,'Secondary Sequence name':Sequence_Name2,\
                  'Sequence Pair Number':Sequence_SequencePair,'Primary Sequence':Sequence_Sequences1,\
                  'Secondary Sequence':Sequence_Sequences2,'DeltaG':Sequence_DeltaG,\
                  'BasePairs':Sequence_BasePairs,'Dimer':Sequence_Dimer,'Bonds':Sequence_Bonds,})
df4
Primary Sequence name Secondary Sequence name Sequence Pair Number Primary Sequence Secondary Sequence DeltaG BasePairs Dimer Bonds
0 Gene-E1-F3 Gene-E1-B3 0 TGAGTACGAACTTATGTACTCAT TTCAGATTTTTAACACGAGAGT -4.52 4 None [2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
1 Gene-E1-F3 Gene-E1-B3 1 TGAGTACGAACTTATGTACTCAT TTCAGATTTTTAACACGAGAGT -3.61 2 None [0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 1, 0, 1, 0, 0, ...
2 Gene-E1-F3 Gene-E1-FIP 0 TGAGTACGAACTTATGTACTCAT ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG -13.36 8 None [0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 0, 1, ...
3 Gene-E1-F3 Gene-E1-FIP 1 TGAGTACGAACTTATGTACTCAT ACCACGAAAGCAAGAAAAAGAAGTTCGTTTCGGAAGAGACAG -7.13 4 None [0, 0, 0, 0, 1, 0, 2, 2, 2, 2, 0, 0, 0, 1, 0, ...
4 Gene-E1-F3 Gene-E1-BIP 0 TGAGTACGAACTTATGTACTCAT TTGCTAGTTACACTAGCCATCCTTAGGTTTTACAAGACTCACGT -6.47 5 None [2, 2, 2, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, ...
... ... ... ... ... ... ... ... ... ...
301 ACTB-BIP ACTB-LF 1 CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC TGTGGTGCCAGATTTTCTCCA -6.21 3 None [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2, 2, 2, 0, ...
302 ACTB-BIP ACTB-LB 0 CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC CGAGAAGATGACCCAGATCATGT -9.69 5 None [0, 0, 1, 0, 0, 0, 0, 2, 2, 2, 2, 2, 0, 0, 0, ...
303 ACTB-BIP ACTB-LB 1 CTGAACCCCAAGGCCAACCGGCTGGGGTGTTGAAGGTC CGAGAAGATGACCCAGATCATGT -7.48 4 None [1, 0, 0, 0, 1, 0, 0, 0, 0, 2, 2, 2, 2, 0, 0, ...
304 ACTB-LF ACTB-LB 0 TGTGGTGCCAGATTTTCTCCA CGAGAAGATGACCCAGATCATGT -6.69 5 None [1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 2, ...
305 ACTB-LF ACTB-LB 1 TGTGGTGCCAGATTTTCTCCA CGAGAAGATGACCCAGATCATGT -5.02 3 None [1, 0, 2, 2, 2, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, ...

306 rows × 9 columns

df4.to_csv('Output Data GENE Analysis/GENE_Hetro_Dimers.csv', index=False)

References

This Notebook is Created by Owais Ahmad for SARS-CoV-2 Gene Sequence Analysis.

This is for Medical Research Purpose only

Feel free to comment if you have any queries:)