User Preference

aPhyloGeo Configuration

The aPhyloGeo software can be encapsulated in other applications and applied to other data by providing a YAML file. This file will include a set of parameters for easy handling.

Configuration File Example

file_name: './datasets/example/geo.csv'
specimen: 'id'
names: ['id', 'ALLSKY_SFC_SW_DWN', 'T2M', 'PRECTOTCORR', 'QV2M', 'WS10M']
bootstrap_threshold: 0
dist_threshold: 60
window_size: 200
step_size: 100
bootstrap_amount: 100
data_names: ['ALLSKY_SFC_SW_DWN_newick', 'T2M_newick', 'QV2M_newick', 'PRECTOTCORR_newick', 'WS10M_newick']
reference_gene_dir: './datasets/example'
reference_gene_file: 'sequences.fasta'
makeDebugFiles: True
alignment_method: '1' # 1:pairwiseAligner, 2:MUSCLE, 3:CLUSTALW, 4:MAFFT
distance_method: '1' # 1: Least-Square distance, 2: Robinson-Foulds distance, 3: Euclidean distance (DendroPY)
fit_method: '1' # 1:Wider Fit by elongating with Gap (starAlignment), 2:Narrow-fit prevent elongation with gap when possible
tree_type: '1' # 1: BioPython consensus tree, 2: FastTree application
rate_similarity: 90
method_similarity: '1' # 1: Hamming distance, 2: Levenshtein distance, 3: Damerau-Levenshtein distance, 4: Jaro similarity, 5: Jaro-Winkler similarity, 6: Smith–Waterman similarity, 7: Jaccard similarity, 8: Sørensen-Dice similarity

There are 11 main options accessible to the user in the YAML configuration file:

Bootstrap Threshold: Number of replicates threshold to be generated for each sub-MSA (each position of the sliding window)
Distance Threshold: Distance threshold between genetic tree and climatic tree for each sub-MSA (each position of the sliding window)
Window Length: Size of the sliding window
Step: Sliding window advancement step
Distance Choice: Distance selection
- '0' for all distances (options '1', '2', and '3')
- '1' for Least Square (LS) distance (version 1.0)
- '2' for Robinson and Foulds (RF) distance (+ normalization $2n-6$ with $n$ is the number of leaves on each tree)
- '3' for Euclidean distance
Distance Threshold: LS distance threshold at which the results are most significant
Alignment Method: Algorithm selection for sequence alignment
- '1' for pairwiseAligner
- '2' for MUSCLE
- '3' for CLUSTALW
- '4' for MAFFT
Fit Method: Gap selection elongation
- '1' for Wider Fit by elongating with Gap (starAlignment)
- '2' for Narrow-fit prevent elongation with gap when possible
Tree Inference Method: The choice of inference methods
- '1' for BioPython consensus tree
- '2' for FastTree application
Rate Similarity: The rate similarity between sequences to reduce and remove the sub-MSA with a high value of similarity
Method Similarity: The choice of similarity methods
- '1' for Hamming distance
- '2' for Levenshtein distance
- '3' for Damerau-Levenshtein distance
- '4' for Jaro similarity
- '5' for Jaro-Winkler similarity
- '6' for Smith–Waterman similarity
- '7' for Jaccard similarity
- '8' for Sørensen-Dice similarity

Please email us at: [email protected] for any questions or feedback.

Wiki

Available analyses

Misc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User Preference

aPhyloGeo Configuration

Configuration File Example

Clone this wiki locally