Skip to content

User Preference

Nadia Tahiri, PhD edited this page Nov 28, 2023 · 8 revisions

Editing User Preferences in aPhyloGeo

The aPhyloGeo software allows users to customize their preferences through a YAML configuration file. This file includes a set of parameters for easy handling. Below is an example configuration file with explanations for each parameter:

Configuration File Example

file_name: './datasets/example/geo.csv'
specimen: 'id'
names: ['id', 'ALLSKY_SFC_SW_DWN', 'T2M', 'PRECTOTCORR', 'QV2M', 'WS10M']
bootstrap_threshold: 0
dist_threshold: 60
window_size: 200
step_size: 100
bootstrap_amount: 100
data_names: ['ALLSKY_SFC_SW_DWN_newick', 'T2M_newick', 'QV2M_newick', 'PRECTOTCORR_newick', 'WS10M_newick']
reference_gene_dir: './datasets/example'
reference_gene_file: 'sequences.fasta'
makeDebugFiles: True
alignment_method: '1' # Options: 1: pairwiseAligner, 2: MUSCLE, 3: CLUSTALW, 4: MAFFT
distance_method: '1' # Options: 1: Least-Square distance, 2: Robinson-Foulds distance, 3: Euclidean distance (DendroPY)
fit_method: '1' # Options: 1: Wider Fit by elongating with Gap (starAlignment), 2: Narrow-fit prevent elongation with gap when possible
tree_type: '1' # Options: 1: BioPython consensus tree, 2: FastTree application
rate_similarity: 90
method_similarity: '1' # Options: 1: Hamming distance, 2: Levenshtein distance, 3: Damerau-Levenshtein distance, 4: Jaro similarity, 5: Jaro-Winkler similarity, 6: Smith–Waterman similarity, 7: Jaccard similarity, 8: Sørensen-Dice similarity

User Preferences Options:

  1. File Name: Path to the input data file (./datasets/example/geo.csv in the example).
  2. Specimen: Identifier for the specimens in the dataset (id in the example).
  3. Names: List of column names in the dataset (['id', 'ALLSKY_SFC_SW_DWN', 'T2M', 'PRECTOTCORR', 'QV2M', 'WS10M'] in the example).
  4. Bootstrap Threshold: Number of replicates threshold to be generated for each sub-MSA.
  5. Distance Threshold: Distance threshold between genetic tree and climatic tree for each sub-MSA.
  6. Window Size: Size of the sliding window.
  7. Step Size: Sliding window advancement step.
  8. Bootstrap Amount: Number of bootstraps to be generated.
  9. Data Names: List of newick file names for each dataset.
  10. Reference Gene Directory: Directory containing reference gene data (./datasets/example in the example).
  11. Reference Gene File: File containing reference gene sequences (sequences.fasta in the example).
  12. Make Debug Files: Option to generate debug files (True or False).
  13. Alignment Method: Algorithm selection for sequence alignment (Options: 1: pairwiseAligner, 2: MUSCLE, 3: CLUSTALW, 4: MAFFT in the example).
  14. Distance Method: Distance selection (Options: 1: Least-Square distance, 2: Robinson-Foulds distance, 3: Euclidean distance (DendroPY) in the example).
  15. Fit Method: Gap selection elongation (Options: 1: Wider Fit by elongating with Gap (starAlignment), 2: Narrow-fit prevent elongation with gap when possible in the example).
  16. Tree Inference Method: The choice of inference methods (Options: 1: BioPython consensus tree, 2: FastTree application in the example).
  17. Rate Similarity: The rate similarity between sequences to reduce and remove the sub-MSA with a high value of similarity.
  18. Method Similarity: The choice of similarity methods (Options: 1: Hamming distance, 2: Levenshtein distance, 3: Damerau-Levenshtein distance, 4: Jaro similarity, 5: Jaro-Winkler similarity, 6: Smith–Waterman similarity, 7: Jaccard similarity, 8: Sørensen-Dice similarity in the example).