Dependencies
-The function requires the following R packages: -
-RSQLite
: To connect to the SQLite database. -
+
The function requires the following R packages:
+
+-
+
RSQLite
: To connect to the SQLite database.
+-
haven
: To read .xpt
file, if
-use_xpt_file = TRUE
.
+use_xpt_file = TRUE
.
+
This implementation ensures flexibility in handling different input
types and configurations while maintaining a consistent structure for
the output.
+
diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml
index b338bc2..ba16db4 100644
--- a/docs/pkgdown.yml
+++ b/docs/pkgdown.yml
@@ -25,7 +25,7 @@ articles:
LB_score_calculation_documentation: LB_score_calculation_documentation.html
MI_score_calculation_documentation: MI_score_calculation_documentation.html
vignettes: vignettes.html
-last_built: 2025-01-02T01:47Z
+last_built: 2025-01-02T02:57Z
urls:
reference: https://aminuldu07.github.io/SENDQSAR/reference
article: https://aminuldu07.github.io/SENDQSAR/articles
diff --git a/docs/search.json b/docs/search.json
index fe58bee..128b6ab 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -1 +1 @@
-[{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Compile_Data_calculation_documentation.html","id":"overview","dir":"Articles","previous_headings":"Function: get_compile_data","what":"Overview","title":"get_compile_data","text":"get_compile_data versatile function designed filter toxicokinetic (TK) recovery animals study database. function takes various inputs specify study ID database path, includes options handling SQLite .xpt file formats, particularly dealing data generated SENDsanitizer package. primary aim clean compile study data, ensuring results focused target set animals.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Compile_Data_calculation_documentation.html","id":"function-parameters","dir":"Articles","previous_headings":"Function: get_compile_data","what":"Function Parameters","title":"get_compile_data","text":"studyid (Mandatory, Character): study ID number, uniquely identifies study within database. path_db (Mandatory, Character): path database file. path SQLite database directory containing .xpt files. fake_study (Optional, Boolean): Indicates study data generated using SENDsanitizer package. Defaults FALSE. use_xpt_file (Optional, Boolean): Specifies whether use .xpt file format dealing data generated SENDsanitizer package. Defaults FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Compile_Data_calculation_documentation.html","id":"return-value","dir":"Articles","previous_headings":"Function: get_compile_data","what":"Return Value","title":"get_compile_data","text":"function returns cleaned compiled data frame filtered study data, ready analysis.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Compile_Data_calculation_documentation.html","id":"key-features","dir":"Articles","previous_headings":"Function: get_compile_data","what":"Key Features","title":"get_compile_data","text":"Data Fetching: Establishes connection database (reads .xpt files) retrieves necessary domain data dm (Demographics) ts (Trial Summary). Data Processing: Performs data transformations, including renaming columns, updating values, selecting relevant columns. Converts “Control” groups “vehicle” retains animals “vehicle” “HD” (high-dose) groups. Removes recovery animals using ds (Disposition) domain. Excludes toxicokinetic (TK) animals rat studies analyzing pp (Pharmacokinetics) pooldef domains. Species Detection Handling: Extracts species information ts domain customize filtering logic different species.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Compile_Data_calculation_documentation.html","id":"example-usage","dir":"Articles","previous_headings":"Function: get_compile_data","what":"Example Usage","title":"get_compile_data","text":"#```{r} # Example call get_compile_data # Note: Replace ‘path//database.db’ actual path database #get_compile_data(studyid = ‘1234123’, path_db = ‘path//database.db’)","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_auc_curve_with_rf_model.html","id":"purpose","dir":"Articles","previous_headings":"Function: get_auc_curve_with_rf_model","what":"Purpose","title":"Documentation for get_auc_curve_with_rf_model","text":"function get_auc_curve_with_rf_model designed train Random Forest model using provided dataset, optionally SQLite database. computes visualizes ROC curve along AUC (Area Curve) metric. function offers various options handling data preprocessing, including hyperparameter tuning, imputation, undersampling, outputs model performance via ROC curve.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_auc_curve_with_rf_model.html","id":"input-parameters","dir":"Articles","previous_headings":"Function: get_auc_curve_with_rf_model","what":"Input Parameters","title":"Documentation for get_auc_curve_with_rf_model","text":"function accepts following parameters:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_auc_curve_with_rf_model.html","id":"output","dir":"Articles","previous_headings":"Function: get_auc_curve_with_rf_model","what":"Output","title":"Documentation for get_auc_curve_with_rf_model","text":"function return explicit values. However, generates following outputs: AUC Value: AUC ROC curve printed console. ROC Curve Plot: ROC curve displayed, showing model’s performance computed AUC value. Performance Metrics: performance metrics (e.g., True Positive Rate, False Positive Rate) computed returned directly.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_auc_curve_with_rf_model.html","id":"key-steps","dir":"Articles","previous_headings":"Function: get_auc_curve_with_rf_model","what":"Key Steps","title":"Documentation for get_auc_curve_with_rf_model","text":"Data provided, function fetches data either SQLite database generates synthetic data (fake_study TRUE). use_xpt_file TRUE, fetches data specified XPT files. function performs data preprocessing, including imputation (Impute TRUE), rounding (Round TRUE), undersampling (Undersample TRUE). harmonizes liver scores prepares data machine learning. function prepares data Random Forest (RF) modeling, tuning hyperparameters hyperparameter_tuning enabled. Random Forest model trained using prepared data, predictions generated. model’s performance evaluated computing AUC (Area Curve) plotting ROC curve. AUC printed console, ROC curve displayed calculated AUC value. specified, function applies error correction method (error_correction_method) performs hyperparameter tuning optimize model.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation for `get_bw_score` Function","text":"get_bw_score function designed normalize body-weight (BW) subject (animal) termed ‘USUBJID’ SEND database, using Z-scoring method. Z-scoring basic method Z-scored continuous data like body weight, clinical pathology, lab test results transforming value many standard deviations control mean.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"z-score-calculation","dir":"Articles","previous_headings":"","what":"Z-Score Calculation","title":"Documentation for `get_bw_score` Function","text":"Z-score normalization performed shown Equation : Zs,=xs,−μs,cσs,c Z_{s,} = \\frac{x_{s,} - \\mu_{s,c}}{\\sigma_{s,c}} : - xs,ix_{s,} observed endpoint value individual ii study ss, - μs,c\\mu_{s,c} mean value observed endpoint control group cc study ss, - σs,c\\sigma_{s,c} standard deviation observed endpoint control group cc study ss, - ss study identifier, - ii refers individual animal study, - cc refers control-treated group animals within study.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"return-value","dir":"Articles","previous_headings":"","what":"Return Value","title":"Documentation for `get_bw_score` Function","text":"data.frame containing calculated BW Z-scores. structure output depends provided parameters: return_individual_scores = TRUE: Returns averaged Z-scores domain per studyid. return_zscore_by_USUBJID = TRUE: Returns Z-score animal/subject USUBJID domain per studyid. Otherwise, summarized BW Z-score specified studyid.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"implementation-details","dir":"Articles","previous_headings":"","what":"Implementation Details","title":"Documentation for `get_bw_score` Function","text":"get_bw_score function follows systematic approach calculate BW Z-score given study ID. key steps involved:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"database-connection","dir":"Articles","previous_headings":"Implementation Details","what":"Database Connection","title":"Documentation for `get_bw_score` Function","text":"function establishes connection specified SQLite database using RSQLite package processes .xpt files depending value use_xpt_file parameter: use_xpt_file = TRUE: Data loaded .xpt files located folder specified path_db. use_xpt_file = FALSE: Data extracted SQLite database file located path_db.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"data-retrieval","dir":"Articles","previous_headings":"Implementation Details","what":"Data Retrieval","title":"Documentation for `get_bw_score` Function","text":"function retrieves necessary data related specified studyid. data retrieval process depends whether master_compiledata provided NULL: master_compiledata = NULL, master_compiledata provided, function extracts data following SEND domains: BW (Body Weight) : Provide Body Weight measurements individual level. DM (Demographics): Supplies animal-level demographic details. DS (Disposition): Identifies recovery animals using DSDECOD column. PC (Pharmacokinetics): Provide USUBJID TK animals rats mice study. TX (Treatment): Provide dose levels information “vehicle” “HD.” master_compiledata Provided, master_compiledata value provided, function retrieve following domains: BW (Body Weight) : Provide Body Weight measurements individual level.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"baseline-weight-adjustment","dir":"Articles","previous_headings":"Implementation Details","what":"Baseline Weight Adjustment","title":"Documentation for `get_bw_score` Function","text":"weight animal normalized subtracting baseline weight recorded first day dosing.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"z-score-normalization","dir":"Articles","previous_headings":"Implementation Details","what":"Z-Score Normalization","title":"Documentation for `get_bw_score` Function","text":"adjusted weights normalized using Z-score equation described “Z-Score Calculation” section .","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"handling-optional-parameters","dir":"Articles","previous_headings":"Implementation Details","what":"Handling Optional Parameters","title":"Documentation for `get_bw_score` Function","text":"return_individual_scores = TRUE, Returns averaged Z-scores domain per studyid. return_zscore_by_USUBJID = TRUE, Returns Z-score animal/subject unique subject identifiersUSUBJID.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"fake-study-handling","dir":"Articles","previous_headings":"Implementation Details","what":"Fake Study Handling","title":"Documentation for `get_bw_score` Function","text":"fake_study = TRUE, special handling applied data sets generated SENDsanitizer package account structure.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"output-generation","dir":"Articles","previous_headings":"Implementation Details","what":"Output Generation","title":"Documentation for `get_bw_score` Function","text":"data frame containing requested scores returned. may include summarized scores, individual scores, Z-scores USUBJID, based parameters provided.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"dependencies","dir":"Articles","previous_headings":"Implementation Details","what":"Dependencies","title":"Documentation for `get_bw_score` Function","text":"function requires following R packages: - RSQLite: connect SQLite database. - haven : read .xpt file, use_xpt_file = TRUE. implementation ensures flexibility handling different input types configurations maintaining consistent structure output.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for `get_bw_score` Function","text":"","code":"# Example 1: Basic usage get_bw_score(studyid = '1234123', path_db = 'path/to/database.db') # Example 2: Include individual scores get_bw_score(studyid = '1234123', path_db = 'path/to/database.db', return_individual_scores = TRUE) # Example 3: Include z-scores by USUBJID get_bw_score(studyid = '1234123', path_db = 'path/to/database.db', return_zscore_by_USUBJID = TRUE)"},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_col_harmonized_scores_df.html","id":"description","dir":"Articles","previous_headings":"Function: get_col_harmonized_scores_df","what":"Description","title":"Function Documentation: get_col_harmonized_scores_df","text":"function takes data frame containing liver score data, harmonizes column names, handles missing values, performs optional rounding specific score columns. aims standardize clean data analysis : - Replacing spaces, commas, slashes column names dots. - Handling missing values replacing zero. - Harmonizing columns similar meanings (synonyms). - Removing unwanted columns. - Optionally rounding columns related liver scores histology scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_col_harmonized_scores_df.html","id":"parameters","dir":"Articles","previous_headings":"Function: get_col_harmonized_scores_df","what":"Parameters","title":"Function Documentation: get_col_harmonized_scores_df","text":"liver_score_data_frame (data.frame): data frame containing liver score data column names may need harmonization. Round (logical, default = FALSE): TRUE, function round values certain columns based specific rules.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_col_harmonized_scores_df.html","id":"details","dir":"Articles","previous_headings":"Function: get_col_harmonized_scores_df","what":"Details","title":"Function Documentation: get_col_harmonized_scores_df","text":"Spaces, commas, slashes column names replaced dots. Missing values (NA) replaced zeros. Columns similar meanings (synonyms) identified harmonized replacing values higher value . Specific columns ‘STUDYID’, ‘UNREMARKABLE’, ‘THIKENING’, ‘POSITIVE’ excluded harmonization. Liver-related columns (avg_, liver) floored nearest integer capped 5. Histology-related columns ceiled nearest integer. Columns reordered based sum values (excluding first column). Columns higher sums moved left, ensuring “important” columns appear first. Columns related specific endpoints (e.g., ‘INFILTRATE’, ‘UNREMARKABLE’, ‘THIKENING’, ‘POSITIVE’) removed final data frame.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_col_harmonized_scores_df.html","id":"return-value","dir":"Articles","previous_headings":"Function: get_col_harmonized_scores_df","what":"Return Value","title":"Function Documentation: get_col_harmonized_scores_df","text":"data frame harmonized columns, optional rounding applied, columns ordered based sum values.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_col_harmonized_scores_df.html","id":"example-usage","dir":"Articles","previous_headings":"Function: get_col_harmonized_scores_df","what":"Example Usage","title":"Function Documentation: get_col_harmonized_scores_df","text":"``r # Sample liver score data frame liver_scores <- data.frame( STUDYID = c(1, 2, 3), INFILTRATE = c(0, 1, 0), avg_Liver = c(3.5, 4.2, 2.1), POSITIVE = c(0, 0, 1),Thickening` = c(0, 0, 1), Liver_to_BW_zscore = c(3, 2, 4) )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_col_harmonized_scores_df.html","id":"call-the-function-with-round-true","dir":"Articles","previous_headings":"","what":"Call the function with Round = TRUE","title":"Function Documentation: get_col_harmonized_scores_df","text":"result <- get_col_harmonized_scores_df(liver_score_data_frame = liver_scores, Round = TRUE)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation for 'get_compile_data' Function","text":"get_compile_data function retrieves cleans study data DM (Demographics) domain applying multiple filtering steps compiles remaining data cleaned format. First, removes recovery animals filtering DM data using information DS (Disposition) domain. Additionally, study involves rats mice,function filters toxicokinetic animals excluding USUBJIDs present Pharmacokinetic (PC) domain. steps ensure data set excludes recovery animals toxicokinetic (TK) animals, focusing onthe target population relevant study’s primary analysis.function supports data retrieval SQLite databases .xpt files.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"return-value","dir":"Articles","previous_headings":"","what":"Return Value","title":"Documentation for 'get_compile_data' Function","text":"Returns cleaned data.frame following columns: STUDYID USUBJID Species SEX ARMCD SETCD cleaned data now ready used analysis.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"implementation-details","dir":"Articles","previous_headings":"","what":"Implementation Details","title":"Documentation for 'get_compile_data' Function","text":"get_compile_data function leverages following steps calculate compile_data data frame given study:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"database-connection","dir":"Articles","previous_headings":"Implementation Details","what":"Database Connection","title":"Documentation for 'get_compile_data' Function","text":"-function connects SQLite database reads .xpt files specified path_db.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"data-fetching","dir":"Articles","previous_headings":"Implementation Details","what":"Data Fetching","title":"Documentation for 'get_compile_data' Function","text":"function retrieves data following SEND domains based input parameters: DM (Demographics): Provides animal-level information. DS (Disposition): Identifies recovery animals using DSDECOD column. PC (Pharmacokinetics): Excludes TK animals rats mice based USUBJID. TX (Treatment): Determines dose levels “vehicle” “HD.”","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"filtering-steps","dir":"Articles","previous_headings":"Implementation Details","what":"Filtering Steps","title":"Documentation for 'get_compile_data' Function","text":"Filtering Recovery Animals Recovery animals excluded filtering DM data based DSDECOD values DS domain. Filtering Toxicokinetic (TK) Animals studies involving rats mice, function removes animals whose USUBJID appears PC domain. Dose Selection function identifies retains animals assigned either “vehicle” group “high-dose” (HD) group applying dose-ranking logic TX domain, “Control” groups reclassified “vehicle.”","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"examples-usage","dir":"Articles","previous_headings":"","what":"Examples Usage","title":"Documentation for 'get_compile_data' Function","text":"","code":"# Example usage with SQLite database df <- get_compile_data( studyid = \"1234123\", path_db = \"path/to/database.db\" ) # Example usage with .xpt files df <- get_compile_data( studyid = \"1234123\", path_db = \"path/to/files\", fake_study = TRUE, use_xpt_file = TRUE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"required-libraries","dir":"Articles","previous_headings":"","what":"Required Libraries","title":"Documentation for 'get_compile_data' Function","text":"function requires following R packages: DBI RSQLite data.table dplyr haven tidyr stringr ##Notes function assumes standard SEND domains column names. non-standard data, adjustments may needed. Check database .xpt files ensure compatibility function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"see-also","dir":"Articles","previous_headings":"","what":"See Also","title":"Documentation for 'get_compile_data' Function","text":"DBI RSQLite data.table SENDsanitizer","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_Data_formatted_for_ml_and_best.m.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation for `get_Data_formatted_for_ml_and_best.m` Function","text":"function get_Data_formatted_for_ml_and_best.m designed retrieve preprocess data machine learning (ML) models given SQLite database XPT file. performs several tasks fetching study IDs, retrieving study metadata, calculating liver toxicity scores, tuning hyperparameters ML models. final output list containing processed data ready machine learning best model.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_Data_formatted_for_ml_and_best.m.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Documentation for `get_Data_formatted_for_ml_and_best.m` Function","text":"function returns list following elements: Data: data frame containing preprocessed data ready machine learning. best.m: best machine learning model hyperparameter tuning, applicable.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_Data_formatted_for_ml_and_best.m.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"Documentation for `get_Data_formatted_for_ml_and_best.m` Function","text":"use_xpt_file TRUE, retrieves study IDs directories within specified path. use_xpt_file FALSE fake_study TRUE, function connects SQLite database retrieves study IDs ‘dm’ table. fake_study FALSE, fetches repeat-dose parallel study IDs database. studyid_metadata provided, generates metadata selecting unique study IDs assigning random “Target_Organ” values (either “Liver” “not_Liver”). function calculates liver toxicity scores using get_liver_om_lb_mi_tox_score_list function. calculated liver toxicity scores harmonized using get_col_harmonized_scores_df function, optionally rounding based Round parameter. function prepares data machine learning performs hyperparameter tuning (hyperparameter_tuning TRUE) using get_ml_data_and_tuned_hyperparameters function. final output consists processed data best machine learning model (best.m).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_Data_formatted_for_ml_and_best.m.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for `get_Data_formatted_for_ml_and_best.m` Function","text":"```r result <- get_Data_formatted_for_ml_and_best.m( path_db = “path//database.db”, rat_studies = TRUE, reps = 5, holdback = 0.2, error_correction_method = “Flip” )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_Data_formatted_for_ml_and_best.m.html","id":"access-the-processed-data-and-the-best-model","dir":"Articles","previous_headings":"","what":"Access the processed data and the best model","title":"Documentation for `get_Data_formatted_for_ml_and_best.m` Function","text":"processed_data <- resultDatabestmodel<−resultData best_model <- resultbest.m","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_histogram_barplot.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation for `get_histogram_barplot` function","text":"get_histogram_barplot function designed generate bar plot displaying liver-related scores, based data either provided directly fetched SQLite database. calculates mean values specific findings, compares liver-related non-liver-related groups, produces either plot processed data frame depending function’s parameters.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_histogram_barplot.html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"Documentation for `get_histogram_barplot` function","text":"function accepts following parameters:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_histogram_barplot.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Documentation for `get_histogram_barplot` function","text":"generateBarPlot = TRUE: function returns ggplot2 bar plot object displaying average scores liver-related findings versus non-liver-related findings. generateBarPlot = FALSE: function returns data.frame (plotData) containing calculated values finding, columns finding, liver status (LIVER), mean values (Value).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_histogram_barplot.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"Documentation for `get_histogram_barplot` function","text":"data provided, function attempts fetch data SQLite database use fake study dataset. fetches study data dm domain database fake_study = FALSE. study IDs extracted, filtered liver-related studies, used subsequent score calculations. get_liver_om_lb_mi_tox_score_list function calculates liver scores provided study IDs. resulting data harmonized using get_col_harmonized_scores_df ensure consistency output data frame. generateBarPlot = TRUE, function iterates findings computes average liver-related score (Liver status) finding. generates ggplot2 bar plot findings x-axis, average values y-axis, distinct colors representing liver vs. non-liver status. function checks whether Data parameter valid data frame. , error thrown.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_histogram_barplot.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for `get_histogram_barplot` function","text":"```r # Example fake study data, generating bar plot get_histogram_barplot(generateBarPlot = TRUE, fake_study = TRUE)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_histogram_barplot.html","id":"example-with-real-study-data-without-generating-a-plot","dir":"Articles","previous_headings":"","what":"Example with real study data, without generating a plot","title":"Documentation for `get_histogram_barplot` function","text":"data <- get_histogram_barplot(generateBarPlot = FALSE, fake_study = FALSE, path_db = “path//db”)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_imp_features_from_rf_model_with_cv.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Function Documentation: get_imp_features_from_rf_model_with_cv","text":"get_imp_features_from_rf_model_with_cv function performs cross-validation test repetitions random forest model, calculates feature importance using Gini importance, returns top n important features. primarily used evaluating feature importance classification tasks utilizing Random Forest optional -sampling custom test repetitions.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_imp_features_from_rf_model_with_cv.html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"Function Documentation: get_imp_features_from_rf_model_with_cv","text":"function accepts following parameters: Data: data frame containing training data (typically rows samples columns features). first column assumed target variable. Undersample: logical value (TRUE FALSE) indicating whether apply -sampling balance classes training data. Default FALSE. best.m: numeric value representing number variables considered split Random Forest model (function determine ). Default NULL. testReps: numeric value indicating number test repetitions (must least 2). Type: numeric value indicating type importance calculated. 1 Mean Decrease Accuracy 2 Mean Decrease Gini. nTopImportance: numeric value indicating number top important features return based importance scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_imp_features_from_rf_model_with_cv.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Function Documentation: get_imp_features_from_rf_model_with_cv","text":"function returns list containing: gini_scores: matrix Gini importance scores feature across different cross-validation iterations. matrix rows representing features columns representing test iterations.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_imp_features_from_rf_model_with_cv.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"Function Documentation: get_imp_features_from_rf_model_with_cv","text":"Initialize Metrics: function starts defining several empty vectors track performance metrics like Sensitivity, Specificity, PPV, NPV, others, initialized used current version. Prepare Data: function prepares data renaming columns input Data consistency initializing new data frame (rfTestData) store prediction results across iterations. Cross-Validation Setup: function sets cross-validation loop test repetitions. repetition, selects random subset data test uses rest training. Optionally, -sampling can applied balance dataset. Model Training: Random Forest model trained training data iteration using randomForest package. uses specified value best.m control number variables considered split. Calculate Gini Importance: training model, Gini importance scores calculated feature using randomForest::importance function. Gini scores aggregated across test repetitions. Aggregate Sort Importance Scores: completing cross-validation iterations, mean Gini importance scores feature calculated sorted decreasing order. Plot Feature Importance: dotchart generated visualize top nTopImportance features based importance scores. Return Results: function returns list containing Gini importance scores across iterations. ```r # Example call function result <- get_imp_features_from_rf_model_with_cv( Data = scores_df, Undersample = FALSE, best.m = 3, testReps = 5, Type = 2, nTopImportance = 10 )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_lb_score.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation for `get_lb_score` Function","text":"laboratory test (LB) data, Z-scores also calculated six key enzymes commonly found blood serum indicative liver function: Bilirubin, Albumin (ALB), Alanine Aminotransferase (ALT), Alkaline Phosphatase (ALP), Aspartate Aminotransferase (AST), Gamma-Glutamyl Transferase (GGT). enzymes serve important biomarkers detecting liver damage dysfunction. get_lb_score function computes liver biomarker z-scores clinical studies, utilizing data database .xpt file. processes lab data (lb domain) calculates z-scores several liver biomarkers (e.g., ALT, AST, ALP, GGT, BILI, ALB) based study data, performing several transformations filtering operations prepare data.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_lb_score.html","id":"function-definition","dir":"Articles","previous_headings":"","what":"Function Definition","title":"Documentation for `get_lb_score` Function","text":"","code":"get_lb_score <- function(studyid = NULL, path_db, fake_study = FALSE, use_xpt_file = FALSE, master_compiledata = NULL, return_individual_scores = FALSE, return_zscore_by_USUBJID = FALSE) { # Function body goes here }"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_lb_score.html","id":"parameters","dir":"Articles","previous_headings":"","what":"Parameters","title":"Documentation for `get_lb_score` Function","text":"studyid (character): study ID filter data . Default NULL. path_db (character): file path database (SQLite .xpt file). fake_study (logical): flag indicate study fake . Default FALSE. use_xpt_file (logical): Whether use .xpt file data extraction. Default FALSE. master_compiledata (data.frame): compile data frame includes participant information. NULL, function call get_compile_data. return_individual_scores (logical): Whether return individual z-scores biomarker. Default FALSE. return_zscore_by_USUBJID (logical): TRUE, return z-scores USUBJID (unique subject ID). Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_lb_score.html","id":"workflow","dir":"Articles","previous_headings":"","what":"Workflow","title":"Documentation for `get_lb_score` Function","text":"function first fetches data either SQLite database .xpt file depending value use_xpt_file. lab data (lb domain) fetched specified studyid. Various filtering operations applied based biomarker study conditions. LBSPEC field populated necessary (e.g., “WHOLE BLOOD”, “SERUM”, “URINE”). liver biomarker (ALT, AST, ALP, GGT, BILI, ALB), function computes z-score using formula: z=racextLBSTRESN−extmeanextvehicleextsdextvehicle z = rac{{ ext{{LBSTRESN}} - ext{{mean}}_{ ext{{vehicle}}}}}{{ ext{{sd}}_{ ext{{vehicle}}}}} z-scores averaged STUDYID classified discrete categories (0, 1, 2, 3) based predefined thresholds.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_lb_score.html","id":"merging-results","dir":"Articles","previous_headings":"","what":"Merging Results","title":"Documentation for `get_lb_score` Function","text":"individual z-scores biomarker merged single data frame. resulting data frame can returned: - USUBJID (unique subject ID), return_zscore_by_USUBJID TRUE. - study (STUDYID), z-scores averaged across subjects study.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_lb_score.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for `get_lb_score` Function","text":"","code":"# Example 1: Run the function with a given study ID and database path result <- get_lb_score(studyid = \"12345\", path_db = \"path_to_database\") # Example 2: Use the function with .xpt file instead of SQLite database result_xpt <- get_lb_score(studyid = \"12345\", path_db = \"path_to_xpt_file\", use_xpt_file = TRUE) # Example 3: Return individual biomarker z-scores individual_scores <- get_lb_score(studyid = \"12345\", path_db = \"path_to_database\", return_individual_scores = TRUE) # Example 4: Return z-scores by subject (USUBJID) subject_zscores <- get_lb_score(studyid = \"12345\", path_db = \"path_to_database\", return_zscore_by_USUBJID = TRUE)"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Documentation for 'get_livertobw_score' Function","text":"get_livertobw_score function designed calculate liver--body-weight (Liver:BW) scores corresponding z-scores study data. function supports data retrieval SQLite databases .xpt files provides options return individual scores, USUBJID-specific z-scores, averaged scores study. weight animal end dosing period normalized subtracting baseline weight measured first day dosing. Following , liver weight body weight ratio calculated animal. liver--body weight ratios normalized using Z-scores, comparisons made respective control group study","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"inputs","dir":"Articles","previous_headings":"Function Parameters","what":"Inputs","title":"Documentation for 'get_livertobw_score' Function","text":"Identifier study interest. NULL, studies database considered. Path SQLite database .xpt files. Indicator handling fake test study data. TRUE, reads data .xpt files. Otherwise, fetches data SQLite database. Precompiled dataset study information. provided, fetched using get_compile_data(). Precomputed body weight z-scores. provided, calculated using get_bw_score(). TRUE, returns individual z-scores averaged study. TRUE, returns z-scores grouped USUBJID.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"outputs","dir":"Articles","previous_headings":"Function Parameters","what":"Outputs","title":"Documentation for 'get_livertobw_score' Function","text":"Liver:BW z-scores grouped study (return_individual_scores = TRUE). Z-scores USUBJID (return_zscore_by_USUBJID = TRUE). Averaged z-scores study (default).","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"data-preparation","dir":"Articles","previous_headings":"Workflow","what":"1. Data Preparation","title":"Documentation for 'get_livertobw_score' Function","text":"Connects SQLite database using DBI use_xpt_file = FALSE. Retrieves data specified studyid using helper function fetch_domain_data(). master_compiledata provided, retrieved using get_compile_data(). bwzscore_BW provided, calculated using get_bw_score().","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"data-extraction","dir":"Articles","previous_headings":"Workflow","what":"2. Data Extraction","title":"Documentation for 'get_livertobw_score' Function","text":"Filters liver-specific data OM domain. Removes test recovery animals based master_compiledata.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"liver-to-body-weight-calculations","dir":"Articles","previous_headings":"Workflow","what":"3. Liver-to-Body-Weight Calculations","title":"Documentation for 'get_livertobw_score' Function","text":"Computes liver weight--body-weight ratio (liverToBW). Calculates z-scores liverToBW using vehicle arm statistics (mean SD). Converts z-scores absolute values.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"score-computation","dir":"Articles","previous_headings":"Workflow","what":"4. Score Computation","title":"Documentation for 'get_livertobw_score' Function","text":"Validates return_individual_scores return_zscore_by_USUBJID TRUE. Individual study-level scores (return_individual_scores = TRUE). USUBJID-specific z-scores (return_zscore_by_USUBJID = TRUE). Default: Average z-scores study.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"output","dir":"Articles","previous_headings":"Workflow","what":"5. Output","title":"Documentation for 'get_livertobw_score' Function","text":"Returns data frame based selected output option.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"example-1-default-averaged-scores","dir":"Articles","previous_headings":"Examples","what":"Example 1: Default Averaged Scores","title":"Documentation for 'get_livertobw_score' Function","text":"```r path <- “path_to_database” study_id <- “STUDY123” result <- get_livertobw_score(studyid = study_id, path_db = path) head(result)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"function-overview","dir":"Articles","previous_headings":"","what":"Function Overview","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"get_liver_om_lb_mi_tox_score_list function calculates series liver organ toxicity scores, body weight z-scores, relevant metrics set studies XPT files. outputs results based user preferences individual scores, z-scores USUBJID, averaged scores multiple studies. function also manages data flow several steps, including fetching processing data, calculating scores, managing error handling.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"function-signature","dir":"Articles","previous_headings":"Function Overview","what":"Function Signature","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"","code":"get_liver_om_lb_mi_tox_score_list( studyid_or_studyids = FALSE, path_db, fake_study = FALSE, use_xpt_file = FALSE, output_individual_scores = FALSE, output_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"function-overview-1","dir":"Articles","previous_headings":"","what":"Function Overview","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"get_liver_om_lb_mi_tox_score_list R function designed process liver toxicity scores one studies. function calculates several scores related liver toxicity body weight, including: Body Weight Z-Score (BWZSCORE_avg) Liver Organ Body Weight Z-Score (liverToBW_avg) LB Score (LB_score_avg) MI Score (MI_score_avg) function can output individual scores, z-scores USUBJID, averaged scores. also includes error handling capture record issues processing.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"arguments","dir":"Articles","previous_headings":"","what":"Arguments","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"studyid_or_studyids (Character vector single study ID): character vector containing one study IDs process. multiple studies provided, function processes study sequentially. path_db (Character): Path database directory containing data files. fake_study (Logical, default: FALSE): boolean flag indicating study data simulated (TRUE) real (FALSE). use_xpt_file (Logical, default: FALSE): boolean flag indicating whether use XPT file study data. Default FALSE. output_individual_scores (Logical, default: FALSE): boolean flag indicating whether individual scores returned. Default FALSE. output_zscore_by_USUBJID (Logical, default: FALSE): boolean flag indicating whether output z-scores USUBJID. Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"details","dir":"Articles","previous_headings":"","what":"Details","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"function iterates study ID XPT folder processes data calculate various toxicity scores. Key calculation blocks include: Fetching Master Compile Data: function calls get_compile_data retrieve primary data study. Body Weight Z-Score Calculation: Using get_bw_score function, body weight z-scores calculated either individually averaged. Liver Organ Body Weight Z-Score Calculation: Using get_livertobw_score function, liver toxicity scores related body weight calculated. LB Score Calculation: get_lb_score function used calculate LB scores. MI Score Calculation: get_mi_score function used MI score calculation.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"key-calculation-blocks","dir":"Articles","previous_headings":"","what":"Key Calculation Blocks","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"Fetching Master Compile Data: block calls get_compile_data function retrieve primary data study. data essential subsequent calculations. Body Weight Z-Score Calculation: body weight z-scores calculated using get_bw_score function, result either returned individual scores averaged scores. Liver Organ Body Weight Z-Score Calculation: liver organ--body weight z-scores calculated using get_livertobw_score function. LB Score Calculation: get_lb_score function called calculate LB score study. MI Score Calculation: function calculates MI score using get_mi_score function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"error-handling","dir":"Articles","previous_headings":"","what":"Error Handling","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"calculation block wrapped tryCatch statement handle errors encountered execution. block fails, study ID added error list, function continues processing next study.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"return-value","dir":"Articles","previous_headings":"","what":"Return Value","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"function returns different outputs based flags passed: output_individual_scores = TRUE: function returns combined data frame individual scores study. output_zscore_by_USUBJID = TRUE: function returns data frame z-scores USUBJID study. neither flag set, function returns data frame averaged scores study.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"example-1-get-averaged-scores-for-a-single-study","dir":"Articles","previous_headings":"","what":"Example 1: Get Averaged Scores for a Single Study","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"example, call get_liver_om_lb_mi_tox_score_list function retrieve averaged scores single study. studyid_or_studyids argument set single study ID, path_db argument points location database.","code":"#result <- get_liver_om_lb_mi_tox_score_list( #studyid_or_studyids = \"Study_001\", #path_db = \"path/to/database\" #)"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_mi_score.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Function Documentation for `get_mi_score`","text":"Z-scores Microscopic Findings (histopathological findings) derived based incidence (frequency) severity liver-related lesions. Initially, score calculated purely severity findings, adjusted based incidence rate providing accurate reflection overall histopathological impact liver. get_mi_score function processes medical information (MI) data clinical study databases. calculates MI scores, manages severity levels, processes data according specified parameters.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_mi_score.html","id":"parameters-explanation","dir":"Articles","previous_headings":"Purpose","what":"Parameters Explanation","title":"Function Documentation for `get_mi_score`","text":"get_mi_score function accepts following parameters: ID study fetch data. NULL, fetch data studies database. path SQLite database folder containing XPT files. required access data. flag indicate whether process fake study dataset. TRUE, function may mock data retrieval. flag determine .xpt files used. TRUE, function read XPT files provided path. dataframe contains compiled study data. NULL, function fetch data database. TRUE, function return individual MI scores participant. Default FALSE. TRUE, function return Z-scores USUBJID. Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_mi_score.html","id":"key-steps-in-the-function","dir":"Articles","previous_headings":"Purpose","what":"Key Steps in the Function","title":"Function Documentation for `get_mi_score`","text":"function follows several key steps process MI data: use_xpt_file FALSE, function connects SQLite database fetch required domains (mi dm). Otherwise, reads XPT files specified directory. MI domain filtered include relevant records, liver-related issues. Severity levels (MISEV) standardized missing values replaced. Severity levels MISEV mapped numerical values (e.g., “MILD” becomes 2, “SEVERE” becomes 5). function merges data mi domain compiled study data, ensuring valid participants (marked “recovery” “tk”) included. MI scores calculated based cleaned merged data. return_individual_scores set TRUE, individual scores returned. final data frame containing MI scores generated cleaned . function returns either compiled MI score data , optionally, Z-scores individual participant scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_mi_score.html","id":"example-usage","dir":"Articles","previous_headings":"Purpose","what":"Example Usage","title":"Function Documentation for `get_mi_score`","text":"example use get_mi_score function:","code":"# Example 1: Basic usage with default parameters # mi_scores <- get_mi_score( # studyid = \"12345\", # path_db = \"/path/to/database\" # ) # # # Example 2: Using XPT files instead of a database # mi_scores_xpt <- get_mi_score( # path_db = \"/path/to/xpt/files\", # use_xpt_file = TRUE # ) # # # Example 3: Return individual scores # mi_individual_scores <- get_mi_score( # studyid = \"12345\", # path_db = \"/path/to/database\", # return_individual_scores = TRUE # ) # # # Example 4: Return Z-scores for each participant # mi_zscores <- get_mi_score( # studyid = \"12345\", # path_db = \"/path/to/database\", # return_zscore_by_USUBJID = TRUE # )"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_mi_score.html","id":"conclusion","dir":"Articles","previous_headings":"Purpose","what":"Conclusion","title":"Function Documentation for `get_mi_score`","text":"get_mi_score function versatile tool processing analyzing MI data clinical study databases. setting various parameters, users can tailor output meet specific needs, : Calculating MI scores based severity levels. Returning individual scores aggregated MI scores. Returning Z-scores participant. Handling data either SQLite databases XPT files. function’s flexibility makes powerful resource researchers data analysts working clinical study data.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"overview","dir":"Articles","previous_headings":"","what":"Overview","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"get_ml_data_and_tuned_hyperparameters function processes prepares machine learning data modeling, various optional preprocessing steps missing value imputation, undersampling, hyperparameter tuning. also supports error correction via specific methods like “Flip” “Prune”.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"function-definition","dir":"Articles","previous_headings":"","what":"Function Definition","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"","code":"get_ml_data_and_tuned_hyperparameters <- function(Data, studyid_metadata, Impute = FALSE, Round = FALSE, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method = NULL) { # Function implementation } result <- get_ml_data_and_tuned_hyperparameters(Data = scores_df, studyid_metadata = metadata_df, Impute = TRUE, Round = TRUE, reps = 10, holdback = 0.25, Undersample = TRUE, hyperparameter_tuning = TRUE, error_correction_method = \"Flip\") # Access the final data and best mtry hyperparameter rfData <- result$rfData best_mtry <- result$best.m )"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"parameters","dir":"Articles","previous_headings":"","what":"Parameters","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"Data (data frame): Input data containing scores. typically data frame named scores_df. studyid_metadata (data frame): data frame containing metadata, typically including STUDYID column, used joining Data. Impute (logical): TRUE, missing values dataset imputed using random forest imputation. Round (logical): TRUE, specific columns rounded according rules described function. reps (numeric): number repetitions cross-validation. value 0 skips repetition. holdback (numeric): fraction data hold back testing. value 1 means leave-one-cross-validation. Undersample (logical): TRUE, training data undersampled balance target classes. hyperparameter_tuning (logical): TRUE, hyperparameter tuning performed using cross-validation. error_correction_method (character): Specifies error correction method use. Can one \"Flip\", \"Prune\", \"None\". Defaults NULL, means correction.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"returns","dir":"Articles","previous_headings":"","what":"Returns","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"rfData: final prepared data preprocessing, imputation, error correction methods. best.m: best mtry hyperparameter random forest model (determined tuning default).","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"data-merging","dir":"Articles","previous_headings":"Function Workflow","what":"Data Merging","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"function first joins metadata (studyid_metadata) input data (Data) based STUDYID column.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"target-variable-encoding","dir":"Articles","previous_headings":"Function Workflow","what":"Target Variable Encoding","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"'Liver' encoded 1. 'not_Liver' encoded 0. encoding facilitates modeling process.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"missing-value-imputation","dir":"Articles","previous_headings":"Function Workflow","what":"Missing Value Imputation","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"Impute TRUE, missing values imputed using randomForest::rfImpute function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"rounding-of-specific-columns","dir":"Articles","previous_headings":"Function Workflow","what":"Rounding of Specific Columns","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"Columns related averages liver-related data rounded using floor(). columns (e.g., \"MI\" columns) rounded using ceiling().","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"data-splitting","dir":"Articles","previous_headings":"Function Workflow","what":"Data Splitting","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"fraction data (holdback) held back testing. repetition (reps), data split . training set optionally undersampled balance target classes.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"hyperparameter-tuning","dir":"Articles","previous_headings":"Function Workflow","what":"Hyperparameter Tuning","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"function performs hyperparameter tuning random forest model using cross-validation trainControl caret package. mtry parameter tuned, controls number variables randomly sampled candidates split.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"model-training","dir":"Articles","previous_headings":"Function Workflow","what":"Model Training","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"random forest model trained prepared data using randomForest package. best.m hyperparameter selected based tuning set default value.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"error-correction","dir":"Articles","previous_headings":"Function Workflow","what":"Error Correction","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"\"Flip\": Flips target class certain conditions met. \"Prune\": Removes instances misclassified. \"None\": error correction applied.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"final-data-return","dir":"Articles","previous_headings":"Function Workflow","what":"Final Data Return","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"processed data (rfData) best mtry hyperparameter (best.m) returned.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_prediction_plot.html","id":"function-purpose","dir":"Articles","previous_headings":"","what":"Function Purpose","title":"Documentation for get_prediction_plot Function","text":"get_prediction_plot function performs model building prediction dataset using random forest model. iterates multiple test repetitions, trains model training data, makes predictions test data. function generates histogram visualize distribution predictions outcome variable (LIVER).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_prediction_plot.html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"Documentation for get_prediction_plot Function","text":"function accepts following input parameters:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_prediction_plot.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Documentation for get_prediction_plot Function","text":"function returns histogram plot visualizing predicted probabilities LIVER variable across test repetitions. plot shows distribution predictions (probabilities) classes (LIVER = “Y” “N”).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_prediction_plot.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"Documentation for get_prediction_plot Function","text":"Data NULL, function fetches formats data using get_Data_formatted_for_ml_and_best.m function. dataset divided training testing sets repetition (testReps). Undersample enabled, undersampling applied balance dataset. random forest model trained using training set repetition. model makes predictions test set. predicted probabilities stored repetition. predictions averaged across repetitions, histogram created visualize distribution predicted probabilities LIVER variable. histogram displayed using ggplot2, showing predicted probabilities LIVER outcome (coded “Y” “N”).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_prediction_plot.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for get_prediction_plot Function","text":"```r # Example function call get_prediction_plot( path_db = “path_to_db”, rat_studies = FALSE, reps = 10, holdback = 0.2, Undersample = TRUE, hyperparameter_tuning = FALSE, error_correction_method = “Flip”, testReps = 5 )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_repeat_dose_parallel_studyids.html","id":"function-purpose","dir":"Articles","previous_headings":"","what":"Function Purpose","title":"Documentation for `get_repeat_dose_parallel_studyids` Function","text":"get_repeat_dose_parallel_studyids function designed retrieve study IDs database correspond parallel-design studies involving repeat-dose toxicity. filters studies based specified design whether species involved rats.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_repeat_dose_parallel_studyids.html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"Documentation for `get_repeat_dose_parallel_studyids` Function","text":"function accepts following parameters:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_repeat_dose_parallel_studyids.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Documentation for `get_repeat_dose_parallel_studyids` Function","text":"function returns vector study IDs meet specified criteria. returned vector contains following: Study IDs: list study IDs match parallel design repeat-dose toxicity criteria (rat species, specified).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_repeat_dose_parallel_studyids.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"Documentation for `get_repeat_dose_parallel_studyids` Function","text":"Database Existence Check: function first checks database file exists provided path. , error raised. Database Connection: database connection established using sendigR package. connection database initialized using sendigR::initEnvironment(). Retrieve Parallel Study IDs: function uses sendigR::getStudiesSDESIGN() retrieve study IDs associated parallel design. Retrieve Repeat-Dose Studies: SQL query executed via sendigR::genericQuery() fetch study IDs associated repeat-dose toxicity. query looks studies specific TSPARMCD values related repeat-dose toxicity. Intersect Parallel Repeat-Dose Studies: study IDs obtained parallel design repeat-dose toxicity studies intersected identify common study IDs. Optionally Filter Rat Studies: rat_studies = TRUE, function retrieves study IDs involve rats species. done querying SPECIES field database filtering based presence “RAT”. Return Study IDs: final result vector study IDs meet filter conditions, including parallel design, repeat-dose toxicity, optionally, rat species.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_repeat_dose_parallel_studyids.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for `get_repeat_dose_parallel_studyids` Function","text":"```r # Example without filtering rat studies study_ids <- get_repeat_dose_parallel_studyids(path_db = “path//database.sqlite”)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_repeat_dose_parallel_studyids.html","id":"example-with-filtering-for-rat-studies","dir":"Articles","previous_headings":"","what":"Example with filtering for rat studies","title":"Documentation for `get_repeat_dose_parallel_studyids` Function","text":"study_ids_rats <- get_repeat_dose_parallel_studyids(path_db = “path//database.sqlite”, rat_studies = TRUE)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model .html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"get_reprtree_from_rf_model Function Documentation","text":"get_reprtree_from_rf_model function designed train Random Forest model provided dataset generate representation tree (ReprTree) trained model. function supports various configurations data preprocessing, model hyperparameters, sampling strategies, including random undersampling. Additionally, allows error correction hyperparameter tuning.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model .html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"get_reprtree_from_rf_model Function Documentation","text":"following table describes input parameters get_reprtree_from_rf_model function:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model .html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"get_reprtree_from_rf_model Function Documentation","text":"function generates representation tree (ReprTree) trained Random Forest model visualizes first tree (k=5) model. plot first tree Random Forest displayed. representation tree object generated explicitly returned.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model .html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"get_reprtree_from_rf_model Function Documentation","text":"Data parameter NULL, function calls get_Data_formatted_for_ml_and_best.m prepare data modeling. Data split training testing sets (70% training 30% testing). undersampling enabled (Undersample = TRUE), positive negative samples balanced training set undersampling majority class. Random Forest model trained using randomForest function. target variable Target_Organ, model uses best hyperparameter (best.m) determined beforehand. number trees forest set 500, proximity calculations enabled. ReprTree generated using reprtree::ReprTree function, creates representation trained Random Forest model. first tree (k=5) plotted using reprtree::plot.getTree. first tree Random Forest model visualized using reprtree::plot.getTree function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model .html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"get_reprtree_from_rf_model Function Documentation","text":"```r get_reprtree_from_rf_model( Data = my_data, path_db = “path//database”, rat_studies = TRUE, studyid_metadata = my_metadata, fake_study = FALSE, use_xpt_file = TRUE, Round = TRUE, Impute = TRUE, reps = 5, holdback = 0.3, Undersample = TRUE, hyperparameter_tuning = FALSE, error_correction_method = “Flip” )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"get_reprtree_from_rf_model Function Documentation","text":"get_reprtree_from_rf_model function designed train Random Forest model provided dataset generate representation tree (ReprTree) trained model. function supports various configurations data preprocessing, model hyperparameters, sampling strategies, including random undersampling. Additionally, allows error correction hyperparameter tuning.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model.html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"get_reprtree_from_rf_model Function Documentation","text":"following table describes input parameters get_reprtree_from_rf_model function:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"get_reprtree_from_rf_model Function Documentation","text":"function generates representation tree (ReprTree) trained Random Forest model visualizes first tree (k=5) model. plot first tree Random Forest displayed. representation tree object generated explicitly returned.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"get_reprtree_from_rf_model Function Documentation","text":"Data parameter NULL, function calls get_Data_formatted_for_ml_and_best.m prepare data modeling. Data split training testing sets (70% training 30% testing). undersampling enabled (Undersample = TRUE), positive negative samples balanced training set undersampling majority class. Random Forest model trained using randomForest function. target variable Target_Organ, model uses best hyperparameter (best.m) determined beforehand. number trees forest set 500, proximity calculations enabled. ReprTree generated using reprtree::ReprTree function, creates representation trained Random Forest model. first tree (k=5) plotted using reprtree::plot.getTree. first tree Random Forest model visualized using reprtree::plot.getTree function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"get_reprtree_from_rf_model Function Documentation","text":"```r get_reprtree_from_rf_model( Data = my_data, path_db = “path//database”, rat_studies = TRUE, studyid_metadata = my_metadata, fake_study = FALSE, use_xpt_file = TRUE, Round = TRUE, Impute = TRUE, reps = 5, holdback = 0.3, Undersample = TRUE, hyperparameter_tuning = FALSE, error_correction_method = “Flip” )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_input_param_list_output_cv_imp.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation for `get_rf_input_param_list_output_cv_imp` Function","text":"get_rf_input_param_list_output_cv_imp function prepares necessary data training evaluating Random Forest (RF) model cross-validation variable importance scores. handles various configurations, imputation, hyperparameter tuning, inclusion rat studies. function interacts either XPT file SQLite database extract harmonize study data, followed model training evaluation.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_input_param_list_output_cv_imp.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Documentation for `get_rf_input_param_list_output_cv_imp` Function","text":"function returns Random Forest model trained cross-validation (CV) includes list variable importance scores. Specifically, returns result get_rf_model_with_cv function, includes trained model, cross-validation results, feature importance scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_input_param_list_output_cv_imp.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"Documentation for `get_rf_input_param_list_output_cv_imp` Function","text":"use_xpt_file TRUE, function loads data XPT file. fake_study TRUE, fetches data SQLite database filters based rat_studies. neither condition met, retrieves study IDs database using get_repeat_dose_parallel_studyids. function calls get_liver_om_lb_mi_tox_score_list calculate liver scores studies, harmonized using get_col_harmonized_scores_df. function prepares data Random Forest model training calling get_ml_data_and_tuned_hyperparameters. step involves imputation, optional hyperparameter tuning, data balancing. function calls get_rf_model_with_cv train evaluate Random Forest model cross-validation. model’s performance evaluated across multiple repetitions (testReps), option include top importance features. specified, function applies error correction method (either “Flip”, “Prune”, “None”). function returns trained Random Forest model along cross-validation results feature importance scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_input_param_list_output_cv_imp.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for `get_rf_input_param_list_output_cv_imp` Function","text":"```r result <- get_rf_input_param_list_output_cv_imp( path_db = “path//database”, rat_studies = TRUE, studyid_metadata = metadata_df, fake_study = FALSE, use_xpt_file = FALSE, Round = TRUE, Impute = TRUE, reps = 10, holdback = 0.2, Undersample = TRUE, hyperparameter_tuning = TRUE, error_correction_method = “Flip”, best.m = NULL, testReps = 5, indeterminateUpper = 0.9, indeterminateLower = 0.1, Type = “classification”, nTopImportance = 10 )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Documentation: get_rf_model_with_cv","text":"get_rf_model_with_cv function implements random forest-based modeling pipeline cross-validation assess model performance. includes optional undersampling handling imbalanced data provides detailed metrics evaluating model accuracy.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"function-overview","dir":"Articles","previous_headings":"","what":"Function Overview","title":"Documentation: get_rf_model_with_cv","text":"","code":"get_rf_model_with_cv <- function(Data, Undersample = FALSE, best.m = NULL, # any numeric value or call function to get it testReps, # testReps must be at least 2; Type) { ... }"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation: get_rf_model_with_cv","text":"function: Builds random forest model using randomForest package. Performs cross-validation evaluate model metrics. Optionally applies undersampling balance datasets. Returns aggregated performance metrics.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"outputs","dir":"Articles","previous_headings":"","what":"Outputs","title":"Documentation: get_rf_model_with_cv","text":"function returns list containing: performance_metrics: Aggregated performance metrics including sensitivity, specificity, accuracy. raw_results: Raw data sensitivity, specificity, accuracy cross-validation fold.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"data-preparation","dir":"Articles","previous_headings":"Cross-Validation Workflow","what":"Data Preparation","title":"Documentation: get_rf_model_with_cv","text":"Splits data training testing subsets based specified testReps. Optionally applies undersampling balance training set.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"model-training","dir":"Articles","previous_headings":"Cross-Validation Workflow","what":"Model Training","title":"Documentation: get_rf_model_with_cv","text":"Trains random forest model using randomForest package.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"prediction-and-metrics-calculation","dir":"Articles","previous_headings":"Cross-Validation Workflow","what":"Prediction and Metrics Calculation","title":"Documentation: get_rf_model_with_cv","text":"Predicts probabilities test set. Computes metrics (sensitivity, specificity, accuracy, etc.) using caret package.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"performance-summary","dir":"Articles","previous_headings":"Cross-Validation Workflow","what":"Performance Summary","title":"Documentation: get_rf_model_with_cv","text":"Aggregates performance metrics across cross-validation folds.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation: get_rf_model_with_cv","text":"","code":"# Load necessary libraries library(randomForest) library(caret) # Example dataset data(Data) Data$Target_Organ <- ifelse(iris$Species == \"setosa\", 1, 0) # Run the function results <- get_rf_model_with_cv(Data = iris[, -5], Undersample = TRUE, best.m = 2, testReps = 5, Type = 2) # Print results print(results$performance_metrics)"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"conclusion","dir":"Articles","previous_headings":"","what":"Conclusion","title":"Documentation: get_rf_model_with_cv","text":"get_rf_model_with_cv function powerful tool evaluating random forest models cross-validation, especially datasets class imbalance. Adjust parameters Undersample best.m optimize performance specific dataset.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"function-purpose","dir":"Articles","previous_headings":"","what":"Function Purpose","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"get_rf_model_output_cv_imp function designed perform cross-validation Random Forest model, track performance metrics (sensitivity, specificity, accuracy), handle indeterminate predictions, compute feature importance based either Gini Accuracy. function outputs performance summaries feature importance rankings specified number test repetitions.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"function takes several input parameters control model’s training process, validation, feature importance calculations. table describing parameter:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"function returns list containing following elements: performance_metrics: vector aggregated performance metrics (e.g., sensitivity, specificity, accuracy, etc.). feature_importance: matrix containing importance top nTopImportance features, ordered importance score. raw_results: list containing raw results debugging analysis, including sensitivity, specificity, accuracy, Gini scores across test repetitions.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"data-preparation","dir":"Articles","previous_headings":"Key Steps","what":"1. Data Preparation","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"input data prepared creating copy scores_df called rfTestData, initialized NA values hold predictions test repetition. column names simplified numeric identifiers.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"cross-validation","dir":"Articles","previous_headings":"Key Steps","what":"2. Cross-Validation","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"function iterates testReps repetitions perform cross-validation: dataset split training testing sets iteration. Undersample set TRUE, training set undersampled balance class distribution. Random Forest model trained training data. Predictions made test data stored rfTestData.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"handling-indeterminate-predictions","dir":"Articles","previous_headings":"Key Steps","what":"3. Handling Indeterminate Predictions","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"repetition, predictions probabilities indeterminateUpper indeterminateLower thresholds considered indeterminate. predictions replaced NA, proportion indeterminate predictions tracked.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"performance-metrics","dir":"Articles","previous_headings":"Key Steps","what":"4. Performance Metrics","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"test repetition, function computes confusion matrix using caret package extracts various performance metrics, including: Sensitivity Specificity Positive Predictive Value (PPV) Negative Predictive Value (NPV) Prevalence Accuracy metrics stored aggregated across test repetitions provide overall performance summary.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"feature-importance","dir":"Articles","previous_headings":"Key Steps","what":"5. Feature Importance","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"feature importance computed using randomForest::importance() function. importance scores aggregated repetitions, top nTopImportance features identified returned.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"return-results","dir":"Articles","previous_headings":"Key Steps","what":"6. Return Results","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"function returns list containing: Aggregated performance metrics Top nTopImportance features ranked importance score Raw results analysis (e.g., confusion matrix outputs)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"```r # Example usage function result <- get_rf_model_output_cv_imp( scores_df = your_data, Undersample = FALSE, best.m = 3, testReps = 5, indeterminateUpper = 0.8, indeterminateLower = 0.2, Type = 1, nTopImportance = 10 )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"view-performance-metrics","dir":"Articles","previous_headings":"","what":"View performance metrics","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"print(result$performance_metrics)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"view-top-features-by-importance","dir":"Articles","previous_headings":"","what":"View top features by importance","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"print(result$feature_importance)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_with_cv.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Random Forest Model with Cross-validation and Exclusion","text":"get_zone_exclusioned_rf_model_with_cv function implements Random Forest classification model cross-validation. provides tools evaluating model’s performance, including sensitivity, specificity, accuracy, metrics. function allows users handle indeterminate predictions includes option undersampling data, can particularly useful dealing imbalanced datasets. document explains use function, describes inputs, outputs, key steps involved model training evaluation process.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_with_cv.html","id":"function-purpose","dir":"Articles","previous_headings":"","what":"Function Purpose","title":"Random Forest Model with Cross-validation and Exclusion","text":"main goal function train Random Forest model evaluate using cross-validation. function: Performs cross-validation across specified number repetitions (testReps). Allows undersampling dataset address class imbalance required. Handles indeterminate predictions setting NA. Tracks performance metrics like sensitivity, specificity, positive predictive value (PPV), accuracy repetition. Provides aggregated summary performance metrics across repetitions. Additionally, function provides option adjust feature importance calculation, either using Gini index Mean Decrease Accuracy.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_with_cv.html","id":"parameters","dir":"Articles","previous_headings":"","what":"Parameters","title":"Random Forest Model with Cross-validation and Exclusion","text":"function accepts following parameters: Data (Data): data frame containing features target variable (Target_Organ) train model . Undersample (Undersample): boolean parameter indicates whether perform undersampling data balance class distribution. set TRUE, function undersample negative class match number positive class instances. Best Model Parameter (best.m): numeric value indicating best number variables (mytry) use split Random Forest model. value can provided manually determined optimization. Test Repetitions (testReps): number times repeat cross-validation process. value must least 2, function relies multiple test sets assess model performance. Indeterminate Prediction Thresholds (indeterminateUpper, indeterminateLower): parameters define upper lower bounds predicting “indeterminate” values. model’s predicted probability falls thresholds, prediction considered indeterminate set NA. Feature Importance Type (Type): integer indicating type feature importance use Random Forest model. Typically, either 1 “Mean Decrease Accuracy” 2 “Mean Decrease Gini”.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_with_cv.html","id":"model-workflow","dir":"Articles","previous_headings":"","what":"Model Workflow","title":"Random Forest Model with Cross-validation and Exclusion","text":"input data frame (Data) processed ensure formatted correctly model training. column names simplified numeric identifiers easier manipulation. dataset split training set test set, iteration using different random samples. Random Forest model trained training set, predictions made test set. Undersample set TRUE, function balances dataset undersampling negative class. positive class left unchanged, negative class reduced match size positive class. training model, predictions made test data. predicted probabilities stored later used calculate performance metrics. Indeterminate predictions identified based upper lower thresholds (indeterminateUpper indeterminateLower). predictions marked NA included performance calculations. Sensitivity: proportion true positives correctly identified model. Specificity: proportion true negatives correctly identified model. Accuracy: overall accuracy model predicting classes. PPV (Positive Predictive Value): proportion positive predictions correct. NPV (Negative Predictive Value): proportion negative predictions correct. Prevalence: proportion positive cases dataset. metrics computed using caret package’s confusion matrix function. completing test repetitions, function calculates mean performance metric across repetitions provide aggregated performance summary. results include individual metrics repetition overall performance summary.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_with_cv.html","id":"outputs","dir":"Articles","previous_headings":"","what":"Outputs","title":"Random Forest Model with Cross-validation and Exclusion","text":"function returns list two components: performance_metrics: vector containing aggregated performance metrics (mean sensitivity, specificity, accuracy, etc.) calculated across test repetitions. raw_results: list containing raw performance metrics repetition, including: sensitivity: vector sensitivity values test repetition. specificity: vector specificity values test repetition. accuracy: vector accuracy values test repetition. outputs can used evaluate model’s performance analyze results.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_with_cv.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Random Forest Model with Cross-validation and Exclusion","text":"example use function:","code":"# Example dataset (replace with actual data) Data <- your_data_frame # Run the model with cross-validation and undersampling results <- get_zone_exclusioned_rf_model_with_cv(Data = Data, Undersample = TRUE, best.m = 5, testReps = 10, indeterminateUpper = 0.8, indeterminateLower = 0.2, Type = 1) # View the aggregated performance metrics print(results$performance_metrics) # Access raw results for further analysis print(results$raw_results)"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Introduction.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Introduction to SENDQSAR","text":"Standard Exchange Nonclinical Data (SEND), developed Clinical Data Interchange Standards Consortium (CDISC), offers structured electronic format organize exchange nonclinical study data among sponsor companies, contract research organizations (CROs), health authorities. Test results, examinations, observations subjects nonclinical study represented series SEND domains. domain defined collection logically related observations common topic. Typically, domain represented single dataset. - [Domain vs MI documentation still progress ## Need edited SEND study (identified IND &STUDYID), normalized toxicity score values calculated hepatotoxicity study endpoints, scores ranging 0 5. included animal body weight, liver weight, liver function test results (e.g., serum enzyme levels ALB, ALT, AST, etc.). Z-scores used standardize values relative control groups, ensuring comparability across different studies. Additionally, histopathological findings adjusted incidence severity incorporated ML model. details scoring system described elsewhere [citation cross-study article], toxicity scores based variety critical parameters, enabling robust assessment liver toxicity. short, initially, weight animal end dosing period normalized subtracting baseline weight measured first day dosing. Following , liver weight body weight ratio calculated animal. liver--body weight ratios normalized using Z-scores, comparisons made respective control group study. allowed standardized comparisons across different studies, reducing variability due differences animal size baseline conditions. laboratory test (LB) data, Z-scores also calculated six key enzymes commonly found blood serum indicative liver function: Bilirubin, Albumin (ALB), Alanine Aminotransferase (ALT), Alkaline Phosphatase (ALP), Aspartate Aminotransferase (AST), Gamma-Glutamyl Transferase (GGT). enzymes serve important biomarkers detecting liver damage dysfunction. addition biochemical data, Z-scores Microscopic Findings (histopathological findings) derived based incidence (frequency) severity liver-related lesions. Initially, score calculated purely severity findings, adjusted based incidence rate providing accurate reflection overall histopathological impact liver. body weight (BW), organ mass (OM), laboratory test (LB) domains, absolute value Z-scores used assign toxicity scores. scoring system follows: Z-scores 1 scored 0 (toxicity signal), Z-scores 1 2 scored 1 (weak signal), Z-scores 2 3 scored 2 (moderate signal), Z-scores 3 scored 3 (strong signal). binning system effectively rounds absolute value Z-scores cases, simplifying categorization toxicity signals. incorporating standardized scores across body weight, organ mass, laboratory data, histopathology findings, comprehensive quantifiable framework assessing hepatotoxicity developed. framework facilitates application machine learning models predict liver toxicity toxicology studies, enhancing reproducibility interpretability toxicological risk assessments. ** Need clarify reasons 0-5 MI rests 0-3. weight animal end dosing period normalized subtracting baseline weight measured first day dosing.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Introduction.html","id":"required-libraries","dir":"Articles","previous_headings":"Introduction","what":"Required Libraries","title":"Introduction to SENDQSAR","text":"function requires following R packages: DBI RSQLite data.table dplyr haven tidyr stringr ##Notes function assumes standard SEND domains column names. non-standard data, adjustments may needed. Check database .xpt files ensure compatibility function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Introduction.html","id":"see-also","dir":"Articles","previous_headings":"Introduction","what":"See Also","title":"Introduction to SENDQSAR","text":"DBI RSQLite data.table SENDsanitizer","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Md Aminul Islam Prodhan. Author, maintainer. Kevin Snyder. Author.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Prodhan M, Snyder K (2025). SENDQSAR: Building Quantitative Structure-Activity Relationship model leveraging SEND Database. R package version 0.0.0.9000, https://github.com/aminuldu07/SENDQSAR, https://aminuldu07.github.io/SENDQSAR/.","code":"@Manual{, title = {SENDQSAR: Building a Quantitative Structure-Activity Relationship model leveraging SEND Database}, author = {Md Aminul Islam Prodhan and Kevin Snyder}, year = {2025}, note = {R package version 0.0.0.9000, https://github.com/aminuldu07/SENDQSAR}, url = {https://aminuldu07.github.io/SENDQSAR/}, }"},{"path":[]},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"about","dir":"","previous_headings":"","what":"About","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"package facilitates developing Quantitative Structure-Activity Relationship (QSAR) models using SEND database. streamlines data acquisition, preprocessing, descriptor calculation, model evaluation, enabling researchers efficiently explore molecular descriptors create robust predictive models.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"features","dir":"","previous_headings":"","what":"Features","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"Automated Data Processing: Simplifies data acquisition preprocessing steps. Comprehensive Analysis: Provides z-score calculations various parameters body weight, liver--body weight ratio, laboratory tests. Machine Learning Integration: Supports classification modeling, hyperparameter tuning, performance evaluation. Visualization Tools: Includes histograms, bar plots, AUC curves better data interpretation.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"data-acquisition-and-processing","dir":"","previous_headings":"Functions Overview","what":"Data Acquisition and Processing","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"get_compile_data - Fetches data database specified database path structured data frame analysis. get_bw_score - Calculates body weight (BW) z-scores animal. get_livertobw_zscore - Computes liver--body weight z-scores. get_lb_score - Calculates z-scores laboratory test (LB) results. get_mi_score - Computes z-scores microscopic findings (MI). get_liver_om_lb_mi_tox_score_list - Combines z-scores LB, MI, liver--BW single data frame. get_col_harmonized_scores_df - Harmonizes column names across studies.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"machine-learning-preparation-and-modeling","dir":"","previous_headings":"Functions Overview","what":"Machine Learning Preparation and Modeling","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"get_ml_data_and_tuned_hyperparameters - Prepares data tunes hyperparameters machine learning. get_rf_model_with_cv - Builds random forest model cross-validation outputs performance metrics. get_zone_exclusioned_rf_model_with_cv - Introduces indeterminate zone improved classification accuracy. get_imp_features_from_rf_model_with_cv - Computes feature importance model interpretation. get_auc_curve_with_rf_model - Generates AUC curves evaluate model performance.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"visualization-and-reporting","dir":"","previous_headings":"Functions Overview","what":"Visualization and Reporting","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"get_histogram_barplot - Creates bar plots target variable classes. get_reprtree_from_rf_model - Builds representative decision trees interpretability. get_prediction_plot - Visualizes prediction probabilities histograms.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"automated-pipelines","dir":"","previous_headings":"Functions Overview","what":"Automated Pipelines","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"get_Data_formatted_for_ml_and_best.m - Formats data machine learning pipelines. get_rf_input_param_list_output_cv_imp - Automates preprocessing, modeling, evaluation one step. get_zone_exclusioned_rf_model_cv_imp - Similar function, excludes uncertain predictions based thresholds.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"workflow","dir":"","previous_headings":"","what":"Workflow","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"Input Database Path: Provide database path containing nonclinical study results STUDYID. Preprocessing: Use functions 1-8 clean, harmonize, prepare data. Model Building: Employ machine learning functions (9-18) training, validation, evaluation. Visualization: Generate plots performance metrics better interpretation.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"dependencies","dir":"","previous_headings":"","what":"Dependencies","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"randomForest ROCR ggplot2 reprtree","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"","code":"# Install from GitHub devtools::install_github(\"aminuldu07/SENDQSAR\")"},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"example-1-basic-data-compilation","dir":"","previous_headings":"Examples","what":"Example 1: Basic Data Compilation","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"","code":"library(SENDQSAR) data <- get_compile_data(\"/path/to/database\")"},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"example-2-z-score-calculation","dir":"","previous_headings":"Examples","what":"Example 2: Z-Score Calculation","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"","code":"bw_scores <- get_bw_score(data) liver_scores <- get_livertobw_zscore(data)"},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"example-3-machine-learning-model","dir":"","previous_headings":"Examples","what":"Example 3: Machine Learning Model","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"","code":"model <- get_rf_model_with_cv(data, n_repeats=10) print(model$confusion_matrix)"},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"example-4-visualization","dir":"","previous_headings":"Examples","what":"Example 4: Visualization","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"","code":"get_histogram_barplot(data, target_col=\"target_variable\")"},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"contribution","dir":"","previous_headings":"","what":"Contribution","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"Contributions welcome! Feel free submit issues pull requests via GitHub.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"license","dir":"","previous_headings":"","what":"License","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"project licensed MIT License - see LICENSE file details.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"contact","dir":"","previous_headings":"","what":"Contact","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"information, visit project GitHub Page contact email@example.com.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_auc_curve_with_rf_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","title":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","text":"function trains Random Forest model, computes ROC curve, calculates AUC (Area Curve). allows various preprocessing options, imputation, rounding, undersampling, hyperparameter tuning.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_auc_curve_with_rf_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","text":"","code":"get_auc_curve_with_rf_model( Data = NULL, path_db = NULL, rat_studies = FALSE, studyid_metadata, fake_study = FALSE, use_xpt_file = FALSE, Round = FALSE, Impute = FALSE, best.m = NULL, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method, output_individual_scores = TRUE, output_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_auc_curve_with_rf_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","text":"Data data frame containing training data. NULL, data fetched database. path_db string representing path SQLite database used fetch data Data NULL. rat_studies Logical; whether filter rat studies. Defaults FALSE. studyid_metadata data frame containing metadata associated study IDs. fake_study Logical; whether use fake study IDs data simulation. Defaults FALSE. use_xpt_file Logical; whether use XPT file input data. Defaults FALSE. Round Logical; whether round numerical values. Defaults FALSE. Impute Logical; whether perform imputation missing values. Defaults FALSE. best.m 'mtry' hyperparameter Random Forest. NULL, determined function. reps numeric value indicating number repetitions cross-validation. Defaults numeric value. holdback Numeric; either 1 fraction value (e.g., 0.75) holdback cross-validation. Undersample Logical; whether perform undersampling. Defaults FALSE. hyperparameter_tuning Logical; whether perform hyperparameter tuning. Defaults FALSE. error_correction_method Character; one \"Flip\", \"Prune\", \"None\", specifying method error correction. output_individual_scores Logical; whether output individual scores. Defaults TRUE. output_zscore_by_USUBJID Logical; whether output z-scores subject ID. Defaults FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_auc_curve_with_rf_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","text":"function return explicit value. generates: AUC (Area Curve) printed console. ROC curve plot calculated AUC value. Various performance metrics (e.g., True Positive Rate, False Positive Rate), displayed plot.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_auc_curve_with_rf_model.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","text":"function prepares data training Random Forest model first fetching data SQLite database generating synthetic data (fake_study TRUE). processes data using various options imputation, rounding, undersampling. model trained using Random Forest algorithm, performance evaluated via ROC curve AUC metric. function also allows hyperparameter tuning error correction. training model, predictions made, AUC calculated visualized ROC curve plot.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_auc_curve_with_rf_model.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","text":"","code":"# Example 1: Using real data from the database get_auc_curve_with_rf_model(Data = NULL, path_db = \"path/to/database.db\", rat_studies = TRUE, reps = 10, holdback = 0.75, error_correction_method = \"Prune\") #> Error in get_repeat_dose_parallel_studyids(path_db = path_db, rat_studies = rat_studies): Database file not found at the specified path! # Example 2: Using synthetic data with fake study IDs get_auc_curve_with_rf_model(Data = NULL, fake_study = TRUE, reps = 5, holdback = 0.8, error_correction_method = \"Flip\") #> Error in .local(drv, ...): length(dbname) == 1 is not TRUE"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_bw_score.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate Body Weight Z-score for a Given STUDYID — get_bw_score","title":"Calculate Body Weight Z-score for a Given STUDYID — get_bw_score","text":"get_bw_score function calculates Body Weight (BW) Z-score specified studyid using data provided database .xpt file. supports optional parameters customize analysis offers flexibility return individual Z-score USUBJID (unique subject identifier).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_bw_score.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate Body Weight Z-score for a Given STUDYID — get_bw_score","text":"","code":"get_bw_score( studyid = NULL, path_db, fake_study = FALSE, use_xpt_file = FALSE, master_compiledata = NULL, return_individual_scores = FALSE, return_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_bw_score.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate Body Weight Z-score for a Given STUDYID — get_bw_score","text":"studyid Mandatory, character studyid BW Z-score calculated. Required use_xpt_file = FALSE. use_xpt_file = TRUE, studyid ignored, .xpt files specified folder (path_db) analyzed. path_db Mandatory, character path SQLite database file folder containing .xpt files (use_xpt_file = TRUE). fake_study Optional, Boolean Indicates whether study generated SENDsanitizer package. Default FALSE. use_xpt_file Mandatory, Boolean TRUE, function processes .xpt files folder specified path_db. FALSE, uses SQLite database file path_db requires valid studyid. Default FALSE. master_compiledata Optional, character master_compiledata provided (.e., NULL), function automatically call get_compile_data function calculate . return_individual_scores Optional, Boolean TRUE, function returns individual scores domain averaging scores subjects/animals (USUBJID) study. Default FALSE. return_zscore_by_USUBJID Optional, Boolean TRUE, function returns Z-scores animal/subject USUBJID. Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_bw_score.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate Body Weight Z-score for a Given STUDYID — get_bw_score","text":"data.frame containing calculated BW Z-scores. structure output depends provided parameters: return_individual_scores = TRUE: Returns averaged Z-scores domain per studyid. return_zscore_by_USUBJID = TRUE: Returns Z-score animal/subject USUBJID domain per studyid. Otherwise, summarized BW score specified studyid.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_bw_score.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate Body Weight Z-score for a Given STUDYID — get_bw_score","text":"","code":"if (FALSE) { # \\dontrun{ # Example 1: Basic usage get_bw_score(studyid = '1234123', path_db = 'path/to/database.db') # Example 2: Include individual scores get_bw_score(studyid = '1234123', path_db = 'path/to/database.db', return_individual_scores = TRUE) # Example 3: Include z-scores by USUBJID get_bw_score(studyid = '1234123', path_db = 'path/to/database.db', return_zscore_by_USUBJID = TRUE) } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_col_harmonized_scores_df.html","id":null,"dir":"Reference","previous_headings":"","what":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","title":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","text":"function harmonizes liver score data cleaning column names, replacing missing values zeros, optionally rounding specific columns. function also identifies harmonizes synonyms, removes unnecessary columns, reorders data based column sums.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_col_harmonized_scores_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","text":"","code":"get_col_harmonized_scores_df(liver_score_data_frame, Round = FALSE)"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_col_harmonized_scores_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","text":"liver_score_data_frame data frame containing liver score data. data frame column names may require harmonization. Round logical value indicating whether data rounded. TRUE, certain liver-related columns floored capped, histology-related columns ceiled. Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_col_harmonized_scores_df.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","text":"data frame harmonized liver scores, optional rounding, columns reordered based sums.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_col_harmonized_scores_df.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","text":"function performs following operations: Harmonizes column names replacing spaces, commas, slashes dots. Replaces missing values (NA) zero. Identifies harmonizes synonym columns, replacing values higher value synonyms. Removes specific unwanted columns 'INFILTRATE', 'UNREMARKABLE', 'THIKENING', 'POSITIVE'. Optionally rounds liver score columns flooring capping 5, histology-related columns ceiling. Reorders columns based sum values.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_col_harmonized_scores_df.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","text":"","code":"if (FALSE) { # \\dontrun{ # Example usage result <- get_col_harmonized_scores_df(liver_score_data_frame = liver_scores, Round = TRUE) } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_compile_data.html","id":null,"dir":"Reference","previous_headings":"","what":"Retrieve Compiled Data from SQLite Database or XPT File — get_compile_data","title":"Retrieve Compiled Data from SQLite Database or XPT File — get_compile_data","text":"function retrieves compiles data given study ID either SQLite database XPT file.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_compile_data.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Retrieve Compiled Data from SQLite Database or XPT File — get_compile_data","text":"","code":"get_compile_data( studyid = NULL, path_db, fake_study = FALSE, use_xpt_file = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_compile_data.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Retrieve Compiled Data from SQLite Database or XPT File — get_compile_data","text":"studyid Character. Study ID number. Defaults NULL. NULL, available studies may retrieved (behavior depends database structure). path_db Character. Path SQLite database file. Mandatory. fake_study Logical. Whether study data generated SENDsanitizer package. Defaults FALSE. use_xpt_file Logical. Whether retrieve study data XPT file format instead database. Defaults FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_compile_data.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Retrieve Compiled Data from SQLite Database or XPT File — get_compile_data","text":"data frame containing compiled study data. structure returned data frame depends database XPT file contents.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_compile_data.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Retrieve Compiled Data from SQLite Database or XPT File — get_compile_data","text":"","code":"if (FALSE) { # \\dontrun{ # Retrieve data for a specific study ID from the database get_compile_data(studyid = '1234123', path_db = 'path/to/database.db') # Retrieve data from an XPT file get_compile_data(path_db = 'path/to/file.xpt', use_xpt_file = TRUE) } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_Data_formatted_for_ml_and_best.m.html","id":null,"dir":"Reference","previous_headings":"","what":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","title":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","text":"function processes data given SQLite database XPT file, calculates liver toxicity scores, prepares data machine learning models. can also tune hyperparameters apply error correction methods.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_Data_formatted_for_ml_and_best.m.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","text":"","code":"get_Data_formatted_for_ml_and_best.m( path_db, rat_studies = FALSE, studyid_metadata = NULL, fake_study = FALSE, use_xpt_file = FALSE, Round = FALSE, Impute = FALSE, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_Data_formatted_for_ml_and_best.m.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","text":"path_db character string representing path SQLite database XPT file. rat_studies logical flag filter rat studies (default FALSE). studyid_metadata data frame containing metadata study IDs. NULL, metadata generated (default NULL). fake_study logical flag use fake study data (default FALSE). use_xpt_file logical flag indicate whether use XPT file instead SQLite database (default FALSE). Round logical flag round liver toxicity scores (default FALSE). Impute logical flag impute missing values dataset (default FALSE). reps integer specifying number repetitions cross-validation. holdback numeric value indicating fraction data hold back validation. Undersample logical flag undersample majority class (default FALSE). hyperparameter_tuning logical flag perform hyperparameter tuning (default FALSE). error_correction_method character string specifying error correction method. Must one 'Flip', 'Prune', 'None'.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_Data_formatted_for_ml_and_best.m.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","text":"list containing: Data data frame containing preprocessed data ready machine learning. best.m best machine learning model hyperparameter tuning, applicable.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_Data_formatted_for_ml_and_best.m.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","text":"function performs several key steps: Retrieves study IDs SQLite database XPT file. Generates uses provided study metadata, including random assignment \"Target_Organ\" values (either \"Liver\" \"not_Liver\"). Calculates liver toxicity scores using get_liver_om_lb_mi_tox_score_list function. Harmonizes calculated scores using get_col_harmonized_scores_df function. Prepares data machine learning tunes hyperparameters (enabled) using get_ml_data_and_tuned_hyperparameters function. Returns processed data best model.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_Data_formatted_for_ml_and_best.m.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","text":"","code":"if (FALSE) { # \\dontrun{ result <- get_Data_formatted_for_ml_and_best.m( path_db = \"path/to/database.db\", rat_studies = TRUE, reps = 5, holdback = 0.2, error_correction_method = \"Flip\" ) # Access the processed data and the best model processed_data <- result$Data best_model <- result$best.m } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_histogram_barplot.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","title":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","text":"function generates bar plot comparing liver-related findings non-liver-related findings, returns processed data analysis. function can fetch data SQLite database, provided XPT file, simulate data fake_study set TRUE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_histogram_barplot.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","text":"","code":"get_histogram_barplot( Data = NULL, generateBarPlot = FALSE, path_db = FALSE, rat_studies = FALSE, studyid_metadata, fake_study = FALSE, use_xpt_file = FALSE, Round = FALSE, output_individual_scores = TRUE, output_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_histogram_barplot.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","text":"Data data frame containing liver-related scores. NULL, function attempt generate fetch data database file. generateBarPlot logical flag (default = FALSE). TRUE, generates bar plot. FALSE, returns processed data. path_db character string representing path SQLite database. Required use_xpt_file FALSE fake_study FALSE. rat_studies logical flag (default = FALSE) filter rat studies fetching data database. studyid_metadata data frame containing metadata associated study IDs. Required fake_study FALSE real data fetched. fake_study logical flag (default = FALSE). TRUE, function simulates study data instead fetching database. use_xpt_file logical flag (default = FALSE). TRUE, function use XPT file fetch data, instead relying database. Round logical flag (default = FALSE). Whether round liver scores. output_individual_scores logical flag (default = TRUE). Whether output individual scores aggregated scores. output_zscore_by_USUBJID logical flag (default = FALSE). Whether output z-scores USUBJID (unique subject identifier).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_histogram_barplot.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","text":"generateBarPlot = TRUE, ggplot2 bar plot object returned displaying average scores liver-related findings versus non-liver-related findings. generateBarPlot = FALSE, data frame (plotData) containing calculated values finding, liver status (LIVER), mean values (Value) returned.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_histogram_barplot.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","text":"data provided, function attempts fetch data SQLite database simulate data based fake_study flag. function also supports use XPT files allows customization study filtering rat_studies studyid_metadata parameters. generating plot, function compares liver-related findings findings, displaying average scores finding bar plot.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_histogram_barplot.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","text":"","code":"# Example 1: Generate a bar plot with fake study data get_histogram_barplot(generateBarPlot = TRUE, fake_study = TRUE) #> Error in path.expand(path): invalid 'path' argument # Example 2: Get processed data without generating a plot data <- get_histogram_barplot(generateBarPlot = FALSE, fake_study = FALSE, path_db = \"path/to/db\") #> Error in get_repeat_dose_parallel_studyids(path_db = path_db, rat_studies = rat_studies): Database file not found at the specified path!"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_imp_features_from_rf_model_with_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","title":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","text":"function performs cross-validation test repetitions random forest model, calculates feature importance using Gini importance, returns top n important features.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_imp_features_from_rf_model_with_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","text":"","code":"get_imp_features_from_rf_model_with_cv( Data = NULL, Undersample = FALSE, best.m = NULL, testReps, Type, nTopImportance )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_imp_features_from_rf_model_with_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","text":"Data data frame containing training data (rows samples, columns features). first column assumed target variable. Undersample logical value indicating whether apply -sampling balance classes training data. Default FALSE. best.m numeric value representing number variables consider split Random Forest model (function determine ). Default NULL. testReps numeric value indicating number test repetitions (must least 2). Type numeric value indicating type importance calculated. 1 Mean Decrease Accuracy 2 Mean Decrease Gini. nTopImportance numeric value indicating number top important features return based importance scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_imp_features_from_rf_model_with_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","text":"list containing: gini_scores matrix Gini importance scores feature across different cross-validation iterations. matrix rows representing features columns representing test iterations.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_imp_features_from_rf_model_with_cv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","text":"function trains Random Forest model using cross-validation specified repetitions calculates feature importance using Gini importance scores. function also supports optional -sampling balance class distribution training set. function performs following steps: Initializes performance metric trackers. Prepares input data cross-validation. Performs cross-validation, repetition involves training model subset data testing remaining data. Optionally applies -sampling training data. Trains Random Forest model fold calculates Gini importance scores. Aggregates sorts Gini importance scores identify top features. Plots importance top features.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_imp_features_from_rf_model_with_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","text":"","code":"# Example of calling the function result <- get_imp_features_from_rf_model_with_cv( Data = scores_df, Undersample = FALSE, best.m = 3, testReps = 5, Type = 2, nTopImportance = 10 ) #> Error: object 'scores_df' not found"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_lb_score.html","id":null,"dir":"Reference","previous_headings":"","what":"Get LB Score for a Given Study ID — get_lb_score","title":"Get LB Score for a Given Study ID — get_lb_score","text":"function computes LB score given study ID using data stored specified database. offers various optional parameters customize output, whether return individual scores Z-scores USUBJID.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_lb_score.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get LB Score for a Given Study ID — get_lb_score","text":"","code":"get_lb_score( studyid = NULL, path_db, fake_study = FALSE, use_xpt_file = FALSE, master_compiledata = NULL, return_individual_scores = FALSE, return_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_lb_score.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get LB Score for a Given Study ID — get_lb_score","text":"studyid Mandatory, character study ID number LB score calculated. path_db Mandatory, character path database containing necessary data calculation. fake_study Optional, boolean Indicates whether study generated SENDsanitizer package. Defaults FALSE. use_xpt_file Mandatory, character Specifies path XPT (SAS transport) file used study. master_compiledata Mandatory, character path compiled master dataset used calculate LB score. return_individual_scores Optional, boolean TRUE, function return individual scores subject. Defaults FALSE. return_zscore_by_USUBJID Optional, boolean TRUE, function return Z-scores USUBJID. Defaults FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_lb_score.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get LB Score for a Given Study ID — get_lb_score","text":"numeric calculated LB score based provided data parameters.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_lb_score.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get LB Score for a Given Study ID — get_lb_score","text":"","code":"if (FALSE) { # \\dontrun{ # Example usage of the function get_lb_score(studyid='1234123', path_db='path/to/database.db') } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_livertobw_score.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate Liver-to-Body-Weight Scores and Z-Scores — get_livertobw_score","title":"Calculate Liver-to-Body-Weight Scores and Z-Scores — get_livertobw_score","text":"function computes liver--body-weight (Liver:BW) ratios corresponding z-scores study data. supports retrieving data SQLite databases .xpt files provides flexible options output formats.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_livertobw_score.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate Liver-to-Body-Weight Scores and Z-Scores — get_livertobw_score","text":"","code":"get_livertobw_score( studyid = NULL, path_db, fake_study = FALSE, use_xpt_file = FALSE, master_compiledata = NULL, bwzscore_BW = NULL, return_individual_scores = FALSE, return_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_livertobw_score.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate Liver-to-Body-Weight Scores and Z-Scores — get_livertobw_score","text":"studyid Optional, character. Study ID calculations performed. NULL, data studies database used. path_db Mandatory, character. Path SQLite database directory containing .xpt files. fake_study Optional, logical. Indicates whether study fake/test study generated SENDsanitizer package. Default FALSE. use_xpt_file Optional, logical. Specifies whether use .xpt files instead SQLite database. Default FALSE. master_compiledata Optional, data.frame. Precompiled dataset study information. NULL, function fetches data using get_compile_data. bwzscore_BW Optional, data.frame. Precomputed body weight z-scores. NULL, calculated using get_bw_score. return_individual_scores Optional, logical. TRUE, returns individual z-scores averaged study. Default FALSE. return_zscore_by_USUBJID Optional, logical. TRUE, returns z-scores grouped USUBJID. Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_livertobw_score.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate Liver-to-Body-Weight Scores and Z-Scores — get_livertobw_score","text":"data frame containing liver--body-weight z-scores: Averaged study (default). Individual scores averaged study (return_individual_scores = TRUE). Z-scores grouped USUBJID (return_zscore_by_USUBJID = TRUE).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_livertobw_score.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate Liver-to-Body-Weight Scores and Z-Scores — get_livertobw_score","text":"","code":"if (FALSE) { # \\dontrun{ # Example 1: Default averaged scores result <- get_livertobw_score( studyid = '1234123', path_db = 'path/to/database.db' ) head(result) # Example 2: Individual scores by study result <- get_livertobw_score( studyid = '1234123', path_db = 'path/to/database.db', return_individual_scores = TRUE ) head(result) # Example 3: Z-scores by USUBJID result <- get_livertobw_score( studyid = '1234123', path_db = 'path/to/database.db', return_zscore_by_USUBJID = TRUE ) head(result) } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_liver_om_lb_mi_tox_score_list.html","id":null,"dir":"Reference","previous_headings":"","what":"get_liver_om_lb_mi_tox_score_list — get_liver_om_lb_mi_tox_score_list","title":"get_liver_om_lb_mi_tox_score_list — get_liver_om_lb_mi_tox_score_list","text":"function processes liver organ toxicity scores, body weight z-scores, related metrics set studies XPT files. can output individual scores, z-scores USUBJID, averaged scores multiple studies, handles errors processing steps.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_liver_om_lb_mi_tox_score_list.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"get_liver_om_lb_mi_tox_score_list — get_liver_om_lb_mi_tox_score_list","text":"","code":"get_liver_om_lb_mi_tox_score_list( studyid_or_studyids = FALSE, path_db, fake_study = FALSE, use_xpt_file = FALSE, output_individual_scores = FALSE, output_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_liver_om_lb_mi_tox_score_list.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"get_liver_om_lb_mi_tox_score_list — get_liver_om_lb_mi_tox_score_list","text":"studyid_or_studyids character vector single study ID process. multiple studies provided, function processes study sequentially. (Mandatory) path_db character string specifying path database directory containing data files. (Mandatory) fake_study boolean flag indicating study data simulated (TRUE) real (FALSE). Default FALSE. (Optional) use_xpt_file boolean flag indicating whether use XPT file study data. Default FALSE. (Mandatory) output_individual_scores boolean flag indicating whether individual scores returned (TRUE) averaged scores (FALSE). Default FALSE. (Optional) output_zscore_by_USUBJID boolean flag indicating whether output z-scores USUBJID (TRUE) averaged scores (FALSE). Default FALSE. (Optional) multiple_xpt_folder character string specifying path folder containing multiple XPT files. (Optional)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_liver_om_lb_mi_tox_score_list.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"get_liver_om_lb_mi_tox_score_list — get_liver_om_lb_mi_tox_score_list","text":"data frame containing calculated scores study. type result depends flags passed: output_individual_scores TRUE, data frame individual scores study returned. output_zscore_by_USUBJID TRUE, data frame z-scores USUBJID study returned. neither flag set, function returns data frame averaged scores study.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_liver_om_lb_mi_tox_score_list.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"get_liver_om_lb_mi_tox_score_list — get_liver_om_lb_mi_tox_score_list","text":"","code":"if (FALSE) { # \\dontrun{ # Get averaged scores for a single study result <- get_liver_om_lb_mi_tox_score_list( studyid_or_studyids = \"Study_001\", path_db = \"path/to/database\" ) # Get individual scores for multiple studies result_individual_scores <- get_liver_om_lb_mi_tox_score_list( studyid_or_studyids = c(\"Study_001\", \"Study_002\"), path_db = \"path/to/database\", output_individual_scores = TRUE ) } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_mi_score.html","id":null,"dir":"Reference","previous_headings":"","what":"Get MI score for a given studyid — get_mi_score","title":"Get MI score for a given studyid — get_mi_score","text":"function calculates MI score given study using provided study ID database. allows flexibility terms returning individual scores, Z-scores, . function compatible SENDsanitizer-generated datasets standard clinical study databases.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_mi_score.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get MI score for a given studyid — get_mi_score","text":"","code":"get_mi_score( studyid = NULL, path_db, fake_study = FALSE, use_xpt_file = FALSE, master_compiledata = NULL, return_individual_scores = FALSE, return_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_mi_score.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get MI score for a given studyid — get_mi_score","text":"studyid Mandatory, character study ID number clinical study. path_db Mandatory, character file path database contains study data. fake_study Optional, logical TRUE, function assumes study data generated SENDsanitizer package. Default FALSE. use_xpt_file Mandatory, logical TRUE, indicates XPT file used instead database analysis. master_compiledata Mandatory, character path master compile data, often used supplement compile data multiple sources. return_individual_scores Optional, logical TRUE, function returns individual MI scores participant. Default FALSE. return_zscore_by_USUBJID Optional, logical TRUE, function returns Z-scores USUBJID (subject identifier). Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_mi_score.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get MI score for a given studyid — get_mi_score","text":"numeric vector data frame containing MI scores. format depends specified parameters, individual scores aggregated scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_mi_score.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get MI score for a given studyid — get_mi_score","text":"","code":"if (FALSE) { # \\dontrun{ # Example usage of get_mi_score get_mi_score(studyid = '1234123', path_db = 'path/to/database.db') } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_ml_data_and_tuned_hyperparameters.html","id":null,"dir":"Reference","previous_headings":"","what":"Get Random Forest Data and Tuned Hyperparameters — get_ml_data_and_tuned_hyperparameters","title":"Get Random Forest Data and Tuned Hyperparameters — get_ml_data_and_tuned_hyperparameters","text":"get_ml_data_and_tuned_hyperparameters function processes input data metadata prepare data random forest analysis. includes steps data preprocessing, optional imputation, rounding, error correction, hyperparameter tuning.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_ml_data_and_tuned_hyperparameters.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get Random Forest Data and Tuned Hyperparameters — get_ml_data_and_tuned_hyperparameters","text":"","code":"get_ml_data_and_tuned_hyperparameters( Data, studyid_metadata, Impute = FALSE, Round = FALSE, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method = NULL )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_ml_data_and_tuned_hyperparameters.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get Random Forest Data and Tuned Hyperparameters — get_ml_data_and_tuned_hyperparameters","text":"Data data.frame. Input data frame containing scores, typically named scores_df. studyid_metadata data.frame. Metadata containing STUDYID values, used joining Data. Impute logical. Indicates whether impute missing values dataset using random forest imputation. Default FALSE. Round logical. Specifies whether round specific numerical columns according predefined rules. Default FALSE. reps integer. Number repetitions cross-validation. value 0 skips repetition. holdback numeric. Fraction data hold back testing. value 1 performs leave-one-cross-validation. Undersample logical. Indicates whether undersample training data balance target classes. Default FALSE. hyperparameter_tuning logical. Specifies whether perform hyperparameter tuning random forest model. Default FALSE. error_correction_method character. Specifies method error correction. Can \"Flip\", \"Prune\", NULL. Default NULL.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_ml_data_and_tuned_hyperparameters.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get Random Forest Data and Tuned Hyperparameters — get_ml_data_and_tuned_hyperparameters","text":"list containing: rfData final processed data preprocessing error correction. best.m best mtry hyperparameter determined random forest model.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_ml_data_and_tuned_hyperparameters.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get Random Forest Data and Tuned Hyperparameters — get_ml_data_and_tuned_hyperparameters","text":"","code":"# Example usage: Data <- scores_df #> Error: object 'scores_df' not found studyid_metadata <- read.csv(\"path/to/study_metadata.csv\") #> Warning: cannot open file 'path/to/study_metadata.csv': No such file or directory #> Error in file(file, \"rt\"): cannot open the connection result <- get_ml_data_and_tuned_hyperparameters( Data = Data, studyid_metadata = studyid_metadata, Impute = TRUE, Round = TRUE, reps = 10, holdback = 0.75, Undersample = TRUE, hyperparameter_tuning = TRUE, error_correction_method = \"Flip\" ) #> Error: object 'Data' not found rfData <- result$rfData #> Error: object 'result' not found best_mtry <- result$best.m #> Error: object 'result' not found"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_prediction_plot.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","title":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","text":"function performs model building prediction using random forest algorithm. iterates multiple test repetitions, training model training data predicting test data. predictions made, histogram plot generated visualize distribution predicted probabilities outcome variable (LIVER).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_prediction_plot.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","text":"","code":"get_prediction_plot( Data = NULL, path_db, rat_studies = FALSE, studyid_metadata = NULL, fake_study = FALSE, use_xpt_file = FALSE, Round = FALSE, Impute = FALSE, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method, testReps )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_prediction_plot.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","text":"Data data frame containing dataset use training testing. NULL, function attempt fetch format data database using get_Data_formatted_for_ml_and_best.m function. path_db string indicating path database contains dataset. rat_studies logical flag indicating whether use rat studies data. Defaults FALSE. studyid_metadata data frame containing metadata related study IDs. Defaults NULL. fake_study logical flag indicating whether use fake study data. Defaults FALSE. use_xpt_file logical flag indicating whether use XPT file. Defaults FALSE. Round logical flag indicating whether round predictions. Defaults FALSE. Impute logical flag indicating whether impute missing values. Defaults FALSE. reps integer specifying number repetitions cross-validation. holdback numeric value indicating proportion data hold back testing cross-validation. Undersample logical flag indicating whether perform undersampling dataset balance classes. Defaults FALSE. hyperparameter_tuning logical flag indicating whether perform hyperparameter tuning. Defaults FALSE. error_correction_method string specifying error correction method used. Possible values \"Flip\", \"Prune\", \"None\". testReps integer specifying number test repetitions model evaluation.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_prediction_plot.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","text":"ggplot object representing histogram predicted probabilities LIVER variable across test repetitions.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_prediction_plot.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","text":"function works follows: Data NULL, function fetches data best model configuration calling get_Data_formatted_for_ml_and_best.m function. dataset divided training test sets repetition (testReps). Undersample enabled, undersampling applied balance dataset. random forest model trained training data predictions made test data. predictions averaged test repetitions histogram plotted visualize distribution predicted probabilities LIVER.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_prediction_plot.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","text":"","code":"# Example function call get_prediction_plot( path_db = \"path_to_db\", rat_studies = FALSE, reps = 10, holdback = 0.2, Undersample = TRUE, hyperparameter_tuning = FALSE, error_correction_method = \"Flip\", testReps = 5 ) #> Error in get_repeat_dose_parallel_studyids(path_db = path_db, rat_studies = rat_studies): Database file not found at the specified path!"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_repeat_dose_parallel_studyids.html","id":null,"dir":"Reference","previous_headings":"","what":"Get Repeat Dose Parallel Study IDs — get_repeat_dose_parallel_studyids","title":"Get Repeat Dose Parallel Study IDs — get_repeat_dose_parallel_studyids","text":"function retrieves study IDs database correspond parallel-design studies involving repeat-dose toxicity. optionally filters studies rat species.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_repeat_dose_parallel_studyids.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get Repeat Dose Parallel Study IDs — get_repeat_dose_parallel_studyids","text":"","code":"get_repeat_dose_parallel_studyids(path_db, rat_studies = FALSE)"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_repeat_dose_parallel_studyids.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get Repeat Dose Parallel Study IDs — get_repeat_dose_parallel_studyids","text":"path_db character string representing file path SQLite database. required parameter. rat_studies logical flag indicating whether filter studies rats . Defaults FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_repeat_dose_parallel_studyids.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get Repeat Dose Parallel Study IDs — get_repeat_dose_parallel_studyids","text":"vector study IDs meet specified criteria. includes: Study IDs match parallel design repeat-dose toxicity criteria. Optionally, study IDs match rat species rat_studies = TRUE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_repeat_dose_parallel_studyids.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get Repeat Dose Parallel Study IDs — get_repeat_dose_parallel_studyids","text":"","code":"if (FALSE) { # \\dontrun{ # Example without filtering for rat studies study_ids <- get_repeat_dose_parallel_studyids(path_db = \"path/to/database.sqlite\") # Example with filtering for rat studies study_ids_rats <- get_repeat_dose_parallel_studyids(path_db = \"path/to/database.sqlite\", rat_studies = TRUE) } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_reprtree_from_rf_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","title":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","text":"function trains Random Forest model provided dataset generates representation tree (ReprTree) trained model. supports various preprocessing configurations, model hyperparameters, sampling strategies, including random undersampling. function also allows error correction hyperparameter tuning.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_reprtree_from_rf_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","text":"","code":"get_reprtree_from_rf_model( Data = NULL, path_db, rat_studies = FALSE, studyid_metadata = NULL, fake_study = FALSE, use_xpt_file = FALSE, Round = FALSE, Impute = FALSE, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_reprtree_from_rf_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","text":"Data data frame containing dataset train Random Forest model. NULL, data fetched using get_Data_formatted_for_ml_and_best.m function. path_db character string representing path database used fetching processing data. rat_studies logical flag indicating whether rat studies used (default: FALSE). studyid_metadata data frame containing metadata related study IDs (default: NULL). fake_study logical flag indicating whether use fake study data (default: FALSE). use_xpt_file logical flag indicating whether use XPT file format data input (default: FALSE). Round logical flag indicating whether round data processing (default: FALSE). Impute logical flag indicating whether impute missing values data (default: FALSE). reps integer specifying number repetitions perform cross-validation resampling. holdback numeric value representing fraction data hold back testing. Undersample logical flag indicating whether undersampling applied balance dataset (default: FALSE). hyperparameter_tuning logical flag indicating whether hyperparameter tuning performed (default: FALSE). error_correction_method character string specifying method error correction. Must one 'Flip', 'Prune', 'None'.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_reprtree_from_rf_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","text":"plot first tree Random Forest model displayed. function return ReprTree object explicitly, generated used plotting.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_reprtree_from_rf_model.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","text":"function performs following steps: Data Preparation: Data NULL, fetched using get_Data_formatted_for_ml_and_best.m function. Data split training (70%) testing (30%) sets. Undersample TRUE, training data balanced using undersampling. Model Training: Random Forest model trained using randomForest::randomForest function. target variable Target_Organ, model uses best hyperparameter (best.m). number trees set 500. ReprTree Generation: reprtree::ReprTree function used generate representation tree trained Random Forest model. Visualization: first tree Random Forest model plotted using reprtree::plot.getTree function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_reprtree_from_rf_model.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","text":"","code":"get_reprtree_from_rf_model( Data = my_data, path_db = \"path/to/database\", rat_studies = TRUE, studyid_metadata = my_metadata, fake_study = FALSE, use_xpt_file = TRUE, Round = TRUE, Impute = TRUE, reps = 5, holdback = 0.3, Undersample = TRUE, hyperparameter_tuning = FALSE, error_correction_method = \"Flip\" ) #> Error: object 'my_data' not found"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_input_param_list_output_cv_imp.html","id":null,"dir":"Reference","previous_headings":"","what":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","title":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","text":"function prepares data training Random Forest (RF) model cross-validation, handles imputation, hyperparameter tuning, evaluates model's performance. supports real fake study data, options rat studies, error correction, feature importance selection.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_input_param_list_output_cv_imp.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","text":"","code":"get_rf_input_param_list_output_cv_imp( path_db, rat_studies = FALSE, studyid_metadata, fake_study = FALSE, use_xpt_file = FALSE, Round = FALSE, Impute = FALSE, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method, best.m = NULL, testReps, indeterminateUpper, indeterminateLower, Type, nTopImportance )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_input_param_list_output_cv_imp.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","text":"path_db character string specifying path SQLite database directory containing XPT file. rat_studies logical value indicating whether filter rat studies. Default FALSE. studyid_metadata data frame containing metadata studies. fake_study logical value indicating whether use fake study data. Default FALSE. use_xpt_file logical value indicating whether use XPT file data. Default FALSE. Round logical value indicating whether round liver scores. Default FALSE. Impute logical value indicating whether impute missing values. Default FALSE. reps integer specifying number repetitions model evaluation. holdback numeric value specifying proportion data hold back validation. Undersample logical value indicating whether undersample data balance classes. Default FALSE. hyperparameter_tuning logical value indicating whether tune Random Forest model's hyperparameters. Default FALSE. error_correction_method character string specifying error correction method. Options 'Flip', 'Prune', 'None'. best.m numeric value specifying number trees Random Forest model. NULL, function determines automatically. testReps integer specifying number test repetitions model evaluation. indeterminateUpper numeric value upper threshold indeterminate predictions. indeterminateLower numeric value lower threshold indeterminate predictions. Type character string specifying type Random Forest model use. Options include 'classification' 'regression'. nTopImportance integer specifying number top important features consider model.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_input_param_list_output_cv_imp.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","text":"list containing trained Random Forest model, cross-validation results, feature importance scores. list returned get_rf_model_with_cv function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_input_param_list_output_cv_imp.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","text":"function performs following steps: Fetches study data based specified parameters. Calculates liver scores harmonizes data. Prepares data machine learning, including imputation optional hyperparameter tuning. Trains evaluates Random Forest model cross-validation. Applies error correction (specified) selects important features.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_input_param_list_output_cv_imp.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","text":"","code":"# Example usage of the function result <- get_rf_input_param_list_output_cv_imp( path_db = \"path/to/database\", rat_studies = TRUE, studyid_metadata = metadata_df, fake_study = FALSE, use_xpt_file = FALSE, Round = TRUE, Impute = TRUE, reps = 10, holdback = 0.2, Undersample = TRUE, hyperparameter_tuning = TRUE, error_correction_method = \"Flip\", best.m = NULL, testReps = 5, indeterminateUpper = 0.9, indeterminateLower = 0.1, Type = \"classification\", nTopImportance = 10 ) #> Error in get_repeat_dose_parallel_studyids(path_db = path_db, rat_studies = rat_studies): Database file not found at the specified path!"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_output_cv_imp.html","id":null,"dir":"Reference","previous_headings":"","what":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","title":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","text":"function performs cross-validation Random Forest model, tracks performance metrics (sensitivity, specificity, accuracy), handles indeterminate predictions, computes feature importance based either Gini Accuracy. function returns performance summaries feature importance rankings specified number test repetitions.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_output_cv_imp.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","text":"","code":"get_rf_model_output_cv_imp( scores_df = NULL, Undersample = FALSE, best.m = NULL, testReps, indeterminateUpper, indeterminateLower, Type, nTopImportance )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_output_cv_imp.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","text":"scores_df data frame containing features target variable training testing model. Undersample logical flag indicating whether apply undersampling training data. Defaults FALSE. best.m numeric value representing number features sample Random Forest model, NULL calculate automatically. testReps integer specifying number repetitions cross-validation. Must least 2. indeterminateUpper numeric threshold predictions considered indeterminate. indeterminateLower numeric threshold predictions considered indeterminate. Type integer specifying type importance compute. 1 MeanDecreaseAccuracy, 2 MeanDecreaseGini. nTopImportance integer specifying number top features display based importance scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_output_cv_imp.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","text":"list following elements: performance_metrics vector aggregated performance metrics (e.g., sensitivity, specificity, accuracy, etc.). feature_importance matrix containing importance top nTopImportance features, ordered importance score. raw_results list containing raw results debugging analysis, including sensitivity, specificity, accuracy, Gini scores across test repetitions.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_output_cv_imp.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","text":"function splits input data training testing sets based specified number test repetitions (testReps). iteration, trains Random Forest model makes predictions test data. Indeterminate predictions handled marking NA. function tracks performance metrics sensitivity, specificity, accuracy, computes top nTopImportance features based either Mean Decrease Accuracy Mean Decrease Gini.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_output_cv_imp.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","text":"","code":"# Example usage of the function result <- get_rf_model_output_cv_imp( scores_df = your_data, Undersample = FALSE, best.m = 3, testReps = 5, indeterminateUpper = 0.8, indeterminateLower = 0.2, Type = 1, nTopImportance = 10 ) #> Error: object 'your_data' not found # View performance metrics print(result$performance_metrics) #> Error: object 'result' not found # View top features by importance print(result$feature_importance) #> Error: object 'result' not found"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_with_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Random Forest with Cross-Validation — get_rf_model_with_cv","title":"Random Forest with Cross-Validation — get_rf_model_with_cv","text":"function builds random forest model using randomForest package, evaluates cross-validation, computes performance metrics sensitivity, specificity, accuracy. optionally applies undersampling handle class imbalance supports custom settings number predictors sampled split.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_with_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Random Forest with Cross-Validation — get_rf_model_with_cv","text":"","code":"get_rf_model_with_cv(Data, Undersample = FALSE, best.m = NULL, testReps, Type)"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_with_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Random Forest with Cross-Validation — get_rf_model_with_cv","text":"Data Mandatory, data frame input dataset, must include column named Target_Organ response variable. Undersample Optional, logical TRUE, balances dataset undersampling majority class. Default FALSE. best.m Optional, numeric NULL Specifies number predictors sampled split. NULL, default value randomForest used. testReps Mandatory, integer number cross-validation repetitions. Must least 2. Type Mandatory, numeric Specifies importance metric type: 1 Mean Decrease Accuracy 2 Gini.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_with_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Random Forest with Cross-Validation — get_rf_model_with_cv","text":"list following elements: performance_metrics: vector aggregated performance metrics, including sensitivity, specificity, accuracy. raw_results: list containing raw sensitivity, specificity, accuracy values cross-validation fold.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_with_cv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Random Forest with Cross-Validation — get_rf_model_with_cv","text":"function splits input data training testing subsets based specified testReps cross-validation folds. undersampling enabled, function balances training set reduce class imbalance. random forest model trained training set, predictions evaluated test set. results aggregated provide summary performance metrics.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_with_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Random Forest with Cross-Validation — get_rf_model_with_cv","text":"","code":"# Load necessary libraries library(randomForest) #> randomForest 4.7-1.2 #> Type rfNews() to see new features/changes/bug fixes. library(caret) #> Loading required package: ggplot2 #> #> Attaching package: 'ggplot2' #> The following object is masked from 'package:randomForest': #> #> margin #> Loading required package: lattice # Example dataset data(iris) iris$Target_Organ <- ifelse(iris$Species == \"setosa\", 1, 0) iris <- iris[, -5] # Remove Species column # Run the function results <- get_rf_model_with_cv(Data = iris, Undersample = TRUE, best.m = 2, testReps = 5, Type = 2) #> Warning: The response has five or fewer unique values. Are you sure you want to do regression? #> Error in randomForest.default(m, y, ...): data (x) has 0 rows # Print results print(results$performance_metrics) #> Error: object 'results' not found"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_zone_exclusioned_rf_model_with_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Random Forest Model with Cross-validation and Exclusion — get_zone_exclusioned_rf_model_with_cv","title":"Random Forest Model with Cross-validation and Exclusion — get_zone_exclusioned_rf_model_with_cv","text":"function implements Random Forest classification model cross-validation allows undersampling, handling indeterminate predictions, calculating various model performance metrics sensitivity, specificity, accuracy. tracks proportion indeterminate predictions provides aggregated performance summary across multiple test repetitions.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_zone_exclusioned_rf_model_with_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Random Forest Model with Cross-validation and Exclusion — get_zone_exclusioned_rf_model_with_cv","text":"","code":"get_zone_exclusioned_rf_model_with_cv( Data = NULL, Undersample = FALSE, best.m = NULL, testReps, indeterminateUpper, indeterminateLower, Type )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_zone_exclusioned_rf_model_with_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Random Forest Model with Cross-validation and Exclusion — get_zone_exclusioned_rf_model_with_cv","text":"Data data frame containing features target variable Target_Organ train Random Forest model . Undersample logical value indicating whether perform undersampling balance classes training data. Defaults FALSE. best.m numeric value representing best number variables (mytry) use split Random Forest model. can manually set determined optimization. testReps integer specifying number test repetitions. must least 2, function relies multiple test sets assess model performance. indeterminateUpper numeric value indicating upper bound predicted probability consider prediction indeterminate. Predictions probabilities within range marked indeterminate. indeterminateLower numeric value indicating lower bound predicted probability consider prediction indeterminate. Predictions probabilities within range marked indeterminate. Type integer indicating type feature importance use Random Forest model. Typically, 1 \"Mean Decrease Accuracy\" 2 \"Mean Decrease Gini\".","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_zone_exclusioned_rf_model_with_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Random Forest Model with Cross-validation and Exclusion — get_zone_exclusioned_rf_model_with_cv","text":"list containing two components: performance_metrics vector aggregated performance metrics, including sensitivity, specificity, accuracy, others, calculated across test repetitions. raw_results list containing raw performance metrics repetition, including sensitivity, specificity, accuracy.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_zone_exclusioned_rf_model_with_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Random Forest Model with Cross-validation and Exclusion — get_zone_exclusioned_rf_model_with_cv","text":"","code":"if (FALSE) { # \\dontrun{ # Example usage Data <- your_data_frame # Replace with actual dataset results <- get_zone_exclusioned_rf_model_with_cv(Data = Data, Undersample = TRUE, best.m = 5, testReps = 10, indeterminateUpper = 0.8, indeterminateLower = 0.2, Type = 1) # View the aggregated performance metrics print(results$performance_metrics) # Access raw results for further analysis print(results$raw_results) } # }"}]
+[{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Compile_Data_calculation_documentation.html","id":"overview","dir":"Articles","previous_headings":"Function: get_compile_data","what":"Overview","title":"get_compile_data","text":"get_compile_data versatile function designed filter toxicokinetic (TK) recovery animals study database. function takes various inputs specify study ID database path, includes options handling SQLite .xpt file formats, particularly dealing data generated SENDsanitizer package. primary aim clean compile study data, ensuring results focused target set animals.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Compile_Data_calculation_documentation.html","id":"function-parameters","dir":"Articles","previous_headings":"Function: get_compile_data","what":"Function Parameters","title":"get_compile_data","text":"studyid (Mandatory, Character): study ID number, uniquely identifies study within database. path_db (Mandatory, Character): path database file. path SQLite database directory containing .xpt files. fake_study (Optional, Boolean): Indicates study data generated using SENDsanitizer package. Defaults FALSE. use_xpt_file (Optional, Boolean): Specifies whether use .xpt file format dealing data generated SENDsanitizer package. Defaults FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Compile_Data_calculation_documentation.html","id":"return-value","dir":"Articles","previous_headings":"Function: get_compile_data","what":"Return Value","title":"get_compile_data","text":"function returns cleaned compiled data frame filtered study data, ready analysis.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Compile_Data_calculation_documentation.html","id":"key-features","dir":"Articles","previous_headings":"Function: get_compile_data","what":"Key Features","title":"get_compile_data","text":"Data Fetching: Establishes connection database (reads .xpt files) retrieves necessary domain data dm (Demographics) ts (Trial Summary). Data Processing: Performs data transformations, including renaming columns, updating values, selecting relevant columns. Converts “Control” groups “vehicle” retains animals “vehicle” “HD” (high-dose) groups. Removes recovery animals using ds (Disposition) domain. Excludes toxicokinetic (TK) animals rat studies analyzing pp (Pharmacokinetics) pooldef domains. Species Detection Handling: Extracts species information ts domain customize filtering logic different species.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Compile_Data_calculation_documentation.html","id":"example-usage","dir":"Articles","previous_headings":"Function: get_compile_data","what":"Example Usage","title":"get_compile_data","text":"#```{r} # Example call get_compile_data # Note: Replace ‘path//database.db’ actual path database #get_compile_data(studyid = ‘1234123’, path_db = ‘path//database.db’)","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_auc_curve_with_rf_model.html","id":"purpose","dir":"Articles","previous_headings":"Function: get_auc_curve_with_rf_model","what":"Purpose","title":"Documentation for get_auc_curve_with_rf_model","text":"function get_auc_curve_with_rf_model designed train Random Forest model using provided dataset, optionally SQLite database. computes visualizes ROC curve along AUC (Area Curve) metric. function offers various options handling data preprocessing, including hyperparameter tuning, imputation, undersampling, outputs model performance via ROC curve.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_auc_curve_with_rf_model.html","id":"input-parameters","dir":"Articles","previous_headings":"Function: get_auc_curve_with_rf_model","what":"Input Parameters","title":"Documentation for get_auc_curve_with_rf_model","text":"function accepts following parameters:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_auc_curve_with_rf_model.html","id":"output","dir":"Articles","previous_headings":"Function: get_auc_curve_with_rf_model","what":"Output","title":"Documentation for get_auc_curve_with_rf_model","text":"function return explicit values. However, generates following outputs: AUC Value: AUC ROC curve printed console. ROC Curve Plot: ROC curve displayed, showing model’s performance computed AUC value. Performance Metrics: performance metrics (e.g., True Positive Rate, False Positive Rate) computed returned directly.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_auc_curve_with_rf_model.html","id":"key-steps","dir":"Articles","previous_headings":"Function: get_auc_curve_with_rf_model","what":"Key Steps","title":"Documentation for get_auc_curve_with_rf_model","text":"Data provided, function fetches data either SQLite database generates synthetic data (fake_study TRUE). use_xpt_file TRUE, fetches data specified XPT files. function performs data preprocessing, including imputation (Impute TRUE), rounding (Round TRUE), undersampling (Undersample TRUE). harmonizes liver scores prepares data machine learning. function prepares data Random Forest (RF) modeling, tuning hyperparameters hyperparameter_tuning enabled. Random Forest model trained using prepared data, predictions generated. model’s performance evaluated computing AUC (Area Curve) plotting ROC curve. AUC printed console, ROC curve displayed calculated AUC value. specified, function applies error correction method (error_correction_method) performs hyperparameter tuning optimize model.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation for `get_bw_score` Function","text":"get_bw_score function designed normalize body-weight (BW) subject (animal) termed ‘USUBJID’ SEND database, using Z-scoring method. Z-scoring basic method Z-scored continuous data like body weight, clinical pathology, lab test results transforming value many standard deviations control mean.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"z-score-calculation","dir":"Articles","previous_headings":"","what":"Z-Score Calculation","title":"Documentation for `get_bw_score` Function","text":"Z-score normalization performed shown Equation : Zs,=xs,−μs,cσs,c Z_{s,} = \\frac{x_{s,} - \\mu_{s,c}}{\\sigma_{s,c}} : - xs,ix_{s,} observed endpoint value individual ii study ss, - μs,c\\mu_{s,c} mean value observed endpoint control group cc study ss, - σs,c\\sigma_{s,c} standard deviation observed endpoint control group cc study ss, - ss study identifier, - ii refers individual animal study, - cc refers control-treated group animals within study.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"return-value","dir":"Articles","previous_headings":"","what":"Return Value","title":"Documentation for `get_bw_score` Function","text":"data.frame containing calculated BW Z-scores. structure output depends provided parameters: return_individual_scores = TRUE: Returns averaged Z-scores domain per studyid. return_zscore_by_USUBJID = TRUE: Returns Z-score animal/subject USUBJID domain per studyid. Otherwise, summarized BW Z-score specified studyid.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"implementation-details","dir":"Articles","previous_headings":"","what":"Implementation Details","title":"Documentation for `get_bw_score` Function","text":"get_bw_score function follows systematic approach calculate BW Z-score given study ID. key steps involved:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"database-connection","dir":"Articles","previous_headings":"Implementation Details","what":"Database Connection","title":"Documentation for `get_bw_score` Function","text":"function establishes connection specified SQLite database using RSQLite package processes .xpt files depending value use_xpt_file parameter: use_xpt_file = TRUE: Data loaded .xpt files located folder specified path_db. use_xpt_file = FALSE: Data extracted SQLite database file located path_db.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"data-retrieval","dir":"Articles","previous_headings":"Implementation Details","what":"Data Retrieval","title":"Documentation for `get_bw_score` Function","text":"function retrieves necessary data related specified studyid. data retrieval process depends whether master_compiledata provided NULL: master_compiledata = NULL, master_compiledata provided, function extracts data following SEND domains: BW (Body Weight) : Provide Body Weight measurements individual level. DM (Demographics): Supplies animal-level demographic details. DS (Disposition): Identifies recovery animals using DSDECOD column. PC (Pharmacokinetics): Provide USUBJID TK animals rats mice study. TX (Treatment): Provide dose levels information “vehicle” “HD.” master_compiledata Provided, master_compiledata value provided, function retrieve following domains: BW (Body Weight) : Provide Body Weight measurements individual level.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"baseline-weight-adjustment","dir":"Articles","previous_headings":"Implementation Details","what":"Baseline Weight Adjustment","title":"Documentation for `get_bw_score` Function","text":"weight animal normalized subtracting baseline weight recorded first day dosing.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"z-score-normalization","dir":"Articles","previous_headings":"Implementation Details","what":"Z-Score Normalization","title":"Documentation for `get_bw_score` Function","text":"adjusted weights normalized using Z-score equation described “Z-Score Calculation” section .","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"handling-optional-parameters","dir":"Articles","previous_headings":"Implementation Details","what":"Handling Optional Parameters","title":"Documentation for `get_bw_score` Function","text":"return_individual_scores = TRUE, Returns averaged Z-scores domain per studyid. return_zscore_by_USUBJID = TRUE, Returns Z-score animal/subject unique subject identifiersUSUBJID.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"fake-study-handling","dir":"Articles","previous_headings":"Implementation Details","what":"Fake Study Handling","title":"Documentation for `get_bw_score` Function","text":"fake_study = TRUE, special handling applied data sets generated SENDsanitizer package account structure.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"output-generation","dir":"Articles","previous_headings":"Implementation Details","what":"Output Generation","title":"Documentation for `get_bw_score` Function","text":"data frame containing requested scores returned. may include summarized scores, individual scores, Z-scores USUBJID, based parameters provided.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"dependencies","dir":"Articles","previous_headings":"Implementation Details","what":"Dependencies","title":"Documentation for `get_bw_score` Function","text":"function requires following R packages: RSQLite: connect SQLite database. haven : read .xpt file, use_xpt_file = TRUE. implementation ensures flexibility handling different input types configurations maintaining consistent structure output.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_bw_score.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for `get_bw_score` Function","text":"","code":"# Example 1: Basic usage get_bw_score(studyid = '1234123', path_db = 'path/to/database.db') # Example 2: Include individual scores get_bw_score(studyid = '1234123', path_db = 'path/to/database.db', return_individual_scores = TRUE) # Example 3: Include z-scores by USUBJID get_bw_score(studyid = '1234123', path_db = 'path/to/database.db', return_zscore_by_USUBJID = TRUE)"},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_col_harmonized_scores_df.html","id":"description","dir":"Articles","previous_headings":"Function: get_col_harmonized_scores_df","what":"Description","title":"Function Documentation: get_col_harmonized_scores_df","text":"function takes data frame containing liver score data, harmonizes column names, handles missing values, performs optional rounding specific score columns. aims standardize clean data analysis : - Replacing spaces, commas, slashes column names dots. - Handling missing values replacing zero. - Harmonizing columns similar meanings (synonyms). - Removing unwanted columns. - Optionally rounding columns related liver scores histology scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_col_harmonized_scores_df.html","id":"parameters","dir":"Articles","previous_headings":"Function: get_col_harmonized_scores_df","what":"Parameters","title":"Function Documentation: get_col_harmonized_scores_df","text":"liver_score_data_frame (data.frame): data frame containing liver score data column names may need harmonization. Round (logical, default = FALSE): TRUE, function round values certain columns based specific rules.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_col_harmonized_scores_df.html","id":"details","dir":"Articles","previous_headings":"Function: get_col_harmonized_scores_df","what":"Details","title":"Function Documentation: get_col_harmonized_scores_df","text":"Spaces, commas, slashes column names replaced dots. Missing values (NA) replaced zeros. Columns similar meanings (synonyms) identified harmonized replacing values higher value . Specific columns ‘STUDYID’, ‘UNREMARKABLE’, ‘THIKENING’, ‘POSITIVE’ excluded harmonization. Liver-related columns (avg_, liver) floored nearest integer capped 5. Histology-related columns ceiled nearest integer. Columns reordered based sum values (excluding first column). Columns higher sums moved left, ensuring “important” columns appear first. Columns related specific endpoints (e.g., ‘INFILTRATE’, ‘UNREMARKABLE’, ‘THIKENING’, ‘POSITIVE’) removed final data frame.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_col_harmonized_scores_df.html","id":"return-value","dir":"Articles","previous_headings":"Function: get_col_harmonized_scores_df","what":"Return Value","title":"Function Documentation: get_col_harmonized_scores_df","text":"data frame harmonized columns, optional rounding applied, columns ordered based sum values.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_col_harmonized_scores_df.html","id":"example-usage","dir":"Articles","previous_headings":"Function: get_col_harmonized_scores_df","what":"Example Usage","title":"Function Documentation: get_col_harmonized_scores_df","text":"``r # Sample liver score data frame liver_scores <- data.frame( STUDYID = c(1, 2, 3), INFILTRATE = c(0, 1, 0), avg_Liver = c(3.5, 4.2, 2.1), POSITIVE = c(0, 0, 1),Thickening` = c(0, 0, 1), Liver_to_BW_zscore = c(3, 2, 4) )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_col_harmonized_scores_df.html","id":"call-the-function-with-round-true","dir":"Articles","previous_headings":"","what":"Call the function with Round = TRUE","title":"Function Documentation: get_col_harmonized_scores_df","text":"result <- get_col_harmonized_scores_df(liver_score_data_frame = liver_scores, Round = TRUE)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation for 'get_compile_data' Function","text":"get_compile_data function retrieves cleans study data DM (Demographics) domain applying multiple filtering steps compiles remaining data cleaned format. First, removes recovery animals filtering DM data using information DS (Disposition) domain. Additionally, study involves rats mice,function filters toxicokinetic animals excluding USUBJIDs present Pharmacokinetic (PC) domain. steps ensure data set excludes recovery animals toxicokinetic (TK) animals, focusing onthe target population relevant study’s primary analysis.function supports data retrieval SQLite databases .xpt files.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"return-value","dir":"Articles","previous_headings":"","what":"Return Value","title":"Documentation for 'get_compile_data' Function","text":"Returns cleaned data.frame following columns: STUDYID USUBJID Species SEX ARMCD SETCD cleaned data now ready used analysis.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"implementation-details","dir":"Articles","previous_headings":"","what":"Implementation Details","title":"Documentation for 'get_compile_data' Function","text":"get_compile_data function leverages following steps calculate compile_data data frame given study:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"database-connection","dir":"Articles","previous_headings":"Implementation Details","what":"Database Connection","title":"Documentation for 'get_compile_data' Function","text":"-function connects SQLite database reads .xpt files specified path_db.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"data-fetching","dir":"Articles","previous_headings":"Implementation Details","what":"Data Fetching","title":"Documentation for 'get_compile_data' Function","text":"function retrieves data following SEND domains based input parameters: DM (Demographics): Provides animal-level information. DS (Disposition): Identifies recovery animals using DSDECOD column. PC (Pharmacokinetics): Excludes TK animals rats mice based USUBJID. TX (Treatment): Determines dose levels “vehicle” “HD.”","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"filtering-steps","dir":"Articles","previous_headings":"Implementation Details","what":"Filtering Steps","title":"Documentation for 'get_compile_data' Function","text":"Filtering Recovery Animals Recovery animals excluded filtering DM data based DSDECOD values DS domain. Filtering Toxicokinetic (TK) Animals studies involving rats mice, function removes animals whose USUBJID appears PC domain. Dose Selection function identifies retains animals assigned either “vehicle” group “high-dose” (HD) group applying dose-ranking logic TX domain, “Control” groups reclassified “vehicle.”","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"examples-usage","dir":"Articles","previous_headings":"","what":"Examples Usage","title":"Documentation for 'get_compile_data' Function","text":"","code":"# Example usage with SQLite database df <- get_compile_data( studyid = \"1234123\", path_db = \"path/to/database.db\" ) # Example usage with .xpt files df <- get_compile_data( studyid = \"1234123\", path_db = \"path/to/files\", fake_study = TRUE, use_xpt_file = TRUE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"required-libraries","dir":"Articles","previous_headings":"","what":"Required Libraries","title":"Documentation for 'get_compile_data' Function","text":"function requires following R packages: DBI RSQLite data.table dplyr haven tidyr stringr ##Notes function assumes standard SEND domains column names. non-standard data, adjustments may needed. Check database .xpt files ensure compatibility function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_compile_data.html","id":"see-also","dir":"Articles","previous_headings":"","what":"See Also","title":"Documentation for 'get_compile_data' Function","text":"DBI RSQLite data.table SENDsanitizer","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_Data_formatted_for_ml_and_best.m.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation for `get_Data_formatted_for_ml_and_best.m` Function","text":"function get_Data_formatted_for_ml_and_best.m designed retrieve preprocess data machine learning (ML) models given SQLite database XPT file. performs several tasks fetching study IDs, retrieving study metadata, calculating liver toxicity scores, tuning hyperparameters ML models. final output list containing processed data ready machine learning best model.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_Data_formatted_for_ml_and_best.m.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Documentation for `get_Data_formatted_for_ml_and_best.m` Function","text":"function returns list following elements: Data: data frame containing preprocessed data ready machine learning. best.m: best machine learning model hyperparameter tuning, applicable.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_Data_formatted_for_ml_and_best.m.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"Documentation for `get_Data_formatted_for_ml_and_best.m` Function","text":"use_xpt_file TRUE, retrieves study IDs directories within specified path. use_xpt_file FALSE fake_study TRUE, function connects SQLite database retrieves study IDs ‘dm’ table. fake_study FALSE, fetches repeat-dose parallel study IDs database. studyid_metadata provided, generates metadata selecting unique study IDs assigning random “Target_Organ” values (either “Liver” “not_Liver”). function calculates liver toxicity scores using get_liver_om_lb_mi_tox_score_list function. calculated liver toxicity scores harmonized using get_col_harmonized_scores_df function, optionally rounding based Round parameter. function prepares data machine learning performs hyperparameter tuning (hyperparameter_tuning TRUE) using get_ml_data_and_tuned_hyperparameters function. final output consists processed data best machine learning model (best.m).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_Data_formatted_for_ml_and_best.m.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for `get_Data_formatted_for_ml_and_best.m` Function","text":"```r result <- get_Data_formatted_for_ml_and_best.m( path_db = “path//database.db”, rat_studies = TRUE, reps = 5, holdback = 0.2, error_correction_method = “Flip” )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_Data_formatted_for_ml_and_best.m.html","id":"access-the-processed-data-and-the-best-model","dir":"Articles","previous_headings":"","what":"Access the processed data and the best model","title":"Documentation for `get_Data_formatted_for_ml_and_best.m` Function","text":"processed_data <- resultDatabestmodel<−resultData best_model <- resultbest.m","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_histogram_barplot.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation for `get_histogram_barplot` function","text":"get_histogram_barplot function designed generate bar plot displaying liver-related scores, based data either provided directly fetched SQLite database. calculates mean values specific findings, compares liver-related non-liver-related groups, produces either plot processed data frame depending function’s parameters.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_histogram_barplot.html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"Documentation for `get_histogram_barplot` function","text":"function accepts following parameters:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_histogram_barplot.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Documentation for `get_histogram_barplot` function","text":"generateBarPlot = TRUE: function returns ggplot2 bar plot object displaying average scores liver-related findings versus non-liver-related findings. generateBarPlot = FALSE: function returns data.frame (plotData) containing calculated values finding, columns finding, liver status (LIVER), mean values (Value).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_histogram_barplot.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"Documentation for `get_histogram_barplot` function","text":"data provided, function attempts fetch data SQLite database use fake study dataset. fetches study data dm domain database fake_study = FALSE. study IDs extracted, filtered liver-related studies, used subsequent score calculations. get_liver_om_lb_mi_tox_score_list function calculates liver scores provided study IDs. resulting data harmonized using get_col_harmonized_scores_df ensure consistency output data frame. generateBarPlot = TRUE, function iterates findings computes average liver-related score (Liver status) finding. generates ggplot2 bar plot findings x-axis, average values y-axis, distinct colors representing liver vs. non-liver status. function checks whether Data parameter valid data frame. , error thrown.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_histogram_barplot.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for `get_histogram_barplot` function","text":"```r # Example fake study data, generating bar plot get_histogram_barplot(generateBarPlot = TRUE, fake_study = TRUE)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_histogram_barplot.html","id":"example-with-real-study-data-without-generating-a-plot","dir":"Articles","previous_headings":"","what":"Example with real study data, without generating a plot","title":"Documentation for `get_histogram_barplot` function","text":"data <- get_histogram_barplot(generateBarPlot = FALSE, fake_study = FALSE, path_db = “path//db”)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_imp_features_from_rf_model_with_cv.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Function Documentation: get_imp_features_from_rf_model_with_cv","text":"get_imp_features_from_rf_model_with_cv function performs cross-validation test repetitions random forest model, calculates feature importance using Gini importance, returns top n important features. primarily used evaluating feature importance classification tasks utilizing Random Forest optional -sampling custom test repetitions.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_imp_features_from_rf_model_with_cv.html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"Function Documentation: get_imp_features_from_rf_model_with_cv","text":"function accepts following parameters: Data: data frame containing training data (typically rows samples columns features). first column assumed target variable. Undersample: logical value (TRUE FALSE) indicating whether apply -sampling balance classes training data. Default FALSE. best.m: numeric value representing number variables considered split Random Forest model (function determine ). Default NULL. testReps: numeric value indicating number test repetitions (must least 2). Type: numeric value indicating type importance calculated. 1 Mean Decrease Accuracy 2 Mean Decrease Gini. nTopImportance: numeric value indicating number top important features return based importance scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_imp_features_from_rf_model_with_cv.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Function Documentation: get_imp_features_from_rf_model_with_cv","text":"function returns list containing: gini_scores: matrix Gini importance scores feature across different cross-validation iterations. matrix rows representing features columns representing test iterations.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_imp_features_from_rf_model_with_cv.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"Function Documentation: get_imp_features_from_rf_model_with_cv","text":"Initialize Metrics: function starts defining several empty vectors track performance metrics like Sensitivity, Specificity, PPV, NPV, others, initialized used current version. Prepare Data: function prepares data renaming columns input Data consistency initializing new data frame (rfTestData) store prediction results across iterations. Cross-Validation Setup: function sets cross-validation loop test repetitions. repetition, selects random subset data test uses rest training. Optionally, -sampling can applied balance dataset. Model Training: Random Forest model trained training data iteration using randomForest package. uses specified value best.m control number variables considered split. Calculate Gini Importance: training model, Gini importance scores calculated feature using randomForest::importance function. Gini scores aggregated across test repetitions. Aggregate Sort Importance Scores: completing cross-validation iterations, mean Gini importance scores feature calculated sorted decreasing order. Plot Feature Importance: dotchart generated visualize top nTopImportance features based importance scores. Return Results: function returns list containing Gini importance scores across iterations. ```r # Example call function result <- get_imp_features_from_rf_model_with_cv( Data = scores_df, Undersample = FALSE, best.m = 3, testReps = 5, Type = 2, nTopImportance = 10 )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_lb_score.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation for `get_lb_score` Function","text":"laboratory test (LB) data, Z-scores also calculated six key enzymes commonly found blood serum indicative liver function: Bilirubin, Albumin (ALB), Alanine Aminotransferase (ALT), Alkaline Phosphatase (ALP), Aspartate Aminotransferase (AST), Gamma-Glutamyl Transferase (GGT). enzymes serve important biomarkers detecting liver damage dysfunction. get_lb_score function computes liver biomarker z-scores clinical studies, utilizing data database .xpt file. processes lab data (lb domain) calculates z-scores several liver biomarkers (e.g., ALT, AST, ALP, GGT, BILI, ALB) based study data, performing several transformations filtering operations prepare data.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_lb_score.html","id":"function-definition","dir":"Articles","previous_headings":"","what":"Function Definition","title":"Documentation for `get_lb_score` Function","text":"","code":"get_lb_score <- function(studyid = NULL, path_db, fake_study = FALSE, use_xpt_file = FALSE, master_compiledata = NULL, return_individual_scores = FALSE, return_zscore_by_USUBJID = FALSE) { # Function body goes here }"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_lb_score.html","id":"parameters","dir":"Articles","previous_headings":"","what":"Parameters","title":"Documentation for `get_lb_score` Function","text":"studyid (character): study ID filter data . Default NULL. path_db (character): file path database (SQLite .xpt file). fake_study (logical): flag indicate study fake . Default FALSE. use_xpt_file (logical): Whether use .xpt file data extraction. Default FALSE. master_compiledata (data.frame): compile data frame includes participant information. NULL, function call get_compile_data. return_individual_scores (logical): Whether return individual z-scores biomarker. Default FALSE. return_zscore_by_USUBJID (logical): TRUE, return z-scores USUBJID (unique subject ID). Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_lb_score.html","id":"workflow","dir":"Articles","previous_headings":"","what":"Workflow","title":"Documentation for `get_lb_score` Function","text":"function first fetches data either SQLite database .xpt file depending value use_xpt_file. lab data (lb domain) fetched specified studyid. Various filtering operations applied based biomarker study conditions. LBSPEC field populated necessary (e.g., “WHOLE BLOOD”, “SERUM”, “URINE”). liver biomarker (ALT, AST, ALP, GGT, BILI, ALB), function computes z-score using formula: z=racextLBSTRESN−extmeanextvehicleextsdextvehicle z = rac{{ ext{{LBSTRESN}} - ext{{mean}}_{ ext{{vehicle}}}}}{{ ext{{sd}}_{ ext{{vehicle}}}}} z-scores averaged STUDYID classified discrete categories (0, 1, 2, 3) based predefined thresholds.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_lb_score.html","id":"merging-results","dir":"Articles","previous_headings":"","what":"Merging Results","title":"Documentation for `get_lb_score` Function","text":"individual z-scores biomarker merged single data frame. resulting data frame can returned: - USUBJID (unique subject ID), return_zscore_by_USUBJID TRUE. - study (STUDYID), z-scores averaged across subjects study.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_lb_score.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for `get_lb_score` Function","text":"","code":"# Example 1: Run the function with a given study ID and database path result <- get_lb_score(studyid = \"12345\", path_db = \"path_to_database\") # Example 2: Use the function with .xpt file instead of SQLite database result_xpt <- get_lb_score(studyid = \"12345\", path_db = \"path_to_xpt_file\", use_xpt_file = TRUE) # Example 3: Return individual biomarker z-scores individual_scores <- get_lb_score(studyid = \"12345\", path_db = \"path_to_database\", return_individual_scores = TRUE) # Example 4: Return z-scores by subject (USUBJID) subject_zscores <- get_lb_score(studyid = \"12345\", path_db = \"path_to_database\", return_zscore_by_USUBJID = TRUE)"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Documentation for 'get_livertobw_score' Function","text":"get_livertobw_score function designed calculate liver--body-weight (Liver:BW) scores corresponding z-scores study data. function supports data retrieval SQLite databases .xpt files provides options return individual scores, USUBJID-specific z-scores, averaged scores study. weight animal end dosing period normalized subtracting baseline weight measured first day dosing. Following , liver weight body weight ratio calculated animal. liver--body weight ratios normalized using Z-scores, comparisons made respective control group study","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"inputs","dir":"Articles","previous_headings":"Function Parameters","what":"Inputs","title":"Documentation for 'get_livertobw_score' Function","text":"Identifier study interest. NULL, studies database considered. Path SQLite database .xpt files. Indicator handling fake test study data. TRUE, reads data .xpt files. Otherwise, fetches data SQLite database. Precompiled dataset study information. provided, fetched using get_compile_data(). Precomputed body weight z-scores. provided, calculated using get_bw_score(). TRUE, returns individual z-scores averaged study. TRUE, returns z-scores grouped USUBJID.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"outputs","dir":"Articles","previous_headings":"Function Parameters","what":"Outputs","title":"Documentation for 'get_livertobw_score' Function","text":"Liver:BW z-scores grouped study (return_individual_scores = TRUE). Z-scores USUBJID (return_zscore_by_USUBJID = TRUE). Averaged z-scores study (default).","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"data-preparation","dir":"Articles","previous_headings":"Workflow","what":"1. Data Preparation","title":"Documentation for 'get_livertobw_score' Function","text":"Connects SQLite database using DBI use_xpt_file = FALSE. Retrieves data specified studyid using helper function fetch_domain_data(). master_compiledata provided, retrieved using get_compile_data(). bwzscore_BW provided, calculated using get_bw_score().","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"data-extraction","dir":"Articles","previous_headings":"Workflow","what":"2. Data Extraction","title":"Documentation for 'get_livertobw_score' Function","text":"Filters liver-specific data OM domain. Removes test recovery animals based master_compiledata.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"liver-to-body-weight-calculations","dir":"Articles","previous_headings":"Workflow","what":"3. Liver-to-Body-Weight Calculations","title":"Documentation for 'get_livertobw_score' Function","text":"Computes liver weight--body-weight ratio (liverToBW). Calculates z-scores liverToBW using vehicle arm statistics (mean SD). Converts z-scores absolute values.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"score-computation","dir":"Articles","previous_headings":"Workflow","what":"4. Score Computation","title":"Documentation for 'get_livertobw_score' Function","text":"Validates return_individual_scores return_zscore_by_USUBJID TRUE. Individual study-level scores (return_individual_scores = TRUE). USUBJID-specific z-scores (return_zscore_by_USUBJID = TRUE). Default: Average z-scores study.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"output","dir":"Articles","previous_headings":"Workflow","what":"5. Output","title":"Documentation for 'get_livertobw_score' Function","text":"Returns data frame based selected output option.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_livertobw_score.html","id":"example-1-default-averaged-scores","dir":"Articles","previous_headings":"Examples","what":"Example 1: Default Averaged Scores","title":"Documentation for 'get_livertobw_score' Function","text":"```r path <- “path_to_database” study_id <- “STUDY123” result <- get_livertobw_score(studyid = study_id, path_db = path) head(result)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"function-overview","dir":"Articles","previous_headings":"","what":"Function Overview","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"get_liver_om_lb_mi_tox_score_list function calculates series liver organ toxicity scores, body weight z-scores, relevant metrics set studies XPT files. outputs results based user preferences individual scores, z-scores USUBJID, averaged scores multiple studies. function also manages data flow several steps, including fetching processing data, calculating scores, managing error handling.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"function-signature","dir":"Articles","previous_headings":"Function Overview","what":"Function Signature","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"","code":"get_liver_om_lb_mi_tox_score_list( studyid_or_studyids = FALSE, path_db, fake_study = FALSE, use_xpt_file = FALSE, output_individual_scores = FALSE, output_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"function-overview-1","dir":"Articles","previous_headings":"","what":"Function Overview","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"get_liver_om_lb_mi_tox_score_list R function designed process liver toxicity scores one studies. function calculates several scores related liver toxicity body weight, including: Body Weight Z-Score (BWZSCORE_avg) Liver Organ Body Weight Z-Score (liverToBW_avg) LB Score (LB_score_avg) MI Score (MI_score_avg) function can output individual scores, z-scores USUBJID, averaged scores. also includes error handling capture record issues processing.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"arguments","dir":"Articles","previous_headings":"","what":"Arguments","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"studyid_or_studyids (Character vector single study ID): character vector containing one study IDs process. multiple studies provided, function processes study sequentially. path_db (Character): Path database directory containing data files. fake_study (Logical, default: FALSE): boolean flag indicating study data simulated (TRUE) real (FALSE). use_xpt_file (Logical, default: FALSE): boolean flag indicating whether use XPT file study data. Default FALSE. output_individual_scores (Logical, default: FALSE): boolean flag indicating whether individual scores returned. Default FALSE. output_zscore_by_USUBJID (Logical, default: FALSE): boolean flag indicating whether output z-scores USUBJID. Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"details","dir":"Articles","previous_headings":"","what":"Details","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"function iterates study ID XPT folder processes data calculate various toxicity scores. Key calculation blocks include: Fetching Master Compile Data: function calls get_compile_data retrieve primary data study. Body Weight Z-Score Calculation: Using get_bw_score function, body weight z-scores calculated either individually averaged. Liver Organ Body Weight Z-Score Calculation: Using get_livertobw_score function, liver toxicity scores related body weight calculated. LB Score Calculation: get_lb_score function used calculate LB scores. MI Score Calculation: get_mi_score function used MI score calculation.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"key-calculation-blocks","dir":"Articles","previous_headings":"","what":"Key Calculation Blocks","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"Fetching Master Compile Data: block calls get_compile_data function retrieve primary data study. data essential subsequent calculations. Body Weight Z-Score Calculation: body weight z-scores calculated using get_bw_score function, result either returned individual scores averaged scores. Liver Organ Body Weight Z-Score Calculation: liver organ--body weight z-scores calculated using get_livertobw_score function. LB Score Calculation: get_lb_score function called calculate LB score study. MI Score Calculation: function calculates MI score using get_mi_score function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"error-handling","dir":"Articles","previous_headings":"","what":"Error Handling","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"calculation block wrapped tryCatch statement handle errors encountered execution. block fails, study ID added error list, function continues processing next study.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"return-value","dir":"Articles","previous_headings":"","what":"Return Value","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"function returns different outputs based flags passed: output_individual_scores = TRUE: function returns combined data frame individual scores study. output_zscore_by_USUBJID = TRUE: function returns data frame z-scores USUBJID study. neither flag set, function returns data frame averaged scores study.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_liver_om_lb_mi_tox_score_list.html","id":"example-1-get-averaged-scores-for-a-single-study","dir":"Articles","previous_headings":"","what":"Example 1: Get Averaged Scores for a Single Study","title":"Function Documentation: `get_liver_om_lb_mi_tox_score_list`","text":"example, call get_liver_om_lb_mi_tox_score_list function retrieve averaged scores single study. studyid_or_studyids argument set single study ID, path_db argument points location database.","code":"#result <- get_liver_om_lb_mi_tox_score_list( #studyid_or_studyids = \"Study_001\", #path_db = \"path/to/database\" #)"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_mi_score.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Function Documentation for `get_mi_score`","text":"Z-scores Microscopic Findings (histopathological findings) derived based incidence (frequency) severity liver-related lesions. Initially, score calculated purely severity findings, adjusted based incidence rate providing accurate reflection overall histopathological impact liver. get_mi_score function processes medical information (MI) data clinical study databases. calculates MI scores, manages severity levels, processes data according specified parameters.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_mi_score.html","id":"parameters-explanation","dir":"Articles","previous_headings":"Purpose","what":"Parameters Explanation","title":"Function Documentation for `get_mi_score`","text":"get_mi_score function accepts following parameters: ID study fetch data. NULL, fetch data studies database. path SQLite database folder containing XPT files. required access data. flag indicate whether process fake study dataset. TRUE, function may mock data retrieval. flag determine .xpt files used. TRUE, function read XPT files provided path. dataframe contains compiled study data. NULL, function fetch data database. TRUE, function return individual MI scores participant. Default FALSE. TRUE, function return Z-scores USUBJID. Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_mi_score.html","id":"key-steps-in-the-function","dir":"Articles","previous_headings":"Purpose","what":"Key Steps in the Function","title":"Function Documentation for `get_mi_score`","text":"function follows several key steps process MI data: use_xpt_file FALSE, function connects SQLite database fetch required domains (mi dm). Otherwise, reads XPT files specified directory. MI domain filtered include relevant records, liver-related issues. Severity levels (MISEV) standardized missing values replaced. Severity levels MISEV mapped numerical values (e.g., “MILD” becomes 2, “SEVERE” becomes 5). function merges data mi domain compiled study data, ensuring valid participants (marked “recovery” “tk”) included. MI scores calculated based cleaned merged data. return_individual_scores set TRUE, individual scores returned. final data frame containing MI scores generated cleaned . function returns either compiled MI score data , optionally, Z-scores individual participant scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_mi_score.html","id":"example-usage","dir":"Articles","previous_headings":"Purpose","what":"Example Usage","title":"Function Documentation for `get_mi_score`","text":"example use get_mi_score function:","code":"# Example 1: Basic usage with default parameters # mi_scores <- get_mi_score( # studyid = \"12345\", # path_db = \"/path/to/database\" # ) # # # Example 2: Using XPT files instead of a database # mi_scores_xpt <- get_mi_score( # path_db = \"/path/to/xpt/files\", # use_xpt_file = TRUE # ) # # # Example 3: Return individual scores # mi_individual_scores <- get_mi_score( # studyid = \"12345\", # path_db = \"/path/to/database\", # return_individual_scores = TRUE # ) # # # Example 4: Return Z-scores for each participant # mi_zscores <- get_mi_score( # studyid = \"12345\", # path_db = \"/path/to/database\", # return_zscore_by_USUBJID = TRUE # )"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_mi_score.html","id":"conclusion","dir":"Articles","previous_headings":"Purpose","what":"Conclusion","title":"Function Documentation for `get_mi_score`","text":"get_mi_score function versatile tool processing analyzing MI data clinical study databases. setting various parameters, users can tailor output meet specific needs, : Calculating MI scores based severity levels. Returning individual scores aggregated MI scores. Returning Z-scores participant. Handling data either SQLite databases XPT files. function’s flexibility makes powerful resource researchers data analysts working clinical study data.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"overview","dir":"Articles","previous_headings":"","what":"Overview","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"get_ml_data_and_tuned_hyperparameters function processes prepares machine learning data modeling, various optional preprocessing steps missing value imputation, undersampling, hyperparameter tuning. also supports error correction via specific methods like “Flip” “Prune”.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"function-definition","dir":"Articles","previous_headings":"","what":"Function Definition","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"","code":"get_ml_data_and_tuned_hyperparameters <- function(Data, studyid_metadata, Impute = FALSE, Round = FALSE, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method = NULL) { # Function implementation } result <- get_ml_data_and_tuned_hyperparameters(Data = scores_df, studyid_metadata = metadata_df, Impute = TRUE, Round = TRUE, reps = 10, holdback = 0.25, Undersample = TRUE, hyperparameter_tuning = TRUE, error_correction_method = \"Flip\") # Access the final data and best mtry hyperparameter rfData <- result$rfData best_mtry <- result$best.m )"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"parameters","dir":"Articles","previous_headings":"","what":"Parameters","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"Data (data frame): Input data containing scores. typically data frame named scores_df. studyid_metadata (data frame): data frame containing metadata, typically including STUDYID column, used joining Data. Impute (logical): TRUE, missing values dataset imputed using random forest imputation. Round (logical): TRUE, specific columns rounded according rules described function. reps (numeric): number repetitions cross-validation. value 0 skips repetition. holdback (numeric): fraction data hold back testing. value 1 means leave-one-cross-validation. Undersample (logical): TRUE, training data undersampled balance target classes. hyperparameter_tuning (logical): TRUE, hyperparameter tuning performed using cross-validation. error_correction_method (character): Specifies error correction method use. Can one \"Flip\", \"Prune\", \"None\". Defaults NULL, means correction.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"returns","dir":"Articles","previous_headings":"","what":"Returns","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"rfData: final prepared data preprocessing, imputation, error correction methods. best.m: best mtry hyperparameter random forest model (determined tuning default).","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"data-merging","dir":"Articles","previous_headings":"Function Workflow","what":"Data Merging","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"function first joins metadata (studyid_metadata) input data (Data) based STUDYID column.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"target-variable-encoding","dir":"Articles","previous_headings":"Function Workflow","what":"Target Variable Encoding","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"'Liver' encoded 1. 'not_Liver' encoded 0. encoding facilitates modeling process.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"missing-value-imputation","dir":"Articles","previous_headings":"Function Workflow","what":"Missing Value Imputation","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"Impute TRUE, missing values imputed using randomForest::rfImpute function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"rounding-of-specific-columns","dir":"Articles","previous_headings":"Function Workflow","what":"Rounding of Specific Columns","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"Columns related averages liver-related data rounded using floor(). columns (e.g., \"MI\" columns) rounded using ceiling().","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"data-splitting","dir":"Articles","previous_headings":"Function Workflow","what":"Data Splitting","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"fraction data (holdback) held back testing. repetition (reps), data split . training set optionally undersampled balance target classes.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"hyperparameter-tuning","dir":"Articles","previous_headings":"Function Workflow","what":"Hyperparameter Tuning","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"function performs hyperparameter tuning random forest model using cross-validation trainControl caret package. mtry parameter tuned, controls number variables randomly sampled candidates split.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"model-training","dir":"Articles","previous_headings":"Function Workflow","what":"Model Training","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"random forest model trained prepared data using randomForest package. best.m hyperparameter selected based tuning set default value.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"error-correction","dir":"Articles","previous_headings":"Function Workflow","what":"Error Correction","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"\"Flip\": Flips target class certain conditions met. \"Prune\": Removes instances misclassified. \"None\": error correction applied.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_ml_data_and_tuned_hyperparameters.html","id":"final-data-return","dir":"Articles","previous_headings":"Function Workflow","what":"Final Data Return","title":"Documentation for `get_ml_data_and_tuned_hyperparameters` Function","text":"processed data (rfData) best mtry hyperparameter (best.m) returned.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_prediction_plot.html","id":"function-purpose","dir":"Articles","previous_headings":"","what":"Function Purpose","title":"Documentation for get_prediction_plot Function","text":"get_prediction_plot function performs model building prediction dataset using random forest model. iterates multiple test repetitions, trains model training data, makes predictions test data. function generates histogram visualize distribution predictions outcome variable (LIVER).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_prediction_plot.html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"Documentation for get_prediction_plot Function","text":"function accepts following input parameters:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_prediction_plot.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Documentation for get_prediction_plot Function","text":"function returns histogram plot visualizing predicted probabilities LIVER variable across test repetitions. plot shows distribution predictions (probabilities) classes (LIVER = “Y” “N”).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_prediction_plot.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"Documentation for get_prediction_plot Function","text":"Data NULL, function fetches formats data using get_Data_formatted_for_ml_and_best.m function. dataset divided training testing sets repetition (testReps). Undersample enabled, undersampling applied balance dataset. random forest model trained using training set repetition. model makes predictions test set. predicted probabilities stored repetition. predictions averaged across repetitions, histogram created visualize distribution predicted probabilities LIVER variable. histogram displayed using ggplot2, showing predicted probabilities LIVER outcome (coded “Y” “N”).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_prediction_plot.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for get_prediction_plot Function","text":"```r # Example function call get_prediction_plot( path_db = “path_to_db”, rat_studies = FALSE, reps = 10, holdback = 0.2, Undersample = TRUE, hyperparameter_tuning = FALSE, error_correction_method = “Flip”, testReps = 5 )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_repeat_dose_parallel_studyids.html","id":"function-purpose","dir":"Articles","previous_headings":"","what":"Function Purpose","title":"Documentation for `get_repeat_dose_parallel_studyids` Function","text":"get_repeat_dose_parallel_studyids function designed retrieve study IDs database correspond parallel-design studies involving repeat-dose toxicity. filters studies based specified design whether species involved rats.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_repeat_dose_parallel_studyids.html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"Documentation for `get_repeat_dose_parallel_studyids` Function","text":"function accepts following parameters:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_repeat_dose_parallel_studyids.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Documentation for `get_repeat_dose_parallel_studyids` Function","text":"function returns vector study IDs meet specified criteria. returned vector contains following: Study IDs: list study IDs match parallel design repeat-dose toxicity criteria (rat species, specified).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_repeat_dose_parallel_studyids.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"Documentation for `get_repeat_dose_parallel_studyids` Function","text":"Database Existence Check: function first checks database file exists provided path. , error raised. Database Connection: database connection established using sendigR package. connection database initialized using sendigR::initEnvironment(). Retrieve Parallel Study IDs: function uses sendigR::getStudiesSDESIGN() retrieve study IDs associated parallel design. Retrieve Repeat-Dose Studies: SQL query executed via sendigR::genericQuery() fetch study IDs associated repeat-dose toxicity. query looks studies specific TSPARMCD values related repeat-dose toxicity. Intersect Parallel Repeat-Dose Studies: study IDs obtained parallel design repeat-dose toxicity studies intersected identify common study IDs. Optionally Filter Rat Studies: rat_studies = TRUE, function retrieves study IDs involve rats species. done querying SPECIES field database filtering based presence “RAT”. Return Study IDs: final result vector study IDs meet filter conditions, including parallel design, repeat-dose toxicity, optionally, rat species.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_repeat_dose_parallel_studyids.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for `get_repeat_dose_parallel_studyids` Function","text":"```r # Example without filtering rat studies study_ids <- get_repeat_dose_parallel_studyids(path_db = “path//database.sqlite”)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_repeat_dose_parallel_studyids.html","id":"example-with-filtering-for-rat-studies","dir":"Articles","previous_headings":"","what":"Example with filtering for rat studies","title":"Documentation for `get_repeat_dose_parallel_studyids` Function","text":"study_ids_rats <- get_repeat_dose_parallel_studyids(path_db = “path//database.sqlite”, rat_studies = TRUE)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model .html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"get_reprtree_from_rf_model Function Documentation","text":"get_reprtree_from_rf_model function designed train Random Forest model provided dataset generate representation tree (ReprTree) trained model. function supports various configurations data preprocessing, model hyperparameters, sampling strategies, including random undersampling. Additionally, allows error correction hyperparameter tuning.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model .html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"get_reprtree_from_rf_model Function Documentation","text":"following table describes input parameters get_reprtree_from_rf_model function:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model .html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"get_reprtree_from_rf_model Function Documentation","text":"function generates representation tree (ReprTree) trained Random Forest model visualizes first tree (k=5) model. plot first tree Random Forest displayed. representation tree object generated explicitly returned.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model .html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"get_reprtree_from_rf_model Function Documentation","text":"Data parameter NULL, function calls get_Data_formatted_for_ml_and_best.m prepare data modeling. Data split training testing sets (70% training 30% testing). undersampling enabled (Undersample = TRUE), positive negative samples balanced training set undersampling majority class. Random Forest model trained using randomForest function. target variable Target_Organ, model uses best hyperparameter (best.m) determined beforehand. number trees forest set 500, proximity calculations enabled. ReprTree generated using reprtree::ReprTree function, creates representation trained Random Forest model. first tree (k=5) plotted using reprtree::plot.getTree. first tree Random Forest model visualized using reprtree::plot.getTree function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model .html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"get_reprtree_from_rf_model Function Documentation","text":"```r get_reprtree_from_rf_model( Data = my_data, path_db = “path//database”, rat_studies = TRUE, studyid_metadata = my_metadata, fake_study = FALSE, use_xpt_file = TRUE, Round = TRUE, Impute = TRUE, reps = 5, holdback = 0.3, Undersample = TRUE, hyperparameter_tuning = FALSE, error_correction_method = “Flip” )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"get_reprtree_from_rf_model Function Documentation","text":"get_reprtree_from_rf_model function designed train Random Forest model provided dataset generate representation tree (ReprTree) trained model. function supports various configurations data preprocessing, model hyperparameters, sampling strategies, including random undersampling. Additionally, allows error correction hyperparameter tuning.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model.html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"get_reprtree_from_rf_model Function Documentation","text":"following table describes input parameters get_reprtree_from_rf_model function:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"get_reprtree_from_rf_model Function Documentation","text":"function generates representation tree (ReprTree) trained Random Forest model visualizes first tree (k=5) model. plot first tree Random Forest displayed. representation tree object generated explicitly returned.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"get_reprtree_from_rf_model Function Documentation","text":"Data parameter NULL, function calls get_Data_formatted_for_ml_and_best.m prepare data modeling. Data split training testing sets (70% training 30% testing). undersampling enabled (Undersample = TRUE), positive negative samples balanced training set undersampling majority class. Random Forest model trained using randomForest function. target variable Target_Organ, model uses best hyperparameter (best.m) determined beforehand. number trees forest set 500, proximity calculations enabled. ReprTree generated using reprtree::ReprTree function, creates representation trained Random Forest model. first tree (k=5) plotted using reprtree::plot.getTree. first tree Random Forest model visualized using reprtree::plot.getTree function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_reprtree_from_rf_model.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"get_reprtree_from_rf_model Function Documentation","text":"```r get_reprtree_from_rf_model( Data = my_data, path_db = “path//database”, rat_studies = TRUE, studyid_metadata = my_metadata, fake_study = FALSE, use_xpt_file = TRUE, Round = TRUE, Impute = TRUE, reps = 5, holdback = 0.3, Undersample = TRUE, hyperparameter_tuning = FALSE, error_correction_method = “Flip” )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_input_param_list_output_cv_imp.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation for `get_rf_input_param_list_output_cv_imp` Function","text":"get_rf_input_param_list_output_cv_imp function prepares necessary data training evaluating Random Forest (RF) model cross-validation variable importance scores. handles various configurations, imputation, hyperparameter tuning, inclusion rat studies. function interacts either XPT file SQLite database extract harmonize study data, followed model training evaluation.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_input_param_list_output_cv_imp.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Documentation for `get_rf_input_param_list_output_cv_imp` Function","text":"function returns Random Forest model trained cross-validation (CV) includes list variable importance scores. Specifically, returns result get_rf_model_with_cv function, includes trained model, cross-validation results, feature importance scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_input_param_list_output_cv_imp.html","id":"key-steps","dir":"Articles","previous_headings":"","what":"Key Steps","title":"Documentation for `get_rf_input_param_list_output_cv_imp` Function","text":"use_xpt_file TRUE, function loads data XPT file. fake_study TRUE, fetches data SQLite database filters based rat_studies. neither condition met, retrieves study IDs database using get_repeat_dose_parallel_studyids. function calls get_liver_om_lb_mi_tox_score_list calculate liver scores studies, harmonized using get_col_harmonized_scores_df. function prepares data Random Forest model training calling get_ml_data_and_tuned_hyperparameters. step involves imputation, optional hyperparameter tuning, data balancing. function calls get_rf_model_with_cv train evaluate Random Forest model cross-validation. model’s performance evaluated across multiple repetitions (testReps), option include top importance features. specified, function applies error correction method (either “Flip”, “Prune”, “None”). function returns trained Random Forest model along cross-validation results feature importance scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_input_param_list_output_cv_imp.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation for `get_rf_input_param_list_output_cv_imp` Function","text":"```r result <- get_rf_input_param_list_output_cv_imp( path_db = “path//database”, rat_studies = TRUE, studyid_metadata = metadata_df, fake_study = FALSE, use_xpt_file = FALSE, Round = TRUE, Impute = TRUE, reps = 10, holdback = 0.2, Undersample = TRUE, hyperparameter_tuning = TRUE, error_correction_method = “Flip”, best.m = NULL, testReps = 5, indeterminateUpper = 0.9, indeterminateLower = 0.1, Type = “classification”, nTopImportance = 10 )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Documentation: get_rf_model_with_cv","text":"get_rf_model_with_cv function implements random forest-based modeling pipeline cross-validation assess model performance. includes optional undersampling handling imbalanced data provides detailed metrics evaluating model accuracy.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"function-overview","dir":"Articles","previous_headings":"","what":"Function Overview","title":"Documentation: get_rf_model_with_cv","text":"","code":"get_rf_model_with_cv <- function(Data, Undersample = FALSE, best.m = NULL, # any numeric value or call function to get it testReps, # testReps must be at least 2; Type) { ... }"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"purpose","dir":"Articles","previous_headings":"","what":"Purpose","title":"Documentation: get_rf_model_with_cv","text":"function: Builds random forest model using randomForest package. Performs cross-validation evaluate model metrics. Optionally applies undersampling balance datasets. Returns aggregated performance metrics.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"outputs","dir":"Articles","previous_headings":"","what":"Outputs","title":"Documentation: get_rf_model_with_cv","text":"function returns list containing: performance_metrics: Aggregated performance metrics including sensitivity, specificity, accuracy. raw_results: Raw data sensitivity, specificity, accuracy cross-validation fold.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"data-preparation","dir":"Articles","previous_headings":"Cross-Validation Workflow","what":"Data Preparation","title":"Documentation: get_rf_model_with_cv","text":"Splits data training testing subsets based specified testReps. Optionally applies undersampling balance training set.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"model-training","dir":"Articles","previous_headings":"Cross-Validation Workflow","what":"Model Training","title":"Documentation: get_rf_model_with_cv","text":"Trains random forest model using randomForest package.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"prediction-and-metrics-calculation","dir":"Articles","previous_headings":"Cross-Validation Workflow","what":"Prediction and Metrics Calculation","title":"Documentation: get_rf_model_with_cv","text":"Predicts probabilities test set. Computes metrics (sensitivity, specificity, accuracy, etc.) using caret package.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"performance-summary","dir":"Articles","previous_headings":"Cross-Validation Workflow","what":"Performance Summary","title":"Documentation: get_rf_model_with_cv","text":"Aggregates performance metrics across cross-validation folds.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Documentation: get_rf_model_with_cv","text":"","code":"# Load necessary libraries library(randomForest) library(caret) # Example dataset data(Data) Data$Target_Organ <- ifelse(iris$Species == \"setosa\", 1, 0) # Run the function results <- get_rf_model_with_cv(Data = iris[, -5], Undersample = TRUE, best.m = 2, testReps = 5, Type = 2) # Print results print(results$performance_metrics)"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_rf_model_with_cv.html","id":"conclusion","dir":"Articles","previous_headings":"","what":"Conclusion","title":"Documentation: get_rf_model_with_cv","text":"get_rf_model_with_cv function powerful tool evaluating random forest models cross-validation, especially datasets class imbalance. Adjust parameters Undersample best.m optimize performance specific dataset.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"function-purpose","dir":"Articles","previous_headings":"","what":"Function Purpose","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"get_rf_model_output_cv_imp function designed perform cross-validation Random Forest model, track performance metrics (sensitivity, specificity, accuracy), handle indeterminate predictions, compute feature importance based either Gini Accuracy. function outputs performance summaries feature importance rankings specified number test repetitions.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"input-parameters","dir":"Articles","previous_headings":"","what":"Input Parameters","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"function takes several input parameters control model’s training process, validation, feature importance calculations. table describing parameter:","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"output","dir":"Articles","previous_headings":"","what":"Output","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"function returns list containing following elements: performance_metrics: vector aggregated performance metrics (e.g., sensitivity, specificity, accuracy, etc.). feature_importance: matrix containing importance top nTopImportance features, ordered importance score. raw_results: list containing raw results debugging analysis, including sensitivity, specificity, accuracy, Gini scores across test repetitions.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"data-preparation","dir":"Articles","previous_headings":"Key Steps","what":"1. Data Preparation","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"input data prepared creating copy scores_df called rfTestData, initialized NA values hold predictions test repetition. column names simplified numeric identifiers.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"cross-validation","dir":"Articles","previous_headings":"Key Steps","what":"2. Cross-Validation","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"function iterates testReps repetitions perform cross-validation: dataset split training testing sets iteration. Undersample set TRUE, training set undersampled balance class distribution. Random Forest model trained training data. Predictions made test data stored rfTestData.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"handling-indeterminate-predictions","dir":"Articles","previous_headings":"Key Steps","what":"3. Handling Indeterminate Predictions","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"repetition, predictions probabilities indeterminateUpper indeterminateLower thresholds considered indeterminate. predictions replaced NA, proportion indeterminate predictions tracked.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"performance-metrics","dir":"Articles","previous_headings":"Key Steps","what":"4. Performance Metrics","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"test repetition, function computes confusion matrix using caret package extracts various performance metrics, including: Sensitivity Specificity Positive Predictive Value (PPV) Negative Predictive Value (NPV) Prevalence Accuracy metrics stored aggregated across test repetitions provide overall performance summary.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"feature-importance","dir":"Articles","previous_headings":"Key Steps","what":"5. Feature Importance","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"feature importance computed using randomForest::importance() function. importance scores aggregated repetitions, top nTopImportance features identified returned.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"return-results","dir":"Articles","previous_headings":"Key Steps","what":"6. Return Results","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"function returns list containing: Aggregated performance metrics Top nTopImportance features ranked importance score Raw results analysis (e.g., confusion matrix outputs)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"```r # Example usage function result <- get_rf_model_output_cv_imp( scores_df = your_data, Undersample = FALSE, best.m = 3, testReps = 5, indeterminateUpper = 0.8, indeterminateLower = 0.2, Type = 1, nTopImportance = 10 )","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"view-performance-metrics","dir":"Articles","previous_headings":"","what":"View performance metrics","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"print(result$performance_metrics)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_cv_imp.html","id":"view-top-features-by-importance","dir":"Articles","previous_headings":"","what":"View top features by importance","title":"Random Forest Model with Cross-Validation and Feature Importance","text":"print(result$feature_importance)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_with_cv.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Random Forest Model with Cross-validation and Exclusion","text":"get_zone_exclusioned_rf_model_with_cv function implements Random Forest classification model cross-validation. provides tools evaluating model’s performance, including sensitivity, specificity, accuracy, metrics. function allows users handle indeterminate predictions includes option undersampling data, can particularly useful dealing imbalanced datasets. document explains use function, describes inputs, outputs, key steps involved model training evaluation process.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_with_cv.html","id":"function-purpose","dir":"Articles","previous_headings":"","what":"Function Purpose","title":"Random Forest Model with Cross-validation and Exclusion","text":"main goal function train Random Forest model evaluate using cross-validation. function: Performs cross-validation across specified number repetitions (testReps). Allows undersampling dataset address class imbalance required. Handles indeterminate predictions setting NA. Tracks performance metrics like sensitivity, specificity, positive predictive value (PPV), accuracy repetition. Provides aggregated summary performance metrics across repetitions. Additionally, function provides option adjust feature importance calculation, either using Gini index Mean Decrease Accuracy.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_with_cv.html","id":"parameters","dir":"Articles","previous_headings":"","what":"Parameters","title":"Random Forest Model with Cross-validation and Exclusion","text":"function accepts following parameters: Data (Data): data frame containing features target variable (Target_Organ) train model . Undersample (Undersample): boolean parameter indicates whether perform undersampling data balance class distribution. set TRUE, function undersample negative class match number positive class instances. Best Model Parameter (best.m): numeric value indicating best number variables (mytry) use split Random Forest model. value can provided manually determined optimization. Test Repetitions (testReps): number times repeat cross-validation process. value must least 2, function relies multiple test sets assess model performance. Indeterminate Prediction Thresholds (indeterminateUpper, indeterminateLower): parameters define upper lower bounds predicting “indeterminate” values. model’s predicted probability falls thresholds, prediction considered indeterminate set NA. Feature Importance Type (Type): integer indicating type feature importance use Random Forest model. Typically, either 1 “Mean Decrease Accuracy” 2 “Mean Decrease Gini”.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_with_cv.html","id":"model-workflow","dir":"Articles","previous_headings":"","what":"Model Workflow","title":"Random Forest Model with Cross-validation and Exclusion","text":"input data frame (Data) processed ensure formatted correctly model training. column names simplified numeric identifiers easier manipulation. dataset split training set test set, iteration using different random samples. Random Forest model trained training set, predictions made test set. Undersample set TRUE, function balances dataset undersampling negative class. positive class left unchanged, negative class reduced match size positive class. training model, predictions made test data. predicted probabilities stored later used calculate performance metrics. Indeterminate predictions identified based upper lower thresholds (indeterminateUpper indeterminateLower). predictions marked NA included performance calculations. Sensitivity: proportion true positives correctly identified model. Specificity: proportion true negatives correctly identified model. Accuracy: overall accuracy model predicting classes. PPV (Positive Predictive Value): proportion positive predictions correct. NPV (Negative Predictive Value): proportion negative predictions correct. Prevalence: proportion positive cases dataset. metrics computed using caret package’s confusion matrix function. completing test repetitions, function calculates mean performance metric across repetitions provide aggregated performance summary. results include individual metrics repetition overall performance summary.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_with_cv.html","id":"outputs","dir":"Articles","previous_headings":"","what":"Outputs","title":"Random Forest Model with Cross-validation and Exclusion","text":"function returns list two components: performance_metrics: vector containing aggregated performance metrics (mean sensitivity, specificity, accuracy, etc.) calculated across test repetitions. raw_results: list containing raw performance metrics repetition, including: sensitivity: vector sensitivity values test repetition. specificity: vector specificity values test repetition. accuracy: vector accuracy values test repetition. outputs can used evaluate model’s performance analyze results.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/get_zone_exclusioned_rf_model_with_cv.html","id":"example-usage","dir":"Articles","previous_headings":"","what":"Example Usage","title":"Random Forest Model with Cross-validation and Exclusion","text":"example use function:","code":"# Example dataset (replace with actual data) Data <- your_data_frame # Run the model with cross-validation and undersampling results <- get_zone_exclusioned_rf_model_with_cv(Data = Data, Undersample = TRUE, best.m = 5, testReps = 10, indeterminateUpper = 0.8, indeterminateLower = 0.2, Type = 1) # View the aggregated performance metrics print(results$performance_metrics) # Access raw results for further analysis print(results$raw_results)"},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Introduction.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"Introduction to SENDQSAR","text":"Standard Exchange Nonclinical Data (SEND), developed Clinical Data Interchange Standards Consortium (CDISC), offers structured electronic format organize exchange nonclinical study data among sponsor companies, contract research organizations (CROs), health authorities. Test results, examinations, observations subjects nonclinical study represented series SEND domains. domain defined collection logically related observations common topic. Typically, domain represented single dataset. - [Domain vs MI documentation still progress ## Need edited SEND study (identified IND &STUDYID), normalized toxicity score values calculated hepatotoxicity study endpoints, scores ranging 0 5. included animal body weight, liver weight, liver function test results (e.g., serum enzyme levels ALB, ALT, AST, etc.). Z-scores used standardize values relative control groups, ensuring comparability across different studies. Additionally, histopathological findings adjusted incidence severity incorporated ML model. details scoring system described elsewhere [citation cross-study article], toxicity scores based variety critical parameters, enabling robust assessment liver toxicity. short, initially, weight animal end dosing period normalized subtracting baseline weight measured first day dosing. Following , liver weight body weight ratio calculated animal. liver--body weight ratios normalized using Z-scores, comparisons made respective control group study. allowed standardized comparisons across different studies, reducing variability due differences animal size baseline conditions. laboratory test (LB) data, Z-scores also calculated six key enzymes commonly found blood serum indicative liver function: Bilirubin, Albumin (ALB), Alanine Aminotransferase (ALT), Alkaline Phosphatase (ALP), Aspartate Aminotransferase (AST), Gamma-Glutamyl Transferase (GGT). enzymes serve important biomarkers detecting liver damage dysfunction. addition biochemical data, Z-scores Microscopic Findings (histopathological findings) derived based incidence (frequency) severity liver-related lesions. Initially, score calculated purely severity findings, adjusted based incidence rate providing accurate reflection overall histopathological impact liver. body weight (BW), organ mass (OM), laboratory test (LB) domains, absolute value Z-scores used assign toxicity scores. scoring system follows: Z-scores 1 scored 0 (toxicity signal), Z-scores 1 2 scored 1 (weak signal), Z-scores 2 3 scored 2 (moderate signal), Z-scores 3 scored 3 (strong signal). binning system effectively rounds absolute value Z-scores cases, simplifying categorization toxicity signals. incorporating standardized scores across body weight, organ mass, laboratory data, histopathology findings, comprehensive quantifiable framework assessing hepatotoxicity developed. framework facilitates application machine learning models predict liver toxicity toxicology studies, enhancing reproducibility interpretability toxicological risk assessments. ** Need clarify reasons 0-5 MI rests 0-3. weight animal end dosing period normalized subtracting baseline weight measured first day dosing.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Introduction.html","id":"required-libraries","dir":"Articles","previous_headings":"Introduction","what":"Required Libraries","title":"Introduction to SENDQSAR","text":"function requires following R packages: DBI RSQLite data.table dplyr haven tidyr stringr ##Notes function assumes standard SEND domains column names. non-standard data, adjustments may needed. Check database .xpt files ensure compatibility function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/articles/Introduction.html","id":"see-also","dir":"Articles","previous_headings":"Introduction","what":"See Also","title":"Introduction to SENDQSAR","text":"DBI RSQLite data.table SENDsanitizer","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Md Aminul Islam Prodhan. Author, maintainer. Kevin Snyder. Author.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Prodhan M, Snyder K (2025). SENDQSAR: Building Quantitative Structure-Activity Relationship model leveraging SEND Database. R package version 0.0.0.9000, https://github.com/aminuldu07/SENDQSAR, https://aminuldu07.github.io/SENDQSAR/.","code":"@Manual{, title = {SENDQSAR: Building a Quantitative Structure-Activity Relationship model leveraging SEND Database}, author = {Md Aminul Islam Prodhan and Kevin Snyder}, year = {2025}, note = {R package version 0.0.0.9000, https://github.com/aminuldu07/SENDQSAR}, url = {https://aminuldu07.github.io/SENDQSAR/}, }"},{"path":[]},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"about","dir":"","previous_headings":"","what":"About","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"package facilitates developing Quantitative Structure-Activity Relationship (QSAR) models using SEND database. streamlines data acquisition, preprocessing, descriptor calculation, model evaluation, enabling researchers efficiently explore molecular descriptors create robust predictive models.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"features","dir":"","previous_headings":"","what":"Features","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"Automated Data Processing: Simplifies data acquisition preprocessing steps. Comprehensive Analysis: Provides z-score calculations various parameters body weight, liver--body weight ratio, laboratory tests. Machine Learning Integration: Supports classification modeling, hyperparameter tuning, performance evaluation. Visualization Tools: Includes histograms, bar plots, AUC curves better data interpretation.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"data-acquisition-and-processing","dir":"","previous_headings":"Functions Overview","what":"Data Acquisition and Processing","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"get_compile_data - Fetches data database specified database path structured data frame analysis. get_bw_score - Calculates body weight (BW) z-scores animal. get_livertobw_zscore - Computes liver--body weight z-scores. get_lb_score - Calculates z-scores laboratory test (LB) results. get_mi_score - Computes z-scores microscopic findings (MI). get_liver_om_lb_mi_tox_score_list - Combines z-scores LB, MI, liver--BW single data frame. get_col_harmonized_scores_df - Harmonizes column names across studies.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"machine-learning-preparation-and-modeling","dir":"","previous_headings":"Functions Overview","what":"Machine Learning Preparation and Modeling","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"get_ml_data_and_tuned_hyperparameters - Prepares data tunes hyperparameters machine learning. get_rf_model_with_cv - Builds random forest model cross-validation outputs performance metrics. get_zone_exclusioned_rf_model_with_cv - Introduces indeterminate zone improved classification accuracy. get_imp_features_from_rf_model_with_cv - Computes feature importance model interpretation. get_auc_curve_with_rf_model - Generates AUC curves evaluate model performance.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"visualization-and-reporting","dir":"","previous_headings":"Functions Overview","what":"Visualization and Reporting","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"get_histogram_barplot - Creates bar plots target variable classes. get_reprtree_from_rf_model - Builds representative decision trees interpretability. get_prediction_plot - Visualizes prediction probabilities histograms.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"automated-pipelines","dir":"","previous_headings":"Functions Overview","what":"Automated Pipelines","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"get_Data_formatted_for_ml_and_best.m - Formats data machine learning pipelines. get_rf_input_param_list_output_cv_imp - Automates preprocessing, modeling, evaluation one step. get_zone_exclusioned_rf_model_cv_imp - Similar function, excludes uncertain predictions based thresholds.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"workflow","dir":"","previous_headings":"","what":"Workflow","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"Input Database Path: Provide database path containing nonclinical study results STUDYID. Preprocessing: Use functions 1-8 clean, harmonize, prepare data. Model Building: Employ machine learning functions (9-18) training, validation, evaluation. Visualization: Generate plots performance metrics better interpretation.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"dependencies","dir":"","previous_headings":"","what":"Dependencies","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"randomForest ROCR ggplot2 reprtree","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"","code":"# Install from GitHub devtools::install_github(\"aminuldu07/SENDQSAR\")"},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"example-1-basic-data-compilation","dir":"","previous_headings":"Examples","what":"Example 1: Basic Data Compilation","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"","code":"library(SENDQSAR) data <- get_compile_data(\"/path/to/database\")"},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"example-2-z-score-calculation","dir":"","previous_headings":"Examples","what":"Example 2: Z-Score Calculation","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"","code":"bw_scores <- get_bw_score(data) liver_scores <- get_livertobw_zscore(data)"},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"example-3-machine-learning-model","dir":"","previous_headings":"Examples","what":"Example 3: Machine Learning Model","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"","code":"model <- get_rf_model_with_cv(data, n_repeats=10) print(model$confusion_matrix)"},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"example-4-visualization","dir":"","previous_headings":"Examples","what":"Example 4: Visualization","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"","code":"get_histogram_barplot(data, target_col=\"target_variable\")"},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"contribution","dir":"","previous_headings":"","what":"Contribution","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"Contributions welcome! Feel free submit issues pull requests via GitHub.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"license","dir":"","previous_headings":"","what":"License","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"project licensed MIT License - see LICENSE file details.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/index.html","id":"contact","dir":"","previous_headings":"","what":"Contact","title":"Building a Quantitative Structure-Activity Relationship model leveraging SEND Database","text":"information, visit project GitHub Page contact email@example.com.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_auc_curve_with_rf_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","title":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","text":"function trains Random Forest model, computes ROC curve, calculates AUC (Area Curve). allows various preprocessing options, imputation, rounding, undersampling, hyperparameter tuning.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_auc_curve_with_rf_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","text":"","code":"get_auc_curve_with_rf_model( Data = NULL, path_db = NULL, rat_studies = FALSE, studyid_metadata, fake_study = FALSE, use_xpt_file = FALSE, Round = FALSE, Impute = FALSE, best.m = NULL, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method, output_individual_scores = TRUE, output_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_auc_curve_with_rf_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","text":"Data data frame containing training data. NULL, data fetched database. path_db string representing path SQLite database used fetch data Data NULL. rat_studies Logical; whether filter rat studies. Defaults FALSE. studyid_metadata data frame containing metadata associated study IDs. fake_study Logical; whether use fake study IDs data simulation. Defaults FALSE. use_xpt_file Logical; whether use XPT file input data. Defaults FALSE. Round Logical; whether round numerical values. Defaults FALSE. Impute Logical; whether perform imputation missing values. Defaults FALSE. best.m 'mtry' hyperparameter Random Forest. NULL, determined function. reps numeric value indicating number repetitions cross-validation. Defaults numeric value. holdback Numeric; either 1 fraction value (e.g., 0.75) holdback cross-validation. Undersample Logical; whether perform undersampling. Defaults FALSE. hyperparameter_tuning Logical; whether perform hyperparameter tuning. Defaults FALSE. error_correction_method Character; one \"Flip\", \"Prune\", \"None\", specifying method error correction. output_individual_scores Logical; whether output individual scores. Defaults TRUE. output_zscore_by_USUBJID Logical; whether output z-scores subject ID. Defaults FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_auc_curve_with_rf_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","text":"function return explicit value. generates: AUC (Area Curve) printed console. ROC curve plot calculated AUC value. Various performance metrics (e.g., True Positive Rate, False Positive Rate), displayed plot.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_auc_curve_with_rf_model.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","text":"function prepares data training Random Forest model first fetching data SQLite database generating synthetic data (fake_study TRUE). processes data using various options imputation, rounding, undersampling. model trained using Random Forest algorithm, performance evaluated via ROC curve AUC metric. function also allows hyperparameter tuning error correction. training model, predictions made, AUC calculated visualized ROC curve plot.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_auc_curve_with_rf_model.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute and Plot AUC Curve with Random Forest Model — get_auc_curve_with_rf_model","text":"","code":"# Example 1: Using real data from the database get_auc_curve_with_rf_model(Data = NULL, path_db = \"path/to/database.db\", rat_studies = TRUE, reps = 10, holdback = 0.75, error_correction_method = \"Prune\") #> Error in get_repeat_dose_parallel_studyids(path_db = path_db, rat_studies = rat_studies): Database file not found at the specified path! # Example 2: Using synthetic data with fake study IDs get_auc_curve_with_rf_model(Data = NULL, fake_study = TRUE, reps = 5, holdback = 0.8, error_correction_method = \"Flip\") #> Error in .local(drv, ...): length(dbname) == 1 is not TRUE"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_bw_score.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate Body Weight Z-score for a Given STUDYID — get_bw_score","title":"Calculate Body Weight Z-score for a Given STUDYID — get_bw_score","text":"get_bw_score function calculates Body Weight (BW) Z-score specified studyid using data provided database .xpt file. supports optional parameters customize analysis offers flexibility return individual Z-score USUBJID (unique subject identifier).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_bw_score.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate Body Weight Z-score for a Given STUDYID — get_bw_score","text":"","code":"get_bw_score( studyid = NULL, path_db, fake_study = FALSE, use_xpt_file = FALSE, master_compiledata = NULL, return_individual_scores = FALSE, return_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_bw_score.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate Body Weight Z-score for a Given STUDYID — get_bw_score","text":"studyid Mandatory, character studyid BW Z-score calculated. Required use_xpt_file = FALSE. use_xpt_file = TRUE, studyid ignored, .xpt files specified folder (path_db) analyzed. path_db Mandatory, character path SQLite database file folder containing .xpt files (use_xpt_file = TRUE). fake_study Optional, Boolean Indicates whether study generated SENDsanitizer package. Default FALSE. use_xpt_file Mandatory, Boolean TRUE, function processes .xpt files folder specified path_db. FALSE, uses SQLite database file path_db requires valid studyid. Default FALSE. master_compiledata Optional, character master_compiledata provided (.e., NULL), function automatically call get_compile_data function calculate . return_individual_scores Optional, Boolean TRUE, function returns individual scores domain averaging scores subjects/animals (USUBJID) study. Default FALSE. return_zscore_by_USUBJID Optional, Boolean TRUE, function returns Z-scores animal/subject USUBJID. Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_bw_score.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate Body Weight Z-score for a Given STUDYID — get_bw_score","text":"data.frame containing calculated BW Z-scores. structure output depends provided parameters: return_individual_scores = TRUE: Returns averaged Z-scores domain per studyid. return_zscore_by_USUBJID = TRUE: Returns Z-score animal/subject USUBJID domain per studyid. Otherwise, summarized BW score specified studyid.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_bw_score.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate Body Weight Z-score for a Given STUDYID — get_bw_score","text":"","code":"if (FALSE) { # \\dontrun{ # Example 1: Basic usage get_bw_score(studyid = '1234123', path_db = 'path/to/database.db') # Example 2: Include individual scores get_bw_score(studyid = '1234123', path_db = 'path/to/database.db', return_individual_scores = TRUE) # Example 3: Include z-scores by USUBJID get_bw_score(studyid = '1234123', path_db = 'path/to/database.db', return_zscore_by_USUBJID = TRUE) } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_col_harmonized_scores_df.html","id":null,"dir":"Reference","previous_headings":"","what":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","title":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","text":"function harmonizes liver score data cleaning column names, replacing missing values zeros, optionally rounding specific columns. function also identifies harmonizes synonyms, removes unnecessary columns, reorders data based column sums.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_col_harmonized_scores_df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","text":"","code":"get_col_harmonized_scores_df(liver_score_data_frame, Round = FALSE)"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_col_harmonized_scores_df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","text":"liver_score_data_frame data frame containing liver score data. data frame column names may require harmonization. Round logical value indicating whether data rounded. TRUE, certain liver-related columns floored capped, histology-related columns ceiled. Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_col_harmonized_scores_df.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","text":"data frame harmonized liver scores, optional rounding, columns reordered based sums.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_col_harmonized_scores_df.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","text":"function performs following operations: Harmonizes column names replacing spaces, commas, slashes dots. Replaces missing values (NA) zero. Identifies harmonizes synonym columns, replacing values higher value synonyms. Removes specific unwanted columns 'INFILTRATE', 'UNREMARKABLE', 'THIKENING', 'POSITIVE'. Optionally rounds liver score columns flooring capping 5, histology-related columns ceiling. Reorders columns based sum values.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_col_harmonized_scores_df.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"get_col_harmonized_scores_df — get_col_harmonized_scores_df","text":"","code":"if (FALSE) { # \\dontrun{ # Example usage result <- get_col_harmonized_scores_df(liver_score_data_frame = liver_scores, Round = TRUE) } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_compile_data.html","id":null,"dir":"Reference","previous_headings":"","what":"Retrieve Compiled Data from SQLite Database or XPT File — get_compile_data","title":"Retrieve Compiled Data from SQLite Database or XPT File — get_compile_data","text":"function retrieves compiles data given study ID either SQLite database XPT file.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_compile_data.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Retrieve Compiled Data from SQLite Database or XPT File — get_compile_data","text":"","code":"get_compile_data( studyid = NULL, path_db, fake_study = FALSE, use_xpt_file = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_compile_data.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Retrieve Compiled Data from SQLite Database or XPT File — get_compile_data","text":"studyid Character. Study ID number. Defaults NULL. NULL, available studies may retrieved (behavior depends database structure). path_db Character. Path SQLite database file. Mandatory. fake_study Logical. Whether study data generated SENDsanitizer package. Defaults FALSE. use_xpt_file Logical. Whether retrieve study data XPT file format instead database. Defaults FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_compile_data.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Retrieve Compiled Data from SQLite Database or XPT File — get_compile_data","text":"data frame containing compiled study data. structure returned data frame depends database XPT file contents.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_compile_data.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Retrieve Compiled Data from SQLite Database or XPT File — get_compile_data","text":"","code":"if (FALSE) { # \\dontrun{ # Retrieve data for a specific study ID from the database get_compile_data(studyid = '1234123', path_db = 'path/to/database.db') # Retrieve data from an XPT file get_compile_data(path_db = 'path/to/file.xpt', use_xpt_file = TRUE) } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_Data_formatted_for_ml_and_best.m.html","id":null,"dir":"Reference","previous_headings":"","what":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","title":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","text":"function processes data given SQLite database XPT file, calculates liver toxicity scores, prepares data machine learning models. can also tune hyperparameters apply error correction methods.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_Data_formatted_for_ml_and_best.m.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","text":"","code":"get_Data_formatted_for_ml_and_best.m( path_db, rat_studies = FALSE, studyid_metadata = NULL, fake_study = FALSE, use_xpt_file = FALSE, Round = FALSE, Impute = FALSE, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_Data_formatted_for_ml_and_best.m.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","text":"path_db character string representing path SQLite database XPT file. rat_studies logical flag filter rat studies (default FALSE). studyid_metadata data frame containing metadata study IDs. NULL, metadata generated (default NULL). fake_study logical flag use fake study data (default FALSE). use_xpt_file logical flag indicate whether use XPT file instead SQLite database (default FALSE). Round logical flag round liver toxicity scores (default FALSE). Impute logical flag impute missing values dataset (default FALSE). reps integer specifying number repetitions cross-validation. holdback numeric value indicating fraction data hold back validation. Undersample logical flag undersample majority class (default FALSE). hyperparameter_tuning logical flag perform hyperparameter tuning (default FALSE). error_correction_method character string specifying error correction method. Must one 'Flip', 'Prune', 'None'.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_Data_formatted_for_ml_and_best.m.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","text":"list containing: Data data frame containing preprocessed data ready machine learning. best.m best machine learning model hyperparameter tuning, applicable.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_Data_formatted_for_ml_and_best.m.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","text":"function performs several key steps: Retrieves study IDs SQLite database XPT file. Generates uses provided study metadata, including random assignment \"Target_Organ\" values (either \"Liver\" \"not_Liver\"). Calculates liver toxicity scores using get_liver_om_lb_mi_tox_score_list function. Harmonizes calculated scores using get_col_harmonized_scores_df function. Prepares data machine learning tunes hyperparameters (enabled) using get_ml_data_and_tuned_hyperparameters function. Returns processed data best model.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_Data_formatted_for_ml_and_best.m.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Retrieve and Preprocess Data for Machine Learning Models — get_Data_formatted_for_ml_and_best.m","text":"","code":"if (FALSE) { # \\dontrun{ result <- get_Data_formatted_for_ml_and_best.m( path_db = \"path/to/database.db\", rat_studies = TRUE, reps = 5, holdback = 0.2, error_correction_method = \"Flip\" ) # Access the processed data and the best model processed_data <- result$Data best_model <- result$best.m } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_histogram_barplot.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","title":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","text":"function generates bar plot comparing liver-related findings non-liver-related findings, returns processed data analysis. function can fetch data SQLite database, provided XPT file, simulate data fake_study set TRUE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_histogram_barplot.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","text":"","code":"get_histogram_barplot( Data = NULL, generateBarPlot = FALSE, path_db = FALSE, rat_studies = FALSE, studyid_metadata, fake_study = FALSE, use_xpt_file = FALSE, Round = FALSE, output_individual_scores = TRUE, output_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_histogram_barplot.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","text":"Data data frame containing liver-related scores. NULL, function attempt generate fetch data database file. generateBarPlot logical flag (default = FALSE). TRUE, generates bar plot. FALSE, returns processed data. path_db character string representing path SQLite database. Required use_xpt_file FALSE fake_study FALSE. rat_studies logical flag (default = FALSE) filter rat studies fetching data database. studyid_metadata data frame containing metadata associated study IDs. Required fake_study FALSE real data fetched. fake_study logical flag (default = FALSE). TRUE, function simulates study data instead fetching database. use_xpt_file logical flag (default = FALSE). TRUE, function use XPT file fetch data, instead relying database. Round logical flag (default = FALSE). Whether round liver scores. output_individual_scores logical flag (default = TRUE). Whether output individual scores aggregated scores. output_zscore_by_USUBJID logical flag (default = FALSE). Whether output z-scores USUBJID (unique subject identifier).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_histogram_barplot.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","text":"generateBarPlot = TRUE, ggplot2 bar plot object returned displaying average scores liver-related findings versus non-liver-related findings. generateBarPlot = FALSE, data frame (plotData) containing calculated values finding, liver status (LIVER), mean values (Value) returned.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_histogram_barplot.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","text":"data provided, function attempts fetch data SQLite database simulate data based fake_study flag. function also supports use XPT files allows customization study filtering rat_studies studyid_metadata parameters. generating plot, function compares liver-related findings findings, displaying average scores finding bar plot.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_histogram_barplot.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Generate Histogram or Bar Plot for Liver-Related Scores — get_histogram_barplot","text":"","code":"# Example 1: Generate a bar plot with fake study data get_histogram_barplot(generateBarPlot = TRUE, fake_study = TRUE) #> Error in path.expand(path): invalid 'path' argument # Example 2: Get processed data without generating a plot data <- get_histogram_barplot(generateBarPlot = FALSE, fake_study = FALSE, path_db = \"path/to/db\") #> Error in get_repeat_dose_parallel_studyids(path_db = path_db, rat_studies = rat_studies): Database file not found at the specified path!"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_imp_features_from_rf_model_with_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","title":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","text":"function performs cross-validation test repetitions random forest model, calculates feature importance using Gini importance, returns top n important features.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_imp_features_from_rf_model_with_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","text":"","code":"get_imp_features_from_rf_model_with_cv( Data = NULL, Undersample = FALSE, best.m = NULL, testReps, Type, nTopImportance )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_imp_features_from_rf_model_with_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","text":"Data data frame containing training data (rows samples, columns features). first column assumed target variable. Undersample logical value indicating whether apply -sampling balance classes training data. Default FALSE. best.m numeric value representing number variables consider split Random Forest model (function determine ). Default NULL. testReps numeric value indicating number test repetitions (must least 2). Type numeric value indicating type importance calculated. 1 Mean Decrease Accuracy 2 Mean Decrease Gini. nTopImportance numeric value indicating number top important features return based importance scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_imp_features_from_rf_model_with_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","text":"list containing: gini_scores matrix Gini importance scores feature across different cross-validation iterations. matrix rows representing features columns representing test iterations.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_imp_features_from_rf_model_with_cv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","text":"function trains Random Forest model using cross-validation specified repetitions calculates feature importance using Gini importance scores. function also supports optional -sampling balance class distribution training set. function performs following steps: Initializes performance metric trackers. Prepares input data cross-validation. Performs cross-validation, repetition involves training model subset data testing remaining data. Optionally applies -sampling training data. Trains Random Forest model fold calculates Gini importance scores. Aggregates sorts Gini importance scores identify top features. Plots importance top features.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_imp_features_from_rf_model_with_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get Important Features from Random Forest Model with Cross-Validation — get_imp_features_from_rf_model_with_cv","text":"","code":"# Example of calling the function result <- get_imp_features_from_rf_model_with_cv( Data = scores_df, Undersample = FALSE, best.m = 3, testReps = 5, Type = 2, nTopImportance = 10 ) #> Error: object 'scores_df' not found"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_lb_score.html","id":null,"dir":"Reference","previous_headings":"","what":"Get LB Score for a Given Study ID — get_lb_score","title":"Get LB Score for a Given Study ID — get_lb_score","text":"function computes LB score given study ID using data stored specified database. offers various optional parameters customize output, whether return individual scores Z-scores USUBJID.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_lb_score.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get LB Score for a Given Study ID — get_lb_score","text":"","code":"get_lb_score( studyid = NULL, path_db, fake_study = FALSE, use_xpt_file = FALSE, master_compiledata = NULL, return_individual_scores = FALSE, return_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_lb_score.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get LB Score for a Given Study ID — get_lb_score","text":"studyid Mandatory, character study ID number LB score calculated. path_db Mandatory, character path database containing necessary data calculation. fake_study Optional, boolean Indicates whether study generated SENDsanitizer package. Defaults FALSE. use_xpt_file Mandatory, character Specifies path XPT (SAS transport) file used study. master_compiledata Mandatory, character path compiled master dataset used calculate LB score. return_individual_scores Optional, boolean TRUE, function return individual scores subject. Defaults FALSE. return_zscore_by_USUBJID Optional, boolean TRUE, function return Z-scores USUBJID. Defaults FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_lb_score.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get LB Score for a Given Study ID — get_lb_score","text":"numeric calculated LB score based provided data parameters.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_lb_score.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get LB Score for a Given Study ID — get_lb_score","text":"","code":"if (FALSE) { # \\dontrun{ # Example usage of the function get_lb_score(studyid='1234123', path_db='path/to/database.db') } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_livertobw_score.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculate Liver-to-Body-Weight Scores and Z-Scores — get_livertobw_score","title":"Calculate Liver-to-Body-Weight Scores and Z-Scores — get_livertobw_score","text":"function computes liver--body-weight (Liver:BW) ratios corresponding z-scores study data. supports retrieving data SQLite databases .xpt files provides flexible options output formats.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_livertobw_score.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculate Liver-to-Body-Weight Scores and Z-Scores — get_livertobw_score","text":"","code":"get_livertobw_score( studyid = NULL, path_db, fake_study = FALSE, use_xpt_file = FALSE, master_compiledata = NULL, bwzscore_BW = NULL, return_individual_scores = FALSE, return_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_livertobw_score.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculate Liver-to-Body-Weight Scores and Z-Scores — get_livertobw_score","text":"studyid Optional, character. Study ID calculations performed. NULL, data studies database used. path_db Mandatory, character. Path SQLite database directory containing .xpt files. fake_study Optional, logical. Indicates whether study fake/test study generated SENDsanitizer package. Default FALSE. use_xpt_file Optional, logical. Specifies whether use .xpt files instead SQLite database. Default FALSE. master_compiledata Optional, data.frame. Precompiled dataset study information. NULL, function fetches data using get_compile_data. bwzscore_BW Optional, data.frame. Precomputed body weight z-scores. NULL, calculated using get_bw_score. return_individual_scores Optional, logical. TRUE, returns individual z-scores averaged study. Default FALSE. return_zscore_by_USUBJID Optional, logical. TRUE, returns z-scores grouped USUBJID. Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_livertobw_score.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculate Liver-to-Body-Weight Scores and Z-Scores — get_livertobw_score","text":"data frame containing liver--body-weight z-scores: Averaged study (default). Individual scores averaged study (return_individual_scores = TRUE). Z-scores grouped USUBJID (return_zscore_by_USUBJID = TRUE).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_livertobw_score.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculate Liver-to-Body-Weight Scores and Z-Scores — get_livertobw_score","text":"","code":"if (FALSE) { # \\dontrun{ # Example 1: Default averaged scores result <- get_livertobw_score( studyid = '1234123', path_db = 'path/to/database.db' ) head(result) # Example 2: Individual scores by study result <- get_livertobw_score( studyid = '1234123', path_db = 'path/to/database.db', return_individual_scores = TRUE ) head(result) # Example 3: Z-scores by USUBJID result <- get_livertobw_score( studyid = '1234123', path_db = 'path/to/database.db', return_zscore_by_USUBJID = TRUE ) head(result) } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_liver_om_lb_mi_tox_score_list.html","id":null,"dir":"Reference","previous_headings":"","what":"get_liver_om_lb_mi_tox_score_list — get_liver_om_lb_mi_tox_score_list","title":"get_liver_om_lb_mi_tox_score_list — get_liver_om_lb_mi_tox_score_list","text":"function processes liver organ toxicity scores, body weight z-scores, related metrics set studies XPT files. can output individual scores, z-scores USUBJID, averaged scores multiple studies, handles errors processing steps.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_liver_om_lb_mi_tox_score_list.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"get_liver_om_lb_mi_tox_score_list — get_liver_om_lb_mi_tox_score_list","text":"","code":"get_liver_om_lb_mi_tox_score_list( studyid_or_studyids = FALSE, path_db, fake_study = FALSE, use_xpt_file = FALSE, output_individual_scores = FALSE, output_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_liver_om_lb_mi_tox_score_list.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"get_liver_om_lb_mi_tox_score_list — get_liver_om_lb_mi_tox_score_list","text":"studyid_or_studyids character vector single study ID process. multiple studies provided, function processes study sequentially. (Mandatory) path_db character string specifying path database directory containing data files. (Mandatory) fake_study boolean flag indicating study data simulated (TRUE) real (FALSE). Default FALSE. (Optional) use_xpt_file boolean flag indicating whether use XPT file study data. Default FALSE. (Mandatory) output_individual_scores boolean flag indicating whether individual scores returned (TRUE) averaged scores (FALSE). Default FALSE. (Optional) output_zscore_by_USUBJID boolean flag indicating whether output z-scores USUBJID (TRUE) averaged scores (FALSE). Default FALSE. (Optional) multiple_xpt_folder character string specifying path folder containing multiple XPT files. (Optional)","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_liver_om_lb_mi_tox_score_list.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"get_liver_om_lb_mi_tox_score_list — get_liver_om_lb_mi_tox_score_list","text":"data frame containing calculated scores study. type result depends flags passed: output_individual_scores TRUE, data frame individual scores study returned. output_zscore_by_USUBJID TRUE, data frame z-scores USUBJID study returned. neither flag set, function returns data frame averaged scores study.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_liver_om_lb_mi_tox_score_list.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"get_liver_om_lb_mi_tox_score_list — get_liver_om_lb_mi_tox_score_list","text":"","code":"if (FALSE) { # \\dontrun{ # Get averaged scores for a single study result <- get_liver_om_lb_mi_tox_score_list( studyid_or_studyids = \"Study_001\", path_db = \"path/to/database\" ) # Get individual scores for multiple studies result_individual_scores <- get_liver_om_lb_mi_tox_score_list( studyid_or_studyids = c(\"Study_001\", \"Study_002\"), path_db = \"path/to/database\", output_individual_scores = TRUE ) } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_mi_score.html","id":null,"dir":"Reference","previous_headings":"","what":"Get MI score for a given studyid — get_mi_score","title":"Get MI score for a given studyid — get_mi_score","text":"function calculates MI score given study using provided study ID database. allows flexibility terms returning individual scores, Z-scores, . function compatible SENDsanitizer-generated datasets standard clinical study databases.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_mi_score.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get MI score for a given studyid — get_mi_score","text":"","code":"get_mi_score( studyid = NULL, path_db, fake_study = FALSE, use_xpt_file = FALSE, master_compiledata = NULL, return_individual_scores = FALSE, return_zscore_by_USUBJID = FALSE )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_mi_score.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get MI score for a given studyid — get_mi_score","text":"studyid Mandatory, character study ID number clinical study. path_db Mandatory, character file path database contains study data. fake_study Optional, logical TRUE, function assumes study data generated SENDsanitizer package. Default FALSE. use_xpt_file Mandatory, logical TRUE, indicates XPT file used instead database analysis. master_compiledata Mandatory, character path master compile data, often used supplement compile data multiple sources. return_individual_scores Optional, logical TRUE, function returns individual MI scores participant. Default FALSE. return_zscore_by_USUBJID Optional, logical TRUE, function returns Z-scores USUBJID (subject identifier). Default FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_mi_score.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get MI score for a given studyid — get_mi_score","text":"numeric vector data frame containing MI scores. format depends specified parameters, individual scores aggregated scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_mi_score.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get MI score for a given studyid — get_mi_score","text":"","code":"if (FALSE) { # \\dontrun{ # Example usage of get_mi_score get_mi_score(studyid = '1234123', path_db = 'path/to/database.db') } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_ml_data_and_tuned_hyperparameters.html","id":null,"dir":"Reference","previous_headings":"","what":"Get Random Forest Data and Tuned Hyperparameters — get_ml_data_and_tuned_hyperparameters","title":"Get Random Forest Data and Tuned Hyperparameters — get_ml_data_and_tuned_hyperparameters","text":"get_ml_data_and_tuned_hyperparameters function processes input data metadata prepare data random forest analysis. includes steps data preprocessing, optional imputation, rounding, error correction, hyperparameter tuning.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_ml_data_and_tuned_hyperparameters.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get Random Forest Data and Tuned Hyperparameters — get_ml_data_and_tuned_hyperparameters","text":"","code":"get_ml_data_and_tuned_hyperparameters( Data, studyid_metadata, Impute = FALSE, Round = FALSE, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method = NULL )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_ml_data_and_tuned_hyperparameters.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get Random Forest Data and Tuned Hyperparameters — get_ml_data_and_tuned_hyperparameters","text":"Data data.frame. Input data frame containing scores, typically named scores_df. studyid_metadata data.frame. Metadata containing STUDYID values, used joining Data. Impute logical. Indicates whether impute missing values dataset using random forest imputation. Default FALSE. Round logical. Specifies whether round specific numerical columns according predefined rules. Default FALSE. reps integer. Number repetitions cross-validation. value 0 skips repetition. holdback numeric. Fraction data hold back testing. value 1 performs leave-one-cross-validation. Undersample logical. Indicates whether undersample training data balance target classes. Default FALSE. hyperparameter_tuning logical. Specifies whether perform hyperparameter tuning random forest model. Default FALSE. error_correction_method character. Specifies method error correction. Can \"Flip\", \"Prune\", NULL. Default NULL.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_ml_data_and_tuned_hyperparameters.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get Random Forest Data and Tuned Hyperparameters — get_ml_data_and_tuned_hyperparameters","text":"list containing: rfData final processed data preprocessing error correction. best.m best mtry hyperparameter determined random forest model.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_ml_data_and_tuned_hyperparameters.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get Random Forest Data and Tuned Hyperparameters — get_ml_data_and_tuned_hyperparameters","text":"","code":"# Example usage: Data <- scores_df #> Error: object 'scores_df' not found studyid_metadata <- read.csv(\"path/to/study_metadata.csv\") #> Warning: cannot open file 'path/to/study_metadata.csv': No such file or directory #> Error in file(file, \"rt\"): cannot open the connection result <- get_ml_data_and_tuned_hyperparameters( Data = Data, studyid_metadata = studyid_metadata, Impute = TRUE, Round = TRUE, reps = 10, holdback = 0.75, Undersample = TRUE, hyperparameter_tuning = TRUE, error_correction_method = \"Flip\" ) #> Error: object 'Data' not found rfData <- result$rfData #> Error: object 'result' not found best_mtry <- result$best.m #> Error: object 'result' not found"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_prediction_plot.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","title":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","text":"function performs model building prediction using random forest algorithm. iterates multiple test repetitions, training model training data predicting test data. predictions made, histogram plot generated visualize distribution predicted probabilities outcome variable (LIVER).","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_prediction_plot.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","text":"","code":"get_prediction_plot( Data = NULL, path_db, rat_studies = FALSE, studyid_metadata = NULL, fake_study = FALSE, use_xpt_file = FALSE, Round = FALSE, Impute = FALSE, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method, testReps )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_prediction_plot.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","text":"Data data frame containing dataset use training testing. NULL, function attempt fetch format data database using get_Data_formatted_for_ml_and_best.m function. path_db string indicating path database contains dataset. rat_studies logical flag indicating whether use rat studies data. Defaults FALSE. studyid_metadata data frame containing metadata related study IDs. Defaults NULL. fake_study logical flag indicating whether use fake study data. Defaults FALSE. use_xpt_file logical flag indicating whether use XPT file. Defaults FALSE. Round logical flag indicating whether round predictions. Defaults FALSE. Impute logical flag indicating whether impute missing values. Defaults FALSE. reps integer specifying number repetitions cross-validation. holdback numeric value indicating proportion data hold back testing cross-validation. Undersample logical flag indicating whether perform undersampling dataset balance classes. Defaults FALSE. hyperparameter_tuning logical flag indicating whether perform hyperparameter tuning. Defaults FALSE. error_correction_method string specifying error correction method used. Possible values \"Flip\", \"Prune\", \"None\". testReps integer specifying number test repetitions model evaluation.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_prediction_plot.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","text":"ggplot object representing histogram predicted probabilities LIVER variable across test repetitions.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_prediction_plot.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","text":"function works follows: Data NULL, function fetches data best model configuration calling get_Data_formatted_for_ml_and_best.m function. dataset divided training test sets repetition (testReps). Undersample enabled, undersampling applied balance dataset. random forest model trained training data predictions made test data. predictions averaged test repetitions histogram plotted visualize distribution predicted probabilities LIVER.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_prediction_plot.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Generate Prediction Plot for Random Forest Model — get_prediction_plot","text":"","code":"# Example function call get_prediction_plot( path_db = \"path_to_db\", rat_studies = FALSE, reps = 10, holdback = 0.2, Undersample = TRUE, hyperparameter_tuning = FALSE, error_correction_method = \"Flip\", testReps = 5 ) #> Error in get_repeat_dose_parallel_studyids(path_db = path_db, rat_studies = rat_studies): Database file not found at the specified path!"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_repeat_dose_parallel_studyids.html","id":null,"dir":"Reference","previous_headings":"","what":"Get Repeat Dose Parallel Study IDs — get_repeat_dose_parallel_studyids","title":"Get Repeat Dose Parallel Study IDs — get_repeat_dose_parallel_studyids","text":"function retrieves study IDs database correspond parallel-design studies involving repeat-dose toxicity. optionally filters studies rat species.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_repeat_dose_parallel_studyids.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get Repeat Dose Parallel Study IDs — get_repeat_dose_parallel_studyids","text":"","code":"get_repeat_dose_parallel_studyids(path_db, rat_studies = FALSE)"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_repeat_dose_parallel_studyids.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get Repeat Dose Parallel Study IDs — get_repeat_dose_parallel_studyids","text":"path_db character string representing file path SQLite database. required parameter. rat_studies logical flag indicating whether filter studies rats . Defaults FALSE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_repeat_dose_parallel_studyids.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get Repeat Dose Parallel Study IDs — get_repeat_dose_parallel_studyids","text":"vector study IDs meet specified criteria. includes: Study IDs match parallel design repeat-dose toxicity criteria. Optionally, study IDs match rat species rat_studies = TRUE.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_repeat_dose_parallel_studyids.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get Repeat Dose Parallel Study IDs — get_repeat_dose_parallel_studyids","text":"","code":"if (FALSE) { # \\dontrun{ # Example without filtering for rat studies study_ids <- get_repeat_dose_parallel_studyids(path_db = \"path/to/database.sqlite\") # Example with filtering for rat studies study_ids_rats <- get_repeat_dose_parallel_studyids(path_db = \"path/to/database.sqlite\", rat_studies = TRUE) } # }"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_reprtree_from_rf_model.html","id":null,"dir":"Reference","previous_headings":"","what":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","title":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","text":"function trains Random Forest model provided dataset generates representation tree (ReprTree) trained model. supports various preprocessing configurations, model hyperparameters, sampling strategies, including random undersampling. function also allows error correction hyperparameter tuning.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_reprtree_from_rf_model.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","text":"","code":"get_reprtree_from_rf_model( Data = NULL, path_db, rat_studies = FALSE, studyid_metadata = NULL, fake_study = FALSE, use_xpt_file = FALSE, Round = FALSE, Impute = FALSE, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_reprtree_from_rf_model.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","text":"Data data frame containing dataset train Random Forest model. NULL, data fetched using get_Data_formatted_for_ml_and_best.m function. path_db character string representing path database used fetching processing data. rat_studies logical flag indicating whether rat studies used (default: FALSE). studyid_metadata data frame containing metadata related study IDs (default: NULL). fake_study logical flag indicating whether use fake study data (default: FALSE). use_xpt_file logical flag indicating whether use XPT file format data input (default: FALSE). Round logical flag indicating whether round data processing (default: FALSE). Impute logical flag indicating whether impute missing values data (default: FALSE). reps integer specifying number repetitions perform cross-validation resampling. holdback numeric value representing fraction data hold back testing. Undersample logical flag indicating whether undersampling applied balance dataset (default: FALSE). hyperparameter_tuning logical flag indicating whether hyperparameter tuning performed (default: FALSE). error_correction_method character string specifying method error correction. Must one 'Flip', 'Prune', 'None'.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_reprtree_from_rf_model.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","text":"plot first tree Random Forest model displayed. function return ReprTree object explicitly, generated used plotting.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_reprtree_from_rf_model.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","text":"function performs following steps: Data Preparation: Data NULL, fetched using get_Data_formatted_for_ml_and_best.m function. Data split training (70%) testing (30%) sets. Undersample TRUE, training data balanced using undersampling. Model Training: Random Forest model trained using randomForest::randomForest function. target variable Target_Organ, model uses best hyperparameter (best.m). number trees set 500. ReprTree Generation: reprtree::ReprTree function used generate representation tree trained Random Forest model. Visualization: first tree Random Forest model plotted using reprtree::plot.getTree function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_reprtree_from_rf_model.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get Representation Tree from Random Forest Model — get_reprtree_from_rf_model","text":"","code":"get_reprtree_from_rf_model( Data = my_data, path_db = \"path/to/database\", rat_studies = TRUE, studyid_metadata = my_metadata, fake_study = FALSE, use_xpt_file = TRUE, Round = TRUE, Impute = TRUE, reps = 5, holdback = 0.3, Undersample = TRUE, hyperparameter_tuning = FALSE, error_correction_method = \"Flip\" ) #> Error: object 'my_data' not found"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_input_param_list_output_cv_imp.html","id":null,"dir":"Reference","previous_headings":"","what":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","title":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","text":"function prepares data training Random Forest (RF) model cross-validation, handles imputation, hyperparameter tuning, evaluates model's performance. supports real fake study data, options rat studies, error correction, feature importance selection.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_input_param_list_output_cv_imp.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","text":"","code":"get_rf_input_param_list_output_cv_imp( path_db, rat_studies = FALSE, studyid_metadata, fake_study = FALSE, use_xpt_file = FALSE, Round = FALSE, Impute = FALSE, reps, holdback, Undersample = FALSE, hyperparameter_tuning = FALSE, error_correction_method, best.m = NULL, testReps, indeterminateUpper, indeterminateLower, Type, nTopImportance )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_input_param_list_output_cv_imp.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","text":"path_db character string specifying path SQLite database directory containing XPT file. rat_studies logical value indicating whether filter rat studies. Default FALSE. studyid_metadata data frame containing metadata studies. fake_study logical value indicating whether use fake study data. Default FALSE. use_xpt_file logical value indicating whether use XPT file data. Default FALSE. Round logical value indicating whether round liver scores. Default FALSE. Impute logical value indicating whether impute missing values. Default FALSE. reps integer specifying number repetitions model evaluation. holdback numeric value specifying proportion data hold back validation. Undersample logical value indicating whether undersample data balance classes. Default FALSE. hyperparameter_tuning logical value indicating whether tune Random Forest model's hyperparameters. Default FALSE. error_correction_method character string specifying error correction method. Options 'Flip', 'Prune', 'None'. best.m numeric value specifying number trees Random Forest model. NULL, function determines automatically. testReps integer specifying number test repetitions model evaluation. indeterminateUpper numeric value upper threshold indeterminate predictions. indeterminateLower numeric value lower threshold indeterminate predictions. Type character string specifying type Random Forest model use. Options include 'classification' 'regression'. nTopImportance integer specifying number top important features consider model.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_input_param_list_output_cv_imp.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","text":"list containing trained Random Forest model, cross-validation results, feature importance scores. list returned get_rf_model_with_cv function.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_input_param_list_output_cv_imp.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","text":"function performs following steps: Fetches study data based specified parameters. Calculates liver scores harmonizes data. Prepares data machine learning, including imputation optional hyperparameter tuning. Trains evaluates Random Forest model cross-validation. Applies error correction (specified) selects important features.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_input_param_list_output_cv_imp.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Prepare and Evaluate Random Forest Model with Cross-Validation and Feature Importance — get_rf_input_param_list_output_cv_imp","text":"","code":"# Example usage of the function result <- get_rf_input_param_list_output_cv_imp( path_db = \"path/to/database\", rat_studies = TRUE, studyid_metadata = metadata_df, fake_study = FALSE, use_xpt_file = FALSE, Round = TRUE, Impute = TRUE, reps = 10, holdback = 0.2, Undersample = TRUE, hyperparameter_tuning = TRUE, error_correction_method = \"Flip\", best.m = NULL, testReps = 5, indeterminateUpper = 0.9, indeterminateLower = 0.1, Type = \"classification\", nTopImportance = 10 ) #> Error in get_repeat_dose_parallel_studyids(path_db = path_db, rat_studies = rat_studies): Database file not found at the specified path!"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_output_cv_imp.html","id":null,"dir":"Reference","previous_headings":"","what":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","title":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","text":"function performs cross-validation Random Forest model, tracks performance metrics (sensitivity, specificity, accuracy), handles indeterminate predictions, computes feature importance based either Gini Accuracy. function returns performance summaries feature importance rankings specified number test repetitions.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_output_cv_imp.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","text":"","code":"get_rf_model_output_cv_imp( scores_df = NULL, Undersample = FALSE, best.m = NULL, testReps, indeterminateUpper, indeterminateLower, Type, nTopImportance )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_output_cv_imp.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","text":"scores_df data frame containing features target variable training testing model. Undersample logical flag indicating whether apply undersampling training data. Defaults FALSE. best.m numeric value representing number features sample Random Forest model, NULL calculate automatically. testReps integer specifying number repetitions cross-validation. Must least 2. indeterminateUpper numeric threshold predictions considered indeterminate. indeterminateLower numeric threshold predictions considered indeterminate. Type integer specifying type importance compute. 1 MeanDecreaseAccuracy, 2 MeanDecreaseGini. nTopImportance integer specifying number top features display based importance scores.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_output_cv_imp.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","text":"list following elements: performance_metrics vector aggregated performance metrics (e.g., sensitivity, specificity, accuracy, etc.). feature_importance matrix containing importance top nTopImportance features, ordered importance score. raw_results list containing raw results debugging analysis, including sensitivity, specificity, accuracy, Gini scores across test repetitions.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_output_cv_imp.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","text":"function splits input data training testing sets based specified number test repetitions (testReps). iteration, trains Random Forest model makes predictions test data. Indeterminate predictions handled marking NA. function tracks performance metrics sensitivity, specificity, accuracy, computes top nTopImportance features based either Mean Decrease Accuracy Mean Decrease Gini.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_output_cv_imp.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Perform Cross-Validation with Random Forest and Feature Importance Calculation — get_rf_model_output_cv_imp","text":"","code":"# Example usage of the function result <- get_rf_model_output_cv_imp( scores_df = your_data, Undersample = FALSE, best.m = 3, testReps = 5, indeterminateUpper = 0.8, indeterminateLower = 0.2, Type = 1, nTopImportance = 10 ) #> Error: object 'your_data' not found # View performance metrics print(result$performance_metrics) #> Error: object 'result' not found # View top features by importance print(result$feature_importance) #> Error: object 'result' not found"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_with_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Random Forest with Cross-Validation — get_rf_model_with_cv","title":"Random Forest with Cross-Validation — get_rf_model_with_cv","text":"function builds random forest model using randomForest package, evaluates cross-validation, computes performance metrics sensitivity, specificity, accuracy. optionally applies undersampling handle class imbalance supports custom settings number predictors sampled split.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_with_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Random Forest with Cross-Validation — get_rf_model_with_cv","text":"","code":"get_rf_model_with_cv(Data, Undersample = FALSE, best.m = NULL, testReps, Type)"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_with_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Random Forest with Cross-Validation — get_rf_model_with_cv","text":"Data Mandatory, data frame input dataset, must include column named Target_Organ response variable. Undersample Optional, logical TRUE, balances dataset undersampling majority class. Default FALSE. best.m Optional, numeric NULL Specifies number predictors sampled split. NULL, default value randomForest used. testReps Mandatory, integer number cross-validation repetitions. Must least 2. Type Mandatory, numeric Specifies importance metric type: 1 Mean Decrease Accuracy 2 Gini.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_with_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Random Forest with Cross-Validation — get_rf_model_with_cv","text":"list following elements: performance_metrics: vector aggregated performance metrics, including sensitivity, specificity, accuracy. raw_results: list containing raw sensitivity, specificity, accuracy values cross-validation fold.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_with_cv.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Random Forest with Cross-Validation — get_rf_model_with_cv","text":"function splits input data training testing subsets based specified testReps cross-validation folds. undersampling enabled, function balances training set reduce class imbalance. random forest model trained training set, predictions evaluated test set. results aggregated provide summary performance metrics.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_rf_model_with_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Random Forest with Cross-Validation — get_rf_model_with_cv","text":"","code":"# Load necessary libraries library(randomForest) #> randomForest 4.7-1.2 #> Type rfNews() to see new features/changes/bug fixes. library(caret) #> Loading required package: ggplot2 #> #> Attaching package: 'ggplot2' #> The following object is masked from 'package:randomForest': #> #> margin #> Loading required package: lattice # Example dataset data(iris) iris$Target_Organ <- ifelse(iris$Species == \"setosa\", 1, 0) iris <- iris[, -5] # Remove Species column # Run the function results <- get_rf_model_with_cv(Data = iris, Undersample = TRUE, best.m = 2, testReps = 5, Type = 2) #> Warning: The response has five or fewer unique values. Are you sure you want to do regression? #> Error in randomForest.default(m, y, ...): data (x) has 0 rows # Print results print(results$performance_metrics) #> Error: object 'results' not found"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_zone_exclusioned_rf_model_with_cv.html","id":null,"dir":"Reference","previous_headings":"","what":"Random Forest Model with Cross-validation and Exclusion — get_zone_exclusioned_rf_model_with_cv","title":"Random Forest Model with Cross-validation and Exclusion — get_zone_exclusioned_rf_model_with_cv","text":"function implements Random Forest classification model cross-validation allows undersampling, handling indeterminate predictions, calculating various model performance metrics sensitivity, specificity, accuracy. tracks proportion indeterminate predictions provides aggregated performance summary across multiple test repetitions.","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_zone_exclusioned_rf_model_with_cv.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Random Forest Model with Cross-validation and Exclusion — get_zone_exclusioned_rf_model_with_cv","text":"","code":"get_zone_exclusioned_rf_model_with_cv( Data = NULL, Undersample = FALSE, best.m = NULL, testReps, indeterminateUpper, indeterminateLower, Type )"},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_zone_exclusioned_rf_model_with_cv.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Random Forest Model with Cross-validation and Exclusion — get_zone_exclusioned_rf_model_with_cv","text":"Data data frame containing features target variable Target_Organ train Random Forest model . Undersample logical value indicating whether perform undersampling balance classes training data. Defaults FALSE. best.m numeric value representing best number variables (mytry) use split Random Forest model. can manually set determined optimization. testReps integer specifying number test repetitions. must least 2, function relies multiple test sets assess model performance. indeterminateUpper numeric value indicating upper bound predicted probability consider prediction indeterminate. Predictions probabilities within range marked indeterminate. indeterminateLower numeric value indicating lower bound predicted probability consider prediction indeterminate. Predictions probabilities within range marked indeterminate. Type integer indicating type feature importance use Random Forest model. Typically, 1 \"Mean Decrease Accuracy\" 2 \"Mean Decrease Gini\".","code":""},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_zone_exclusioned_rf_model_with_cv.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Random Forest Model with Cross-validation and Exclusion — get_zone_exclusioned_rf_model_with_cv","text":"list containing two components: performance_metrics vector aggregated performance metrics, including sensitivity, specificity, accuracy, others, calculated across test repetitions. raw_results list containing raw performance metrics repetition, including sensitivity, specificity, accuracy.","code":""},{"path":[]},{"path":"https://aminuldu07.github.io/SENDQSAR/reference/get_zone_exclusioned_rf_model_with_cv.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Random Forest Model with Cross-validation and Exclusion — get_zone_exclusioned_rf_model_with_cv","text":"","code":"if (FALSE) { # \\dontrun{ # Example usage Data <- your_data_frame # Replace with actual dataset results <- get_zone_exclusioned_rf_model_with_cv(Data = Data, Undersample = TRUE, best.m = 5, testReps = 10, indeterminateUpper = 0.8, indeterminateLower = 0.2, Type = 1) # View the aggregated performance metrics print(results$performance_metrics) # Access raw results for further analysis print(results$raw_results) } # }"}]
diff --git a/vignettes/get_bw_score.Rmd b/vignettes/get_bw_score.Rmd
index 543a9f2..f4327f4 100644
--- a/vignettes/get_bw_score.Rmd
+++ b/vignettes/get_bw_score.Rmd
@@ -96,13 +96,15 @@ The adjusted weights are further normalized using the Z-score equation described
### Dependencies
The function requires the following R packages:
+
- `RSQLite`: To connect to the SQLite database.
- `haven` : To read `.xpt` file, if `use_xpt_file = TRUE`.
----
+***
This implementation ensures flexibility in handling different input types and configurations while maintaining a consistent structure for the output.
+***
## Example Usage
```r