-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
136841a
commit 5c6429a
Showing
3 changed files
with
321 additions
and
93 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,55 +1,112 @@ | ||
|
||
--- | ||
title: "SENDQSAR" | ||
output: github_document | ||
--- | ||
|
||
<!-- README.md is generated from README.Rmd. Please edit that file --> | ||
# SENDQSAR: QSAR Modeling with SEND Database | ||
|
||
```{r, include = FALSE} | ||
knitr::opts_chunk$set( | ||
collapse = TRUE, | ||
comment = "#>", | ||
fig.path = "man/figures/README-", | ||
out.width = "100%" | ||
) | ||
``` | ||
## About | ||
|
||
# SENDQSAR | ||
This package facilitates developing Quantitative Structure-Activity Relationship (QSAR) models using the SEND database. It streamlines data acquisition, preprocessing, descriptor calculation, and model evaluation, enabling researchers to efficiently explore molecular descriptors and create robust predictive models. | ||
|
||
<!-- badges: start --> | ||
<!-- badges: end --> | ||
## Features | ||
|
||
The goal of SENDQSAR is to ... | ||
- **Automated Data Processing**: Simplifies data acquisition and preprocessing steps. | ||
- **Comprehensive Analysis**: Provides z-score calculations for various parameters such as body weight, liver-to-body weight ratio, and laboratory tests. | ||
- **Machine Learning Integration**: Supports classification modeling, hyperparameter tuning, and performance evaluation. | ||
- **Visualization Tools**: Includes histograms, bar plots, and AUC curves for better data interpretation. | ||
|
||
## Installation | ||
## Functions Overview | ||
|
||
### Data Acquisition and Processing | ||
|
||
- `get_compile_data` - Fetches data from the database specified by the database path into a structured data frame for analysis. | ||
- `get_bw_score` - Calculates body weight (BW) z-scores for each animal. | ||
- `get_livertobw_zscore` - Computes liver-to-body weight z-scores. | ||
- `get_lb_score` - Calculates z-scores for laboratory test (LB) results. | ||
- `get_mi_score` - Computes z-scores for microscopic findings (MI). | ||
- `get_liver_om_lb_mi_tox_score_list` - Combines z-scores of LB, MI, and liver-to-BW into a single data frame. | ||
- `get_col_harmonized_scores_df` - Harmonizes column names across studies. | ||
|
||
### Machine Learning Preparation and Modeling | ||
|
||
- `get_ml_data_and_tuned_hyperparameters` - Prepares data and tunes hyperparameters for machine learning. | ||
- `get_rf_model_with_cv` - Builds a random forest model with cross-validation and outputs performance metrics. | ||
- `get_zone_exclusioned_rf_model_with_cv` - Introduces an indeterminate zone for improved classification accuracy. | ||
- `get_imp_features_from_rf_model_with_cv` - Computes feature importance for model interpretation. | ||
- `get_auc_curve_with_rf_model` - Generates AUC curves to evaluate model performance. | ||
|
||
### Visualization and Reporting | ||
|
||
- `get_histogram_barplot` - Creates bar plots for target variable classes. | ||
- `get_reprtree_from_rf_model` - Builds representative decision trees for interpretability. | ||
- `get_prediction_plot` - Visualizes prediction probabilities with histograms. | ||
|
||
### Automated Pipelines | ||
|
||
- `get_Data_formatted_for_ml_and_best.m` - Formats data for machine learning pipelines. | ||
- `get_rf_input_param_list_output_cv_imp` - Automates preprocessing, modeling, and evaluation in one step. | ||
- `get_zone_exclusioned_rf_model_cv_imp` - Similar to the above function, but excludes uncertain predictions based on thresholds. | ||
|
||
## Workflow | ||
|
||
You can install the development version of SENDQSAR from [GitHub](https://github.com/) with: | ||
1. **Input Database Path**: Provide the database path containing nonclinical study results for each STUDYID. | ||
2. **Preprocessing**: Use functions 1-8 to clean, harmonize, and prepare data. | ||
3. **Model Building**: Employ machine learning functions (9-18) for training, validation, and evaluation. | ||
4. **Visualization**: Generate plots and performance metrics for better interpretation. | ||
|
||
``` r | ||
# install.packages("pak") | ||
pak::pak("aminuldu07/SENDQSAR") | ||
## Dependencies | ||
|
||
- `randomForest` | ||
- `ROCR` | ||
- `ggplot2` | ||
- `reprtree` | ||
|
||
## Installation | ||
|
||
```R | ||
# Install from GitHub | ||
devtools::install_github("aminuldu07/SENDQSAR") | ||
``` | ||
|
||
## Example | ||
## Examples | ||
|
||
This is a basic example which shows you how to solve a common problem: | ||
### Example 1: Basic Data Compilation | ||
|
||
```{r example} | ||
```R | ||
library(SENDQSAR) | ||
## basic example code | ||
data <- get_compile_data("/path/to/database") | ||
``` | ||
|
||
What is special about using `README.Rmd` instead of just `README.md`? You can include R chunks like so: | ||
### Example 2: Z-Score Calculation | ||
|
||
```{r cars} | ||
summary(cars) | ||
```R | ||
bw_scores <- get_bw_score(data) | ||
liver_scores <- get_livertobw_zscore(data) | ||
``` | ||
|
||
You'll still need to render `README.Rmd` regularly, to keep `README.md` up-to-date. `devtools::build_readme()` is handy for this. | ||
### Example 3: Machine Learning Model | ||
|
||
```R | ||
model <- get_rf_model_with_cv(data, n_repeats=10) | ||
print(model$confusion_matrix) | ||
``` | ||
|
||
You can also embed plots, for example: | ||
### Example 4: Visualization | ||
|
||
```{r pressure, echo = FALSE} | ||
plot(pressure) | ||
```R | ||
get_histogram_barplot(data, target_col="target_variable") | ||
``` | ||
|
||
In that case, don't forget to commit and push the resulting figure files, so they display on GitHub and CRAN. | ||
## Contribution | ||
|
||
Contributions are welcome! Feel free to submit issues or pull requests via GitHub. | ||
|
||
## License | ||
|
||
This project is licensed under the MIT License - see the LICENSE file for details. | ||
|
||
## Contact | ||
|
||
For more information, visit the project GitHub Page or contact [email protected]. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -601,41 +601,125 @@ | |
|
||
<body> | ||
|
||
<!-- README.md is generated from README.Rmd. Please edit that file --> | ||
|
||
<h1 id="sendqsar">SENDQSAR</h1> | ||
<!-- badges: start --> | ||
|
||
<!-- badges: end --> | ||
|
||
<p>The goal of SENDQSAR is to …</p> | ||
<h1 id="sendqsar-qsar-modeling-with-send-database">SENDQSAR: QSAR | ||
Modeling with SEND Database</h1> | ||
<h2 id="about">About</h2> | ||
<p>This package facilitates developing Quantitative Structure-Activity | ||
Relationship (QSAR) models using the SEND database. It streamlines data | ||
acquisition, preprocessing, descriptor calculation, and model | ||
evaluation, enabling researchers to efficiently explore molecular | ||
descriptors and create robust predictive models.</p> | ||
<h2 id="features">Features</h2> | ||
<ul> | ||
<li><strong>Automated Data Processing</strong>: Simplifies data | ||
acquisition and preprocessing steps.</li> | ||
<li><strong>Comprehensive Analysis</strong>: Provides z-score | ||
calculations for various parameters such as body weight, liver-to-body | ||
weight ratio, and laboratory tests.</li> | ||
<li><strong>Machine Learning Integration</strong>: Supports | ||
classification modeling, hyperparameter tuning, and performance | ||
evaluation.</li> | ||
<li><strong>Visualization Tools</strong>: Includes histograms, bar | ||
plots, and AUC curves for better data interpretation.</li> | ||
</ul> | ||
<h2 id="functions-overview">Functions Overview</h2> | ||
<h3 id="data-acquisition-and-processing">Data Acquisition and | ||
Processing</h3> | ||
<ul> | ||
<li><code>get_compile_data</code> - Fetches data from the database | ||
specified by the database path into a structured data frame for | ||
analysis.</li> | ||
<li><code>get_bw_score</code> - Calculates body weight (BW) z-scores for | ||
each animal.</li> | ||
<li><code>get_livertobw_zscore</code> - Computes liver-to-body weight | ||
z-scores.</li> | ||
<li><code>get_lb_score</code> - Calculates z-scores for laboratory test | ||
(LB) results.</li> | ||
<li><code>get_mi_score</code> - Computes z-scores for microscopic | ||
findings (MI).</li> | ||
<li><code>get_liver_om_lb_mi_tox_score_list</code> - Combines z-scores | ||
of LB, MI, and liver-to-BW into a single data frame.</li> | ||
<li><code>get_col_harmonized_scores_df</code> - Harmonizes column names | ||
across studies.</li> | ||
</ul> | ||
<h3 id="machine-learning-preparation-and-modeling">Machine Learning | ||
Preparation and Modeling</h3> | ||
<ul> | ||
<li><code>get_ml_data_and_tuned_hyperparameters</code> - Prepares data | ||
and tunes hyperparameters for machine learning.</li> | ||
<li><code>get_rf_model_with_cv</code> - Builds a random forest model | ||
with cross-validation and outputs performance metrics.</li> | ||
<li><code>get_zone_exclusioned_rf_model_with_cv</code> - Introduces an | ||
indeterminate zone for improved classification accuracy.</li> | ||
<li><code>get_imp_features_from_rf_model_with_cv</code> - Computes | ||
feature importance for model interpretation.</li> | ||
<li><code>get_auc_curve_with_rf_model</code> - Generates AUC curves to | ||
evaluate model performance.</li> | ||
</ul> | ||
<h3 id="visualization-and-reporting">Visualization and Reporting</h3> | ||
<ul> | ||
<li><code>get_histogram_barplot</code> - Creates bar plots for target | ||
variable classes.</li> | ||
<li><code>get_reprtree_from_rf_model</code> - Builds representative | ||
decision trees for interpretability.</li> | ||
<li><code>get_prediction_plot</code> - Visualizes prediction | ||
probabilities with histograms.</li> | ||
</ul> | ||
<h3 id="automated-pipelines">Automated Pipelines</h3> | ||
<ul> | ||
<li><code>get_Data_formatted_for_ml_and_best.m</code> - Formats data for | ||
machine learning pipelines.</li> | ||
<li><code>get_rf_input_param_list_output_cv_imp</code> - Automates | ||
preprocessing, modeling, and evaluation in one step.</li> | ||
<li><code>get_zone_exclusioned_rf_model_cv_imp</code> - Similar to the | ||
above function, but excludes uncertain predictions based on | ||
thresholds.</li> | ||
</ul> | ||
<h2 id="workflow">Workflow</h2> | ||
<ol style="list-style-type: decimal"> | ||
<li><strong>Input Database Path</strong>: Provide the database path | ||
containing nonclinical study results for each STUDYID.</li> | ||
<li><strong>Preprocessing</strong>: Use functions 1-8 to clean, | ||
harmonize, and prepare data.</li> | ||
<li><strong>Model Building</strong>: Employ machine learning functions | ||
(9-18) for training, validation, and evaluation.</li> | ||
<li><strong>Visualization</strong>: Generate plots and performance | ||
metrics for better interpretation.</li> | ||
</ol> | ||
<h2 id="dependencies">Dependencies</h2> | ||
<ul> | ||
<li><code>randomForest</code></li> | ||
<li><code>ROCR</code></li> | ||
<li><code>ggplot2</code></li> | ||
<li><code>reprtree</code></li> | ||
</ul> | ||
<h2 id="installation">Installation</h2> | ||
<p>You can install the development version of SENDQSAR from <a href="https://github.com/">GitHub</a> with:</p> | ||
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" tabindex="-1"></a><span class="co"># install.packages("pak")</span></span> | ||
<span id="cb1-2"><a href="#cb1-2" tabindex="-1"></a>pak<span class="sc">::</span><span class="fu">pak</span>(<span class="st">"aminuldu07/SENDQSAR"</span>)</span></code></pre></div> | ||
<h2 id="example">Example</h2> | ||
<p>This is a basic example which shows you how to solve a common | ||
problem:</p> | ||
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" tabindex="-1"></a><span class="co"># Install from GitHub</span></span> | ||
<span id="cb1-2"><a href="#cb1-2" tabindex="-1"></a>devtools<span class="sc">::</span><span class="fu">install_github</span>(<span class="st">"aminuldu07/SENDQSAR"</span>)</span></code></pre></div> | ||
<h2 id="examples">Examples</h2> | ||
<h3 id="example-1-basic-data-compilation">Example 1: Basic Data | ||
Compilation</h3> | ||
<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" tabindex="-1"></a><span class="fu">library</span>(SENDQSAR)</span> | ||
<span id="cb2-2"><a href="#cb2-2" tabindex="-1"></a><span class="do">## basic example code</span></span></code></pre></div> | ||
<p>What is special about using <code>README.Rmd</code> instead of just | ||
<code>README.md</code>? You can include R chunks like so:</p> | ||
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" tabindex="-1"></a><span class="fu">summary</span>(cars)</span> | ||
<span id="cb3-2"><a href="#cb3-2" tabindex="-1"></a><span class="co">#> speed dist </span></span> | ||
<span id="cb3-3"><a href="#cb3-3" tabindex="-1"></a><span class="co">#> Min. : 4.0 Min. : 2.00 </span></span> | ||
<span id="cb3-4"><a href="#cb3-4" tabindex="-1"></a><span class="co">#> 1st Qu.:12.0 1st Qu.: 26.00 </span></span> | ||
<span id="cb3-5"><a href="#cb3-5" tabindex="-1"></a><span class="co">#> Median :15.0 Median : 36.00 </span></span> | ||
<span id="cb3-6"><a href="#cb3-6" tabindex="-1"></a><span class="co">#> Mean :15.4 Mean : 42.98 </span></span> | ||
<span id="cb3-7"><a href="#cb3-7" tabindex="-1"></a><span class="co">#> 3rd Qu.:19.0 3rd Qu.: 56.00 </span></span> | ||
<span id="cb3-8"><a href="#cb3-8" tabindex="-1"></a><span class="co">#> Max. :25.0 Max. :120.00</span></span></code></pre></div> | ||
<p>You’ll still need to render <code>README.Rmd</code> regularly, to | ||
keep <code>README.md</code> up-to-date. | ||
<code>devtools::build_readme()</code> is handy for this.</p> | ||
<p>You can also embed plots, for example:</p> | ||
<img role="img" src="" width="100%" /> | ||
|
||
<p>In that case, don’t forget to commit and push the resulting figure | ||
files, so they display on GitHub and CRAN.</p> | ||
<span id="cb2-2"><a href="#cb2-2" tabindex="-1"></a>data <span class="ot"><-</span> <span class="fu">get_compile_data</span>(<span class="st">"/path/to/database"</span>)</span></code></pre></div> | ||
<h3 id="example-2-z-score-calculation">Example 2: Z-Score | ||
Calculation</h3> | ||
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" tabindex="-1"></a>bw_scores <span class="ot"><-</span> <span class="fu">get_bw_score</span>(data)</span> | ||
<span id="cb3-2"><a href="#cb3-2" tabindex="-1"></a>liver_scores <span class="ot"><-</span> <span class="fu">get_livertobw_zscore</span>(data)</span></code></pre></div> | ||
<h3 id="example-3-machine-learning-model">Example 3: Machine Learning | ||
Model</h3> | ||
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" tabindex="-1"></a>model <span class="ot"><-</span> <span class="fu">get_rf_model_with_cv</span>(data, <span class="at">n_repeats=</span><span class="dv">10</span>)</span> | ||
<span id="cb4-2"><a href="#cb4-2" tabindex="-1"></a><span class="fu">print</span>(model<span class="sc">$</span>confusion_matrix)</span></code></pre></div> | ||
<h3 id="example-4-visualization">Example 4: Visualization</h3> | ||
<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" tabindex="-1"></a><span class="fu">get_histogram_barplot</span>(data, <span class="at">target_col=</span><span class="st">"target_variable"</span>)</span></code></pre></div> | ||
<h2 id="contribution">Contribution</h2> | ||
<p>Contributions are welcome! Feel free to submit issues or pull | ||
requests via GitHub.</p> | ||
<h2 id="license">License</h2> | ||
<p>This project is licensed under the MIT License - see the LICENSE file | ||
for details.</p> | ||
<h2 id="contact">Contact</h2> | ||
<p>For more information, visit the project GitHub Page or contact <a href="mailto:[email protected]">[email protected]</a>.</p> | ||
|
||
</body> | ||
</html> |
Oops, something went wrong.