Skip to content

Commit

Permalink
rmd file updated
Browse files Browse the repository at this point in the history
  • Loading branch information
aminuldu07 committed Jan 1, 2025
1 parent 136841a commit 5c6429a
Show file tree
Hide file tree
Showing 3 changed files with 321 additions and 93 deletions.
117 changes: 87 additions & 30 deletions README.Rmd
Original file line number Diff line number Diff line change
@@ -1,55 +1,112 @@

---
title: "SENDQSAR"
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->
# SENDQSAR: QSAR Modeling with SEND Database

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
## About

# SENDQSAR
This package facilitates developing Quantitative Structure-Activity Relationship (QSAR) models using the SEND database. It streamlines data acquisition, preprocessing, descriptor calculation, and model evaluation, enabling researchers to efficiently explore molecular descriptors and create robust predictive models.

<!-- badges: start -->
<!-- badges: end -->
## Features

The goal of SENDQSAR is to ...
- **Automated Data Processing**: Simplifies data acquisition and preprocessing steps.
- **Comprehensive Analysis**: Provides z-score calculations for various parameters such as body weight, liver-to-body weight ratio, and laboratory tests.
- **Machine Learning Integration**: Supports classification modeling, hyperparameter tuning, and performance evaluation.
- **Visualization Tools**: Includes histograms, bar plots, and AUC curves for better data interpretation.

## Installation
## Functions Overview

### Data Acquisition and Processing

- `get_compile_data` - Fetches data from the database specified by the database path into a structured data frame for analysis.
- `get_bw_score` - Calculates body weight (BW) z-scores for each animal.
- `get_livertobw_zscore` - Computes liver-to-body weight z-scores.
- `get_lb_score` - Calculates z-scores for laboratory test (LB) results.
- `get_mi_score` - Computes z-scores for microscopic findings (MI).
- `get_liver_om_lb_mi_tox_score_list` - Combines z-scores of LB, MI, and liver-to-BW into a single data frame.
- `get_col_harmonized_scores_df` - Harmonizes column names across studies.

### Machine Learning Preparation and Modeling

- `get_ml_data_and_tuned_hyperparameters` - Prepares data and tunes hyperparameters for machine learning.
- `get_rf_model_with_cv` - Builds a random forest model with cross-validation and outputs performance metrics.
- `get_zone_exclusioned_rf_model_with_cv` - Introduces an indeterminate zone for improved classification accuracy.
- `get_imp_features_from_rf_model_with_cv` - Computes feature importance for model interpretation.
- `get_auc_curve_with_rf_model` - Generates AUC curves to evaluate model performance.

### Visualization and Reporting

- `get_histogram_barplot` - Creates bar plots for target variable classes.
- `get_reprtree_from_rf_model` - Builds representative decision trees for interpretability.
- `get_prediction_plot` - Visualizes prediction probabilities with histograms.

### Automated Pipelines

- `get_Data_formatted_for_ml_and_best.m` - Formats data for machine learning pipelines.
- `get_rf_input_param_list_output_cv_imp` - Automates preprocessing, modeling, and evaluation in one step.
- `get_zone_exclusioned_rf_model_cv_imp` - Similar to the above function, but excludes uncertain predictions based on thresholds.

## Workflow

You can install the development version of SENDQSAR from [GitHub](https://github.com/) with:
1. **Input Database Path**: Provide the database path containing nonclinical study results for each STUDYID.
2. **Preprocessing**: Use functions 1-8 to clean, harmonize, and prepare data.
3. **Model Building**: Employ machine learning functions (9-18) for training, validation, and evaluation.
4. **Visualization**: Generate plots and performance metrics for better interpretation.

``` r
# install.packages("pak")
pak::pak("aminuldu07/SENDQSAR")
## Dependencies

- `randomForest`
- `ROCR`
- `ggplot2`
- `reprtree`

## Installation

```R
# Install from GitHub
devtools::install_github("aminuldu07/SENDQSAR")
```

## Example
## Examples

This is a basic example which shows you how to solve a common problem:
### Example 1: Basic Data Compilation

```{r example}
```R
library(SENDQSAR)
## basic example code
data <- get_compile_data("/path/to/database")
```

What is special about using `README.Rmd` instead of just `README.md`? You can include R chunks like so:
### Example 2: Z-Score Calculation

```{r cars}
summary(cars)
```R
bw_scores <- get_bw_score(data)
liver_scores <- get_livertobw_zscore(data)
```

You'll still need to render `README.Rmd` regularly, to keep `README.md` up-to-date. `devtools::build_readme()` is handy for this.
### Example 3: Machine Learning Model

```R
model <- get_rf_model_with_cv(data, n_repeats=10)
print(model$confusion_matrix)
```

You can also embed plots, for example:
### Example 4: Visualization

```{r pressure, echo = FALSE}
plot(pressure)
```R
get_histogram_barplot(data, target_col="target_variable")
```

In that case, don't forget to commit and push the resulting figure files, so they display on GitHub and CRAN.
## Contribution

Contributions are welcome! Feel free to submit issues or pull requests via GitHub.

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Contact

For more information, visit the project GitHub Page or contact [email protected].
148 changes: 116 additions & 32 deletions README.html
Original file line number Diff line number Diff line change
Expand Up @@ -601,41 +601,125 @@

<body>

<!-- README.md is generated from README.Rmd. Please edit that file -->

<h1 id="sendqsar">SENDQSAR</h1>
<!-- badges: start -->

<!-- badges: end -->

<p>The goal of SENDQSAR is to …</p>
<h1 id="sendqsar-qsar-modeling-with-send-database">SENDQSAR: QSAR
Modeling with SEND Database</h1>
<h2 id="about">About</h2>
<p>This package facilitates developing Quantitative Structure-Activity
Relationship (QSAR) models using the SEND database. It streamlines data
acquisition, preprocessing, descriptor calculation, and model
evaluation, enabling researchers to efficiently explore molecular
descriptors and create robust predictive models.</p>
<h2 id="features">Features</h2>
<ul>
<li><strong>Automated Data Processing</strong>: Simplifies data
acquisition and preprocessing steps.</li>
<li><strong>Comprehensive Analysis</strong>: Provides z-score
calculations for various parameters such as body weight, liver-to-body
weight ratio, and laboratory tests.</li>
<li><strong>Machine Learning Integration</strong>: Supports
classification modeling, hyperparameter tuning, and performance
evaluation.</li>
<li><strong>Visualization Tools</strong>: Includes histograms, bar
plots, and AUC curves for better data interpretation.</li>
</ul>
<h2 id="functions-overview">Functions Overview</h2>
<h3 id="data-acquisition-and-processing">Data Acquisition and
Processing</h3>
<ul>
<li><code>get_compile_data</code> - Fetches data from the database
specified by the database path into a structured data frame for
analysis.</li>
<li><code>get_bw_score</code> - Calculates body weight (BW) z-scores for
each animal.</li>
<li><code>get_livertobw_zscore</code> - Computes liver-to-body weight
z-scores.</li>
<li><code>get_lb_score</code> - Calculates z-scores for laboratory test
(LB) results.</li>
<li><code>get_mi_score</code> - Computes z-scores for microscopic
findings (MI).</li>
<li><code>get_liver_om_lb_mi_tox_score_list</code> - Combines z-scores
of LB, MI, and liver-to-BW into a single data frame.</li>
<li><code>get_col_harmonized_scores_df</code> - Harmonizes column names
across studies.</li>
</ul>
<h3 id="machine-learning-preparation-and-modeling">Machine Learning
Preparation and Modeling</h3>
<ul>
<li><code>get_ml_data_and_tuned_hyperparameters</code> - Prepares data
and tunes hyperparameters for machine learning.</li>
<li><code>get_rf_model_with_cv</code> - Builds a random forest model
with cross-validation and outputs performance metrics.</li>
<li><code>get_zone_exclusioned_rf_model_with_cv</code> - Introduces an
indeterminate zone for improved classification accuracy.</li>
<li><code>get_imp_features_from_rf_model_with_cv</code> - Computes
feature importance for model interpretation.</li>
<li><code>get_auc_curve_with_rf_model</code> - Generates AUC curves to
evaluate model performance.</li>
</ul>
<h3 id="visualization-and-reporting">Visualization and Reporting</h3>
<ul>
<li><code>get_histogram_barplot</code> - Creates bar plots for target
variable classes.</li>
<li><code>get_reprtree_from_rf_model</code> - Builds representative
decision trees for interpretability.</li>
<li><code>get_prediction_plot</code> - Visualizes prediction
probabilities with histograms.</li>
</ul>
<h3 id="automated-pipelines">Automated Pipelines</h3>
<ul>
<li><code>get_Data_formatted_for_ml_and_best.m</code> - Formats data for
machine learning pipelines.</li>
<li><code>get_rf_input_param_list_output_cv_imp</code> - Automates
preprocessing, modeling, and evaluation in one step.</li>
<li><code>get_zone_exclusioned_rf_model_cv_imp</code> - Similar to the
above function, but excludes uncertain predictions based on
thresholds.</li>
</ul>
<h2 id="workflow">Workflow</h2>
<ol style="list-style-type: decimal">
<li><strong>Input Database Path</strong>: Provide the database path
containing nonclinical study results for each STUDYID.</li>
<li><strong>Preprocessing</strong>: Use functions 1-8 to clean,
harmonize, and prepare data.</li>
<li><strong>Model Building</strong>: Employ machine learning functions
(9-18) for training, validation, and evaluation.</li>
<li><strong>Visualization</strong>: Generate plots and performance
metrics for better interpretation.</li>
</ol>
<h2 id="dependencies">Dependencies</h2>
<ul>
<li><code>randomForest</code></li>
<li><code>ROCR</code></li>
<li><code>ggplot2</code></li>
<li><code>reprtree</code></li>
</ul>
<h2 id="installation">Installation</h2>
<p>You can install the development version of SENDQSAR from <a href="https://github.com/">GitHub</a> with:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" tabindex="-1"></a><span class="co"># install.packages(&quot;pak&quot;)</span></span>
<span id="cb1-2"><a href="#cb1-2" tabindex="-1"></a>pak<span class="sc">::</span><span class="fu">pak</span>(<span class="st">&quot;aminuldu07/SENDQSAR&quot;</span>)</span></code></pre></div>
<h2 id="example">Example</h2>
<p>This is a basic example which shows you how to solve a common
problem:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" tabindex="-1"></a><span class="co"># Install from GitHub</span></span>
<span id="cb1-2"><a href="#cb1-2" tabindex="-1"></a>devtools<span class="sc">::</span><span class="fu">install_github</span>(<span class="st">&quot;aminuldu07/SENDQSAR&quot;</span>)</span></code></pre></div>
<h2 id="examples">Examples</h2>
<h3 id="example-1-basic-data-compilation">Example 1: Basic Data
Compilation</h3>
<div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" tabindex="-1"></a><span class="fu">library</span>(SENDQSAR)</span>
<span id="cb2-2"><a href="#cb2-2" tabindex="-1"></a><span class="do">## basic example code</span></span></code></pre></div>
<p>What is special about using <code>README.Rmd</code> instead of just
<code>README.md</code>? You can include R chunks like so:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" tabindex="-1"></a><span class="fu">summary</span>(cars)</span>
<span id="cb3-2"><a href="#cb3-2" tabindex="-1"></a><span class="co">#&gt; speed dist </span></span>
<span id="cb3-3"><a href="#cb3-3" tabindex="-1"></a><span class="co">#&gt; Min. : 4.0 Min. : 2.00 </span></span>
<span id="cb3-4"><a href="#cb3-4" tabindex="-1"></a><span class="co">#&gt; 1st Qu.:12.0 1st Qu.: 26.00 </span></span>
<span id="cb3-5"><a href="#cb3-5" tabindex="-1"></a><span class="co">#&gt; Median :15.0 Median : 36.00 </span></span>
<span id="cb3-6"><a href="#cb3-6" tabindex="-1"></a><span class="co">#&gt; Mean :15.4 Mean : 42.98 </span></span>
<span id="cb3-7"><a href="#cb3-7" tabindex="-1"></a><span class="co">#&gt; 3rd Qu.:19.0 3rd Qu.: 56.00 </span></span>
<span id="cb3-8"><a href="#cb3-8" tabindex="-1"></a><span class="co">#&gt; Max. :25.0 Max. :120.00</span></span></code></pre></div>
<p>You’ll still need to render <code>README.Rmd</code> regularly, to
keep <code>README.md</code> up-to-date.
<code>devtools::build_readme()</code> is handy for this.</p>
<p>You can also embed plots, for example:</p>
<img role="img" src="" width="100%" />

<p>In that case, don’t forget to commit and push the resulting figure
files, so they display on GitHub and CRAN.</p>
<span id="cb2-2"><a href="#cb2-2" tabindex="-1"></a>data <span class="ot">&lt;-</span> <span class="fu">get_compile_data</span>(<span class="st">&quot;/path/to/database&quot;</span>)</span></code></pre></div>
<h3 id="example-2-z-score-calculation">Example 2: Z-Score
Calculation</h3>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" tabindex="-1"></a>bw_scores <span class="ot">&lt;-</span> <span class="fu">get_bw_score</span>(data)</span>
<span id="cb3-2"><a href="#cb3-2" tabindex="-1"></a>liver_scores <span class="ot">&lt;-</span> <span class="fu">get_livertobw_zscore</span>(data)</span></code></pre></div>
<h3 id="example-3-machine-learning-model">Example 3: Machine Learning
Model</h3>
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" tabindex="-1"></a>model <span class="ot">&lt;-</span> <span class="fu">get_rf_model_with_cv</span>(data, <span class="at">n_repeats=</span><span class="dv">10</span>)</span>
<span id="cb4-2"><a href="#cb4-2" tabindex="-1"></a><span class="fu">print</span>(model<span class="sc">$</span>confusion_matrix)</span></code></pre></div>
<h3 id="example-4-visualization">Example 4: Visualization</h3>
<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" tabindex="-1"></a><span class="fu">get_histogram_barplot</span>(data, <span class="at">target_col=</span><span class="st">&quot;target_variable&quot;</span>)</span></code></pre></div>
<h2 id="contribution">Contribution</h2>
<p>Contributions are welcome! Feel free to submit issues or pull
requests via GitHub.</p>
<h2 id="license">License</h2>
<p>This project is licensed under the MIT License - see the LICENSE file
for details.</p>
<h2 id="contact">Contact</h2>
<p>For more information, visit the project GitHub Page or contact <a href="mailto:[email protected]">[email protected]</a>.</p>

</body>
</html>
Loading

0 comments on commit 5c6429a

Please sign in to comment.