From 5c6429a38a58fecec732b254a05602f0dda632ae Mon Sep 17 00:00:00 2001
From: Md Aminul Islam Prodhan <aminuldu07@gmail.com>
Date: Tue, 31 Dec 2024 19:42:09 -0600
Subject: [PATCH] rmd file updated

---
 README.Rmd  | 117 ++++++++++++++++++++++++++++++-----------
 README.html | 148 ++++++++++++++++++++++++++++++++++++++++-----------
 README.md   | 149 +++++++++++++++++++++++++++++++++++++++++-----------
 3 files changed, 321 insertions(+), 93 deletions(-)

diff --git a/README.Rmd b/README.Rmd
index 176191c..fdd1fd3 100644
--- a/README.Rmd
+++ b/README.Rmd
@@ -1,55 +1,112 @@
+
 ---
+title: "SENDQSAR"
 output: github_document
 ---
 
-<!-- README.md is generated from README.Rmd. Please edit that file -->
+# SENDQSAR: QSAR Modeling with SEND Database
 
-```{r, include = FALSE}
-knitr::opts_chunk$set(
-  collapse = TRUE,
-  comment = "#>",
-  fig.path = "man/figures/README-",
-  out.width = "100%"
-)
-```
+## About
 
-# SENDQSAR
+This package facilitates developing Quantitative Structure-Activity Relationship (QSAR) models using the SEND database. It streamlines data acquisition, preprocessing, descriptor calculation, and model evaluation, enabling researchers to efficiently explore molecular descriptors and create robust predictive models.
 
-<!-- badges: start -->
-<!-- badges: end -->
+## Features
 
-The goal of SENDQSAR is to ...
+- **Automated Data Processing**: Simplifies data acquisition and preprocessing steps.
+- **Comprehensive Analysis**: Provides z-score calculations for various parameters such as body weight, liver-to-body weight ratio, and laboratory tests.
+- **Machine Learning Integration**: Supports classification modeling, hyperparameter tuning, and performance evaluation.
+- **Visualization Tools**: Includes histograms, bar plots, and AUC curves for better data interpretation.
 
-## Installation
+## Functions Overview
+
+### Data Acquisition and Processing
+
+- `get_compile_data` - Fetches data from the database specified by the database path into a structured data frame for analysis.
+- `get_bw_score` - Calculates body weight (BW) z-scores for each animal.
+- `get_livertobw_zscore` - Computes liver-to-body weight z-scores.
+- `get_lb_score` - Calculates z-scores for laboratory test (LB) results.
+- `get_mi_score` - Computes z-scores for microscopic findings (MI).
+- `get_liver_om_lb_mi_tox_score_list` - Combines z-scores of LB, MI, and liver-to-BW into a single data frame.
+- `get_col_harmonized_scores_df` - Harmonizes column names across studies.
+
+### Machine Learning Preparation and Modeling
+
+- `get_ml_data_and_tuned_hyperparameters` - Prepares data and tunes hyperparameters for machine learning.
+- `get_rf_model_with_cv` - Builds a random forest model with cross-validation and outputs performance metrics.
+- `get_zone_exclusioned_rf_model_with_cv` - Introduces an indeterminate zone for improved classification accuracy.
+- `get_imp_features_from_rf_model_with_cv` - Computes feature importance for model interpretation.
+- `get_auc_curve_with_rf_model` - Generates AUC curves to evaluate model performance.
+
+### Visualization and Reporting
+
+- `get_histogram_barplot` - Creates bar plots for target variable classes.
+- `get_reprtree_from_rf_model` - Builds representative decision trees for interpretability.
+- `get_prediction_plot` - Visualizes prediction probabilities with histograms.
+
+### Automated Pipelines
+
+- `get_Data_formatted_for_ml_and_best.m` - Formats data for machine learning pipelines.
+- `get_rf_input_param_list_output_cv_imp` - Automates preprocessing, modeling, and evaluation in one step.
+- `get_zone_exclusioned_rf_model_cv_imp` - Similar to the above function, but excludes uncertain predictions based on thresholds.
+
+## Workflow
 
-You can install the development version of SENDQSAR from [GitHub](https://github.com/) with:
+1. **Input Database Path**: Provide the database path containing nonclinical study results for each STUDYID.
+2. **Preprocessing**: Use functions 1-8 to clean, harmonize, and prepare data.
+3. **Model Building**: Employ machine learning functions (9-18) for training, validation, and evaluation.
+4. **Visualization**: Generate plots and performance metrics for better interpretation.
 
-``` r
-# install.packages("pak")
-pak::pak("aminuldu07/SENDQSAR")
+## Dependencies
+
+- `randomForest`
+- `ROCR`
+- `ggplot2`
+- `reprtree`
+
+## Installation
+
+```R
+# Install from GitHub
+devtools::install_github("aminuldu07/SENDQSAR")
 ```
 
-## Example
+## Examples
 
-This is a basic example which shows you how to solve a common problem:
+### Example 1: Basic Data Compilation
 
-```{r example}
+```R
 library(SENDQSAR)
-## basic example code
+data <- get_compile_data("/path/to/database")
 ```
 
-What is special about using `README.Rmd` instead of just `README.md`? You can include R chunks like so:
+### Example 2: Z-Score Calculation
 
-```{r cars}
-summary(cars)
+```R
+bw_scores <- get_bw_score(data)
+liver_scores <- get_livertobw_zscore(data)
 ```
 
-You'll still need to render `README.Rmd` regularly, to keep `README.md` up-to-date. `devtools::build_readme()` is handy for this.
+### Example 3: Machine Learning Model
+
+```R
+model <- get_rf_model_with_cv(data, n_repeats=10)
+print(model$confusion_matrix)
+```
 
-You can also embed plots, for example:
+### Example 4: Visualization
 
-```{r pressure, echo = FALSE}
-plot(pressure)
+```R
+get_histogram_barplot(data, target_col="target_variable")
 ```
 
-In that case, don't forget to commit and push the resulting figure files, so they display on GitHub and CRAN.
+## Contribution
+
+Contributions are welcome! Feel free to submit issues or pull requests via GitHub.
+
+## License
+
+This project is licensed under the MIT License - see the LICENSE file for details.
+
+## Contact
+
+For more information, visit the project GitHub Page or contact email@example.com.
diff --git a/README.html b/README.html
index 7a5ee86..7cd6f6e 100644
--- a/README.html
+++ b/README.html
@@ -601,41 +601,125 @@
 
 <body>
 
-<!-- README.md is generated from README.Rmd. Please edit that file -->
-
 <h1 id="sendqsar">SENDQSAR</h1>
-<!-- badges: start -->
-
-<!-- badges: end -->
-
-<p>The goal of SENDQSAR is to …</p>
+<h1 id="sendqsar-qsar-modeling-with-send-database">SENDQSAR: QSAR
+Modeling with SEND Database</h1>
+<h2 id="about">About</h2>
+<p>This package facilitates developing Quantitative Structure-Activity
+Relationship (QSAR) models using the SEND database. It streamlines data
+acquisition, preprocessing, descriptor calculation, and model
+evaluation, enabling researchers to efficiently explore molecular
+descriptors and create robust predictive models.</p>
+<h2 id="features">Features</h2>
+<ul>
+<li><strong>Automated Data Processing</strong>: Simplifies data
+acquisition and preprocessing steps.</li>
+<li><strong>Comprehensive Analysis</strong>: Provides z-score
+calculations for various parameters such as body weight, liver-to-body
+weight ratio, and laboratory tests.</li>
+<li><strong>Machine Learning Integration</strong>: Supports
+classification modeling, hyperparameter tuning, and performance
+evaluation.</li>
+<li><strong>Visualization Tools</strong>: Includes histograms, bar
+plots, and AUC curves for better data interpretation.</li>
+</ul>
+<h2 id="functions-overview">Functions Overview</h2>
+<h3 id="data-acquisition-and-processing">Data Acquisition and
+Processing</h3>
+<ul>
+<li><code>get_compile_data</code> - Fetches data from the database
+specified by the database path into a structured data frame for
+analysis.</li>
+<li><code>get_bw_score</code> - Calculates body weight (BW) z-scores for
+each animal.</li>
+<li><code>get_livertobw_zscore</code> - Computes liver-to-body weight
+z-scores.</li>
+<li><code>get_lb_score</code> - Calculates z-scores for laboratory test
+(LB) results.</li>
+<li><code>get_mi_score</code> - Computes z-scores for microscopic
+findings (MI).</li>
+<li><code>get_liver_om_lb_mi_tox_score_list</code> - Combines z-scores
+of LB, MI, and liver-to-BW into a single data frame.</li>
+<li><code>get_col_harmonized_scores_df</code> - Harmonizes column names
+across studies.</li>
+</ul>
+<h3 id="machine-learning-preparation-and-modeling">Machine Learning
+Preparation and Modeling</h3>
+<ul>
+<li><code>get_ml_data_and_tuned_hyperparameters</code> - Prepares data
+and tunes hyperparameters for machine learning.</li>
+<li><code>get_rf_model_with_cv</code> - Builds a random forest model
+with cross-validation and outputs performance metrics.</li>
+<li><code>get_zone_exclusioned_rf_model_with_cv</code> - Introduces an
+indeterminate zone for improved classification accuracy.</li>
+<li><code>get_imp_features_from_rf_model_with_cv</code> - Computes
+feature importance for model interpretation.</li>
+<li><code>get_auc_curve_with_rf_model</code> - Generates AUC curves to
+evaluate model performance.</li>
+</ul>
+<h3 id="visualization-and-reporting">Visualization and Reporting</h3>
+<ul>
+<li><code>get_histogram_barplot</code> - Creates bar plots for target
+variable classes.</li>
+<li><code>get_reprtree_from_rf_model</code> - Builds representative
+decision trees for interpretability.</li>
+<li><code>get_prediction_plot</code> - Visualizes prediction
+probabilities with histograms.</li>
+</ul>
+<h3 id="automated-pipelines">Automated Pipelines</h3>
+<ul>
+<li><code>get_Data_formatted_for_ml_and_best.m</code> - Formats data for
+machine learning pipelines.</li>
+<li><code>get_rf_input_param_list_output_cv_imp</code> - Automates
+preprocessing, modeling, and evaluation in one step.</li>
+<li><code>get_zone_exclusioned_rf_model_cv_imp</code> - Similar to the
+above function, but excludes uncertain predictions based on
+thresholds.</li>
+</ul>
+<h2 id="workflow">Workflow</h2>
+<ol style="list-style-type: decimal">
+<li><strong>Input Database Path</strong>: Provide the database path
+containing nonclinical study results for each STUDYID.</li>
+<li><strong>Preprocessing</strong>: Use functions 1-8 to clean,
+harmonize, and prepare data.</li>
+<li><strong>Model Building</strong>: Employ machine learning functions
+(9-18) for training, validation, and evaluation.</li>
+<li><strong>Visualization</strong>: Generate plots and performance
+metrics for better interpretation.</li>
+</ol>
+<h2 id="dependencies">Dependencies</h2>
+<ul>
+<li><code>randomForest</code></li>
+<li><code>ROCR</code></li>
+<li><code>ggplot2</code></li>
+<li><code>reprtree</code></li>
+</ul>
 <h2 id="installation">Installation</h2>
-<p>You can install the development version of SENDQSAR from <a href="https://github.com/">GitHub</a> with:</p>
-<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" tabindex="-1"></a><span class="co"># install.packages(&quot;pak&quot;)</span></span>
-<span id="cb1-2"><a href="#cb1-2" tabindex="-1"></a>pak<span class="sc">::</span><span class="fu">pak</span>(<span class="st">&quot;aminuldu07/SENDQSAR&quot;</span>)</span></code></pre></div>
-<h2 id="example">Example</h2>
-<p>This is a basic example which shows you how to solve a common
-problem:</p>
+<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" tabindex="-1"></a><span class="co"># Install from GitHub</span></span>
+<span id="cb1-2"><a href="#cb1-2" tabindex="-1"></a>devtools<span class="sc">::</span><span class="fu">install_github</span>(<span class="st">&quot;aminuldu07/SENDQSAR&quot;</span>)</span></code></pre></div>
+<h2 id="examples">Examples</h2>
+<h3 id="example-1-basic-data-compilation">Example 1: Basic Data
+Compilation</h3>
 <div class="sourceCode" id="cb2"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb2-1"><a href="#cb2-1" tabindex="-1"></a><span class="fu">library</span>(SENDQSAR)</span>
-<span id="cb2-2"><a href="#cb2-2" tabindex="-1"></a><span class="do">## basic example code</span></span></code></pre></div>
-<p>What is special about using <code>README.Rmd</code> instead of just
-<code>README.md</code>? You can include R chunks like so:</p>
-<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" tabindex="-1"></a><span class="fu">summary</span>(cars)</span>
-<span id="cb3-2"><a href="#cb3-2" tabindex="-1"></a><span class="co">#&gt;      speed           dist       </span></span>
-<span id="cb3-3"><a href="#cb3-3" tabindex="-1"></a><span class="co">#&gt;  Min.   : 4.0   Min.   :  2.00  </span></span>
-<span id="cb3-4"><a href="#cb3-4" tabindex="-1"></a><span class="co">#&gt;  1st Qu.:12.0   1st Qu.: 26.00  </span></span>
-<span id="cb3-5"><a href="#cb3-5" tabindex="-1"></a><span class="co">#&gt;  Median :15.0   Median : 36.00  </span></span>
-<span id="cb3-6"><a href="#cb3-6" tabindex="-1"></a><span class="co">#&gt;  Mean   :15.4   Mean   : 42.98  </span></span>
-<span id="cb3-7"><a href="#cb3-7" tabindex="-1"></a><span class="co">#&gt;  3rd Qu.:19.0   3rd Qu.: 56.00  </span></span>
-<span id="cb3-8"><a href="#cb3-8" tabindex="-1"></a><span class="co">#&gt;  Max.   :25.0   Max.   :120.00</span></span></code></pre></div>
-<p>You’ll still need to render <code>README.Rmd</code> regularly, to
-keep <code>README.md</code> up-to-date.
-<code>devtools::build_readme()</code> is handy for this.</p>
-<p>You can also embed plots, for example:</p>
-<img role="img" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAqAAAAHgCAMAAABNUi8GAAAAXVBMVEUAAAAAADoAAGYAOjoAOpAAZrY6AAA6ADo6AGY6Ojo6kNtmAABmADpmkJBmtrZmtv+QOgCQkGaQtpCQ2/+2ZgC225C2///bkDrb/7bb////tmb/25D//7b//9v///93nJJvAAAACXBIWXMAAA7DAAAOwwHHb6hkAAAN6UlEQVR4nO3djXbqurlGYSUNpIXVHVpo3DjA/V/m9j8miR1bn2Re0HzGOGvk7IDsFWaNrbAsdwaEuVvvADCGQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCEtcKAOmORWgYYdDo8qUqCnXZ3/83uQ4ZCsOIFmblN/kbdfmIZDuqIEetp1WWYvH+bhkLAogR63b+2X+cCbPIFiEo6gkBbrHLQ5hHIOCptIV/HHbX0VP3D8JFB89+OkJ/OgEOHOP2VBoNDgen9++8+Tnz8VE/WYadFAmajHXEsGyjQT5lvwHHR4on7+h1SQjOWu4jmCIhQm6iGNiXpIYx4U0ggU0iIFenBuVU3Wvw08gEAxSaSLpOLc8+BW5bkoF0mwiDfNlD/tz0wzwSjeRH09Rc8n6mHCERTSIp+D9n6nZBgO6eIqHtKYB4U0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoU0AoW0SIFWC8kWBlaLJ1BMFCfQzDWrcOeO5bhhESXQ3irx5crx1uGQsCiBHrfdMvH5wJs8gWISjqCQFusctDmEcg4Km0hX8cdtfRU/cPwkUEzEPCikESikMVEPaUzUQxrTTJC28ES968wYDgnjCAppTNRDGhP1kMY8KKQRKKRFCvRQXLx/rp172gcZDsmKE2jV5+v+asLJMBzSFWketLh2P6zKL5lmgkm0ifpmLpRP1MMk0lt8cfTMOILCLk6gx+3ze3UIzYeukggUk8SaZsrrifpVoOHwGOZ/CIN5UCzHnWe/8gSKxbjen/OeE/KBNxkO94BAIY1AoY1zUGjjKh4PhkAhjUAhjUAhjUAhjUAhjUAhjUAhjUAhjUAhjUAhjUAhjUAhjUAhjUAhjUAhjUAhjUAhjUAhjUAhzSfQ49a9fBwGVkcIvV2kzSPQ/GmfvXxU9wBdYLtI2/xAy7vWlTdVzAbX4Qy6XaRtfqDl3WnLQIduTRt4u0ib/xH0MLgGUtDtIm3e56DdYnKRt4u0eV7FjywwE3i7SBvzoJDmdw664HaRNr+r+AW3i7T5XCSZ5pfmbhdp8zmC1gt4OOZBER8XSZBGoJDGWzykeR9BjZdKBIpJ/N/iD4PLHAbdLtLmHyifZsIC/APl86BYgHegx+3YW/xp98uFFIFiEv+r+LGPg2au+X197gZ+cU+gmCTKPGjv8yTZQMcEikmiBNr7PMnQpRSBYhKPQMt/z5mNvsVzBEUoHoEeXj4+16vRedDu34NwDgobv8+D5kWAo/Ogv15JESgm8Qv0UMTJPCgW4PMWvzpuyzuL8KtOxOd1keSe9qfdaJ9M1COMOJ8HZaIegTBRD2lR5kGHJ+pdZ95uIlVR5kE5giKUOPOgTNQjkEjzoEzUIwzmQSEt0jxouO3i3oS9Ao767+I/14O3cSLQR+XOQV/dSJ8HvUwn8XnQtLjen+HGm/fA35ehaS7eOYKmRyDQKcvQlJdRBJqi2wc6cRmaw9OeQFN083PQqcvQZG5DoCm69VX85GVoPtf/IFAYeZ+DTliG5rQbfgyBYhLPq3iWocEyuIEtpPmdgy64XaTN7yp+we0ibT4XSSxDg8Vwj3pI4yIJ0ggU0rznQX/7PVKo7SJtfr9JKv7MbDP1BIpJfOZB62mmX38XH2a7SJvHVfyf+tDJMjRYgP8RdOiWDIG3i7T5nINWn1Ga8GmmINtF2gwT9aa5egLFJMyDQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBQhqBYr6wK82Mbyr4A28yHJYUeK2uX7cV9oE3GQ4Lcr0/F9pYyAfeZDgsiEAhjUChjXNQaOMqHqhFCvS0++X2dwSKSeIEmrlmucTcDaybSKCYJEqgveU8h+7DTKCYJEqgveU8h+5kT6CYhCMopMU6B20OoZyDwibSVXx7I/vBlUAIFJMwDwppBAppTNRDGhP1kMY0E6QtPFF/WaRuxnBIGEdQSGOiHtKYqIc05kEhjUAhLVKgWfH2Xp2GZnzcDhaRLpKe9sVp6OpMoDCKOM102hWXSAQKk6gT9YeXDwKFSdyJ+sOKQGES6Ry0yfK4Hfo8E4FikmhX8fWb/GlHoHdK5PMSzIPiR0vef2kMgeIni97BbgyB4icEGnY4BEagYYdDaJyDQhtX8cDvCBTSCBTSCBTSCBTSCBTSCBTSCBTSCBTSCBTSCBTSCBTSCBTSCBTSCDRhIp+oG0Wg6VL5TPIoAk2WzL/qGEWgySLQBYfDfAS64HDwwDnocsPBB1fxiw2HR0WgkEagkEagkEagkEagkEagD+4eppLGEOhju4vJ+DEE+tDu49eZYwj0oRFo9O3CgkCjbxcmnIPG3i5suIqPvF2kjUAhjUAfwL2/jY8h0Pt39xdCYwj07t3/VNIYAr17BDrvgTcZLmUEOu+BNxnu4Y1dB3EOOuuBNxnu0Y03yFX8nAfeZLgH99jv4qMIVMfwgZBAAz7wJsM9gpG3cQIN+MCbDHcvRs4WRyN86OugUZECPe1c5fl9xnCjV6pj1wG+T4wy6G8R+r2NP/J10Kg4gWZuU3+Rt19MGO63V8/7ZV90UO936oTfxsdECfS067LMXj4mDjf2Anm/sosP6v/EhN/Gx0QJ9Lh9a7/Mr9/kXWdgAwkHmu7b+BiOoGEH5VonsFjnoM0hlHPQL9/kIDlTpKv447Z+Ix84fqZ6FY/ZIgW69HB4VAQKaQQKaQQKaQQKaQQKaQQKaQQKaTcLFJjkRoHG3AbDJDZM8LEib4NhEhsm+FiRt8EwiQ0TfKzI22CYxIYJPlbkbTBMYsMEHyvyNhgmsWGCjxV5GwyT2DDBx4q8DYZJbJjgY0XeBsMkNkzwsYDgCBTSCBTSCBTSCBTSCBTSCBTSCBTSCBTSCBTSCBTSCBTSCBTSogeaO/e0N41Q35R0ZRvr85/vV3vjOVQ9jG2PqqVSNua96YYx/nyybzthGibIq9UTO9C82Mfctp+fr3vzWMdtdVP9bgTPoZphTHt02hVPyMqX0LQ3l2FsP5+s+Btd74RtmBCvVl/kQOvb2h9WljHaBRsMY+X10k7dCJ5DNcPY9uhzXd5HvXhBbXvTDWPbm+N2Uz5zZfzZdMOEeLWuRA708mP0l62sY+VuU/3cuhH8hmqHCbBH5bHFuDftMAH2piwrwN5UgQb42VyJHWh1wM9Nu3n4V32yZRqrDrQdwXuo+ikB9ujQ2wnbMAH2JisyD7A35TBhXq2eyIHWJyGmU5Hjtlyy4bCxjVX9pLoRvIeqhgmwR+UyKfa9qYYx701eBWXem3qYMK9Wj36gzUDP7zKB2vcob6+RjHvjulM808/ntHv5CPA/l3KYAHtz7Q7e4uuB1m86b/HWPaqXmTLvTX+1KtvPpzwjDnDC0eVofbV67uAiqR7odW8aK8RF0vk6UO89atbkte5N1l9NzfbzKYMKcMlWP9W8N1f0p5nqv2nem5bxkYeYZrrq3HuP2vX6jHvTDmPbm+/Ptg0T5tXquYOJ+uovedjYxsqDTNS3V/GWPfpctwc+095chrH9fA7FeWMVk+1n0w0T5NXqif6rzsz+C6+Dc/XRwjBW897cjeA5VDOMZY+y+g7D5dMse9Mbxvbz+f5s2zAhXq0ePiwCaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQQKaQTqLX9b/pnpIVBfx61vZv7PTBCB+iLQRRCop8+1cy8f1V0QyzUt1n9tiy/K//h2/nz977pZS6759vHPf8oVluo7E1bP/H8ZaVFq852sXXwOXxCor+o4WC3gst4U/1et+1amVt5luMgwd71vN2tfrOr/Uj7z2AZafad9HL4hUF91ZmVU1Z2vN81Nj6vbvZf/NXv56L5dfXH8s6/vlX0d6OZ8Gea2fyNJBOqrTKy+xXW7BEH7R3Of9nZFl+L/7c468/I9/jrQt/NlmNv9bWQRqK8q0Po+3O5LoK/tkkPtt+tAixPN5/99O4K+nS/D3PrvJIhAfV2OoOdLm18Cbb9dZdg9ZuAIip8QqK8us9KXt/hqBZbyHPTt8thmsbvuLX5zefNn3mkYgfqqEqvXT21WLr4EWq7A0l7Fl9/ujqDHbbW65qZaNvC06978m8fd+u8kiEC9Hdp50OLI+CXQf68vS7GU3+7OQZ/2ZYblM4tU3V/b9vKpeRy+IdDwuBwPiEDDI9CACDQ8Ag2IQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCGNQCHtb/5QAr6+cJD7AAAAAElFTkSuQmCC" width="100%" />
-
-<p>In that case, don’t forget to commit and push the resulting figure
-files, so they display on GitHub and CRAN.</p>
+<span id="cb2-2"><a href="#cb2-2" tabindex="-1"></a>data <span class="ot">&lt;-</span> <span class="fu">get_compile_data</span>(<span class="st">&quot;/path/to/database&quot;</span>)</span></code></pre></div>
+<h3 id="example-2-z-score-calculation">Example 2: Z-Score
+Calculation</h3>
+<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" tabindex="-1"></a>bw_scores <span class="ot">&lt;-</span> <span class="fu">get_bw_score</span>(data)</span>
+<span id="cb3-2"><a href="#cb3-2" tabindex="-1"></a>liver_scores <span class="ot">&lt;-</span> <span class="fu">get_livertobw_zscore</span>(data)</span></code></pre></div>
+<h3 id="example-3-machine-learning-model">Example 3: Machine Learning
+Model</h3>
+<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb4-1"><a href="#cb4-1" tabindex="-1"></a>model <span class="ot">&lt;-</span> <span class="fu">get_rf_model_with_cv</span>(data, <span class="at">n_repeats=</span><span class="dv">10</span>)</span>
+<span id="cb4-2"><a href="#cb4-2" tabindex="-1"></a><span class="fu">print</span>(model<span class="sc">$</span>confusion_matrix)</span></code></pre></div>
+<h3 id="example-4-visualization">Example 4: Visualization</h3>
+<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" tabindex="-1"></a><span class="fu">get_histogram_barplot</span>(data, <span class="at">target_col=</span><span class="st">&quot;target_variable&quot;</span>)</span></code></pre></div>
+<h2 id="contribution">Contribution</h2>
+<p>Contributions are welcome! Feel free to submit issues or pull
+requests via GitHub.</p>
+<h2 id="license">License</h2>
+<p>This project is licensed under the MIT License - see the LICENSE file
+for details.</p>
+<h2 id="contact">Contact</h2>
+<p>For more information, visit the project GitHub Page or contact <a href="mailto:email@example.com">email@example.com</a>.</p>
 
 </body>
 </html>
diff --git a/README.md b/README.md
index 36f0f00..ffc7433 100644
--- a/README.md
+++ b/README.md
@@ -1,52 +1,139 @@
+SENDQSAR
+================
+
+# SENDQSAR: QSAR Modeling with SEND Database
+
+## About
+
+This package facilitates developing Quantitative Structure-Activity
+Relationship (QSAR) models using the SEND database. It streamlines data
+acquisition, preprocessing, descriptor calculation, and model
+evaluation, enabling researchers to efficiently explore molecular
+descriptors and create robust predictive models.
+
+## Features
+
+- **Automated Data Processing**: Simplifies data acquisition and
+  preprocessing steps.
+- **Comprehensive Analysis**: Provides z-score calculations for various
+  parameters such as body weight, liver-to-body weight ratio, and
+  laboratory tests.
+- **Machine Learning Integration**: Supports classification modeling,
+  hyperparameter tuning, and performance evaluation.
+- **Visualization Tools**: Includes histograms, bar plots, and AUC
+  curves for better data interpretation.
+
+## Functions Overview
+
+### Data Acquisition and Processing
+
+- `get_compile_data` - Fetches data from the database specified by the
+  database path into a structured data frame for analysis.
+- `get_bw_score` - Calculates body weight (BW) z-scores for each animal.
+- `get_livertobw_zscore` - Computes liver-to-body weight z-scores.
+- `get_lb_score` - Calculates z-scores for laboratory test (LB) results.
+- `get_mi_score` - Computes z-scores for microscopic findings (MI).
+- `get_liver_om_lb_mi_tox_score_list` - Combines z-scores of LB, MI, and
+  liver-to-BW into a single data frame.
+- `get_col_harmonized_scores_df` - Harmonizes column names across
+  studies.
+
+### Machine Learning Preparation and Modeling
+
+- `get_ml_data_and_tuned_hyperparameters` - Prepares data and tunes
+  hyperparameters for machine learning.
+- `get_rf_model_with_cv` - Builds a random forest model with
+  cross-validation and outputs performance metrics.
+- `get_zone_exclusioned_rf_model_with_cv` - Introduces an indeterminate
+  zone for improved classification accuracy.
+- `get_imp_features_from_rf_model_with_cv` - Computes feature importance
+  for model interpretation.
+- `get_auc_curve_with_rf_model` - Generates AUC curves to evaluate model
+  performance.
+
+### Visualization and Reporting
+
+- `get_histogram_barplot` - Creates bar plots for target variable
+  classes.
+- `get_reprtree_from_rf_model` - Builds representative decision trees
+  for interpretability.
+- `get_prediction_plot` - Visualizes prediction probabilities with
+  histograms.
+
+### Automated Pipelines
+
+- `get_Data_formatted_for_ml_and_best.m` - Formats data for machine
+  learning pipelines.
+- `get_rf_input_param_list_output_cv_imp` - Automates preprocessing,
+  modeling, and evaluation in one step.
+- `get_zone_exclusioned_rf_model_cv_imp` - Similar to the above
+  function, but excludes uncertain predictions based on thresholds.
+
+## Workflow
+
+1.  **Input Database Path**: Provide the database path containing
+    nonclinical study results for each STUDYID.
+2.  **Preprocessing**: Use functions 1-8 to clean, harmonize, and
+    prepare data.
+3.  **Model Building**: Employ machine learning functions (9-18) for
+    training, validation, and evaluation.
+4.  **Visualization**: Generate plots and performance metrics for better
+    interpretation.
+
+## Dependencies
+
+- `randomForest`
+- `ROCR`
+- `ggplot2`
+- `reprtree`
 
-<!-- README.md is generated from README.Rmd. Please edit that file -->
+## Installation
 
-# SENDQSAR
+``` r
+# Install from GitHub
+devtools::install_github("aminuldu07/SENDQSAR")
+```
 
-<!-- badges: start -->
-<!-- badges: end -->
+## Examples
 
-The goal of SENDQSAR is to …
+### Example 1: Basic Data Compilation
 
-## Installation
+``` r
+library(SENDQSAR)
+data <- get_compile_data("/path/to/database")
+```
 
-You can install the development version of SENDQSAR from
-[GitHub](https://github.com/) with:
+### Example 2: Z-Score Calculation
 
 ``` r
-# install.packages("pak")
-pak::pak("aminuldu07/SENDQSAR")
+bw_scores <- get_bw_score(data)
+liver_scores <- get_livertobw_zscore(data)
 ```
 
-## Example
-
-This is a basic example which shows you how to solve a common problem:
+### Example 3: Machine Learning Model
 
 ``` r
-library(SENDQSAR)
-## basic example code
+model <- get_rf_model_with_cv(data, n_repeats=10)
+print(model$confusion_matrix)
 ```
 
-What is special about using `README.Rmd` instead of just `README.md`?
-You can include R chunks like so:
+### Example 4: Visualization
 
 ``` r
-summary(cars)
-#>      speed           dist       
-#>  Min.   : 4.0   Min.   :  2.00  
-#>  1st Qu.:12.0   1st Qu.: 26.00  
-#>  Median :15.0   Median : 36.00  
-#>  Mean   :15.4   Mean   : 42.98  
-#>  3rd Qu.:19.0   3rd Qu.: 56.00  
-#>  Max.   :25.0   Max.   :120.00
+get_histogram_barplot(data, target_col="target_variable")
 ```
 
-You’ll still need to render `README.Rmd` regularly, to keep `README.md`
-up-to-date. `devtools::build_readme()` is handy for this.
+## Contribution
+
+Contributions are welcome! Feel free to submit issues or pull requests
+via GitHub.
+
+## License
 
-You can also embed plots, for example:
+This project is licensed under the MIT License - see the LICENSE file
+for details.
 
-<img src="man/figures/README-pressure-1.png" width="100%" />
+## Contact
 
-In that case, don’t forget to commit and push the resulting figure
-files, so they display on GitHub and CRAN.
+For more information, visit the project GitHub Page or contact
+<email@example.com>.