Course Project for the Getting and Cleaning Data course @ Coursera
When the script is sourced, it checks if the required packages are installed and tries to install them if no installation can be found. It then displays instructions on how to run the script
source('./run_analysis.R')
run_analysis()
The script checks if there is a directory called UCI HAR Dataset
in the working directory. If it finds one, this directory is assumed to contain the unzipped files from the Samsung data. It the directory is not found, the script checks for a file called dataset.zip
that is assumed to contain the Samsung data and will be unzipped and used. It this file os not found, the scripts downloads the data.
- When sourced, the scripts checks for the needed packages and tries to install missing ones
- Calling run_analysis() starts the processing
- The script checks if all the needed data is present
- The Script checks for several key files in the
UCI HAR Dataset
subdirectory - If they are not found, the script checks for a file called
dataset.zip
in the working directory - If this file is not there, it will be downloaded
- The Script checks for several key files in the
- The actual processing starts for training and test data
- The feature labes are loaded from
features.txt
- The features containing std or mean values are selected using
grepl
- The activity labels are loadad from
activity_labels.txt
- The data is loaded
- The feature vector is filtered using the selected features
- The feature vector is labeled according to the selected features
- Activity and subject data is added to the feature vector
- The feature labes are loaded from
- The processed training and test datasets are merged using
rbind
and converted to a data.table - The mean is calculated for each feature per activity and subject
- The column names are cleaned and reapplied to the tidy dataset
- Both raw and tidy datasets are written to disc,
- The script checks if all the needed data is present