The run_analysis.R script contains all code needed to process the source file. It assumes that https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip has been expanded and the the content files in the "UCI HAR Dataset" folder have been copied into the same folder as run_analysis.R. To generate the results.txt output, run run_analysis in the folder that contains the contents of the zipped folder "UCI HAR Dataset".
This analysis assumes that we are interested in readings that end in a mean() or std(), as the projects asks to measure the mean and standard deviation of the readings.
The data is processed as follows:
- Load source data.
- Build a list of columns we're interested in by finding all columns that contain std() or mean().
- Rename the columns on the data to change from V1, V2... to the actual variable name.
- Create indices for all of the data so that we can merege it later.
- Reshape the data so that there is one column that contains the variable name and another with the value, to facilitate later calculations on the data.
- Merge the data with the subject ids and the activity names.
- Combine the test and train data.
- Remove unneeded columns.
- Calculate the average values.
- Write out the results.