-
Notifications
You must be signed in to change notification settings - Fork 0
Add implementation of standard deviation on data #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,17 +0,0 @@ | ||
| # Introduction | ||
|
|
||
| This is a template software project repository used by the [Intermediate Research Software Development Skills In Python](https://github.com/carpentries-incubator/python-intermediate-development). | ||
|
|
||
| ## Purpose | ||
|
|
||
| This repository is intended to be used as a code template which is copied by learners at [Intermediate Research Software Development Skills In Python](https://github.com/carpentries-incubator/python-intermediate-development) course. | ||
| This can be done using the `Use this template` button towards the top right of this repo's GitHub page. | ||
|
|
||
| This software project is not finished, is currently failing to run and contains some code style issues. It is used as a starting point for the course - issues will be fixed and code will be added in a number of places during the course by learners in their own copies of the repository, as course topics are introduced. | ||
|
|
||
| ## Tests | ||
|
|
||
| Several tests have been implemented already, some of which are currently failing. | ||
| These failing tests set out the requirements for the additional code to be implemented during the workshop. | ||
|
|
||
| The tests should be run using `pytest`, which will be introduced during the workshop. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| Metadata-Version: 2.1 | ||
| Name: inflammation | ||
| Version: 1.0.0 | ||
| Summary: | ||
| License: MIT | ||
| Author: Yoyo Yiu | ||
| Author-email: yoyo.yiu@ukaea.uk | ||
| Requires-Python: >=3.10,<4.0 | ||
| Classifier: License :: OSI Approved :: MIT License | ||
| Classifier: Programming Language :: Python :: 3 | ||
| Classifier: Programming Language :: Python :: 3.10 | ||
| Classifier: Programming Language :: Python :: 3.11 | ||
| Classifier: Programming Language :: Python :: 3.12 | ||
| Requires-Dist: matplotlib (>=3.9.0,<4.0.0) | ||
| Requires-Dist: numpy (>=2.0.0,<3.0.0) | ||
| Description-Content-Type: text/markdown | ||
|
|
||
|
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| inflammation/README.md,sha256=__27PY24jF0qIk22pnAuDj5M1lP_x_VpwbTpvaQj3SA,964 | ||
| inflammation/__init__.py,sha256=sB3D60ULiwjsYC5d4NjvSkmcAb-SI3igDBBQRFaXjCU,71 | ||
| inflammation/compute_data.py,sha256=bCq5h-dqcfaHi_coxo4OOL3iDnL-v9pn9wT_ZQp5a4I,984 | ||
| inflammation/models.py,sha256=SGaFY8OIzFqir-RT6JLQ_H20YRSiWB9-OTnFCR9sOT0,1652 | ||
| inflammation/views.py,sha256=D7B8J9JjkA4UN1_2yHURslhQGzs7kVkUJOd0YJQXoBY,649 | ||
| inflammation-1.0.0.dist-info/METADATA,sha256=U3VNEND10cG_GNFH8Sov5fU8VsWm8CDAB1_n2lbm1CE,532 | ||
| inflammation-1.0.0.dist-info/WHEEL,sha256=sP946D7jFCHeNz5Iq4fL4Lu-PrWrFsgfLXbbkciIZwg,88 | ||
| inflammation-1.0.0.dist-info/RECORD,, |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| Wheel-Version: 1.0 | ||
| Generator: poetry-core 1.9.0 | ||
| Root-Is-Purelib: true | ||
| Tag: py3-none-any |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| # Inflam | ||
| Inflam is a data management system written in Python that manages trial data used in clinical inflammation studies. | ||
|
|
||
| ## Main features | ||
| Here are some key features of Inflam: | ||
|
|
||
| - Provide basic statistical analyses over clinical trial data | ||
| - Ability to work on trial data in Comma-Separated Value (CSV) format | ||
| - Generate plots of trial data | ||
| - Analytical functions and views can be easily extended based on its Model-View-Controller architecture | ||
|
|
||
| ## Prerequisites | ||
| Inflam requires the following Python packages: | ||
|
|
||
| - [NumPy](https://www.numpy.org/) - makes use of NumPy's statistical functions | ||
| - [Matplotlib](https://matplotlib.org/stable/index.html) - uses Matplotlib to generate statistical plots | ||
|
|
||
| The following optional packages are required to run Inflam's unit tests: | ||
|
|
||
| - [pytest](https://docs.pytest.org/en/stable/) - Inflam's unit tests are written using pytest | ||
| - [pytest-cov](https://pypi.org/project/pytest-cov/) - Adds test coverage stats to unit testing |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| """Package containing the bulk of code for the patient data system.""" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| """Module containing mechanism for calculating standard deviation between datasets. | ||
| """ | ||
|
|
||
| import glob | ||
| import os | ||
| import numpy as np | ||
|
|
||
| from inflammation import models, views | ||
|
|
||
|
|
||
| def analyse_data(data_dir): | ||
| """Calculate the standard deviation by day between datasets | ||
|
|
||
| Gets all the inflammation csvs within a directory, works out the mean | ||
| inflammation value for each day across all datasets, then graphs the | ||
| standard deviation of these means.""" | ||
| data_file_paths = glob.glob(os.path.join(data_dir, 'inflammation*.csv')) | ||
| if len(data_file_paths) == 0: | ||
| raise ValueError(f"No inflammation csv's found in path {data_dir}") | ||
| data = map(models.load_csv, data_file_paths) | ||
|
|
||
|
|
||
| means_by_day = map(models.daily_mean, data) | ||
| means_by_day_matrix = np.stack(list(means_by_day)) | ||
|
|
||
| daily_standard_deviation = np.std(means_by_day_matrix, axis=0) | ||
|
|
||
| graph_data = { | ||
| 'standard deviation by day': daily_standard_deviation, | ||
| } | ||
| views.visualize(graph_data) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| """Module containing models representing patients and their data. | ||
|
|
||
| The Model layer is responsible for the 'business logic' part of the software. | ||
|
|
||
| Patients' data is held in an inflammation table (2D array) where each row contains | ||
| inflammation data for a single patient taken over a number of days | ||
| and each column represents a single day across all patients. | ||
| """ | ||
|
|
||
| import json | ||
| import numpy as np | ||
|
|
||
|
|
||
| def load_csv(filename): | ||
| """Load a Numpy array from a CSV | ||
|
|
||
| :param filename: Filename of CSV to load | ||
| """ | ||
| return np.loadtxt(fname=filename, delimiter=',') | ||
|
|
||
| def load_json(filename): | ||
| """Load a numpy array from a JSON document. | ||
|
|
||
| Expected format: | ||
| [ | ||
| { | ||
| observations: [0, 1] | ||
| }, | ||
| { | ||
| observations: [0, 2] | ||
| } | ||
| ] | ||
|
|
||
| :param filename: Filename of CSV to load | ||
|
|
||
| """ | ||
| with open(filename, 'r', encoding='utf-8') as file: | ||
| data_as_json = json.load(file) | ||
| return [np.array(entry['observations']) for entry in data_as_json] | ||
|
|
||
|
|
||
|
|
||
| def daily_mean(data): | ||
| """Calculate the daily mean of a 2d inflammation data array.""" | ||
| return np.mean(data, axis=0) | ||
|
|
||
|
|
||
| def daily_max(data): | ||
| """Calculate the daily max of a 2d inflammation data array.""" | ||
| return np.max(data, axis=0) | ||
|
|
||
|
|
||
| def daily_min(data): | ||
| """Calculate the daily min of a 2d inflammation data array.""" | ||
| return np.min(data, axis=0) | ||
|
|
||
|
|
||
| def s_dev(data): | ||
| """Computes and returns standard deviation for data.""" | ||
| mmm = np.mean(data, axis=0) | ||
| devs = [] | ||
| for entry in data: | ||
| devs.append((entry - mmm) * (entry - mmm)) | ||
|
|
||
| s_dev2 = sum(devs) / len(data) | ||
| return {'standard deviation': s_dev2} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| """Module containing code for plotting inflammation data.""" | ||
|
|
||
| from matplotlib import pyplot as plt | ||
| import numpy as np | ||
|
|
||
|
|
||
| def visualize(data_dict): | ||
| """Display plots of basic statistical properties of the inflammation data. | ||
|
|
||
| :param data_dict: Dictionary of name -> data to plot | ||
| """ | ||
| # TODO(lesson-design) Extend to allow saving figure to file | ||
|
|
||
| num_plots = len(data_dict) | ||
| fig = plt.figure(figsize=((3 * num_plots) + 1, 3.0)) | ||
|
|
||
| for i, (name, data) in enumerate(data_dict.items()): | ||
| axes = fig.add_subplot(1, num_plots, i + 1) | ||
|
|
||
| axes.set_ylabel(name) | ||
| axes.plot(data) | ||
|
|
||
| fig.tight_layout() | ||
|
|
||
| plt.show() |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -27,7 +27,8 @@ def main(args): | |
| for filename in InFiles: | ||
| inflammation_data = models.load_csv(filename) | ||
|
|
||
| view_data = {'average': models.daily_mean(inflammation_data), 'max': models.daily_max(inflammation_data), 'min': models.daily_min(inflammation_data)} | ||
| view_data = {'average': models.daily_mean(inflammation_data), 'max': models.daily_max(inflammation_data), 'min': models.daily_min(inflammation_data), **(models.s_dev(inflammation_data))} | ||
|
|
||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi the code looks good for deployment , you can merge it with the main repository
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good. Will go ahead |
||
| views.visualize(view_data) | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| # Inflam | ||
| Inflam is a data management system written in Python that manages trial data used in clinical inflammation studies. | ||
|
|
||
| ## Main features | ||
| Here are some key features of Inflam: | ||
|
|
||
| - Provide basic statistical analyses over clinical trial data | ||
| - Ability to work on trial data in Comma-Separated Value (CSV) format | ||
| - Generate plots of trial data | ||
| - Analytical functions and views can be easily extended based on its Model-View-Controller architecture | ||
|
|
||
| ## Prerequisites | ||
| Inflam requires the following Python packages: | ||
|
|
||
| - [NumPy](https://www.numpy.org/) - makes use of NumPy's statistical functions | ||
| - [Matplotlib](https://matplotlib.org/stable/index.html) - uses Matplotlib to generate statistical plots | ||
|
|
||
| The following optional packages are required to run Inflam's unit tests: | ||
|
|
||
| - [pytest](https://docs.pytest.org/en/stable/) - Inflam's unit tests are written using pytest | ||
| - [pytest-cov](https://pypi.org/project/pytest-cov/) - Adds test coverage stats to unit testing |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -54,3 +54,13 @@ def daily_min(data): | |
| """Calculate the daily min of a 2d inflammation data array.""" | ||
| return np.min(data, axis=0) | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks Good |
||
|
|
||
| def s_dev(data): | ||
| """Computes and returns standard deviation for data.""" | ||
| mmm = np.mean(data, axis=0) | ||
| devs = [] | ||
| for entry in data: | ||
| devs.append((entry - mmm) * (entry - mmm)) | ||
|
|
||
| s_dev2 = sum(devs) / len(data) | ||
| return {'standard deviation': s_dev2} | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Standard Deviation for one patient with multiple observations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great help!