Skip to content

Commit 78a700a

Browse files
committed
Merge pull request #16 from fgaim/2025-upgrade
Add real-time auto-refresh of dashboard, enhancements, and upgrade dependencies.
2 parents 09ab0a2 + a54b678 commit 78a700a

File tree

16 files changed

+594
-315
lines changed

16 files changed

+594
-315
lines changed

.circleci/config.yml

Lines changed: 8 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ jobs:
77
build:
88
docker:
99
# specify the version you desire here
10-
# use `-browsers` prefix for selenium tests, e.g. `3.6.1-browsers`
11-
- image: circleci/python:3.6.1
10+
# use `-browsers` prefix for selenium tests, e.g. `3.9-browsers`
11+
- image: cimg/python:3.9
1212

1313
# Specify service dependencies here if necessary
1414
# CircleCI maintains a library of pre-built images
@@ -18,14 +18,10 @@ jobs:
1818
working_directory: ~/repo
1919

2020
steps:
21-
- checkout
22-
23-
# Download and cache dependencies
24-
- restore_cache:
25-
keys:
26-
- v1-dependencies-{{ checksum "requirements.txt" }}
27-
# fallback to using the latest cache if no exact match is found
28-
- v1-dependencies-
21+
- run:
22+
name: Clone repo via HTTPS
23+
command: |
24+
git clone https://github.com/fgaim/gpuview.git ~/repo
2925
3026
- run:
3127
name: install dependencies
@@ -38,15 +34,14 @@ jobs:
3834
paths:
3935
- ./venv
4036
key: v1-dependencies-{{ checksum "requirements.txt" }}
41-
37+
4238
- run:
4339
name: run tests
4440
command: |
4541
. venv/bin/activate
46-
flake8 --exclude=venv* --statistics
42+
flake8 --exclude=venv* --max-line-length=120 --statistics
4743
pytest -v --cov=gpuview
4844
4945
- store_artifacts:
5046
path: test-reports
5147
destination: test-reports
52-

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@ tmp/
99
gpuhosts.db
1010
gpuview.log
1111
pypi.sh
12+
.DS_Store

README.md

Lines changed: 65 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,92 +1,106 @@
1-
gpuview
1+
# gpuview
2+
23
=======
34

45
[![LICENSE](https://img.shields.io/github/license/fgaim/gpuview.svg)](https://github.com/fgaim/gpuview/blob/master/LICENSE)
56
![GitHub issues](https://img.shields.io/github/issues/fgaim/gpuview.svg)
7+
[![Python Versions](https://img.shields.io/pypi/pyversions/gpuview.svg)](https://pypi.org/project/gpuview/)
68
[![PyPI](https://img.shields.io/pypi/v/gpuview.svg)](https://pypi.org/project/gpuview/)
79
[![CircleCI](https://circleci.com/gh/fgaim/gpuview.svg?style=shield)](https://circleci.com/gh/fgaim/gpuview)
810

9-
10-
GPU is an expensive resource, and deep learning practitioners have to monitor the
11-
health and usage of their GPUs, such as the temperature, memory, utilization, and the users.
12-
This can be done with tools like `nvidia-smi` and `gpustat` from the terminal or command-line.
13-
Often times, however, it is not convenient to `ssh` into servers to just check the GPU status.
14-
`gpuview` is meant to mitigate this by running a lightweight web dashboard on top of
11+
GPU is an expensive resource, and deep learning practitioners have to monitor the health and usage of their GPUs, such as the temperature, memory, utilization, and the users. This can be done with tools like `nvidia-smi` and `gpustat` from the terminal or command-line. Often times, however, it is not convenient to `ssh` into servers to just check the GPU status. `gpuview` is meant to mitigate this by running a lightweight web dashboard on top of
1512
[`gpustat`][repo_gpustat].
1613

17-
With `gpuview` one can monitor GPUs on the go, though a web browser. Moreover, **multiple GPU servers**
18-
can be registered into one `gpuview` dashboard and all stats are aggregated and accessible from one place.
19-
14+
With `gpuview` one can monitor GPUs on the go, through a web browser. Moreover, **multiple GPU servers** can be registered into one `gpuview` dashboard and all stats are aggregated and accessible from one place.
2015

21-
Thumbnail view of GPUs across multiple servers.
16+
The dashboard features **live auto-refresh** (every 3 seconds) and includes interactive tooltips, temperature-based color coding, and pause/resume controls for real-time GPU monitoring.
2217

23-
![Screenshot of gpuview](https://github.com/fgaim/gpuview/blob/master/imgs/dash-1.png)
18+
Dashboard view of nine GPUs across multiple servers:
2419

20+
![Screenshot of gpuview](imgs/dash-1.png)
2521

26-
Setup
27-
-----
22+
## Setup
2823

29-
Python is required,`gpuview` has been tested with both 2.7 and 3 versions.
24+
Python 3.9 or higher is required.
3025

3126
Install from [PyPI][pypi_gpuview]:
3227

33-
```
34-
$ pip install gpuview
28+
```sh
29+
pip install gpuview
3530
```
3631

3732
[or] Install directly from repo:
3833

39-
```
40-
$ pip install git+https://github.com/fgaim/gpuview.git@master
34+
```sh
35+
pip install git+https://github.com/fgaim/gpuview.git@master
4136
```
4237

43-
> `gpuview` installs the latest version of `gpustat` from `pypi`, therefore, its commands are available
38+
> `gpuview` installs the latest version of `gpustat` from `pypi`, therefore, its commands are available
4439
from the terminal.
4540

46-
47-
48-
Usage
49-
-----
41+
## Usage
5042

5143
`gpuview` can be used in two modes as a temporary process or as a background service.
5244

5345
### Run gpuview
46+
5447
Once `gpuview` is installed, it can be started as follows:
48+
49+
```sh
50+
gpuview run --safe-zone
5551
```
56-
$ gpuview run --safe-zone
52+
53+
This will start the dashboard at `http://0.0.0.0:9988`.
54+
55+
By default, `gpuview` runs at `0.0.0.0` and port `9988`, but these can be changed using `--host` and `--port`. The `safe-zone` option means report all details including usernames, but it can be turned off for security reasons.
56+
57+
For testing and development purposes, you can run gpuview with synthetic data:
58+
59+
```sh
60+
gpuview run --demo
5761
```
58-
This will start the dasboard at `http://0.0.0.0:9988`.
5962

63+
This displays fake GPU statistics and is useful when developing on systems without NVIDIA GPUs or when showcasing the dashboard.
64+
65+
## API Endpoints
6066

61-
By default, `gpuview` runs at `0.0.0.0` and port `9988`, but these can be changed using `--host` and `--port`. The `safe-zone` option means report all detials including usernames, but it can be turned off for security reasons.
67+
gpuview provides REST API endpoints for programmatic access:
68+
69+
* `GET /api/gpustat/self` - Returns GPU statistics for the main host
70+
* `GET /api/gpustat/all` - Returns aggregated GPU statistics for all registered hosts
71+
72+
**Legacy endpoints:**
73+
74+
* `GET /gpustat` - Returns GPU statistics for the local host (backward compatibility)
6275

6376
### Run as a Service
77+
6478
To permanently run `gpuview` it needs to be deployed as a background service.
6579
This will require a `sudo` privilege authentication.
6680
The following command needs to be executed only once:
6781

68-
```
69-
$ gpuview service [--safe-zone] [--exlude-self]
82+
```sh
83+
gpuview service [--safe-zone] [--exlude-self]
7084
```
7185

7286
If successful, the `gpuview` service is run immediately and will also autostart at boot time. It can be controlled using `supervisorctl start|stop|restart gpuview`.
7387

74-
7588
### Runtime options
7689

7790
There a few important options in `gpuview`, use `-h` to see them all.
7891

79-
```
80-
$ gpuview -h
92+
```sh
93+
gpuview -h
8194
```
8295

8396
* `run` : Start `gpuview` dashboard server
8497
* `--host` : URL or IP address of host (default: 0.0.0.0)
8598
* `--port` : Port number to listen to (default: 9988)
8699
* `--safe-zone` : Safe to report all details, eg. usernames
87100
* `--exclude-self` : Don't report to others but to self-dashboard
101+
* `--demo` : Run with fake data for testing purposes
88102
* `-d`, `--debug` : Run server in debug mode (for developers)
89-
* `add` : Add a GPU host to dashboard
103+
* `add` : Add a GPU host to the dashboard
90104
* `--url` : URL of host [IP:Port], eg. X.X.X.X:9988
91105
* `--name` : Optional readable name for the host, eg. Node101
92106
* `remove` : Remove a registered host from dashboard
@@ -100,56 +114,49 @@ $ gpuview -h
100114
* `-v`, `--version` : Print versions of `gpuview` and `gpustat`
101115
* `-h`, `--help` : Print help for command-line options
102116

103-
104117
### Monitoring multiple hosts
105118

106119
To aggregate the stats of multiple machines, they can be registered to one dashboard using their address and the port number running `gpustat`.
107120

108121
Register a host to monitor as follows:
109-
```
110-
$ gpuview add --url <ip:port> --name <name>
122+
123+
```sh
124+
gpuview add --url <ip:port> --name <name>
111125
```
112126

113127
Remove a registered host as follows:
114-
```
115-
$ gpuview remove --url <ip:port> --name <name>
116-
```
117128

118-
Display all registered hosts as follows:
119-
```
120-
$ gpuview hosts
129+
```sh
130+
gpuview remove --url <ip:port> --name <name>
121131
```
122132

123-
> Note: the `gpuview` service needs to run in all hosts that will be monitored.
133+
Display all registered hosts/nodes as follows:
124134

125-
> Tip: `gpuview` can be setup on a none GPU machine, such as laptops, to monitor remote GPU servers.
135+
```sh
136+
gpuview hosts
137+
```
126138

139+
The `gpuview` service needs to run in all hosts that will be monitored.
127140

128-
etc
129-
---
141+
> Tip: `gpuview` can be setup on a none GPU machine, such as laptops, to monitor remote GPU servers.
130142
131-
Helpful tips related to the underlying performance are available at the [`gpustat`][repo_gpustat] repo.
143+
## etc
132144

145+
Helpful tips related to the underlying performance are available at the [`gpustat`][repo_gpustat] repo.
133146

134147
For the sake of simplicity, `gpuview` does not have a user authentication in place. As a security measure,
135-
it does not report sensitive details such as user names by default. This can be changed if the service is
136-
running in a trusted network, using the `--safe-zone` option to report all details.
137-
148+
it does not report sensitive details such as user names by default. This can be changed if the service is
149+
running in a trusted network, using the `--safe-zone` option to report all details.
138150

139151
The `--exclude-self` option of the run command can be used to prevent other dashboards from getting stats of the current machine. This way the stats are shown only on the host's own dashboard.
140152

141-
142153
Detailed view of GPUs across multiple servers.
143154

144-
![Screenshot of gpuview](https://github.com/fgaim/gpuview/blob/master/imgs/dash-2.png)
145-
146-
147-
License
148-
-------
149-
150-
[MIT License](LICENSE)
155+
![Screenshot of gpuview](imgs/dash-2.png)
151156

157+
## License
152158

159+
`gpuview` is licensed under the [MIT License](LICENSE), which is a permissive open-source license that allows you to freely use, modify, and distribute this software.
153160

154161
[repo_gpustat]: https://github.com/wookayin/gpustat
155162
[pypi_gpuview]: https://pypi.python.org/pypi/gpuview

gpuview/__init__.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,11 @@
22
The gpuview module.
33
"""
44

5-
version_info = (0, 4, 0)
6-
__version__ = '.'.join(str(c) for c in version_info)
5+
version_info = (1, 0, 0)
6+
__version__ = ".".join(str(c) for c in version_info)
77

88

99
__all__ = (
10-
'version_info', '__version__',
10+
"version_info",
11+
"__version__",
1112
)

gpuview/__main__.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
"""
2-
2+
Entry point for the gpuview application.
33
"""
44

5-
6-
if __name__ == '__main__':
5+
if __name__ == "__main__":
76
from .app import main
7+
88
main()

0 commit comments

Comments
 (0)