Skip to content
This repository was archived by the owner on Jul 21, 2022. It is now read-only.

CT-525 added profile.py for local file system profiling#203

Open
lawschlosser wants to merge 3 commits intomasterfrom
feature/CT-525/filesystem-profiler
Open

CT-525 added profile.py for local file system profiling#203
lawschlosser wants to merge 3 commits intomasterfrom
feature/CT-525/filesystem-profiler

Conversation

@lawschlosser
Copy link
Contributor

@lawschlosser lawschlosser commented Jun 1, 2018

Usage

Profile files from a specific job

./profile.py job 00007

Profile files from a specific directory

./profile.py dirs ~

or multiple directories

./profile.py dirs ~ /tmp /opt

Write verbose results to csv

/profile.py job 00007 --csv_path /tmp/profile_00007.csv

Use multiple threads

/profile.py job 00007 --threads 4

Example Output

Summary console output

---------- SUMMARY -----------
DESCRIPTION    FILE COUNT                 SIZE         STAT TIME            WARMUP TIME          READ TIME            MD5 TIME             XXHASH TIME        
Averaged Time  1494618 files (5 skipped)  34536        000.000004182297247  000.000133777334901  000.000019915541863  000.000077454557157  000.000020146680994
Summed Time    1494618 files (5 skipped)  51618658052  006.250936746597290  199.946012735366821  029.766127347946167  115.764975309371948  030.111592054367065
Test Time      389.234282017                                                                                                                                  
------------------------------

Verbose CSV output

image

Option flags

Profile the performance of your file system by processing files from the
provided Conductor Job ID (jid)

positional arguments:
  jid                   The jid (job id) for a job whose files to target

optional arguments:
  -h, --help            show this help message and exit
  --stat {False,True}   Perform a stat on each file (default: True)
  --read {False,True}   Read the entire contents of each file (default: True)
  --md5 {False,True}    Generate an md5 hash for each file (default: True)
  --xxhash {False,True}
                        Generate an xxhash hash for each file (default: True)
  --warmup {False,True}
                        If True, will "warm up" each file before performing
                        any further operations on it.This essentially loads
                        the file into any OS/disk cache so that subsequent
                        reads to that file will yield consistent performance
                        between tests. (default: True)
  --threads THREADS     The number of threads to use so that parallel reads
                        can be tested. Note that parallelizing reads my result
                        in a faster overall test, but may also lead to slower
                        per-file reads on average (default: 1)
  --csv_path CSV_PATH   A csv filepath to write the the results to. Results
                        contain metrics per each file (default: None)
  --log_level {CRITICAL,ERROR,WARNING,INFO,DEBUG,NOTSET}
                        The logging level to display (default: INFO)
  --read_size READ_SIZE
                        The number of bytes to read at a time (when reading a
                        file) (default: 65536)
  --skip_failures {False,True}
                        If True, will skip any failures upon file reads.
                        Otherwise an exception will halt testing (default:
                        False)

Copy link
Contributor

@flebel flebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few suggestions.

tests/profile.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean --threads?

tests/profile.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option flags should be prefixed by --, not -.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on/off should be True/False.

tests/profile.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be --csv_filepath /path/to/output_file.csv.

tests/profile.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parenthesis aren't needed here.

use_api_key=True
)

jobs = json.loads(r_body).get("data")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client.make_request returns a requests object, you should be able to parse its payload as JSON with r_body.get_json().get('data').

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment also applies to the other use of json.loads in this script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client.make_request does not return a response object (sadly). Only text (response.text)

tests/profile.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does argparse behave if accessing a property for an argument that doesn't exist?

Since the --xxhash flag isn't enabled if XXHASH if False, I'm wondering if an exception would be raised if this code was run on a systems that's missing the xxhash module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you expand that statement to use multiple lines, you can more easily see why it's not problematic, e.g,

if XXHASH:
     hash_xx=args.xxhash
else:
    hash_xx=None

In other words, the arg.xxhash value will not be accessed if xxhash is not installed.

tests/profile.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just use the default values defined for every argument? It feels redundant to redefine a second set of defaults here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was designed to stand on its own, and therefore provide its own default values. The fact that our shell/argparse logic happens to call it is irrelevant and should not dictate whether this function should define its own default arguments IMO. For example, if I wanted to import/reuse this function elsewhere, I would want this function to provide useful default arguments.

tests/profile.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave a comment explaining what this is for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, good call.

tests/profile.py Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

files is already initiated above.

@lawschlosser lawschlosser force-pushed the feature/CT-525/filesystem-profiler branch from ccbd2d3 to 32382c2 Compare June 1, 2018 23:45
@lawschlosser lawschlosser force-pushed the feature/CT-525/filesystem-profiler branch 3 times, most recently from 70e65a3 to eeb8e43 Compare June 28, 2018 03:36
- also added device benchmarking option
- also changed all file-reading operations to use io.open
@lawschlosser lawschlosser force-pushed the feature/CT-525/filesystem-profiler branch from eeb8e43 to de1b6bd Compare June 28, 2018 19:28
- improve general cross-platform support
- now flushes cache once per test run (not per file)
@lawschlosser lawschlosser force-pushed the feature/CT-525/filesystem-profiler branch from 35ebf58 to c9a4213 Compare June 1, 2020 08:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments