Skip to content

AndyC80297/pinto

 
 

Repository files navigation

Pinto

A command line utility for managing and running jobs in complex Python environments.

Background

Most ongoing research in the ML4GW organization leverages Poetry for managing Python virtual environments in the context of a Python monorepo. In particular, Poetry makes managing a shared set of libraries between jobs within a project simple and straightforward.

However, several tools in the Python gravitational wave analysis ecosystem cannot be installed via Pip (in particular the library GWpy uses to read and write .gwf files and the library it uses for reading archival data from the NDS2 server). This complicates the environment management picture by having some projects which use Poetry to install local libraries as well as their own code into Conda virtual environments, and others which don't require Conda at all and can install all the libraries they need into Poetry virtual environments.

Enter: pinto

Pinto attempts to simplify this picture by installing a single tool in the base Conda environment which can dynamically detect whether a project requires Conda, create the appropriate virtual environment, and install all necessary libraries into it.

pinto -p /path/to/my/project build

It can then be used to run jobs inside of that virtual environment.

pinto -p /path/to/my/project run my-command --arg1

If you're currently in the project's directory, you can drop the -p/--project flag altogether for any pinto command, e.g.

pinto build
pinto run my-command --arg1

Structuring a project with Pinto

To leverage Pinto in a project, all you need is the pyproject.toml file required by Poetry which specifies your project's dependencies. If just this file is present, pinto will treat your project as a "vanilla" Poetry project and manage all of its dependencies inside a Poetry virtual environment.

But what if I need Conda?

Inidicating to Pinto that your project requires Conda is as simple as including a poetry.toml file in your project directory with the lines

[virtualenvs]
create = false

Alternatively, from you project directory you can run

poetry config virtualenvs.create false --local

When building your project, pinto will first look for an entry that looks like

[tool.pinto]
base_env = "/path/to/environment.yaml"

In your project's pyproject.toml. If this entry doesn't exist, pinto will look for a file called either environment.yaml or environment.yml starting in your project's directory, then ascending up your directory tree to the root, using the first file it finds. This way, you can easily have a base environment.yaml in the root of a monorepo from on top of which all your projects build, while leaving projects the option of overriding this base image with their own environment.yaml.

In fact, if the name listed in the environment.yaml discovered by pinto ends with -base, pinto will automatically name your project's virtual environment <prefix>-<project-name>. For example, if the name of your project (as given in the pyproject.toml) is nn-trainer, and the environment.yaml at the root of your monorepo looks like

name: myproject-base
dependencies:
    - ...

then pinto will name your project's virtual environment myproject-nn-trainer.

To see more examples of project structures, consult the examples folder.

Installation

Environment set up

Pinto requires local versions of both Conda and Poetry. First make sure that you have a local version of Conda installed in your environment (instructions found here. I particularly recommend using Miniconda for a bare install, since most your work will be in virtual environments anyway). Then install Poetry into your base Conda environment via pip rather than using the Poetry installer

(base) ~$ python -m pip install "poetry>1.2.0"

Install

You can install pinto in your base Conda environment with pip, either by pointing at this GitHub repo

(base) ~$ python -m pip install git+https://github.com/ML4GW/pinto.git

or by cloning this repo and pip installing it locally

(base) ~$ git clone https://github.com/ML4GW/pinto.git
(base) ~$ python -m pip install pinto

Development Installation

The best way to install pinto for development is to clone this repo and perform an editable installation with the development dependencies included:

(base) ~$ git clone https://github.com/ML4GW/pinto.git
(base) ~$ python -m pip install -e pinto[dev]

About

Job environment management and execution tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 97.2%
  • Dockerfile 2.8%