One of the corner stones in bioinformatics research certainly is the implementation of analysis pipelines and algorithms. Once the software has been published and released it often remains difficult to maintain, extend, or bug fix and, finally, make any updates easily accessible to the research community. A researchers life can be made much more easy with the help of a versioning tool such as git, proper documentation, and the use of standardized and automated build and packaging tools right from the beginning of the software development process.
In this workshop I will focus on software development with C/C++ and/or Python. I will briefly introduce approaches for software documentation for both programming languages which can even be merged easily to produce appealing HTML and PDF documentations (keywords: doxygen, breathe, sphinxdoc). Moreover, I will introduce some build and packaging strategies (autotools/automake and Pythons setuptools) that help a lot to check and prepare all relevant prerequisites for your software to be used on a users computer. Bug fixing, feature additions, packaging, and releasing your software, then, can be highly automated using CI strategies. For that, I will briefly introduce Github workflows which serve as a very convenient tool set. Previous knowledge on (some of) the tools and programming languages mentioned above is helpful but not required.
Let's start with a new software project where we want to track the development process using the revision control system git. This will help us to easily test out new features without harming any already working code, to share the development among multiple developers, to undo changes we might have mistakenly introduced, and many more things.
As a first step, let's create a new directory that will contain all the files we want to keep track of and change into this directory:
$ mkdir my_git_project
$ cd my_git_project
Now, it is time to initialize the directory:
$ git init
Now, for the sake of documentation, let us add a readme-file to our new git repository that we track from the very beginning:
$ touch README.md
$ git add README.md
$ git commit -m "Initial commit with README.md"
We will use this file in the process of software development to write
down some essential things about what the software does, how a user
can install it, how it can be used, etc.
Any addition to this file will be recognized by git and in most cases
the changes will be part of the next git commit.
Before we start to actually write any code of our software we create
some additional directoriesto keep our project clean. First, let
us put any source code into a src directory. For the documentation
we create a doc directory and finally, let us also create a test
directory where we will hopefully put in some tests to check the
functionality of our software:
$ mkdir src
$ mkdir doc
$ mkdir test
Next, we will start with a very minimalistic program written C
that consists of a single function main() that simply prints the
string "Hello World" to the terminal. Note, that in C the main
function is special as it is the entry point for any executable
program we build with our code.
Now, open a text editor, create the hello.c file in the src/
directory with the following content:
#include <stdlib.h>
#include <stdio.h>
int
main(int argc,
char *argv[])
{
printf("Hello World\n");
return EXIT_SUCCESS;
}We can now change into the src/ directory to compile and run our
simple program:
$ cd src/
$ gcc -o hello hello.c
Here, we use gcc to create (output/ -o) a binary executable with
the name hello that is compiled from the source code file hello.c.
Another popular C compiler is clang (used by default in MacOS X).
The one line invokation of gcc above is a short-hand of actually three
processes. The compiler pre-processes the code, then compiles it, and
finally links everything into the single executable file hello. Larger
software projects usually require more option parameters to be passed to
the preprocessor, compiler and linker, and typically split the processes
into two stages, compilation and linking. Thus, the one-liner above
can also be expressed by two invocations like:
$ gcc -c hello.c
$ gcc -o hello hello.o
where the hello.o file is an object file created by the compilation
stage of gcc.
Running our little program should produce something like
$ ./hello
Hello World
Let us now separate the printf() code into another function that
we can possibly re-use later on. At the same time we make the function
more general by allowing for arbitrary strings to be printed. Just for
fun we always print the current time along with the string. So the code
in hello.c may now look like this:
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
void
print_time_string(const char *string);
int
main(int argc,
char *argv[])
{
if (argc > 1)
print_time_string(argv[1]);
else
print_time_string("Hello World");
return EXIT_SUCCESS;
}
void
print_time_string(const char *string)
{
if (string) {
time_t result = time(NULL);
if (result != (time_t)(-1))
printf("%s%s\n",
asctime(gmtime(&result)),
string);
else
printf("%s\n", string);
}
}don't be afraid of the complicated-looking code that retrieves the current
time and date. What matters here is that we created a function
print_time_string() that takes as an argument a constant pointer to a
sequence of characters, the string we want to print. We first declared that
function just before main() such that its interface is known as soon as
main() is compiled. Then we defined the actual behavior of it at the bottom
of our file.
You might have also noted that the main() function changed slightly.
We now check whether additional parameters are given to the program and
treat the first parameter as an input string we want to print out. Thus,
compiling and running our program now looks like:
$ gcc hello.c -o hello
$ ./hello
Fri Sep 26 11:13:54 2025
Hello World
$ ./hello "Greetings Earthlings!"
Fri Sep 26 11:18:14 2025
Greetings earthlings!
Now, it is time to separate our little function even further by putting it
into its own object file, apart from the one that contains main(). So,
create a new text file time_print.c and move the print_time_string()
function there (do not forget the include statements):
#include <stdio.h>
#include <time.h>
void
print_time_string(char *string)
{
if (string) {
time_t result = time(NULL);
if (result != (time_t)(-1))
printf("%s%s\n",
asctime(gmtime(&result)),
string);
else
printf("%s\n", string);
}
}Also, let us already move the declaration of the print_time_string()
function into a header file, so that it can be included into any other
piece of source code we might want to create in the future. The header
file time_print,h should look like:
#ifndef MY_TIME_PRINT_H
#define MY_TIME_PRINT_H
void
print_time_string(char *string);
#endifSince hello.c now has no idea how our print function looks like, we
need to include the newly created header file and hello.c becomes
#include <stdlib.h>
#include <stdio.h>
#include <time_print.h>
int
main(int argc,
char *argv[])
{
if (argc > 1)
print_time_string(argv[1]);
else
print_time_string("Hello World");
return EXIT_SUCCESS;
}When we compile each of the two source code files now, we produce two
object files hello.o and time_print.o:
$ gcc -c time_print.c
$ gcc -I. -c hello.c
$ ls
hello* hello.c hello.o time_print.c time_print.h time_print.o
that need to be linked together to make up our program:
$ gcc -o hello time_print.o hello.o
Note here the additional argument -I. when we compile hello.c. This
instructs the pre-processor to also search for header files in the current
(.) directory.
Done, now we have made our little print_time_string() function re-usable
for any other C code parts or programs. They only require the inclusion
of the header file time_print.h and linking against the time_print.o
object.
If we want to re-use our function in the future, or provide it to other users to include it into their own project, we should better document our code! This means to add meaningful descriptions about the purpose and instructions of our function, as well as a description of its arguments and return values.
For the programming languages C and C++ the tool doxygen
is quite popular and very handy. It requires the documentation in form
of special comments in the code and offers a variety of commands to
nicely format and structure an API reference manual. To use it, let us
first create a basic configuration file in our doc/ directory:
$ cd ../doc
$ doxygen -g
Now open the configuration file Doxyfile, locate the options
INPUT, FILE_PATTERNS, and GENERATE_LATEX and change them as follows:
INPUT = ../src
FILE_PATTERNS = *.h
GENERATE_LATEX = NO
This will instruct doxygen to search for documentation in our src/
directory and to only consider header files with the ending .h.
You may of course also other options to better suite your project, like
PROJECT_NAME, PROJECT_NUMBER, etc. Have a look into the
doxygen configuration docs to
find out what more this tool could do for you.
Note here, that we deactivated LaTeX code generation which would enable us to produce nice PDF reference manuals. But we will come to that part later and for now simply rely on the default HTML output.
We can now run docygen to create our HTML reference manual like this:
$ doxygen Doxyfile
This will create an html directory within our doc directory that
contains the entire documentations and we can already open the
html/index.html file with our favorite web browser. Of cource, since
we didn't add any comments so far, the reference manual is still empty.
So, let us change that by documenting our print_time_string() function.
For that, we add a comment block into our header file just before the
function declaration such that it now looks like:
#ifndef MY_TIME_PRINT_H
#define MY_TIME_PRINT_H
/**
* @file time_print.h
* @brief This is our example header file with documentation
*/
/**
* @brief Print the current time and a string
*
* On invocation, this function retrieves the current local
* time and prints it to @c stdout along with a user-defined
* string on the next line.
*
* @param string The user-defined string to print
*/
void
print_time_string(char *string);
#endifRunning doxygen again now produces the description for our print_time_string()
function.
As you might have already guessed, invoking all the compilation, linking,
and code-generation command line calls is very tedious. Instead we want to
automate everything somehow. For that purpose, one typically would use the
make program and a corresponding Makefile. Still, the latter requires
all specific details on how to exactly build everything. But we are lazy
and don't want to write up all the command line calls ourself, but rely on
some further automatization. There are a few handy tools available for this
task, the most common might be autoconf and automake also known as
GNU autotools and
cmake.
In this workshop we will focus on GNU autotools despite the critisism
among developers of whether one should use it or not. I personally like it
due to its great flexibility and transparency (see also the
Autotools Mythbuster which is very helpful).
This allows for easy adaptation of the build process to very specific tasks.
But I won't debate on the choice between GNU autotools and cmake :)
In our project root directory my_git_project/ create a file configure.ac
with the content:
AC_INIT([Sustainable Software], [1.0],
[Ronny Lorenz <ronny@bioinf.uni-leipzig.de>],
[workshop])
AM_INIT_AUTOMAKE([-Wall foreign 1.11 tar-ustar])
AC_PROG_CC
AC_PROG_INSTALL
##
## Check for the doxygen executable and prepare a Makefile.am
## condition variable to turn-on/off the build process of the
## documentation
##
AC_PATH_PROG(DOXYGEN, [doxygen], [no])
AC_SUBST([DOXYGEN])
if test x"${DOXYGEN}" = x"no" ; then
AC_MSG_WARN([doxygen not found! You may want to install doxygen to generate the API documentation.])
fi
AM_CONDITIONAL(WITH_DOXYGEN, test "x$DOXYGEN" != "xno")
AC_CONFIG_FILES([Makefile src/Makefile doc/Makefile])
AC_OUTPUT
This is the main configuration (autoconf) of our project. First, we
call AC_INIT with some meta information, such as our project name,
the version number, our email contact, and the name of the projects
distribution archive file we will be able to automatically create later.
Then we initialize automake with some additional parameters to enable
all warnings (-Wall), allow for missing files a GNU autotools project
would usually require (foreign), set the minimum automake version
(1.11), and enable arbitrarily long file names in the distribution
archive. This is followed by two checks for the compiler and install
programs.
Then comes a larger part that tries to determine the absolute path
to the doxygen executable to set up the build process depending on
whether or not doxygen is installed.
Finally, the AC_CONFIG_FILES macro is called with a list of files
that we want to automatically generate. In our case, these are the
Makefiles in our projects root, within the src/, and doc directory.
What is still missing are some instructions for automake to tell it
the basic requirements to create Makefiles that do what we want. This
is what Makefile.am files are for, and we will now create three of
them. The first one in our project's root (i.e. my_git_project/Makefile.am)
can be as simple as:
SUBDIRS = \
src \
doc
simply telling automake that there is nothing to do with the root
directory, but that there are subdirectoris src and doc that need
to be scanned further.
The second file is for our source code and executable program. Create
the file src/Makefile.am with the following content:
bin_PROGRAMS = hello
hello_SOURCES = \
hello.c \
time_print.c \
time_print.h
to specify that we want a binary program with the name hello which
itself depends on the source code files hello.c, time_print.c, and
time_print.h. That's all we need at this point.
Finally, the documentation part is more involved since we want to only
create the documentation depending on whether the user (or we ourselves)
have the doxygen programm installed. Here is the code that goes into
doc/Makefile.am:
##--------------------------------------------------##
## Tell autoconf/automake to include the necessary ##
## files in the distribution archive as well as in ##
## the installation routine ##
##--------------------------------------------------##
html_DATA = $(REFERENCE_MANUAL_FILES_HTML)
EXTRA_DIST = \
Doxyfile \
doxygen-html \
html
##--------------------------------------------------##
## prepare variables in case HTML reference manual ##
## is going to be installed ##
##--------------------------------------------------##
if WITH_DOXYGEN
REFERENCE_MANUAL_FILES_HTML = html/*
##--------------------------------------------------##
## In case the HTML manual should be created, here ##
## is the rule how to do so ##
##--------------------------------------------------##
$(REFERENCE_MANUAL_FILES_HTML): doxygen-html
doxygen-html: $(pkginclude_HEADERS) Doxyfile
@DOXYGEN@ Doxyfile >>doxygen.log 2>&1; \
touch doxygen-html
endif WITH_DOXYGEN
Take it as granted for now, I might find the time later to describe what it actually does.
Phew, that has been a lot but we are done with this part and can
enjoy the art of automated build processes working on our project.
Just start with invoking the autotools toolchain by calling
$ autoreconf -i && ./configure
The first of the two calls parses all our configuration files and
creates a ./configure script, a Makefile.in for each of our
Makefile.am files, and several more files we do not want to spend
too much time on right now. The second command runs the ./configure
script that will check for all pre-requisites in our build process
to create the Makefiles. If everything went well, we can now call
$ make
and everything is build automagically. You can remove all files
that are build by make with the
$ make clean
or
$ make maintainer-clean
command. Beware, that the latter cleans up all files that the
autotools tool chain has build automatically!
To install our program we can use
$ make install
and to distribute our code to other people, we can easily create
a tar.gz archive:
$ make dist
The resulting archive workshop-1.0.tar.gz contains everything
necessary such that any end-user only has to call the ./configure
script and run make/make install to install our software. Note,
that if you prefer to distribute your software as a ZIP file, you
can also run make dist-zip.
We now turn to a much simpler, widely used programming language, that
most of yo might have already worked with, Python.
Again, instead of simply creating a Python script that performs a
specific tast, let us create an actual Python project that we can
easily distribute. This project could either be a module, which
would be a single Python file containing our code, or a Python
package, which consists of several code files. We choose the latter
just for fun and name our project workshop.
Before we start, we create another subdirectory where we want to
store our Python code, just to keep our directory tree clean. In the
following we use src/python, but the Python code directory could
have been placed anywhere else:
$ mkdir -p src/python/workshop
$ touch src/python/workshop/__init__.py
$ touch src/python/workshop/hello.py
Note, that Python projects are placed in a directory with the project
name, and this directory usually contains an __init__.py file which
may be empty in the beginning. We also created an emtpy file hello.py
where we will place our main function in. Now, open hello.py in
your favorite editor and add the following content:
def greetings():
"""
Our main function that simply prints 'Hello World!'
"""
print("Hello World!")
# If this file is executed by the Python interpreter, the main() function
# is our entry point
if __name__ == '__main__':
greetings()We can now run the script by the Python interpreter like so:
$ python src/python/workshop/hello.py
Hello World!
Alternatively, we can run our code as a module (python -m) instead:
$ PYTHONPATH=src/python/ python -m workshop.hello
Hello World!
which will essentially do the same. That's all cool, but let us rewrite
the code to make it more re-usable, like we did before in the C example.
To do so, we create a new module time_print in our workshop package
by simply adding the file src/python/workshop/time_print.py with the
content:
import time
def print_time_string(string):
"""
Print the current time and a string
On invocation, this function retrieves the current local
time and prints it to `stdout` along with a user-defined
string on the next line.
Parameters
----------
string : str
The user-defined string to print
"""
if string:
print(time.asctime(time.localtime()))
print(string)Note, that we already added some documentation for our function
print_time_string(). We've added a docstring, a special documentation
string surrounded by triple double quotes, and we adhere to the
numpydoc standard
to autoamtically generate a documentation for our package later.
At the same time, we change the file src/python/workshop/hello.py
to
from . import time_print as tp
def greetings():
"""
Our main function that simply prints 'Hello World!'
"""
tp.print_time_string("Hello World!")
# If this file is executed by the Python interpreter, the main() function
# is our entry point
if __name__ == '__main__':
greetings()Let us run the module again to observe its new output that now also contains the current local time:
$ PYTHONPATH=src/python/ python -m workshop.hello
Sat Sep 27 11:17:32 2025
Hello World!
Let us now focus on how to package our Python module, such that we can easily distribute it to other users. For that, there are several build backend tools available. We won't have enough time to look into the pro's and con's of each of the different tools, but use setuptools for now. The basic configuration file(s) are standardized nowadays, so you can change the actual build tool relatively easy later on if you decide that there are better alternatives.
We start here with the main declarative configuration file in our directory
root (workshop/), pyproject.toml that should have the following content:
[build-system]
requires = ["setuptools >= 77.0.3"]
build-backend = "setuptools.build_meta"
[project]
name = "sustainable_software_workshop"
version = "1.0.0"
authors = [
{ name="Ronny Lorenz", email="ronny@bioinf.uni-leipzig.de" },
]
description = "An example packge for the sustainable software development workshop"
readme = "README.md"
requires-python = ">=3.9"
classifiers = [
"Programming Language :: Python :: 3",
"Operating System :: OS Independent",
]
license = "MIT"
license-files = ["LICENSE.txt"]
keywords = ["time", "print", "workshop", "dubice"]
[project.urls]
Homepage = "https://github.com/ViennaRNA/sd-workshop-dubice2025"
Issues = "https://github.com/ViennaRNA/sd-workshop-dubice2025/issues"
[tool.setuptools.packages.find]
where = ["src/python"] # ["."] by default
[project.scripts]
workshop-hello = "workshop.hello:greetings"Note, that we specified a license file here, which didn't exist so far. I've chosen MIT license for this workshop, you might find other licenses to suite your project better. See, e.g. choosealicense.com for help in choosing the right license.
This is all we need for now to create distributable python packages in form
of source packages (sustainable_software_workshop-1.0.0.tar.gz) and binary
[Python wheels] (sustainable_software_workshop-1.0.0-py3-none-any.whl) that
both can be easily installed via pip (or even be uploaded to PyPI.
Simply run
$ python -m build
to build the packages. both of them should be now available in the newly created
directory dist/.
Next, I will show you how to (mostly) automatically create a nice looking
documentation of your project. I've chosen Sphinx
here, since it seems the most popular documentation tool for Python and I
actually like it also for creating documentations for my C projects, but
more to that later.
To bootstrap our use of sphinx, we can use the sphinx-quickstart tool:
$ sphinx-quickstart --no-makefile doc/
Answer the questions the quickstart tool will ask you and it automatically
set's up a basic structure for your documentation. I've chosen to split
the build and source directories. In my opinion, this resembles a cleaner
way for our directory tree structure. Here, I've added the --no-makefile
option, since we already have a Makefile in our doc/ directory, which has
been automatically generated by autoconf. We will later include the
corresponding command line calls for sphinx-build into our autotools
toolchain.
Before we build our Python documentation, we need to include a Sphinx extension
named autodoc.
This will allow us to parse the docstrings in our Python module to
automatically create the API documentation. Additionally, let us use two more
extensions, napoleon
and myst_parser. The former can be
used to correctly parse the numpy-style docstring we added to our
print_time_string() function. The latter is for parsing Markdown files and
serving them to sphinx.
To include the extensions, open doc/source/conf.py and add the following
to the extensions list:
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'myst_parser'
]We also add some configuration for napoleon in this file to activate
numpy-style docstrings:
napoleon_google_docstring = False
napoleon_numpy_docstring = TrueNow, open the doc/source/index.rst file change it to
.. Sustainable Software Development - Workshop documentation master file, created by
sphinx-quickstart on Sat Sep 27 14:43:42 2025.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Sustainable Software Development - Workshop documentation
=========================================================
This is the documentation part for the sustainable software development
workshop held in Dubice on the occasion of the
`21st Herbstseminar 2025 <https://herbstseminar.bioinf.uni-leipzig.de>`__
.. toctree::
:maxdepth: 2
:caption: Contents:
:maxdepth: 1
ReadMe <readme>
python_apiand add two more files to the doc/source/ directory, one that uses myst_parser
to include our README.md file and a second where we call automodule from the
autodoc extension to place the Python API description. The contents of both
files are:
doc/source/readme.rst:
.. include:: ../../README.md
:parser: myst_parser.sphinx_and doc/source/python_api.rst:
Python API description
======================
Below, you'll find a detailed description of the *workshop.timeprint* API
.. automodule:: workshop.time_print
:members:
:undoc-members:We can now finally build the documentation by changing into our doc/
directory and calling sphinx-build as follows:
$ sphinx-build -M html source/ build/
This will create a directory build/html/ what contains the HTML documentation.
Open the index.html file to have a look what we've included so far.
We now integrate the sphinx-build process for the Python documentation
with autotools. This can be achieved in the same way we added the doxygen
documentation support. First, let us extend our configure.ac script with
the following code:
AC_PATH_PROG(SPHINXBUILD, [sphinx-build], [no])
AC_SUBST([SPHINXBUILD])
if test x"${SPHINXBUILD}" = x"no" ; then
AC_MSG_WARN([sphinx-build not found! You may want to install sphinx to generate the Python API documentation.])
fi
AM_CONDITIONAL(WITH_SPHINXBUILD, test "x$SPHINXBUILD" != "xno")
that we place directly after the configuration parts for doxygen and before
AC_CONFIG_FILES. What is left is to change doc/Makefile.am, where we first
add the sphinx sources and sphinx-generated files we want to create and
distribute:
html_DATA = \
$(REFERENCE_MANUAL_FILES_HTML) \
$(PYTHON_REFERENCE_MANUAL_FILES_HTML)
EXTRA_DIST = \
Doxyfile \
doxygen-html \
sphinxbuild-html \
html \
source \
buildand then put the conditional instructions how to build the documentation
with sphinx at the end of the file:
if WITH_SPHINXBUILD
PYTHON_REFERENCE_MANUAL_FILES_HTML = build/html/*
##--------------------------------------------------##
## In case the HTML manual should be created, here ##
## is the rule how to do so ##
##--------------------------------------------------##
$(PYTHON_REFERENCE_MANUAL_FILES_HTML): sphinxbuild-html
sphinxbuild-html:
(rm -f sphinxbuild-html && \
@SPHINXBUILD@ -M html source build >sphinx.log 2>&1 && \
touch sphinxbuild-html) || cat sphinx.log
endif WITH_SPHINXBUILDFrom now on, the Python documentation will be included in our
make process and will also be part of the distribution archive.
What is still missing, though, is to include the src/python
subdirectory into the distribution archive, so let us quickly
add the corresponding instruction to src/Makefile.am:
At this time, we now have two documentations, one for the
C code and another for our Python package, both build by
separate tools. Moreover, sphinx is way more flexilble in
terms of documentation and adapting the final output to whatever
requirements we have. The doxygen output is much more static
and doesn't allow for too much interference in the HTML layout.
Luckily, there exists a bridge between the both of them:
breathe.
It serves an extension to sphinx that is able to parse XML
output generated by doxygen. It then adds special directives
that we can use in the ReStructuredText (.rst) files we use
for sphinx. Below, I will show how we can get this bridge to
work and in the end have only a single documentation build
by sphinx.
What we need to do now, is to change the output of doxygen
from HTML to XML. We do that by adapting the doc/Doxygen
configuration file as follows:
GENERATE_HTML = NO
...
GENERATE_XML = YES
and by changing our doc/Makefile.am to
##--------------------------------------------------##
## Tell autoconf/automake to include the necessary ##
## files in the distribution archive as well as in ##
## the installation routine ##
##--------------------------------------------------##
html_DATA =
EXTRA_DIST = \
Doxyfile \
doxygen-xml \
xml \
sphinxbuild-html \
ext \
source \
build
##--------------------------------------------------##
## add directive to build doxygen XML ##
##--------------------------------------------------##
if WITH_DOXYGEN
doxygen-xml: $(pkginclude_HEADERS) Doxyfile
@DOXYGEN@ Doxyfile >>doxygen.log 2>&1; \
touch doxygen-xml
endif WITH_DOXYGEN
Whenever doxygen will be invoked now, it will store XML output
to the doc/xml directory. We will add the correct html_DATA
files again later as soon as we correctly set up the breathe
extension, since we will only rely on HTML generated by sphinx
from now on.
If you are working on a Python-only project, you are probably
installing any Python dependencies via pip
and, therefore you would only add breathe to your dependency
list. Here, however, we download its sources directly from the
official breathe github repository.
In particular, we will clone the latest tag (release) into
a doc/ext/ subdirectory to keep our directory tree clean
$ git clone --depth 1 --branch v5.0.0a5 https://github.com/breathe-doc/breathe.git doc/ext/breathe
and we remove all non-essential files of the cloned repository such that it doesn't interfere with our own repository.
$ rm -rf doc/ext/breathe/.git*
Note, that this is not the best choice to include foreign git repositories into your own. You might want to consider git submodules or even better git subtree instead! But we don't have enough time to go into details for that within the time frame of our workshop.
Now, let us add the breathe source to the files we track with
git:
$ git add doc/ext/breathe
Also note, that since we directly cloned the source of breathe
we first need to set it up to get it to work. In particular, we
need to create the parser script breathe/._parser:
$ cd doc/ext/breathe/ && make parser && cd ../../../
that we also need to add to the tracked files:
$ git add doc/ext/breathe/breathe/_parser.py
The last steps should not be necessary if you installed breathe
via pip or any other package manager of your operating system.
The next step is to make sphinx aware of our extension by
adding the respective entries to doc/source/conf.py, i.e.
extending the path to the breathe sources and including
the extension in the extensions list
sys.path.insert(0, os.path.abspath('../ext/breathe'))
...
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'myst_parser',
'breathe'
]Next, we need to tell breathe where to find the XML output
of doxygen by setting the breathe_projects dictionary and the
breathe_default_project variable in doc/source/conf.py:
breathe_projects = {"workshop": "../xml"}
breathe_default_project = "workshop"Now we are all set to include our doxygen documentation into
the sphinx ReStructuredText files.
Let us now create a new page api in our sphinx documentation to
list the C API by calling the appropriate
breathe directives.
To keep our example simple, we only use the doxygenfunction directive
to directly address the one function we documented in our C project.
In reality, you would more likely use the
doxygen grouping feature to group
functions, definitions and alike in your C API and then use the doxygengroup
directive to automatically insert all member functions and symbols of the
respective group.
Open the doc/source/index.rst file and change the table of contents (toctree)
to
.. toctree::
:caption: Contents:
:maxdepth: 1
ReadMe <readme>
api
python_apiNext, create a new file doc/source/api.rst with the content
C API description
=================
Below, you'll find a detailed description of the *workshop* `C` API
.. doxygenfunction:: print_time_string
Now, we will change doc/Makefile.am again to tell automake that
our sphinx-build requires the doxygen XML output to run properly.
In addition, we will add the sphinx ouptut as default html_DATA,
so our doc/Makefile.am should look like the following now:
##--------------------------------------------------##
## Tell autoconf/automake to include the necessary ##
## files in the distribution archive as well as in ##
## the installation routine ##
##--------------------------------------------------##
html_DATA = $(REFERENCE_MANUAL_FILES_HTML)
EXTRA_DIST = \
Doxyfile \
doxygen-xml \
xml \
sphinxbuild-html \
ext \
source \
build
##--------------------------------------------------##
## add directive to build doxygen XML ##
##--------------------------------------------------##
if WITH_DOXYGEN
doxygen-xml: $(pkginclude_HEADERS) Doxyfile
@DOXYGEN@ Doxyfile >>doxygen.log 2>&1; \
touch doxygen-xml
endif WITH_DOXYGEN
if WITH_SPHINXBUILD
REFERENCE_MANUAL_FILES_HTML = build/html/*
##--------------------------------------------------##
## In case the HTML manual should be created, here ##
## is the rule how to do so ##
##--------------------------------------------------##
$(REFERENCE_MANUAL_FILES_HTML): sphinxbuild-html
sphinxbuild-html: doxygen-xml
(rm -f sphinxbuild-html && \
@SPHINXBUILD@ -M html source build >sphinx.log 2>&1 && \
touch sphinxbuild-html) || cat sphinx.log
endif WITH_SPHINXBUILDDone! Now remove the files doc/sphinxbuild-html and
doc/doxygen-xml if they exist, run make to build our
merged documentation:
$ rm -f doc/sphinxbuild-html doc/doxygen-xml
$ make
Open doc/build/html/index.html with your favorite web browser
and sit back and enjoy! The documentation now has an additional
page C API description that displays the documentation for our
print_time_string() C function.
It is time to remvoe all the doxygen remnants we created
earlier to still keep our directory clean:
$ rm -rf doc/html dox/doxygen-html
As soon as you want to make your software public, you may consider
using an accessible git repository, e.g. served at github
or gitlab. Both of them offer a range of
automated workflows that can make the life of a software developer
much easier. Here, I will focus on using github and its
github actions and workflows.
In essence, workflows contain actions that are triggered upon certain
interactions with the repository, e.g. whenever you push a new commit
to (a specific branch) or you add a certain tag like a version release
number. The actions then can do many things, such as automatically build
and test your software or create a distribution archive.
In the following, we assume that you already created a repository
at github and that you've added it as a remote origin to your
local repository. Instructions how to do that will be given by
github once you created your project.
Let us begin with a fairly simple workflow that runs our entire
autotools tool chain and builds our software and documentation
as soon as we push to our repository.
For that, we create a .github/workflows/ subdirectory
$ mkdir -p .github/workflows
with the file build-project.yml that contains
name: Run autotools toolchain
run-name: ${{ github.actor }} is running the autotools toolchain
on: [push]
jobs:
build_software:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install prerequisites
run: |
sudo apt-get update
sudo apt-get -y install \
build-essential \
autoconf \
automake \
doxygen \
python3-sphinx \
python3-myst-parser
- name: Autotools setup
run: autoreconf -i
- name: Configure
run: ./configure
- name: Build
run: makeAdd this file to the ones we track in our repository and push everything to you github repo:
$ git add .github/workflows/build-project.yml
$ git commit -a -m "Start using github workflows"
% git push
Now, go to your github repository and open the Actions tab. It will now
show our workflow running and you can inspect the results.
Github offers a Releases page
where official releases of your software are available. Releases themselves
can be manually created through the github web interface. However, it is
much more convenient to create a release automatically as soon as your
software reached a certain point. Such an automation is relatively simple,
all you need to do is to git tag a certain commit as a release, e.g. by
adding a tag that consists of a version number, and adding a corresponding
workflow that automates the creation of the relesae in your github repo.
Let us begin with some prerequisites first, though!
A release is usually accompanied by a release note that tells your users
what this new release acutally encompasses. If we want to automatically
add a release note to the Releases page on github, we have to store that
note somewhere in our repository. The most convenient way to do so is to
add a CHANGELOG.md file to our repository that will list all changes of
the software from one release to the next. So, let us create one, add it
to the autotools toolchain such that it will be among the distribution
tar-balls and fill in our first release note.
Create a new file CHANGELOG.md with the content:
# Changelog
Below, you'll find a list of notable changes for each version of the
Sustainable Software Development *workshop*
## Version 1.0.x
### [Unreleased](https://github.com/ViennaRNA/sd-workshop-dubice2025/compare/v1.0.0...HEAD)
### [Version 1.0.0](https://github.com/ViennaRNA/sd-workshop-dubice2025/compare/bfb904c...v1.0.0)
#### Software
* Add the `hello` program
* Add a python version of the `hello` program
#### Documentation
* Add documentation for both APIs, `C` and `Python` that is build with `doxygen`, `breathe`, and `sphinx`and add the following to Makefile.am:
EXTRA_DIST = \
CHANGELOG.md
Now add the CHANGELOG.md to the files we track with git
$ git add CHANGELOG.md
Everytime we make a new release of our software, we will now add a new entry
to our CHANGELOG.md file and list all the changes that make up the new release!
Next, let us create the workflow that builds a distribution archive and then
automatically creates a new release at our github repository. First, we create
a reusable workflow
that is dedicated to build the distribution archive. Create a new file
.github/workflows/make-dist.yml with the content
name: Make distribution archives
on:
workflow_dispatch:
inputs:
config-flags:
description: 'Configure flags to prepare the source directory'
default: ''
required: false
type: string
zip:
description: 'Additionally create ZIP archive next to default GZIP'
required: false
default: false
type: boolean
artifact-name:
description: 'Name of the artifact'
required: false
default: 'distribution-archives'
type: string
workflow_call:
inputs:
config-flags:
description: 'Configure flags to prepare the source directory'
default: ''
required: false
type: string
zip:
description: 'Additionally create ZIP archive next to default GZIP'
required: false
default: false
type: boolean
artifact-name:
description: 'Name of the artifact'
required: false
default: 'distribution-archives'
type: string
outputs:
version_number:
description: "The Version number of this build"
value: ${{ jobs.make_dist.outputs.version_number }}
jobs:
make_dist:
runs-on: ubuntu-latest
# Map the job outputs to step outputs
outputs:
version_number: ${{ steps.tarball.outputs.version_number }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install prerequisites
run: |
sudo apt-get update
sudo apt-get -y install \
build-essential \
autoconf \
automake \
doxygen \
python3-sphinx \
python3-myst-parser
- name: Autotools setup
run: autoreconf -i
- name: Configure
run: ./configure ${{ inputs.config-flags }}
- name: Make tarball
id: tarball
run: |
make dist-gzip
version_number=$(ls workshop-*.tar.gz)
version_number="${version_number#workshop-}"
version_number="${version_number%.tar.gz}"
echo "version_number=${version_number}" >> "$GITHUB_OUTPUT"
- name: Make ZIP
if: ${{ inputs.zip }}
run: make dist-zip
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: ${{ inputs.artifact-name }}
path: |
workshop-*.tar.gz
workshop-*.zip
retention-days: 3Note the similarity to our previous workflow, at least all the setup is the
same. What is new here is that we allow for input to the workflow and that
it also produces output and some artifacts that we can retrieve later.
Next, we create the actual release workflow that will be triggered only
if we push a tag that follows the pattern v*.*.*, .e.g v1.0.0. This
workflow will then run out make-dist workflow to build the distribution
archives and once this is done, it will extract the latest release note
from out CHANGELOG.md file using an adapted version of the
Extract Release Notes Action
to then create the github release using the
Release Action.
Create the workflow file .github/workflows/release.yml and add the
following content:
name: Version release
on:
push:
tags:
- 'v*.*.*'
jobs:
create-dist-archives:
uses: ./.github/workflows/make-dist.yml
with:
zip: true
artifact-name: 'dist-archives'
create-release:
needs: create-dist-archives
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- uses: actions/checkout@v4
- name: Extract release notes
id: extract-release-notes
uses: RaumZeit/extract-release-notes@a991ec1541871118630638fe002862a023870cff
with:
header_level: 3
version_prefix: "Version"
- name: Download source archives
uses: actions/download-artifact@v4
with:
name: dist-archives
- name: Make release
uses: ncipollo/release-action@v1
with:
artifacts: "workshop-*.tar.gz,workshop-*.zip"
body: ${{ steps.extract-release-notes.outputs.release_notes }}
name: "Sustainable Software Development Workshop ${{ needs.create-dist-archives.outputs.version_number }}"That's all we need. Add the files to be tracked by our repository, commit, and finaly push everything to github.
$ git add .github/workflows/make-dist.yml .github/workflows/release.yml
$ git commit -a -m "Add CHANGELOG.md and release workflow"
$ git push
Now, let us test the release workflow by adding a new tag v1.0.0a to create
an alpha release of our software
$ git tag -a v1.0.0a -m "This is the 1.0.0alpha release of our workshop"
$ git push --tags
Yeehaa! We made our first automated release.
We've before created Python packages for our little example software
both as a source code package, as well as a binary Python wheel.
This is all fine for our example and we could in principle add the
instructions to build the package either to our autotools toolchain
or to a github workflow. However, for packages that are not only based
on Python code, this introduces a problem in terms of portability.
Once you outsourced parts of your code to a much faster low level
programming language such as C or C++, the compiled Python wheel
depends on the actual Python version and your operating system and
architecture. To make your software available for a larger audience
of users with diverse computer architecture, you would either rely
on them to build your software themselves, or you would need to provide
wheels for a large list of architecture/operating system/Python version
combinations.
The good news is, that for the latter, there is a nice way to automate it using cibuildwheel. It integrates with workflows on various continuous integration (CI) servers, among them also Github Actions. Also, setting it up is fairly easy but since our project is a pure Python project without any platform dependency the following workflow is only for demonstration purposes to show how in principle you could automate that:
.github/workflows/python-wheels.yml:
name: Build Python Packages
on:
workflow_dispatch:
release:
types: [published]
jobs:
build_sdist:
name: Build sdist
runs-on: ubuntu-latest
outputs:
sdist_file: ${{ steps.create_sdist.outputs.file }}
steps:
- uses: actions/checkout@v4
- id: create_sdist
name: build Python source distribution
run: |
python -m pip install --upgrade build
python -m build --sdist
echo "file=$(cd dist && ls sustainable_software_workshop-*.tar.gz)" >> "$GITHUB_OUTPUT"
- name: Archive production artifacts
uses: actions/upload-artifact@v4
with:
name: sdist
path: dist/
build_wheels:
name: Build Linux wheels for ${{ matrix.pyver }} on ${{ matrix.os }}
needs: build_sdist
runs-on: ${{ matrix.os }}
strategy:
# Ensure that a wheel builder finishes even if another fails
fail-fast: false
matrix:
os : [ubuntu-latest, ubuntu-24.04-arm, windows-latest, windows-11-arm, macos-15-intel, macos-14]
pyver: [cp311, cp312, cp313]
steps:
- uses: actions/download-artifact@v4
with:
name: sdist
- name: Set sdist environment variable
run: |
echo "SDIST_FILE=${{needs.build_sdist.outputs.sdist_file}}" >> "$GITHUB_ENV"
- name: Build wheels
uses: pypa/cibuildwheel@v3.2.0
with:
package-dir: "$SDIST_FILE"
output-dir: dist
env:
CIBW_BUILD_VERBOSITY: 3
CIBW_BUILD: ${{matrix.pyver}}-*
CIBW_ENVIRONMENT: SDIST_FILE=$VRNA_SDIST_FILE
- uses: actions/upload-artifact@v4
with:
name: wheels-${{ strategy.job-index }}
path: ./dist/*.whl
compression-level: 0 # no compression
retention-days: 3Instead, to complete our workshop at this point, we simply add the
build process for our Python package to our .github/workflows/release.yml:
name: Version release
on:
push:
tags:
- 'v*.*.*'
jobs:
create-dist-archives:
uses: ./.github/workflows/make-dist.yml
with:
zip: true
artifact-name: 'dist-archives'
create-python-dist:
uses: ./.github/workflows/python-wheels.yml
with:
artifact-name: 'py-packages'
create-release:
needs: [create-dist-archives, create-python-dist]
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- uses: actions/checkout@v4
- name: Extract release notes
id: extract-release-notes
uses: RaumZeit/extract-release-notes@a991ec1541871118630638fe002862a023870cff
with:
header_level: 3
version_prefix: "Version"
- name: Download source archives
uses: actions/download-artifact@v4
with:
merge-multiple: true
- name: Make release
uses: ncipollo/release-action@v1
with:
artifacts: "workshop-*.tar.gz,workshop-*.zip,sustainable_software_workshop-*.whl,sustainable_software_workshop-*.tar.gz"
body: ${{ steps.extract-release-notes.outputs.release_notes }}
name: "Sustainable Software Development Workshop ${{ needs.create-dist-archives.outputs.version_number }}"and we change our .github/workflows/python-wheels.yml to
name: Build Python Packages
on:
workflow_dispatch:
inputs:
artifact-name:
description: 'Name of the artifact'
required: false
default: 'python-packages'
type: string
workflow_call:
inputs:
artifact-name:
description: 'Name of the artifact'
required: false
default: 'python-packages'
type: string
jobs:
build_sdist:
name: Build sdist
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- id: create_sdist
name: build Python source distribution
run: |
python -m pip install --upgrade build
python -m build
- name: Archive production artifacts
uses: actions/upload-artifact@v4
with:
name: ${{ inputs.artifact-name }}
path: dist/
compression-level: 0 # no compression
retention-days: 3We will now be able to manually run the Python build process through
the github actions web interface (workflow_dispatch) and the workflow
may be called by our release workflow as well.
Test everything now by creating a new tag that follows a version string and push it to github:
$ git tag -a v1.0.0 -m "This is version 1.0.0"
$ git push --tags
and have a look at the Actions tab in your github repository to
see how the build and release process is triggered.
This is were we will finish our workshop due to time constraints.
There would have been so many more things to tell, especially
how to use .gitignore files and additional instructions for
the automake rules for our documentation. Maybe next time.
If you are interested how most of the things I've presented
here work together in a real-life project, checkout the sources
of the ViennaRNA Package.
There we build C programs and a C library, the corresponding
Python and Perl 5 bindings, create the release, create Python
wheels, push them to PyPI,
create the documentation and push it to ReadTheDocs,
etc.
Have fun and be lazy!