Skip to content

ViennaRNA/sd-workshop-dubice2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sustainable Software Development - Workshop

One of the corner stones in bioinformatics research certainly is the implementation of analysis pipelines and algorithms. Once the software has been published and released it often remains difficult to maintain, extend, or bug fix and, finally, make any updates easily accessible to the research community. A researchers life can be made much more easy with the help of a versioning tool such as git, proper documentation, and the use of standardized and automated build and packaging tools right from the beginning of the software development process.

In this workshop I will focus on software development with C/C++ and/or Python. I will briefly introduce approaches for software documentation for both programming languages which can even be merged easily to produce appealing HTML and PDF documentations (keywords: doxygen, breathe, sphinxdoc). Moreover, I will introduce some build and packaging strategies (autotools/automake and Pythons setuptools) that help a lot to check and prepare all relevant prerequisites for your software to be used on a users computer. Bug fixing, feature additions, packaging, and releasing your software, then, can be highly automated using CI strategies. For that, I will briefly introduce Github workflows which serve as a very convenient tool set. Previous knowledge on (some of) the tools and programming languages mentioned above is helpful but not required.

Use git right from the beginning

Let's start with a new software project where we want to track the development process using the revision control system git. This will help us to easily test out new features without harming any already working code, to share the development among multiple developers, to undo changes we might have mistakenly introduced, and many more things.

As a first step, let's create a new directory that will contain all the files we want to keep track of and change into this directory:

$ mkdir my_git_project
$ cd my_git_project

Now, it is time to initialize the directory:

$ git init

Now, for the sake of documentation, let us add a readme-file to our new git repository that we track from the very beginning:

$ touch README.md
$ git add README.md
$ git commit -m "Initial commit with README.md"

We will use this file in the process of software development to write down some essential things about what the software does, how a user can install it, how it can be used, etc. Any addition to this file will be recognized by git and in most cases the changes will be part of the next git commit.

Start writing code for the software project

A clean directory layout

Before we start to actually write any code of our software we create some additional directoriesto keep our project clean. First, let us put any source code into a src directory. For the documentation we create a doc directory and finally, let us also create a test directory where we will hopefully put in some tests to check the functionality of our software:

$ mkdir src
$ mkdir doc
$ mkdir test

Hello World in C

A single function

Next, we will start with a very minimalistic program written C that consists of a single function main() that simply prints the string "Hello World" to the terminal. Note, that in C the main function is special as it is the entry point for any executable program we build with our code.

Now, open a text editor, create the hello.c file in the src/ directory with the following content:

#include <stdlib.h>
#include <stdio.h>

int
main(int argc,
     char *argv[])
{
  printf("Hello World\n");

  return EXIT_SUCCESS;
}

We can now change into the src/ directory to compile and run our simple program:

$ cd src/
$ gcc -o hello hello.c

Here, we use gcc to create (output/ -o) a binary executable with the name hello that is compiled from the source code file hello.c. Another popular C compiler is clang (used by default in MacOS X).

The one line invokation of gcc above is a short-hand of actually three processes. The compiler pre-processes the code, then compiles it, and finally links everything into the single executable file hello. Larger software projects usually require more option parameters to be passed to the preprocessor, compiler and linker, and typically split the processes into two stages, compilation and linking. Thus, the one-liner above can also be expressed by two invocations like:

$ gcc -c hello.c
$ gcc -o hello hello.o

where the hello.o file is an object file created by the compilation stage of gcc.

Running our little program should produce something like

$ ./hello
Hello World

Re-usable functions

Let us now separate the printf() code into another function that we can possibly re-use later on. At the same time we make the function more general by allowing for arbitrary strings to be printed. Just for fun we always print the current time along with the string. So the code in hello.c may now look like this:

#include <stdlib.h>
#include <stdio.h>
#include <time.h>


void
print_time_string(const char *string);


int
main(int argc,
     char *argv[])
{
  if (argc > 1)
    print_time_string(argv[1]);
  else
    print_time_string("Hello World");

  return EXIT_SUCCESS;
}


void
print_time_string(const char *string)
{
  if (string) {
    time_t result = time(NULL);
    if (result != (time_t)(-1))
      printf("%s%s\n",
             asctime(gmtime(&result)),
             string);
    else
      printf("%s\n", string);
  }
}

don't be afraid of the complicated-looking code that retrieves the current time and date. What matters here is that we created a function print_time_string() that takes as an argument a constant pointer to a sequence of characters, the string we want to print. We first declared that function just before main() such that its interface is known as soon as main() is compiled. Then we defined the actual behavior of it at the bottom of our file.

You might have also noted that the main() function changed slightly. We now check whether additional parameters are given to the program and treat the first parameter as an input string we want to print out. Thus, compiling and running our program now looks like:

$ gcc hello.c -o hello
$ ./hello
Fri Sep 26 11:13:54 2025
Hello World
$ ./hello "Greetings Earthlings!"
Fri Sep 26 11:18:14 2025
Greetings earthlings!

Objects and header files

Now, it is time to separate our little function even further by putting it into its own object file, apart from the one that contains main(). So, create a new text file time_print.c and move the print_time_string() function there (do not forget the include statements):

#include <stdio.h>
#include <time.h>


void
print_time_string(char *string)
{
  if (string) {
    time_t result = time(NULL);
    if (result != (time_t)(-1))
      printf("%s%s\n",
             asctime(gmtime(&result)),
             string);
    else
      printf("%s\n", string);
  }
}

Also, let us already move the declaration of the print_time_string() function into a header file, so that it can be included into any other piece of source code we might want to create in the future. The header file time_print,h should look like:

#ifndef   MY_TIME_PRINT_H
#define   MY_TIME_PRINT_H

void
print_time_string(char *string);

#endif

Since hello.c now has no idea how our print function looks like, we need to include the newly created header file and hello.c becomes

#include <stdlib.h>
#include <stdio.h>
#include <time_print.h>


int
main(int argc,
     char *argv[])
{
  if (argc > 1)
    print_time_string(argv[1]);
  else
    print_time_string("Hello World");

  return EXIT_SUCCESS;
}

When we compile each of the two source code files now, we produce two object files hello.o and time_print.o:

$ gcc -c time_print.c
$ gcc -I. -c hello.c
$ ls
hello*  hello.c  hello.o  time_print.c  time_print.h  time_print.o

that need to be linked together to make up our program:

$ gcc -o hello time_print.o hello.o

Note here the additional argument -I. when we compile hello.c. This instructs the pre-processor to also search for header files in the current (.) directory.

Done, now we have made our little print_time_string() function re-usable for any other C code parts or programs. They only require the inclusion of the header file time_print.h and linking against the time_print.o object.

Function documentation with doxygen

If we want to re-use our function in the future, or provide it to other users to include it into their own project, we should better document our code! This means to add meaningful descriptions about the purpose and instructions of our function, as well as a description of its arguments and return values.

For the programming languages C and C++ the tool doxygen is quite popular and very handy. It requires the documentation in form of special comments in the code and offers a variety of commands to nicely format and structure an API reference manual. To use it, let us first create a basic configuration file in our doc/ directory:

$ cd ../doc
$ doxygen -g

Now open the configuration file Doxyfile, locate the options INPUT, FILE_PATTERNS, and GENERATE_LATEX and change them as follows:

INPUT                  = ../src
FILE_PATTERNS          = *.h
GENERATE_LATEX         = NO

This will instruct doxygen to search for documentation in our src/ directory and to only consider header files with the ending .h. You may of course also other options to better suite your project, like PROJECT_NAME, PROJECT_NUMBER, etc. Have a look into the doxygen configuration docs to find out what more this tool could do for you.

Note here, that we deactivated LaTeX code generation which would enable us to produce nice PDF reference manuals. But we will come to that part later and for now simply rely on the default HTML output.

We can now run docygen to create our HTML reference manual like this:

$ doxygen Doxyfile

This will create an html directory within our doc directory that contains the entire documentations and we can already open the html/index.html file with our favorite web browser. Of cource, since we didn't add any comments so far, the reference manual is still empty.

So, let us change that by documenting our print_time_string() function. For that, we add a comment block into our header file just before the function declaration such that it now looks like:

#ifndef   MY_TIME_PRINT_H
#define   MY_TIME_PRINT_H

/**
 *  @file   time_print.h
 *  @brief  This is our example header file with documentation
 */


/**
 *  @brief    Print the current time and a string
 *
 *  On invocation, this function retrieves the current local
 *  time and prints it to @c stdout along with a user-defined
 *  string on the next line.
 *
 *  @param string  The user-defined string to print
 */
void
print_time_string(char *string);

#endif

Running doxygen again now produces the description for our print_time_string() function.

Automating everything

As you might have already guessed, invoking all the compilation, linking, and code-generation command line calls is very tedious. Instead we want to automate everything somehow. For that purpose, one typically would use the make program and a corresponding Makefile. Still, the latter requires all specific details on how to exactly build everything. But we are lazy and don't want to write up all the command line calls ourself, but rely on some further automatization. There are a few handy tools available for this task, the most common might be autoconf and automake also known as GNU autotools and cmake.

In this workshop we will focus on GNU autotools despite the critisism among developers of whether one should use it or not. I personally like it due to its great flexibility and transparency (see also the Autotools Mythbuster which is very helpful). This allows for easy adaptation of the build process to very specific tasks. But I won't debate on the choice between GNU autotools and cmake :)

In our project root directory my_git_project/ create a file configure.ac with the content:

AC_INIT([Sustainable Software], [1.0],
        [Ronny Lorenz <ronny@bioinf.uni-leipzig.de>],
        [workshop])

AM_INIT_AUTOMAKE([-Wall foreign 1.11 tar-ustar])
AC_PROG_CC
AC_PROG_INSTALL

##
## Check for the doxygen executable and prepare a Makefile.am
## condition variable to turn-on/off the build process of the
## documentation
##
AC_PATH_PROG(DOXYGEN, [doxygen], [no])
AC_SUBST([DOXYGEN])

if test x"${DOXYGEN}" = x"no" ; then
    AC_MSG_WARN([doxygen not found! You may want to install doxygen to generate the API documentation.])
fi

AM_CONDITIONAL(WITH_DOXYGEN, test "x$DOXYGEN" != "xno")

AC_CONFIG_FILES([Makefile src/Makefile doc/Makefile])
AC_OUTPUT

This is the main configuration (autoconf) of our project. First, we call AC_INIT with some meta information, such as our project name, the version number, our email contact, and the name of the projects distribution archive file we will be able to automatically create later. Then we initialize automake with some additional parameters to enable all warnings (-Wall), allow for missing files a GNU autotools project would usually require (foreign), set the minimum automake version (1.11), and enable arbitrarily long file names in the distribution archive. This is followed by two checks for the compiler and install programs.

Then comes a larger part that tries to determine the absolute path to the doxygen executable to set up the build process depending on whether or not doxygen is installed.

Finally, the AC_CONFIG_FILES macro is called with a list of files that we want to automatically generate. In our case, these are the Makefiles in our projects root, within the src/, and doc directory.

What is still missing are some instructions for automake to tell it the basic requirements to create Makefiles that do what we want. This is what Makefile.am files are for, and we will now create three of them. The first one in our project's root (i.e. my_git_project/Makefile.am) can be as simple as:

SUBDIRS = \
    src \
    doc

simply telling automake that there is nothing to do with the root directory, but that there are subdirectoris src and doc that need to be scanned further.

The second file is for our source code and executable program. Create the file src/Makefile.am with the following content:

bin_PROGRAMS = hello

hello_SOURCES = \
    hello.c \
    time_print.c \
    time_print.h

to specify that we want a binary program with the name hello which itself depends on the source code files hello.c, time_print.c, and time_print.h. That's all we need at this point.

Finally, the documentation part is more involved since we want to only create the documentation depending on whether the user (or we ourselves) have the doxygen programm installed. Here is the code that goes into doc/Makefile.am:

##--------------------------------------------------##
## Tell autoconf/automake to include the necessary  ##
## files in the distribution archive as well as in  ##
## the installation routine                         ##
##--------------------------------------------------##
html_DATA = $(REFERENCE_MANUAL_FILES_HTML)

EXTRA_DIST =  \
    Doxyfile \
    doxygen-html \
    html

##--------------------------------------------------##
## prepare variables in case HTML reference manual  ##
## is going to be installed                         ##
##--------------------------------------------------##
if WITH_DOXYGEN

REFERENCE_MANUAL_FILES_HTML = html/*

##--------------------------------------------------##
## In case the HTML manual should be created, here  ##
## is the rule how to do so                         ##
##--------------------------------------------------##
$(REFERENCE_MANUAL_FILES_HTML): doxygen-html


doxygen-html: $(pkginclude_HEADERS) Doxyfile
	@DOXYGEN@ Doxyfile >>doxygen.log 2>&1; \
  touch doxygen-html

endif WITH_DOXYGEN

Take it as granted for now, I might find the time later to describe what it actually does.

Phew, that has been a lot but we are done with this part and can enjoy the art of automated build processes working on our project. Just start with invoking the autotools toolchain by calling

$ autoreconf -i && ./configure

The first of the two calls parses all our configuration files and creates a ./configure script, a Makefile.in for each of our Makefile.am files, and several more files we do not want to spend too much time on right now. The second command runs the ./configure script that will check for all pre-requisites in our build process to create the Makefiles. If everything went well, we can now call

$ make

and everything is build automagically. You can remove all files that are build by make with the

$ make clean

or

$ make maintainer-clean

command. Beware, that the latter cleans up all files that the autotools tool chain has build automatically!

To install our program we can use

$ make install

and to distribute our code to other people, we can easily create a tar.gz archive:

$ make dist

The resulting archive workshop-1.0.tar.gz contains everything necessary such that any end-user only has to call the ./configure script and run make/make install to install our software. Note, that if you prefer to distribute your software as a ZIP file, you can also run make dist-zip.

Hello World in Python

We now turn to a much simpler, widely used programming language, that most of yo might have already worked with, Python.

Again, instead of simply creating a Python script that performs a specific tast, let us create an actual Python project that we can easily distribute. This project could either be a module, which would be a single Python file containing our code, or a Python package, which consists of several code files. We choose the latter just for fun and name our project workshop.

Before we start, we create another subdirectory where we want to store our Python code, just to keep our directory tree clean. In the following we use src/python, but the Python code directory could have been placed anywhere else:

$ mkdir -p src/python/workshop
$ touch src/python/workshop/__init__.py
$ touch src/python/workshop/hello.py

Note, that Python projects are placed in a directory with the project name, and this directory usually contains an __init__.py file which may be empty in the beginning. We also created an emtpy file hello.py where we will place our main function in. Now, open hello.py in your favorite editor and add the following content:

def greetings():
    """
    Our main function that simply prints 'Hello World!'
    """
    print("Hello World!")


# If this file is executed by the Python interpreter, the main() function
# is our entry point
if __name__ == '__main__':
    greetings()

We can now run the script by the Python interpreter like so:

$ python src/python/workshop/hello.py 
Hello World!

Alternatively, we can run our code as a module (python -m) instead:

$ PYTHONPATH=src/python/ python -m workshop.hello
Hello World!

which will essentially do the same. That's all cool, but let us rewrite the code to make it more re-usable, like we did before in the C example.

Re-usable functions

To do so, we create a new module time_print in our workshop package by simply adding the file src/python/workshop/time_print.py with the content:

import time


def print_time_string(string):
    """
    Print the current time and a string

    On invocation, this function retrieves the current local
    time and prints it to `stdout` along with a user-defined
    string on the next line.

    Parameters
    ----------
    string : str
        The user-defined string to print

    """
    if string:
        print(time.asctime(time.localtime()))
        print(string)

Note, that we already added some documentation for our function print_time_string(). We've added a docstring, a special documentation string surrounded by triple double quotes, and we adhere to the numpydoc standard to autoamtically generate a documentation for our package later.

At the same time, we change the file src/python/workshop/hello.py to

from . import time_print as tp


def greetings():
    """
    Our main function that simply prints 'Hello World!'
    """
    tp.print_time_string("Hello World!")



# If this file is executed by the Python interpreter, the main() function
# is our entry point
if __name__ == '__main__':
    greetings()

Let us run the module again to observe its new output that now also contains the current local time:

$ PYTHONPATH=src/python/ python -m workshop.hello
Sat Sep 27 11:17:32 2025
Hello World!

Python packaging

Let us now focus on how to package our Python module, such that we can easily distribute it to other users. For that, there are several build backend tools available. We won't have enough time to look into the pro's and con's of each of the different tools, but use setuptools for now. The basic configuration file(s) are standardized nowadays, so you can change the actual build tool relatively easy later on if you decide that there are better alternatives.

We start here with the main declarative configuration file in our directory root (workshop/), pyproject.toml that should have the following content:

[build-system]
requires = ["setuptools >= 77.0.3"]
build-backend = "setuptools.build_meta"


[project]
name = "sustainable_software_workshop"
version = "1.0.0"
authors = [
  { name="Ronny Lorenz", email="ronny@bioinf.uni-leipzig.de" },
]
description = "An example packge for the sustainable software development workshop"
readme = "README.md"
requires-python = ">=3.9"
classifiers = [
    "Programming Language :: Python :: 3",
    "Operating System :: OS Independent",
]
license = "MIT"
license-files = ["LICENSE.txt"]
keywords = ["time", "print", "workshop", "dubice"]


[project.urls]
Homepage = "https://github.com/ViennaRNA/sd-workshop-dubice2025"
Issues = "https://github.com/ViennaRNA/sd-workshop-dubice2025/issues"


[tool.setuptools.packages.find]
where = ["src/python"]  # ["."] by default


[project.scripts]
workshop-hello = "workshop.hello:greetings"

Note, that we specified a license file here, which didn't exist so far. I've chosen MIT license for this workshop, you might find other licenses to suite your project better. See, e.g. choosealicense.com for help in choosing the right license.

This is all we need for now to create distributable python packages in form of source packages (sustainable_software_workshop-1.0.0.tar.gz) and binary [Python wheels] (sustainable_software_workshop-1.0.0-py3-none-any.whl) that both can be easily installed via pip (or even be uploaded to PyPI. Simply run

$ python -m build

to build the packages. both of them should be now available in the newly created directory dist/.

Automatic module documentation with sphinx

Next, I will show you how to (mostly) automatically create a nice looking documentation of your project. I've chosen Sphinx here, since it seems the most popular documentation tool for Python and I actually like it also for creating documentations for my C projects, but more to that later.

To bootstrap our use of sphinx, we can use the sphinx-quickstart tool:

$ sphinx-quickstart --no-makefile doc/

Answer the questions the quickstart tool will ask you and it automatically set's up a basic structure for your documentation. I've chosen to split the build and source directories. In my opinion, this resembles a cleaner way for our directory tree structure. Here, I've added the --no-makefile option, since we already have a Makefile in our doc/ directory, which has been automatically generated by autoconf. We will later include the corresponding command line calls for sphinx-build into our autotools toolchain.

Before we build our Python documentation, we need to include a Sphinx extension named autodoc. This will allow us to parse the docstrings in our Python module to automatically create the API documentation. Additionally, let us use two more extensions, napoleon and myst_parser. The former can be used to correctly parse the numpy-style docstring we added to our print_time_string() function. The latter is for parsing Markdown files and serving them to sphinx.

To include the extensions, open doc/source/conf.py and add the following to the extensions list:

extensions = [
    'sphinx.ext.autodoc',
    'sphinx.ext.napoleon',
    'myst_parser'
]

We also add some configuration for napoleon in this file to activate numpy-style docstrings:

napoleon_google_docstring = False
napoleon_numpy_docstring = True

Now, open the doc/source/index.rst file change it to

.. Sustainable Software Development - Workshop documentation master file, created by
   sphinx-quickstart on Sat Sep 27 14:43:42 2025.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Sustainable Software Development - Workshop documentation
=========================================================

This is the documentation part for the sustainable software development
workshop held in Dubice on the occasion of the
`21st Herbstseminar 2025 <https://herbstseminar.bioinf.uni-leipzig.de>`__


.. toctree::
   :maxdepth: 2
   :caption: Contents:
   :maxdepth: 1
    
    ReadMe <readme>
    python_api

and add two more files to the doc/source/ directory, one that uses myst_parser to include our README.md file and a second where we call automodule from the autodoc extension to place the Python API description. The contents of both files are:

doc/source/readme.rst:

.. include:: ../../README.md
   :parser: myst_parser.sphinx_

and doc/source/python_api.rst:

Python API description
======================

Below, you'll find a detailed description of the *workshop.timeprint* API

.. automodule:: workshop.time_print
   :members:
   :undoc-members:

We can now finally build the documentation by changing into our doc/ directory and calling sphinx-build as follows:

$ sphinx-build -M html source/ build/

This will create a directory build/html/ what contains the HTML documentation. Open the index.html file to have a look what we've included so far.

Running sphinx with automake

We now integrate the sphinx-build process for the Python documentation with autotools. This can be achieved in the same way we added the doxygen documentation support. First, let us extend our configure.ac script with the following code:

AC_PATH_PROG(SPHINXBUILD, [sphinx-build], [no])
AC_SUBST([SPHINXBUILD])

if test x"${SPHINXBUILD}" = x"no" ; then
    AC_MSG_WARN([sphinx-build not found! You may want to install sphinx to generate the Python API documentation.])
fi

AM_CONDITIONAL(WITH_SPHINXBUILD, test "x$SPHINXBUILD" != "xno")

that we place directly after the configuration parts for doxygen and before AC_CONFIG_FILES. What is left is to change doc/Makefile.am, where we first add the sphinx sources and sphinx-generated files we want to create and distribute:

html_DATA = \
    $(REFERENCE_MANUAL_FILES_HTML) \
    $(PYTHON_REFERENCE_MANUAL_FILES_HTML)


EXTRA_DIST =  \
    Doxyfile \
    doxygen-html \
    sphinxbuild-html \
    html \
    source \
    build

and then put the conditional instructions how to build the documentation with sphinx at the end of the file:

if WITH_SPHINXBUILD

PYTHON_REFERENCE_MANUAL_FILES_HTML = build/html/*

##--------------------------------------------------##
## In case the HTML manual should be created, here  ##
## is the rule how to do so                         ##
##--------------------------------------------------##
$(PYTHON_REFERENCE_MANUAL_FILES_HTML): sphinxbuild-html


sphinxbuild-html:
	(rm -f sphinxbuild-html && \
    @SPHINXBUILD@ -M html source build >sphinx.log 2>&1 && \
    touch sphinxbuild-html) || cat sphinx.log

endif WITH_SPHINXBUILD

From now on, the Python documentation will be included in our make process and will also be part of the distribution archive. What is still missing, though, is to include the src/python subdirectory into the distribution archive, so let us quickly add the corresponding instruction to src/Makefile.am:

Bridging the gap between doxygen and spbinx

At this time, we now have two documentations, one for the C code and another for our Python package, both build by separate tools. Moreover, sphinx is way more flexilble in terms of documentation and adapting the final output to whatever requirements we have. The doxygen output is much more static and doesn't allow for too much interference in the HTML layout. Luckily, there exists a bridge between the both of them: breathe.

It serves an extension to sphinx that is able to parse XML output generated by doxygen. It then adds special directives that we can use in the ReStructuredText (.rst) files we use for sphinx. Below, I will show how we can get this bridge to work and in the end have only a single documentation build by sphinx.

What we need to do now, is to change the output of doxygen from HTML to XML. We do that by adapting the doc/Doxygen configuration file as follows:

GENERATE_HTML          = NO

...

GENERATE_XML           = YES

and by changing our doc/Makefile.am to

##--------------------------------------------------##
## Tell autoconf/automake to include the necessary  ##
## files in the distribution archive as well as in  ##
## the installation routine                         ##
##--------------------------------------------------##
html_DATA = 


EXTRA_DIST =  \
    Doxyfile \
    doxygen-xml \
    xml \
    sphinxbuild-html \
    ext \
    source \
    build


##--------------------------------------------------##
## add directive to build doxygen XML               ##
##--------------------------------------------------##
if WITH_DOXYGEN

doxygen-xml: $(pkginclude_HEADERS) Doxyfile
	@DOXYGEN@ Doxyfile >>doxygen.log 2>&1; \
  touch doxygen-xml

endif WITH_DOXYGEN

Whenever doxygen will be invoked now, it will store XML output to the doc/xml directory. We will add the correct html_DATA files again later as soon as we correctly set up the breathe extension, since we will only rely on HTML generated by sphinx from now on.

Preparation to use the breathe extension

If you are working on a Python-only project, you are probably installing any Python dependencies via pip and, therefore you would only add breathe to your dependency list. Here, however, we download its sources directly from the official breathe github repository. In particular, we will clone the latest tag (release) into a doc/ext/ subdirectory to keep our directory tree clean

$ git clone --depth 1 --branch v5.0.0a5 https://github.com/breathe-doc/breathe.git doc/ext/breathe

and we remove all non-essential files of the cloned repository such that it doesn't interfere with our own repository.

$ rm -rf doc/ext/breathe/.git*

Note, that this is not the best choice to include foreign git repositories into your own. You might want to consider git submodules or even better git subtree instead! But we don't have enough time to go into details for that within the time frame of our workshop.

Now, let us add the breathe source to the files we track with git:

$ git add doc/ext/breathe

Also note, that since we directly cloned the source of breathe we first need to set it up to get it to work. In particular, we need to create the parser script breathe/._parser:

$ cd doc/ext/breathe/ && make parser && cd ../../../

that we also need to add to the tracked files:

$ git add doc/ext/breathe/breathe/_parser.py

The last steps should not be necessary if you installed breathe via pip or any other package manager of your operating system.

The next step is to make sphinx aware of our extension by adding the respective entries to doc/source/conf.py, i.e. extending the path to the breathe sources and including the extension in the extensions list

sys.path.insert(0, os.path.abspath('../ext/breathe'))

...

extensions = [
  'sphinx.ext.autodoc',
  'sphinx.ext.napoleon',
  'myst_parser',
  'breathe'
]

Next, we need to tell breathe where to find the XML output of doxygen by setting the breathe_projects dictionary and the breathe_default_project variable in doc/source/conf.py:

breathe_projects = {"workshop": "../xml"}
breathe_default_project = "workshop"

Now we are all set to include our doxygen documentation into the sphinx ReStructuredText files.

Adding the doxygen docuemntation to sphinx

Let us now create a new page api in our sphinx documentation to list the C API by calling the appropriate breathe directives. To keep our example simple, we only use the doxygenfunction directive to directly address the one function we documented in our C project. In reality, you would more likely use the doxygen grouping feature to group functions, definitions and alike in your C API and then use the doxygengroup directive to automatically insert all member functions and symbols of the respective group.

Open the doc/source/index.rst file and change the table of contents (toctree) to

.. toctree::
   :caption: Contents:
   :maxdepth: 1

   ReadMe <readme>
   api
   python_api

Next, create a new file doc/source/api.rst with the content

C API description
=================

Below, you'll find a detailed description of the *workshop* `C` API

.. doxygenfunction:: print_time_string

Now, we will change doc/Makefile.am again to tell automake that our sphinx-build requires the doxygen XML output to run properly. In addition, we will add the sphinx ouptut as default html_DATA, so our doc/Makefile.am should look like the following now:

##--------------------------------------------------##
## Tell autoconf/automake to include the necessary  ##
## files in the distribution archive as well as in  ##
## the installation routine                         ##
##--------------------------------------------------##
html_DATA = $(REFERENCE_MANUAL_FILES_HTML)


EXTRA_DIST =  \
    Doxyfile \
    doxygen-xml \
    xml \
    sphinxbuild-html \
    ext \
    source \
    build


##--------------------------------------------------##
## add directive to build doxygen XML               ##
##--------------------------------------------------##
if WITH_DOXYGEN

doxygen-xml: $(pkginclude_HEADERS) Doxyfile
	@DOXYGEN@ Doxyfile >>doxygen.log 2>&1; \
  touch doxygen-xml

endif WITH_DOXYGEN


if WITH_SPHINXBUILD

REFERENCE_MANUAL_FILES_HTML = build/html/*

##--------------------------------------------------##
## In case the HTML manual should be created, here  ##
## is the rule how to do so                         ##
##--------------------------------------------------##
$(REFERENCE_MANUAL_FILES_HTML): sphinxbuild-html


sphinxbuild-html: doxygen-xml
	(rm -f sphinxbuild-html && \
    @SPHINXBUILD@ -M html source build >sphinx.log 2>&1 && \
    touch sphinxbuild-html) || cat sphinx.log

endif WITH_SPHINXBUILD

Done! Now remove the files doc/sphinxbuild-html and doc/doxygen-xml if they exist, run make to build our merged documentation:

$ rm -f doc/sphinxbuild-html doc/doxygen-xml
$ make

Open doc/build/html/index.html with your favorite web browser and sit back and enjoy! The documentation now has an additional page C API description that displays the documentation for our print_time_string() C function.

It is time to remvoe all the doxygen remnants we created earlier to still keep our directory clean:

$ rm -rf doc/html dox/doxygen-html

Using github workflows

As soon as you want to make your software public, you may consider using an accessible git repository, e.g. served at github or gitlab. Both of them offer a range of automated workflows that can make the life of a software developer much easier. Here, I will focus on using github and its github actions and workflows. In essence, workflows contain actions that are triggered upon certain interactions with the repository, e.g. whenever you push a new commit to (a specific branch) or you add a certain tag like a version release number. The actions then can do many things, such as automatically build and test your software or create a distribution archive.

In the following, we assume that you already created a repository at github and that you've added it as a remote origin to your local repository. Instructions how to do that will be given by github once you created your project.

Build our software with a github workflow

Let us begin with a fairly simple workflow that runs our entire autotools tool chain and builds our software and documentation as soon as we push to our repository.

For that, we create a .github/workflows/ subdirectory

$ mkdir -p .github/workflows

with the file build-project.yml that contains

name: Run autotools toolchain
run-name: ${{ github.actor }} is running the autotools toolchain
on: [push]

jobs:
  build_software:
    runs-on: ubuntu-latest

    steps:
    - name: Checkout
      uses: actions/checkout@v4
    - name: Install prerequisites
      run:  |
        sudo apt-get update
        sudo apt-get -y install \
          build-essential \
          autoconf \
          automake \
          doxygen \
          python3-sphinx \
          python3-myst-parser
    - name: Autotools setup
      run:  autoreconf -i
    - name: Configure
      run: ./configure
    - name: Build
      run:  make

Add this file to the ones we track in our repository and push everything to you github repo:

$ git add .github/workflows/build-project.yml
$ git commit -a -m "Start using github workflows"
% git push

Now, go to your github repository and open the Actions tab. It will now show our workflow running and you can inspect the results.

Creating a release workflow

Github offers a Releases page where official releases of your software are available. Releases themselves can be manually created through the github web interface. However, it is much more convenient to create a release automatically as soon as your software reached a certain point. Such an automation is relatively simple, all you need to do is to git tag a certain commit as a release, e.g. by adding a tag that consists of a version number, and adding a corresponding workflow that automates the creation of the relesae in your github repo.

Let us begin with some prerequisites first, though!

A release is usually accompanied by a release note that tells your users what this new release acutally encompasses. If we want to automatically add a release note to the Releases page on github, we have to store that note somewhere in our repository. The most convenient way to do so is to add a CHANGELOG.md file to our repository that will list all changes of the software from one release to the next. So, let us create one, add it to the autotools toolchain such that it will be among the distribution tar-balls and fill in our first release note.

Create a new file CHANGELOG.md with the content:

# Changelog

Below, you'll find a list of notable changes for each version of the
Sustainable Software Development *workshop*

## Version 1.0.x


### [Unreleased](https://github.com/ViennaRNA/sd-workshop-dubice2025/compare/v1.0.0...HEAD)


### [Version 1.0.0](https://github.com/ViennaRNA/sd-workshop-dubice2025/compare/bfb904c...v1.0.0)

#### Software
  * Add the `hello` program
  * Add a python version of the `hello` program


#### Documentation
  * Add documentation for both APIs, `C` and `Python` that is build with `doxygen`, `breathe`, and `sphinx`

and add the following to Makefile.am:

EXTRA_DIST = \
    CHANGELOG.md

Now add the CHANGELOG.md to the files we track with git

$ git add CHANGELOG.md

Everytime we make a new release of our software, we will now add a new entry to our CHANGELOG.md file and list all the changes that make up the new release!

Next, let us create the workflow that builds a distribution archive and then automatically creates a new release at our github repository. First, we create a reusable workflow that is dedicated to build the distribution archive. Create a new file .github/workflows/make-dist.yml with the content

name: Make distribution archives

on:
  workflow_dispatch:
    inputs:
      config-flags:
        description: 'Configure flags to prepare the source directory'
        default: ''
        required: false
        type: string
      zip:
        description: 'Additionally create ZIP archive next to default GZIP'
        required: false
        default: false
        type: boolean
      artifact-name:
        description: 'Name of the artifact'
        required: false
        default: 'distribution-archives'
        type: string
  workflow_call:
    inputs:
      config-flags:
        description: 'Configure flags to prepare the source directory'
        default: ''
        required: false
        type: string
      zip:
        description: 'Additionally create ZIP archive next to default GZIP'
        required: false
        default: false
        type: boolean
      artifact-name:
        description: 'Name of the artifact'
        required: false
        default: 'distribution-archives'
        type: string
    outputs:
      version_number:
        description: "The Version number of this build"
        value: ${{ jobs.make_dist.outputs.version_number }}

jobs:
  make_dist:
    runs-on: ubuntu-latest
    # Map the job outputs to step outputs
    outputs:
      version_number: ${{ steps.tarball.outputs.version_number }}

    steps:
    - name: Checkout
      uses: actions/checkout@v4
    - name: Install prerequisites
      run:  |
        sudo apt-get update
        sudo apt-get -y install \
          build-essential \
          autoconf \
          automake \
          doxygen \
          python3-sphinx \
          python3-myst-parser
    - name: Autotools setup
      run:  autoreconf -i
    - name: Configure
      run: ./configure ${{ inputs.config-flags }}
    - name: Make tarball
      id: tarball
      run:  |
        make dist-gzip
        version_number=$(ls workshop-*.tar.gz)
        version_number="${version_number#workshop-}"
        version_number="${version_number%.tar.gz}"
        echo "version_number=${version_number}" >> "$GITHUB_OUTPUT"
    - name: Make ZIP
      if: ${{ inputs.zip }}
      run:  make dist-zip
    - name: Upload artifacts
      uses: actions/upload-artifact@v4
      with:
        name: ${{ inputs.artifact-name }}
        path: |
          workshop-*.tar.gz
          workshop-*.zip
        retention-days: 3

Note the similarity to our previous workflow, at least all the setup is the same. What is new here is that we allow for input to the workflow and that it also produces output and some artifacts that we can retrieve later.

Next, we create the actual release workflow that will be triggered only if we push a tag that follows the pattern v*.*.*, .e.g v1.0.0. This workflow will then run out make-dist workflow to build the distribution archives and once this is done, it will extract the latest release note from out CHANGELOG.md file using an adapted version of the Extract Release Notes Action to then create the github release using the Release Action. Create the workflow file .github/workflows/release.yml and add the following content:

name: Version release

on:
  push:
    tags:
      - 'v*.*.*'

jobs:
  create-dist-archives:
    uses: ./.github/workflows/make-dist.yml
    with:
      zip: true
      artifact-name: 'dist-archives'

  create-release:
    needs: create-dist-archives
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
    - uses: actions/checkout@v4
    - name: Extract release notes
      id: extract-release-notes
      uses: RaumZeit/extract-release-notes@a991ec1541871118630638fe002862a023870cff
      with:
        header_level: 3
        version_prefix: "Version"
    - name: Download source archives
      uses: actions/download-artifact@v4
      with:
        name: dist-archives
    - name: Make release
      uses: ncipollo/release-action@v1
      with:
        artifacts: "workshop-*.tar.gz,workshop-*.zip"
        body: ${{ steps.extract-release-notes.outputs.release_notes }}
        name: "Sustainable Software Development Workshop ${{ needs.create-dist-archives.outputs.version_number }}"

That's all we need. Add the files to be tracked by our repository, commit, and finaly push everything to github.

$ git add .github/workflows/make-dist.yml .github/workflows/release.yml
$ git commit -a -m "Add CHANGELOG.md and release workflow"
$ git push

Now, let us test the release workflow by adding a new tag v1.0.0a to create an alpha release of our software

$ git tag -a v1.0.0a -m "This is the 1.0.0alpha release of our workshop"
$ git push --tags

Yeehaa! We made our first automated release.

Automating the build process for Python wheels

We've before created Python packages for our little example software both as a source code package, as well as a binary Python wheel. This is all fine for our example and we could in principle add the instructions to build the package either to our autotools toolchain or to a github workflow. However, for packages that are not only based on Python code, this introduces a problem in terms of portability. Once you outsourced parts of your code to a much faster low level programming language such as C or C++, the compiled Python wheel depends on the actual Python version and your operating system and architecture. To make your software available for a larger audience of users with diverse computer architecture, you would either rely on them to build your software themselves, or you would need to provide wheels for a large list of architecture/operating system/Python version combinations.

The good news is, that for the latter, there is a nice way to automate it using cibuildwheel. It integrates with workflows on various continuous integration (CI) servers, among them also Github Actions. Also, setting it up is fairly easy but since our project is a pure Python project without any platform dependency the following workflow is only for demonstration purposes to show how in principle you could automate that:

.github/workflows/python-wheels.yml:

name: Build Python Packages

on:
  workflow_dispatch:
  release:
    types: [published]

jobs:
  build_sdist:
    name: Build sdist
    runs-on: ubuntu-latest

    outputs:
      sdist_file: ${{ steps.create_sdist.outputs.file }}

    steps:
      - uses: actions/checkout@v4
      - id: create_sdist
        name: build Python source distribution
        run:  |
          python -m pip install --upgrade build
          python -m build --sdist
          echo "file=$(cd dist && ls sustainable_software_workshop-*.tar.gz)" >> "$GITHUB_OUTPUT"
      - name: Archive production artifacts
        uses: actions/upload-artifact@v4
        with:
          name: sdist
          path: dist/

  build_wheels:
    name: Build Linux wheels for ${{ matrix.pyver }} on ${{ matrix.os }}
    needs: build_sdist
    runs-on: ${{ matrix.os }}

    strategy:
      # Ensure that a wheel builder finishes even if another fails
      fail-fast: false
      matrix:
        os : [ubuntu-latest, ubuntu-24.04-arm, windows-latest, windows-11-arm, macos-15-intel, macos-14]
        pyver: [cp311, cp312, cp313]

    steps:
      - uses: actions/download-artifact@v4
        with:
          name: sdist
      - name: Set sdist environment variable
        run:  |
          echo "SDIST_FILE=${{needs.build_sdist.outputs.sdist_file}}" >> "$GITHUB_ENV"
      - name: Build wheels
        uses: pypa/cibuildwheel@v3.2.0
        with:
          package-dir: "$SDIST_FILE"
          output-dir: dist
        env:
          CIBW_BUILD_VERBOSITY: 3
          CIBW_BUILD: ${{matrix.pyver}}-*
          CIBW_ENVIRONMENT: SDIST_FILE=$VRNA_SDIST_FILE
      - uses: actions/upload-artifact@v4
        with:
          name: wheels-${{ strategy.job-index }}
          path: ./dist/*.whl
          compression-level: 0 # no compression
          retention-days: 3

Instead, to complete our workshop at this point, we simply add the build process for our Python package to our .github/workflows/release.yml:

name: Version release

on:
  push:
    tags:
      - 'v*.*.*'

jobs:
  create-dist-archives:
    uses: ./.github/workflows/make-dist.yml
    with:
      zip: true
      artifact-name: 'dist-archives'

  create-python-dist:
    uses: ./.github/workflows/python-wheels.yml
    with:
      artifact-name: 'py-packages'
      
  create-release:
    needs: [create-dist-archives, create-python-dist]
    runs-on: ubuntu-latest
    permissions:
      contents: write
    steps:
    - uses: actions/checkout@v4
    - name: Extract release notes
      id: extract-release-notes
      uses: RaumZeit/extract-release-notes@a991ec1541871118630638fe002862a023870cff
      with:
        header_level: 3
        version_prefix: "Version"
    - name: Download source archives
      uses: actions/download-artifact@v4
      with:
        merge-multiple: true
    - name: Make release
      uses: ncipollo/release-action@v1
      with:
        artifacts: "workshop-*.tar.gz,workshop-*.zip,sustainable_software_workshop-*.whl,sustainable_software_workshop-*.tar.gz"
        body: ${{ steps.extract-release-notes.outputs.release_notes }}
        name: "Sustainable Software Development Workshop ${{ needs.create-dist-archives.outputs.version_number }}"

and we change our .github/workflows/python-wheels.yml to

name: Build Python Packages

on:
  workflow_dispatch:
    inputs:
      artifact-name:
        description: 'Name of the artifact'
        required: false
        default: 'python-packages'
        type: string
  workflow_call:
    inputs:
      artifact-name:
        description: 'Name of the artifact'
        required: false
        default: 'python-packages'
        type: string

jobs:
  build_sdist:
    name: Build sdist
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4
      - id: create_sdist
        name: build Python source distribution
        run:  |
          python -m pip install --upgrade build
          python -m build
      - name: Archive production artifacts
        uses: actions/upload-artifact@v4
        with:
          name: ${{ inputs.artifact-name }}
          path: dist/
          compression-level: 0 # no compression
          retention-days: 3

We will now be able to manually run the Python build process through the github actions web interface (workflow_dispatch) and the workflow may be called by our release workflow as well.

Test everything now by creating a new tag that follows a version string and push it to github:

$ git tag -a v1.0.0 -m "This is version 1.0.0"
$ git push --tags

and have a look at the Actions tab in your github repository to see how the build and release process is triggered.

The end

This is were we will finish our workshop due to time constraints. There would have been so many more things to tell, especially how to use .gitignore files and additional instructions for the automake rules for our documentation. Maybe next time.

If you are interested how most of the things I've presented here work together in a real-life project, checkout the sources of the ViennaRNA Package. There we build C programs and a C library, the corresponding Python and Perl 5 bindings, create the release, create Python wheels, push them to PyPI, create the documentation and push it to ReadTheDocs, etc.

Have fun and be lazy!