[note by JA: the texcollab script was written in 2015 by graduate student
Barry Moore II (user barrymoo in the examples). I created this fork after Barry graduated. Much of the text that follows is Barry's, but I made some changes over the years and applied
updates and fixes to the code.]
texcollab is a shell script that wraps around git and rsync and is
primarily intended for collaborative work on complex LaTeX documents. The original
developer referred to this as his
"Advisor-Student/s Mergeless" model.
texcollab is intended to be used in a bash terminal session on a UNIX-type system (Linux, the Linux subsystem
in Windows, or the command line interface on a Mac). Prerequisites are git,
openssh, rsync, and to work on the resulting documents you'll obviously also need
a working LaTeX installation (we use texlive). For the comparison of tex files modified by different users, the meld tool is highly recommended (http://meldmerge.org/) and assumed to be available on the machine where you run texcollab. There is also 'Beyond Compare' (https://www.scootersoftware.com/). Finally, the people collaborating on a project need to have
accounts on a machine that is remotely accessible by ssh, and the accounts should belong to the same user group. In our case, the repositories are typically stored at the local computing
center or on a shared computer in the laboratory.
Because texcollab wraps git (and rsync) commands, most of the
output you see is from git. This is sometimes useful, especially when there are errors.
At some point, instead of just emailing each other updates of tex files and figures
for a joint manuscript, we started to use git for version control of LaTeX projects
and also as a way to preserve its history. Problems
became immediately obvious.
An automated git merge works fine for well-structured source code, but LaTeX is not quite like
source code. For example, different people use editors with different indentation preferences or end-of-line characters, which for us occasionally caused an automated git merge to produce an incomprehensible mess.
Also, a research manuscript tends to come with a large set of figures, usually in some
binary format or with embedded figures in a compressed bitmap format, or with accompanying
Office documents and such. Those files are rarely suitable for version control, and if a set
of PDF figures, say, gets recreated repeatedly, the git revisions tree would quickly
grow very large (preserving each indivdual version of each file) with no apparent benefit to the workflow.
Therefore, we came up with a workflow in which files are exchanged via a remote repository, accessible
by all participants in the project, such that some files are under git version control while
other types of files (binary files, in particular) are exchanged via the repository using rsync. The texcollab script combines everything in a single command-line interface.
texcollab also hides much of the git and rsync syntax and can therefore be used by someone who
is unfamiliar with either. However, it is helpful to have a general idea of how version control works, and
we use the related terms 'commit', 'pull', 'push', 'branch', etc. texcollab makes to particular use of the distributed development framework
offered by git; rather, we use it in an old-fashioned 'hub and spokes' model that someone (LT) indirectly
referred to as 'ugly and stupid'.
texcollab is not foolproof, of course, and sometimes it happens
that someone needs to fix things manually with git.
The model consists of the
branch main used by the senior author and a branch for each co-author ('student'). The examples that follow
use the branches main for the advisor and barrymoo for a student or postdoc co-author. The student is
not supposed to commit to main and the advisor is not supposed to commit
to the student branch. Neither the student nor the advisor will ever use an automated git merge.
texcollab makes a distinction between version-controlled files (primarily tex files) vs. other data that should not be under version control. texcollab provides commands to list version-controlled files that are different in the two branches, and an option to compare those files with the meld tool (or similar software) and merge the differences in meld. This way, there is always someone looking at the pieces that will get merged, and the aforementioned 'incomprehensible mess' is typically avoided. meld also has an option to merge all changes from a file in one branch to the same file
in another branch, which is sometimes useful when there are a lot of edits by someone who can be trusted.
An occasional 'manual' intervention with git may become necessary if there are many complex updates
in a repository, for example, when there are repeated file name changes followed by commits in one branch vs. another. In this case, an experienced git user may have to work on this. Worst case: re-create a new repository from the
files in one of the branches. However,
this should not be necessary.
├── .texcollab
├── citations.bib
├── data/
├── esub/
├── figures/
├── main.tex
├── plots/
├── schemes/
├── share/
├── spreadsheets/
└── supporting-information.tex
Files/Directories that are tracked:
citations.bib: The citations in bibtex format (not necessary, we use an internal citation git repository), name can be changedmain.tex: The main tex files, name can be changedplots/: A directory where users can create and modify plotting files. Read note below!supporting-information.tex: The supporting information (not necessary), do not change name (fortexcollab compileto work)
Files/Directories that are NOT tracked (set via .gitignore):
.texcollab: The texcollab configuration filedata/: Contains raw output files from programsesub/: We use scripts which generate electronic submissions to online journals in this directoryfigures/: Contains image files used for publicationschemes/: See section 2.5.5 of ACS Author Guide, we choose to separate these images fromfigures/but you can choose what's best for you.share/: Contains binary files generated from specific programs, for example ChemDraw or MarvinSketch or an orbital plotting programspreadsheets/: Contains spreadsheet files, for example from Excel or Gnumeric
Important notes about using plots/. Please be very careful with this
directory! texcollab status is your friend. We expect people to have the
following types of files in this directory:
*.dat: small parsed data files, likely generated from thedata/directory, to generate images.*.plt: gnuplot files to generate*.{eps,pdf}*.py: python scripts to generate*.{eps,pdf}*.tex: panels to combine*.{eps,pdf}files or bitmaps*.{png,jpg}inshare/into other{pdf}files via single-page LaTeX documents, to be included in the main file via\includegraphics. Typically,pdffigure files generated inplots/get symlinked tofigures/
Some users like to use Inkscape to combine images, but *.svg files are ignored by default and should be in 'extras'. Put
the *.svg files into share/ if you want to include them in the repository.
We assume you have ssh access to a private server to store the git remote
repository. Additionally, you should set up password-less login to that server.
Finally, we suggest to use .ssh/config to ease the process. Googling "ssh
config" yields Simplify Your Life With an SSH Config
File.
The link should be enough to get you started.
It is easy. We assume that the repository is created by one of the co-authors of the manuscript (student).
First, you need to set up some configuration variables inside .texcollab (see
the one in this repo as an example). We use this tool for publications
which means a public github isn't a great idea, although a private repo with shared access may work. We have storage on a
remote machine with ssh access. The domain is in TEXCOLLAB_REMOTE_DOMAIN
(could be example.somewhere.com, or a shortcut in .ssh/config). On the
remote machine exists a directory where "in-progress" publications are stored,
something like /$SOME_PATH/shared/latex/barrymoo/$PROJECT_NAME, which is set
as TEXCOLLAB_REMOTE_DIR (obviously all but the $PROJECT_NAME should already
exist on the remote machine). The project name should be unique and will be
stored on the remote machine as $PROJECT_NAME.git (a standard convention for
git remote repos). Next, you should set your advisor and your user name for
TEXCOLLAB_ADVISOR/TEXCOLLAB_STUDENT, respectively. The
TEXCOLLAB_CURRENT_USER can be set to $TEXCOLLAB_ADVISOR or
$TEXCOLLAB_STUDENT (note the $) and the TEXCOLLAB_EDITOR is used for the
view command.
Now you're ready to initialize the directory! Place your *.tex file, *.bib
files (if necessary), and extras (figures, spreadsheets, share, data,
or schemes) into the corresponding directories (Always use
supporting-information.tex for supporting information). Note: the extras
directories are used to keep backups of various things for the publication
(required for most funding agencies now) and these are ignored by texcollab
because they tend to be binary files. Finally, run texcollab init, if you
have extras run texcollab extras push (additionally), and let the git magic
ensue :) Note, that texcollab compile exists and you should probably make
sure it compiles properly before sending it to your advisor (they hate when it
doesn't compile!).
Note: The texcollab script uses the git option --initial-branch to set the advisor
branch to main. Per this web page, this is an option available in git as of version 2.28.0. If your system uses an older version of git, please modify the texcollab script
to initialize the repo with the alternative commands provided on that web page, or replace
main with the default master and do not use the --initial-branch option.
This is also easy!
Remember, first, that the advisor always uses the main branch. The
student will send him/her the .texcollab file. The advisor will first create an empty
directory where they want to create the local copy and run texcollab clone $TEXCOLLAB_REMOTE_DOMAIN:$TEXCOLLAB_REMOTE_DIR (note lack of .git ending), replacing
the variable names with the relevant strings from the .texcollab file. The clone
command must be ran inside that new empty directory.
Once the clone is complete, the advisor moves the .texcollab file to the
new local git repository and modify TEXCOLLAB_CURRENT_USER to
$TEXCOLLAB_ADVISOR, and modify other environment settings as they choose. If there are
'extras', run texcollab extras pull.
Finally, the advisor needs to run texcollab branch $TEXCOLLAB_STUDENT (again, replace the TEXCOLLAB_STUDENT variable with the student's username manually) to make the student branch
visible (git doesn't do this automatically), then switch back to the main branch
with texcollab branch main and compile the tex files.
Again, this is easy.
Both the student and advisor can make changes in their branches as they see fit, commit, push,
pull, extras push/pull, etc. When the advisor, or student, are ready to
merge changes, they run texcollab compare <other branch> main.tex ( compares main.tex of
the branch you are working on with main.tex in the other branch (student branch barrymoo, for example). This will open up the
TEXCOLLAB_MERGE_TOOL (meld or alternatives) and
then one can pick and choose the changes. Finally, commit and then push
and the other collaborator is ready to pull. We also added a texcollab log
functionality, which means you can use the config key to compare with previous
commits too (see -h/--help).
Before committing any changes with texcollab commit, always run texcollab status to see which files
are known to git and have changed, or which files are not yet known to git but not ignored via the settings in .gitignore. New/unknown files will all be committed to the revisions tree when you run texcollab commit. This includes, unfortunately, any temporary files created while a file is open in an editor, or
Word files and such that are not placed in one of the 'extras' directories (see above). Therefore,
check the output of texcollab status carefully before you commit.
WARNING: The texcollab extras push/pull commands use rsync, as mentioned, and the options are set such that
files will be deleted or replaced without any warnings. In our experience, therefore, it is safest
if only one of the authors pushes and updates extras, and the other authors only pull extras. Of course,
there may be exceptions to this rule, but they should be communicated among the authors prior to
changing, temporarily or permanently, who pulls and who pushed extras.
Pass the collaborator the .texcollab, have them change the
TEXCOLLAB_STUDENT variable, clone the repository as the advisor does, and
texcollab add-collab $TEXCOLLAB_STUDENT (the $TEXCOLLAB_STUDENT is
obviously the new collaborator). The collaborator can now edit/commit/push/pull
as normal now. AND, more importantly (arguably), the advisor can pull and
texcollab branch $NEW_STUDENT_BRANCH to merge changes from the new
collaborator. Presumably, after something like this happens, other co-authors will
be notified that they need to merge changes from the advisor's branch.
In the -h/--help string, you may see that view and compare work with
revisions, but what is a revision? A revision is basically a previous commit's
version of the file of interest. The best way to get this information is to run
texcollab log, for example:
commit 45f158b34cef9141aaeacebc09f37ff800071132
Author: Jon Doe
Date: Tue Jun 2 11:54:07 2015
Initial Commit
The revision is: 45f158b34cef9141aaeacebc09f37ff800071132, copy and paste the
string to compare and view to use the revision.
This is a work in progress, but our group is publishing in a variety of scientific journals using this tool. We have even used it for a book (https://ja01.chem.buffalo.edu/in-focus-mo-ebook/in-focus-mo-ebook.html). Email us if you have suggestions or come across a bug.
- Important Design Principle: Nothing that isn't tracked by
texcollabshould be edited inside atexcollabdirectory. What does that mean? Your content inextrasshould be edited somewhere else. For example, if you have the following directories:~/projects/$PROJECT_NAME~/publications/$PROJECT_NAME
extrasinprojectsand copy topublications. As mentioned, files inextrasare at risk of being replaced or deleted. Keeping separate copies of the files inextraswill help to mitigate this risk. - ALWAYS, ALWAYS, ALWAYS run
texcollab statusbeforecommit/push, this prevents you from committing, potentially huge, files you did not intend to. Such files should either be listed in.gitignoreor stored in one of theextrasdirectories. - Autocompletion is available for some of the commands, e.g., after
texcollab compareautocompletion is offered for the branch names in the repository. To enable this functionality, source the filetexcollab-autocomplete.shfrom this repository in your.bashrc.
This section exists in case you don't have sudo access on the remote server.
If the standard .texcollab template doesn't work for you, you may consider adding these
settings:
TEXCOLLAB_REMOTE_RSYNC: Allows you to use non-standard remote rsyncTEXCOLLAB_GROUP: Allows you to change group ownership on remote