This template contains some settings and examples in Python, Java, and Scala.
To create a new job, use the provided createTool.sh, which is a script that create an empty job
from a suitable template located in the folder templates. The script is called as follows:
python ./createTool.sh <language> <toolname>
, where
- is scala, java or python, and
- is the name of the tool (by convention starting with an upper case).
If you have Java or Scala jobs, you can compile them with the command as follows
mvn package
Before you run a job on the cluster, please test the code locally. To do this, you have to define so-called test cases. All tests assume that you have the environment variable SPARK_HOME pointing to the top directory of your spark installation, which can be downloaded here.
If you created a tool with createTool.sh, your test cases should be stored in the file
src/test/scala/nl/utwente/bigdat/<ToolName>Test.scala. In the file there is a commented-out
test case.
Test cases for python jobs are stored in the same directory as the
job itself: src/main/python. The tests require the package
pytest, which can be installed with the command
pip install pytest
. To execute the tests, you first have to include the python adaptor for spark and the library py4j into your python path:
export PYTHONPATH=$PYTHONPATH:$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-*-src.zip
And to actually execute the tests:
py.test src/main/python/<ToolName>Test.py
The documentation of how to access the cluster can be found here.
The documentation of available datasets on the cluster can be found here.
To create a self-contained html version of the documentation use the following commands (requires pandocs)
pandoc -T "Cluster Access" --toc -f markdown_github access.md -N -t html -o access.html
encodeImages.py access.html