Template Spark Project for the CTIT Cluster

This template contains some settings and examples in Python, Java, and Scala.

Creating a new Job

To create a new job, use the provided createTool.sh, which is a script that create an empty job from a suitable template located in the folder templates. The script is called as follows:

python ./createTool.sh <language> <toolname>

, where

is scala, java or python, and
is the name of the tool (by convention starting with an upper case).

Compilation

If you have Java or Scala jobs, you can compile them with the command as follows

mvn package

Testing

Before you run a job on the cluster, please test the code locally. To do this, you have to define so-called test cases. All tests assume that you have the environment variable SPARK_HOME pointing to the top directory of your spark installation, which can be downloaded here.

Scala

If you created a tool with createTool.sh, your test cases should be stored in the file src/test/scala/nl/utwente/bigdat/<ToolName>Test.scala. In the file there is a commented-out test case.

Python

Test cases for python jobs are stored in the same directory as the job itself: src/main/python. The tests require the package pytest, which can be installed with the command

pip install pytest

. To execute the tests, you first have to include the python adaptor for spark and the library py4j into your python path:

export PYTHONPATH=$PYTHONPATH:$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-*-src.zip

And to actually execute the tests:

py.test src/main/python/<ToolName>Test.py

Access to the cluster

The documentation of how to access the cluster can be found here.

Datasets

The documentation of available datasets on the cluster can be found here.

Creating html versions of the documentation

To create a self-contained html version of the documentation use the following commands (requires pandocs)

pandoc -T "Cluster Access" --toc -f markdown_github access.md -N -t html -o access.html
encodeImages.py access.html

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
data @ 55265c2		data @ 55265c2
src		src
templates		templates
.gitignore		.gitignore
.gitmodules		.gitmodules
access.html		access.html
access.md		access.md
cluster.svg		cluster.svg
createTool.sh		createTool.sh
data.md		data.md
pom.xml		pom.xml
readme.md		readme.md
setenv		setenv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Template Spark Project for the CTIT Cluster

Creating a new Job

Compilation

Testing

Scala

Python

Access to the cluster

Datasets

Creating html versions of the documentation

About

Uh oh!

Releases

Packages

Languages

robinaly/ctit-spark

Folders and files

Latest commit

History

Repository files navigation

Template Spark Project for the CTIT Cluster

Creating a new Job

Compilation

Testing

Scala

Python

Access to the cluster

Datasets

Creating html versions of the documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages