-
Notifications
You must be signed in to change notification settings - Fork 0
Quickstart Scala
Lemkit comes in two packges, lemkit-train and lemkit-model. lemkit-train can be used to create and save linear classification models, which can then be read using lemkit-model. There are command-line applications to do both classification and prediction, and this can also be done using library functions.
This library depends on lemkit-model. See the documentation for that package for an introduction to feature observations and examples.
com.peoplepattern.classify.ClassifyApp implements lktrain and is a good source of example code.
Imagine you have a data file called data.train containing data instances in the format described in Input Format. Then use the following code. Note that in order for this to work, the vw executable that runs Vowpal Wabbit needs to be installed somewhere in your PATH.
import com.peoplepattern.classify._
import com.peoplepattern.classify.data._
val hashOptions = HashingOptions()
val options = VowpalClassifierOptions(hashOptions)
val trainData = ClassifierSource.readDataFile("data.train").toSeq
val classifier = VowpalClassifier.train(trainData, options)
LinearClassifier.writeBinaryModel(classifier, "model.binary")
You can set options for hashing using HashingOptions, and options for training Vowpal using VowpalClassifierOptions. For example, to turn on feature hashing using a maximum of 1000 features, use
val hashOptions = HashingOptions(hashtrick = Some(1000))
You can test this code in a Scala REPL (read-eval-print loop) by running sbt console and then pasting the code in.
Assuming you already have trained a model using lemkit-train, you can use it for predicting as follows. See also com.peoplepattern.classify.PredictApp, which implements bin/lkpredict and is a good source of example code.
Imagine you have saved out a binary model to a file called model.binary, and have a data file called data.predict containing data instances in the format described in Input Format. Then use code like this:
import com.peoplepattern.classify._
import com.peoplepattern.classify.data._
val classifier = LinearClassifier.readBinaryModel("model.binary")
val predictData = ClassifierSource.readDataFile("data.predict").toSeq
val predictions = predictData.map(i => classifier(i.features))
for ((prediction, inst) <- predictions zip predictData) {
println(s"Predicted label: ${prediction}, correct label: ${inst.label}")
}
If you want the scores for each possible label, use classifier.scores
in place of just classifier (which calls the apply function).
You can test this code in a Scala REPL (read-eval-print loop) by running sbt console and then pasting the code in.
In lemkit-train, the command-line app lktrain can be used to train and write out classification models (as well as to do prediction, either using a model trained at the same or a previously-written-out model, trained either using Scala or Python). The following command line does the equivalent of the library code above for lemkit-train:
lktrain -m vowpal -f binary -t data.train -w model.binary
Full documentation is available in the User Guide.
lkpredict is used to do prediction on linear classifier models trained using lemkit-train. The following command line does the equivalent of the library code above for lemkit-model:
lkpredict -m vowpal -f binary -p data.predict
Full documentation is available in the User Guide.