Skip to content

Releases: mimno/Mallet

Release 2.1.0 - Java 17, reduced dependencies

08 Jan 16:46

Choose a tag to compare

What's New

This is the first stable release since 2.0.8 (2016), incorporating years of improvements and modernization.

Highlights

  • Java 17 required - Updated from Java 8 to Java 17
  • Reduced dependencies - Streamlined external library requirements
  • Maven structure - Standard Maven project layout
  • JShell support - Replaced BeanShell with JShell for scripting

Installation

Maven:

<dependency>
  <groupId>cc.mallet</groupId>
  <artifactId>mallet</artifactId>
  <version>2.1.0</version>
</dependency>

Gradle:

implementation 'cc.mallet:mallet:2.1.0'

See the full changelog: v2.0.8...v2.1.0

202108

11 Aug 13:57

Choose a tag to compare

This is a serialization-breaking release due to the switch to HPPC, which affects feature alphabets.

Added

  • Nonnegative Matrix Factorization
  • Word embeddings (word2vec clone)
  • PagedInstanceList supports iteration correctly
  • lebiathan added stratified sampling of InstanceList
  • This file!

Changed

  • All merging and propagation of sampling statistics for topic modeling is now multi-threaded (if num-threads is more than 1), leading to a 5-10% speed boost.
  • The primitive collections library (for example mapping String to int) has been changed from GNU trove to Carrotlabs HPPC. This change removes all GNU dependencies.
  • The license has been changed from CPL to Apache.
  • Use of VMID for unique identifier for serialized objects. (Breaks serialization!)
  • Many small fixes suggested by ErrorProne.
  • Unneeded imports removed.

Removed

  • The Matrix2 class has been removed.
  • GRMM has been moved to a separate package.

Fixed

  • Te Rutherford fixed a bug where non-String instance IDs were being cast as Strings.
  • The import functions (Csv2Vectors, Text2Vectors) have a case-sensitive flag, but this was not being passed to the stopword remover.

2.1 alpha

13 Jun 14:54

Choose a tag to compare

This is a serialization-breaking release due to the switch to HPPC, which affects feature alphabets.

Added

  • Nonnegative Matrix Factorization
  • Word embeddings (word2vec clone)
  • PagedInstanceList supports iteration correctly
  • lebiathan added stratified sampling of InstanceList
  • This file!

Changed

  • All merging and propagation of sampling statistics for topic modeling is now multi-threaded (if num-threads is more than 1), leading to a 5-10% speed boost.
  • The primitive collections library (for example mapping String to int) has been changed from GNU trove to Carrotlabs HPPC. This change removes all GNU dependencies.
  • The license has been changed from CPL to Apache.
  • Use of VMID for unique identifier for serialized objects. (Breaks serialization!)
  • Many small fixes suggested by ErrorProne.
  • Unneeded imports removed.

Removed

  • The Matrix2 class has been removed.
  • GRMM has been moved to a separate package.

Fixed

  • Te Rutherford fixed a bug where non-String instance IDs were being cast as Strings.

2.0.8 release candidate 3

11 Nov 19:16

Choose a tag to compare

Pre-release
v2.0.8RC3

moving windows .bat to new topic trainer

v2.0.8RC2

19 Jun 19:57

Choose a tag to compare

v2.0.8RC2 Pre-release
Pre-release
cleaned up, added comments

2.0.8 release candidate 1

10 Dec 19:50

Choose a tag to compare

Pre-release
Merge pull request #13 from drevicko/fix-instancelist-serialisation

set alphabets from data or pipe when reading serialised instance list