Releases: mimno/Mallet
Releases · mimno/Mallet
Release 2.1.0 - Java 17, reduced dependencies
What's New
This is the first stable release since 2.0.8 (2016), incorporating years of improvements and modernization.
Highlights
- Java 17 required - Updated from Java 8 to Java 17
- Reduced dependencies - Streamlined external library requirements
- Maven structure - Standard Maven project layout
- JShell support - Replaced BeanShell with JShell for scripting
Installation
Maven:
<dependency>
<groupId>cc.mallet</groupId>
<artifactId>mallet</artifactId>
<version>2.1.0</version>
</dependency>Gradle:
implementation 'cc.mallet:mallet:2.1.0'See the full changelog: v2.0.8...v2.1.0
202108
This is a serialization-breaking release due to the switch to HPPC, which affects feature alphabets.
Added
- Nonnegative Matrix Factorization
- Word embeddings (word2vec clone)
- PagedInstanceList supports iteration correctly
- lebiathan added stratified sampling of InstanceList
- This file!
Changed
- All merging and propagation of sampling statistics for topic modeling is now multi-threaded (if num-threads is more than 1), leading to a 5-10% speed boost.
- The primitive collections library (for example mapping String to int) has been changed from GNU trove to Carrotlabs HPPC. This change removes all GNU dependencies.
- The license has been changed from CPL to Apache.
- Use of VMID for unique identifier for serialized objects. (Breaks serialization!)
- Many small fixes suggested by ErrorProne.
- Unneeded imports removed.
Removed
- The Matrix2 class has been removed.
- GRMM has been moved to a separate package.
Fixed
- Te Rutherford fixed a bug where non-String instance IDs were being cast as Strings.
- The import functions (Csv2Vectors, Text2Vectors) have a case-sensitive flag, but this was not being passed to the stopword remover.
2.1 alpha
This is a serialization-breaking release due to the switch to HPPC, which affects feature alphabets.
Added
- Nonnegative Matrix Factorization
- Word embeddings (word2vec clone)
- PagedInstanceList supports iteration correctly
- lebiathan added stratified sampling of InstanceList
- This file!
Changed
- All merging and propagation of sampling statistics for topic modeling is now multi-threaded (if num-threads is more than 1), leading to a 5-10% speed boost.
- The primitive collections library (for example mapping String to int) has been changed from GNU trove to Carrotlabs HPPC. This change removes all GNU dependencies.
- The license has been changed from CPL to Apache.
- Use of VMID for unique identifier for serialized objects. (Breaks serialization!)
- Many small fixes suggested by ErrorProne.
- Unneeded imports removed.
Removed
- The Matrix2 class has been removed.
- GRMM has been moved to a separate package.
Fixed
- Te Rutherford fixed a bug where non-String instance IDs were being cast as Strings.
2.0.8 release candidate 3
v2.0.8RC3 moving windows .bat to new topic trainer
v2.0.8RC2
2.0.8 release candidate 1
Merge pull request #13 from drevicko/fix-instancelist-serialisation set alphabets from data or pipe when reading serialised instance list