A Python implementation of the PROCLUS algorithm: PROjected CLUStering for high-dimensional data.
PROCLUS is a subspace clustering algorithm that finds clusters in different subspaces of high-dimensional data. It automatically identifies relevant dimensions for each cluster, making it particularly effective for datasets where different clusters exist in different feature subspaces.
- NumPy
- SciPy
- Matplotlib (for running the examples)
- Cython (optional, for building the Adjusted Rand Index evaluator)
-
Clone the repository:
git clone https://github.com/Alessi0X/pyproclus-fork.git cd pyproclus-fork -
(Optional) Build the Cython extension for faster evaluation of the Adjusted Rand Index:
python setup.py build_ext --inplace
or
make
Check the example files (example01.py, example02.py, example03.py) for usage demonstrations.
Basic usage:
import proclus as prc
import arffreader as ar
# Load data
X, supervision = ar.readarff("data/simple.arff")
# Run PROCLUS
medoids, dimensions, assignments = prc.proclus(X, k=3, l=2, seed=902884)
# Evaluate clustering
accuracy = prc.computeBasicAccuracy(assignments, supervision)example01.py- Simple 2D clustering demonstrationexample02.py- High-dimensional data clusteringexample03.py- Subspace stream clustering with custom parameterscorrelation.py- Analysis of correlation between objective function and Adjusted Rand Index
The Adjusted Rand Index evaluation measure is implemented in Cython for efficiency. Pre-generated C code is included in the distribution. This component is optional and only required for computing clustering quality metrics.
Charu C. Aggarwal, Joel L. Wolf, Philip S. Yu, Cecilia Procopiuc, and Jong Soo Park. 1999. Fast algorithms for projected clustering. In Proceedings of the 1999 ACM SIGMOD international conference on Management of data (SIGMOD '99). ACM, New York, NY, USA, 61-72. DOI=10.1145/304182.304188
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
This is a fork with Python 3 compatibility updates. Original implementation by Cassio M. M. Pereira.
Original repository: https://github.com/cmmp/pyproclus