This package provides a set of algorithms for data clustering.
Pkg.add("Clustering")Currently working algorithms:
- Kmeans
- Affinity Propagation
To be available:
- K medoids
- DP means
- ISO Data
Interfaces:
# perform K-means (centers are updated inplace)
result = kmeans!(x, centers, opts)
# perform K-means based on a given set of inital centers
result = kmeans(x, init_centers, opts)
# perform K-means to get K centers
result = kmeans(x, k, opts)
result = kmeans(x, k)
All these methods return an instance of KmeansResult, it is defined as
type KmeansResult{T<:FloatingPoint}
centers::Matrix{T} # cluster centers (d x k)
assignments::Vector{Int} # assignments (n)
costs::Vector{T} # costs of the resultant assignments (n)
counts::Vector{Int} # number of samples assigned to each cluster (k)
cweights::Vector{T} # cluster weights (k)
total_cost::Float64 # total cost (i.e. objective) (k)
iterations::Int # number of elapsed iterations
converged::Bool # whether the procedure converged
endOptions:
Note: options are specified using keyword arguments.
| name | descrption | default value |
|---|---|---|
| max_iters | maximum number of iterations | 100 |
| tol | tolerable objv change at convergence | 1.0e-6 |
| weights | sample weights (a vector or nothing) | nothing |
| display | verbosity (:none, :final, or :iter) |
:iter |
x = rand(100, 10000) # a set of 10000 samples (each of dimension 100)
k = 50 # the number of clustering
result = kmeans(x, k; max_iter=50, display=:iter)
Affinity Propagation is an algorithm that uses loopy belief propagation to run MAP inference to identify some exemplars. Unlike kmeans, the exemplars are chosen from the original samples. After the algorithm returns, every sample will be assigned to one of the exemplars.
The input of the algorithm is a similarity matrix S.
Unlike kmeans, you don't need to (and cannot) specify the number of
clusters. But the diagonal values of S will affect how many
clusters you will get at the end. Specifically, S[i,j] could be
interpreted as the tendency of assigning point i to point j
(when j is an exemplar). So
Sneed NOT to be symmetricS[i,i]represents the willingness of assigned pointito itself. So generally larger diagonal values forSmeans more clusters. For example, ifS[i,i]==max(S)for alli, then every point will be an exemplar itself.
Usually, assigning the diagonal of S to be the median of all the
rest entries could lead to reasonable results.
Interfaces:
result = affinity_propagation(S, opts)where the following options could be specified using keyword arguments
max_iter::Integer = 500, # max number of iterations
n_stop_check::Integer = 10, # stop if exemplars not changed for this number of iterations
damp::FloatingPoint = 0.5, # damping factor for message updating, 0 means no damping
display::Symbol = :iter # whether progress is shownthe returning value is a struct that looks like this:
type AffinityPropagationResult
exemplar_index ::Vector{Int} # index for exemplars (centers)
assignments ::Vector{Int} # assignments for each point
iterations ::Int # number of iterations executed
converged ::Bool # converged or not
end