Spectral Clustering

cluster.cluster provides an interface for k-Means Clustering and Spectral Clustering.

class SpectralClustering(k: int, max_iter: int = 1000, rep: int = 10, seed: Optional[int] = None, eps: float = 1e-10, norm: Optional[int] = None, sparse: bool = True, graph: str = 'kNN', graph_param: Union[int, float] = 5, normalized: bool = True, vectorize_similarity_matrix: bool = True)

Spectral Clustering object

Parameters
  • k – the number of clusters to create

  • max_iter – maximum number of iterations to perform for convergence of clusters in k-Means iteration

  • rep – number of times to repeat the k-Means clustering algorithm

  • seed – seed to use

  • eps – stopping criterion

  • norm – the norm to use

  • sparse – whether to use a sparse representation of the graph (using scipy.sparse)

  • graph – one of ‘kNN’, ‘mkNN’, ‘eps’, ‘fc’ - ‘kNN’: k-Nearest-Neighbor graph - ‘mkNN’: mutual k-Nearest-Neighbor graph - ‘eps’: epsilon connected graph - ‘fc’: fully connected graph

  • graph_param – the parameters for the specified graph (i.e.: number of neighbors, epsilon)

  • normalized – whether to use normalized spectral clustering

  • vectorize_similarity_matrix – whether to vectorize the matrix or filling the matrix iteratively

cluster(points: numpy.array)numpy.array

cluster the points provided

Parameters

points – the dataset to be clustered in form (N x d) where N is the number of points to be clustered

Returns

None

get_L()numpy.array

get graph Laplacian L

Returns

graph Laplacian L

Return type

np.array

get_S()numpy.array

get similarity matrix S

Returns

similarity matrix S

Return type

np.array

get_graph()cluster.graph.Graph.Graph

get graph

Returns

graph

Return type

Graph

get_labels()numpy.array

get labels

Returns

labels

Return type

np.array

get_points()numpy.array

get clustered points

Returns

points

Return type

np.array

Example

import numpy as np
from scipy.stats import multivariate_normal
from cluster.cluster import SpectralClustering
from cluster.utils import visualize_clusters_2d, visualize_graph_2d
# matplotlib only for visualization
import matplotlib.pyplot as plt

# create the dataset
np.random.seed(42)

N1 = 25
N2 = 75
r1 = 1 / 4
r2 = 3 / 4

x1 = 2 * np.pi * np.random.rand(N1)
x2 = 2 * np.pi * np.random.rand(N2)
r1 = r1 + np.random.rand(N1) * 0.1
r2 = r2 + np.random.rand(N2) * 0.1

points = np.vstack([np.concatenate([r1 * np.cos(x1), r2 * np.cos(x2)]),
                    np.concatenate([r1 * np.sin(x1), r2 * np.sin(x2)])]).T

# cluster the dataset
spectral_clustering = SpectralClustering(k=2, graph_param=5)
spectral_clustering.cluster(points=points)
labels = spectral_clustering.get_labels()

# visualize the dataset in 2D
fig, ax = plt.subplots()
ax = visualize_clusters_2d(points, labels, ax, 'SpectralClustering with k = 2')
plt.show()

# visualize the graph
fig, ax = plt.subplots()
visualize_graph_2d(points, spectral_clustering.get_graph(), ax, 'kNN graph with 5 neighbors')
plt.show()

(Source code, png, hires.png, pdf)

../_images/SpectralClustering-1_00_00.png

(png, hires.png, pdf)

../_images/SpectralClustering-1_01_00.png