@kanaries/ml

Clusters

API reference for Clusters

Clusters

Clustering Algorithms Comparison

Compare different clustering algorithms on classic datasets. The datasets shown here are commonly used to demonstrate the strengths and weaknesses of different clustering approaches.

Algorithm Comparison Notes

K-Means

Assumes circular clusters and struggles with non-convex shapes. Works well when clusters are spherical and similar in size.

DBSCAN

Excellent for non-convex shapes and handles noise well. Requires tuning of epsilon and min_samples parameters.

OPTICS

Extension of DBSCAN that works with varying densities. Creates a reachability plot for cluster extraction.

Mean Shift

Finds clusters by shifting points towards modes of the data distribution. Automatically determines cluster count.

HDBSCAN

Hierarchical extension of DBSCAN that works well with varying densities and hierarchical cluster structures.

Clusters.KMeans

constructor (n_clusters: number = 2, opt_ratio: number = 0.05, initCenters?: number[][], max_iter: number = 30)
props nametypedefault value
n_clustersnumber2
opt_rationumber0.05
initCentersnumber[][]undefined
max_iternumber30
const X = [
    [0, 0],
    [0.5, 0],
    [0.5, 1],
    [1, 1],
];
const sampleWeights = [3, 1, 1, 3];
const initCenters = [[0, 0], [1, 1]];

const kmeans = new KMeans(2, 0.05, initCenters);

const result = kmeans.fitPredict(X, sampleWeights);

Clusters.DBScan

constructor(eps: number = 0.5, minSamples: number = 5, distanceType: Distance.IDistanceType = 'euclidiean')

fitPredict(samplesX: number[][]): number[] returns cluster labels for samples. Noise points are marked as -1.

const X = makeCircles(20, 20, 1, 5);
const dbscan = new DBScan(0.6, 3);
const labels = dbscan.fitPredict(X);

Clusters.HDBScan

constructor(
    min_cluster_size: number = 5,
    min_samples: number | null = null,
    cluster_selection_epsilon: number = 0.5,
    metric: Distance.IDistanceType = 'euclidiean'
)

fitPredict(samplesX: number[][]): number[] returns cluster labels. Noise points are marked as -1.

This is a simplified implementation that internally calls DBSCAN using cluster_selection_epsilon as the eps parameter.

const hdb = new HDBScan(5, null, 0.6);
const labels = hdb.fitPredict(X);

Clusters.MeanShift

constructor(
    bandwidth: number = 1,
    max_iter: number = 300,
    distanceType: Distance.IDistanceType = 'euclidiean'
)

Methods:

  • fitPredict(samplesX: number[][]): number[]
  • getCentroids(): number[][]
const ms = new MeanShift(2);
const labels = ms.fitPredict(X);
const centers = ms.getCentroids();

Clusters.OPTICS

interface OPTICSOptions {
    min_samples?: number;
    max_eps?: number;
    metric?: Distance.IDistanceType;
    p?: number;
    eps?: number;
}
constructor(options: OPTICSOptions = {})

fitPredict(samplesX: number[][]): number[] returns cluster labels. Noise points are marked as -1.

const optics = new OPTICS({ eps: 0.5, min_samples: 5 });
const labels = optics.fitPredict(X);

Clusters.kmeansPlusPlus

kmeansPlusPlus(
    X: number[][],
    n_clusters: number,
    sampleWeight?: number[],
    randomState: () => number = Math.random
): { centers: number[][]; indices: number[] }

This utility initializes cluster centers using the k-means++ strategy.

const { centers } = kmeansPlusPlus(X, 3);