Module: cluster

Fuzzy clustering subpackage, containing fuzzy c-means clustering algorithm. This can be either supervised or unsupervised, depending if U_init kwarg is used (if guesses are provided, it is supervised).

skfuzzy.cluster.cmeans(data, c, m, error, ...) Fuzzy c-means clustering algorithm [1].
skfuzzy.cluster.cmeans_predict(test_data, ...) Prediction of new data in given a trained fuzzy c-means framework [1].

cmeans

skfuzzy.cluster.cmeans(data, c, m, error, maxiter, init=None, seed=None)[source]

Fuzzy c-means clustering algorithm [1].

Parameters:

data : 2d array, size (S, N)

Data to be clustered. N is the number of data sets; S is the number of features within each sample vector.

c : int

Desired number of clusters or classes.

m : float

Array exponentiation applied to the membership function u_old at each iteration, where U_new = u_old ** m.

error : float

Stopping criterion; stop early if the norm of (u[p] - u[p-1]) < error.

maxiter : int

Maximum number of iterations allowed.

init : 2d array, size (S, N)

Initial fuzzy c-partitioned matrix. If none provided, algorithm is randomly initialized.

seed : int

If provided, sets random seed of init. No effect if init is provided. Mainly for debug/testing purposes.

Returns:

cntr : 2d array, size (S, c)

Cluster centers. Data for each center along each feature provided for every cluster (of the c requested clusters).

u : 2d array, (S, N)

Final fuzzy c-partitioned matrix.

u0 : 2d array, (S, N)

Initial guess at fuzzy c-partitioned matrix (either provided init or random guess used if init was not provided).

d : 2d array, (S, N)

Final Euclidian distance matrix.

jm : 1d array, length P

Objective function history.

p : int

Number of iterations run.

fpc : float

Final fuzzy partition coefficient.

Notes

The algorithm implemented is from Ross et al. [R24].

Fuzzy C-Means has a known problem with high dimensionality datasets, where the majority of cluster centers are pulled into the overall center of gravity. If you are clustering data with very high dimensionality and encounter this issue, another clustering method may be required. For more information and the theory behind this, see Winkler et al. [R25].

References

[R24](1, 2) Ross, Timothy J. Fuzzy Logic With Engineering Applications, 3rd ed. Wiley. 2010. ISBN 978-0-470-74376-8 pp 352-353, eq 10.28 - 10.35.
[R25](1, 2) Winkler, R., Klawonn, F., & Kruse, R. Fuzzy c-means in high dimensional spaces. 2012. Contemporary Theory and Pragmatic Approaches in Fuzzy Computing Utilization, 1.

cmeans_predict

skfuzzy.cluster.cmeans_predict(test_data, cntr_trained, m, error, maxiter, init=None, seed=None)[source]

Prediction of new data in given a trained fuzzy c-means framework [1].

Parameters:

test_data : 2d array, size (S, N)

New, independent data set to be predicted based on trained c-means from cmeans. N is the number of data sets; S is the number of features within each sample vector.

cntr_trained : 2d array, size (S, c)

Location of trained centers from prior training c-means.

m : float

Array exponentiation applied to the membership function u_old at each iteration, where U_new = u_old ** m.

error : float

Stopping criterion; stop early if the norm of (u[p] - u[p-1]) < error.

maxiter : int

Maximum number of iterations allowed.

init : 2d array, size (S, N)

Initial fuzzy c-partitioned matrix. If none provided, algorithm is randomly initialized.

seed : int

If provided, sets random seed of init. No effect if init is provided. Mainly for debug/testing purposes.

Returns:

u : 2d array, (S, N)

Final fuzzy c-partitioned matrix.

u0 : 2d array, (S, N)

Initial guess at fuzzy c-partitioned matrix (either provided init or random guess used if init was not provided).

d : 2d array, (S, N)

Final Euclidian distance matrix.

jm : 1d array, length P

Objective function history.

p : int

Number of iterations run.

fpc : float

Final fuzzy partition coefficient.

Notes

Ross et al. [R26] did not include a prediction algorithm to go along with fuzzy c-means. This prediction algorithm works by repeating the clustering with fixed centers, then efficiently finds the fuzzy membership at all points.

References

[R26](1, 2) Ross, Timothy J. Fuzzy Logic With Engineering Applications, 3rd ed. Wiley. 2010. ISBN 978-0-470-74376-8 pp 352-353, eq 10.28 - 10.35.