Time Series Clustering - Mixture Models for Clustering

less than 1 minute read

Published: April 15, 2018

Following are the steps in using mixture models for Clustering:-

Fit the mixture model
Compute $P(z_i=k|x_i,\theta)$

$P(z_i=k|x_i,\theta)$ represents the posterior probability that point i belongs to cluster k.

$r_{ik}=P(z_i=k|x_i,\theta)\propto(P(z_i=k|\theta)P(x_i|z_i=k,\theta))$

Previous procedure is soft clustering.

Assuming this is small, it may be reasonable to compute a hard clustering using the MAP estimte, given by

$z_i^*=argmax$r_{ik}$=argmax$log(p(x_i|z_i=k,\theta))+log(p(z_i=k|\theta))$$

We call this procedure

import os
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
from sklearn.mixture import GaussianMixture

path = '/Users/avinashbarnwal/Desktop/Machine Learning/Daily'
os.chdir(path)
X = pd.read_csv("simola_natural_yeast_imputed.csv",sep=",",header='infer',index_col=0)

One of the most important parameter is covariance type which is “full” leading to exhaustive search

n_components = np.arange(1, 21)
models = [GaussianMixture(n, covariance_type='full', random_state=0).fit(X)
          for n in n_components]

plt.plot(n_components, [m.bic(X) for m in models], label='BIC')
plt.plot(n_components, [m.aic(X) for m in models], label='AIC')
plt.legend(loc='best')
plt.xlabel('n_components')

png

gmm = GaussianMixture(n_components=16, covariance_type='full').fit(X)
for i in range(16):
    plt.plot(gmm.means_[i], label=i)
plt.legend(loc='best')

png

Share on

Twitter Facebook LinkedIn

Avinash Barnwal

Time Series Clustering - Mixture Models for Clustering

Share on

You May Also Enjoy

India Chip Push

Impacting businesses through predictive models

Managers vs Leaders

The 5 Types of Wealth