Adaptively constrained K-means algorithm
In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis. ACK-means is currently experimented with Python. ACK-means is expected to be introduced into ROME 2.0 as a Python-independent, parallel computing module in the near future.
Y. Xu, J. Wu, C.C. Yin, Y. Mao. Unsupervised cryo-EM data clustering through adaptively constrained K-means algorithm. PLoS ONE 11, e0167765 (2016). doi: 10.1371/journal.pone.0167765. arXiv: 1609.02213 [q-bio.QM]. Download
Click here: Python source code.