7.2 kmeans quick start

20210322 Below we share some examples, showing various commands supported by the package. The sample training dataset iris.csv can be downloaded directly from GitHub:

wget https://raw.githubusercontent.com/gjwgit/kmeans/master/iris.csv

The simplest cluster analysis of a csv file with numeric columns for all variables:

ml train kmeans 3 --input iris.csv

The data can be piped to the train command:

cat iris.csv | ml train kmeans 3

The model can be saved to file using --output:

ml train kmeans 3 --input iris.csv --output model.csv

Or we can redirect the output:

cat iris.csv | ml train kmeans 3 > model.csv

The model can then be used by predict to assign each observation in a csv file to a cluster (effectively, to it’s nearest centroid):

ml predict kmeans model.csv --input iris.csv

The model output from the train command can also be piped to the predict command to report the cluster membership for each observation in the training dataset:

cat iris.csv | ml train kmeans 3 | ml predict kmeans iris.csv

A video can be constructed and displayed to show the iterations of the training algorithm over the data:

ml train kmeans 3 -i iris.csv --view

To save the video to file use the -m or --movie option naming the file into which to save the mp4 video:

ml train kmeans 3 -i iris.csv -m iris.mp4


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0