8.4 apriori itemsets

UNDER DEVELOPMENT 20220106

ml itemsets apriori [options] [datafile]
     -i <name>     --id=<name>          The id column name.
     -s <0-1>      --support=<0-1>      Minimum support threshold.

Input file is a two column csv file, one is the basket id and the other is an item in that basket. If no datafile is named on the command line the data is read from stdin.

id,item
u1234567,comp1234
u1234567,comp2345
u1234567,comp3456
u1234567,comp4567
u1234568,comp1234
u1234568,comp4567
...

Output to stdout is a row for each possible basket item set combination, with frequencies and support:

$ ml itemsets apriori mcopm.csv
pattern,freq,support
comp1234:comp4567,145,0.75
comp2345,123,0.45
...

The output can be filtered to include only those item sets with at least a specified value for the support, the support being 10% (0.1) by default.

$ ml itemsets apriori --support=0.5 mcomp.csv
pattern,freq,support
comp1234:comp4567,145,0.75
...

A column named id is expected. If it has a different name it can be set:

$ ml itemsets apriori --id=ID mcomp.csv
pattern,freq,support
comp1234:comp4567,145,0.75
...


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0