The authors of this study (open access) used the results and datasets from 18 published projects and biobanks to build a database of bacterial metataxonomic data from fermented table olives. The collated database contained database 442 samples of 16S rRNA bacterial profiles
They then compared three tree-based Machine Learning algorithms—Classification and Regression Tree, Random Forest (RF), and Extreme Gradient Boosting— to classify the origin or production process of the olives. They report that Machine Learning techniques can effectively classify bacterial profiles based on olive processing type, cultivar, country of origin, and isolation matrix. The Random Forest model achieved the highest accuracy, reaching 97% in the best cases, with a kappa coefficient above 0.8 for most categories.
They conclude that approach holds potential applications in the table olive sector and in other food products, where the industrial application of ML techniques to metataxonomic data could enhance traceability, authenticity, and quality control.
Photo by Melina Kiefer on Unsplash