Chemometrics as an efficient tool for food authentication: Golden pillars for building reliable models

This paper (open access) reviews the use of chemometrics and machine learning for building food authenticity classification databases. It highlights best practice. It concentrates on one-class classification models (“Is this sample X or is this sample NOT X?”) and on the complication, common within authenticity applications, that adulterants which should classify the sample as “NOT X” may be present in very low proportions.

The paper describes a generic and structured approach to building a classification database. It concludes with 10 “golden rules”

1 Before choosing a data analysis method, it is important to understand what question needs to be answered.

2. Authentication should be developed using an one-class classification (OCC) approach. Discrimination, such as PLS-DA, may be used in exceptional cases where an exhaustive list of classes is available.

3.The OCC model must be developed using a representative training set collected from the target class, a further representative set for model optimization and, at the end, a test set for model validation. The key to success is good sampling.

4.Proper data preprocessing is essential because all data sets contain both useful and unwanted information.

5.The quality of a model is assessed using two main figures of merit (FoM); sensitivity, which is the rate of true acceptance of samples from the target class, and specificity, which is the rate of true rejection of samples not belonging to the target class.

6.In probabilistic models such as SIMCA, 100% sensitivity cannot be expected. Their variability due to the limited number of samples should be taken into account.

7.There are two ways to optimize an OCC model: rigorous, which is based only on the target data, and compliant, which also uses data belonging to other classes.

8.In either case, OCC optimization is performed by comparing the results of the training and test data, and the optimal complexity (e.g., the number of PCs) is chosen at a FoM convergence.

9.When choosing an alternative set from a non-target class for compliant optimization, the class that needs to be chosen must be as similar as possible to the target one and must reflect the real authentication problem, otherwise the specificity estimate may be overoptimistic or non-realistic.

10.Once the model is properly optimized, its actual performance needs to be verified on the test set, which must be completely extraneous to the previous training and optimization phases. FoM calculated on the test set are the ones that must be considered to define model prediction ability.

(image taken from the paper)

News

Chemometrics as an efficient tool for food authentication: Golden pillars for building reliable models

Comments