Evaluation metrics

Firenze uses a few different evaluation metrics for scoring models. Evaluation metrics are techniques used to measure the performance of a machine learning model. These metrics are used to compare different models and select the best one for a given task. Classification techniques use the MCC score, regression techniques use the R² score.

It is important to note that the choice of evaluation metric depends on the specific problem and the characteristics of the data. For example, accuracy is a good metric for balanced datasets, but it might not be the best choice for imbalanced datasets. Matthews Correlation Coefficient would be more useful in that case. Additionally, when multiple metrics are used, it can give more insight into the model performance.

On the evaluation page of a trained model, different metrics are also listed.

Accuracy

Accuracy is a measure of how often the model correctly predicts the class or outcome. It ranges from 0 to 1, where a value of 1 indicates that the model perfectly fits the data and a value of 0 indicates that the model does not fit the data at all. A higher Accuracy indicates that the model is a better fit for the data, while a lower Accuracy indicates that the model is a poorer fit for the data.

This metric is used for Tabular Classification, Text Classification and Relationship Extraction.

Macro average

The Macro average is an average of the performance metrics of a model across different classes or groups. It ranges from 0 to 1, where a value of 1 indicates that the model perfectly fits the data and a value of 0 indicates that the model does not fit the data at all. A higher Macro average score indicates that the model is a better fit for the data, while a lower Macro average score indicates that the model is a poorer fit for the data.

This metric is used for Tabular Classification, Text Classification and Relationship Extraction.

Matthews Correlation Coefficient

The MCC score is a measure of the correlation between predicted and actual classifications. It ranges from -1 to 1, where a value of 1 indicates that the model perfectly fits the data and a value of -1 indicates that the model does not fit the data at all. A higher MCC score indicates that the model is a better fit for the data, while a lower MCC score indicates that the model is a poorer fit for the data.

This metric is used for Tabular Classification, Text Classification and Relationship Extraction.

Weighted average

The Weighted average is an average of the performance metrics of a model that takes into account the relative importance of each class or group. It ranges from 0 to 1, where a value of 1 indicates that the model perfectly fits the data and a value of 0 indicates that the model does not fit the data at all. A higher Weighted average score indicates that the model is a better fit for the data, while a lower Weighted average score indicates that the model is a poorer fit for the data.

This metric is used for Tabular Classification, Text Classification and Relationship Extraction.

R²

R² is a measure of how well a regression model fits the data. It ranges from -infinity to 1, where a value of 1 indicates that the model perfectly. The R2 score is calculated by comparing the variance of the predicted values to the variance of the actual values. A higher R2 score indicates that the model is a better fit for the data, while a lower R2 score indicates that the model is a poorer fit for the data.

This metric is used for Tabular Regression.

Mean Absolute Error

The Mean Absolute Error (MAE) is a measure of how close the predictions are to the actual values. A lower MAE indicates that the model is a better fit for the data, while a higher MAE indicates that the model is a poorer fit for the data.

This metric is used for Tabular Regression.

Mean Squared Error

The Mean Squared Error (MSE) is a measure of how close the predictions are to the actual values. A lower MSE indicates that the model is a better fit for the data, while a higher MSE indicates that the model is a poorer fit for the data.

This metric is used for Tabular Regression.

Root Mean Squared Error

The Root Mean Squared Error (RMSE) is a measure of how close the predictions are to the actual values. A lower RMSE indicates that the model is a better fit for the data, while a higher RMSE indicates that the model is a poorer fit for the data.

This metric is used for Tabular Regression.