Prediction Text Quality Monitoring

After training a model you will want to use it to predict on new data. For good results, it is not only important to have quality training data, but also to have quality prediction data. If there is a large discrepancy between the two, the model could have a harder time producing quality predictions. Simply looking at the prediction data is not sufficient, since what really matters is its similarity to the training data. In order to made this process easier and automatic, we introduced Prediction Text Quality Monitoring.

In order to calculate the prediction data text quality, several text metrics are calculated for the training and prediction data. No actual data content is being saved. The prediction data is compared to the training data on the following text statistics:

Language
Text length
Alphabetical percentage
Numerical percentage
Symbol percentage

Prediction Text Quality Monitoring is enabled by default. It can be turned off by unticking the Calculate text quality checkbox before submitting a data prediction on the Prediction tab of a model.

The results of Prediction Text Quality Monitoring can be found on the Monitoring tab of a model. The text quality scores of the selected time range are shown here. To inspect the individual text statistics press the Show breakdown button for a specific label or the overall time range. This will show a gauge per text statistic.