Data content requirements
Firenze supports different techniques for training models and requires different sizes of data. More data samples generally lead to better model performance. The quality of the data also has a direct correlation with how well a model will perform, so it is important to find a balance between quantity and quality.
Tabular classification, Text classification, Image classification and Relationship Extraction
Training a model: A minimum of 25 samples per class required, a minimum of 100 samples per class advised. Evaluating a model: A minimum of 10 samples per class required, a minimum of 50 samples per class advised.
Furthermore, if the most occurring class occurs 20% more than the least occurring class, a warning will be given showing that the data is unbalanced.
Tabular regression
Training a model: A minimum of 100 rows required, a minimum of 250 rows advised. Evaluating a model: A minimum of 10 rows required, a minimum of 50 rows advised.