Upload data
Data uploading
When selecting the data upload button, a side panel will open where information regarding the data can be entered. The required fields include:
- Title: the name of the dataset. If a file is uploaded first this field will fill automatically receive the name of the file.
- Language(s): the languages that are present in the dataset.
- Technique: the technique for which the dataset will be used.
- License(s): the licenses that are associated with the dataset.
- Description: a description of the content and purpose of the dataset. A detailed description is recommended.
- File: the file that contains the dataset. Firenze supports CSV, TSV, XLSX, and JSONL. Depending on the technique it may be required to select columns for prediction and training.
Column selection
If a technique requires selection of certain columns from the dataset, a select columns button will appear below the file upload section. This button will be disabled until a dataset is added. Upon clicking the select columns button a dialog will open which shows a table with the first ten entries of the datafile. The dropdown menus below the table allow for selection of the column that will be used for prediction and the columns that will be used for training. If a dataset is unannotated, the user only has to select the columns for training.
The exclamation mark icons in the column names identify columns which contain information that might be biased (red exclamation mark) or information which might be insignificant (orange exclamation mark). If such a column is selected for training or prediction, the affected columns will also be shown in the overview table on the bottom right. The user should then carefully consider whether to include these columns for training or prediction.
Once all necessary columns have been selected the Confirm button can be selected. The selected prediction and training columns can be seen in the overview below the file upload section in the data upload panel. Once all fields have been filled in, the Upload data button can be selected and the data will be uploaded
If the data is not labelled: continue to Annotate data.
If the data is already labelled: continue to Create a project.