Skip to main content

Uploading data to a NER Project

Accessing the upload screen

Once a Project has been created and opened, Upload can be accessed using the sidebar.

sidebar upload buttonsidebar upload button

Current supported data formats

  • Plain text
  • JSONL
  • IOB-style data
    • IOB(1)
    • IOB2
    • IOBES
    • CONLL2003
    • BILOU
  • TSV

Uploading data

To upload the data, either click on the drag and drop files here section or drag and drop files into the UI element.

Upload ui screenshot drop zoneUpload ui screenshot drop zone

If uploading records in plain text, the UI supports adding the plain text content directly as a record. Paste the content into the section highlighted below and the content will be added as a record.

Upload ui screenshot Upload ui screenshot
note

Duplicate file uploads are checked to prevent inconsistencies.

Form fields

Data format

The format of the uploaded data. Currently, the following data formats are supported: Supported data formats

Upload name

The name of the current upload

note

An upload name is generated if one is not provided

Tags

Tags that can be used to identify the upload

tip

Tagging set of uploaded records appropriately would help in identifying data sources which causes change in training results when comparing training.

Is the data in Acharya format?

Select this option if the data being uploaded is in the default Acharya JSONL format default config

Text (txt)

This is a plain text upload. All content uploaded will be treated as record data not associated to a particular data format

IOB style data

Formats such as IOB, IOB2, IOBES, CONLL2003 and BILOU are actively supported

JSONL

JSON Lines is supported where each line contains a JSON string with the following keys:
The JSON Key represents the key of the JSON property in the record (Fields marked * are required)

JSON KeyTypeDescription
Data *stringwhich denotes the actual training data
EntityLabels[][number, number, string]list of entity labels with start index, end index and label
Keystringwhich denotes the record key
Completednumberwhich denotes record as pending = 0/ train = 1/ test = 2
Prevstringprevious record's key
Nextstringnext record's key
note

For EntityLabels the end index is exclusive

For example consider a JSONL record

{"details":"Welcome to Acharya","entities":[[10,20,"Name"]]}
{"details":"Acharya is a data centric MLOps tool","entities":[[0,7,"Name"],[26,31,"Operation"]]}

the corresponding JSON map will be

{
"Data": "details",
"EntityLabels": "entities"
}
info

For the fields that are not provided in JSON map will be overridden with the default values

Default JSON map configuration
{
"Data": "data",
"EntityLabels": "meta_data",
"Key": "key",
"Completed": "completed",
"Prev": "prev",
"Next": "next"
}

Mark all the records in this upload

Here there are 3 options

  • As Pending
  • For Test/Evaluation
  • For Training

As Pending

As pending will mark all the records in the data being uploaded to be pending (i.e awaiting action). Records marked as pending will not be part of any training or evaluation.

For Test/Evaluation

For Test/Evaluation will mark all the records in the data being uploaded for testing or evaluation only, it will not be used for training.

For Training

For Training will mark all the records in the data being uploaded to be used for training.

tip

It is recommended to test your files before upload