A Data Centric MLOps suite for
Named Entity Recognition

Rapidly identify and fix labelling errors in your dataset. Import/export datasets in multiple formats, train a model and use it to aid in annotation process. Setup an MLOps pipeline to experiment with different algorithms on the same data and increase their accuracy and performance in a data-centric way.

Acharya Community Edition is completely free and needs no installation. It does not “call home”, there is no telemetry or tracking and no data or results are sent to us. Download and run the binary suitable for your system.

latest release v0.3.4-alphaOr

How Acharya helps

Data Scientists

Gain insights about your training & test data, distribution of annotated entities, and decide how to curate your data for better accuracy


Analyze your data annotations, identify missing, incorrect & multi-classifications, and improve your dataset quality

  • Update training and test data based on training and experiment with selective records for training.
  • Edit data content, upload custom data and export annotated data in multiple data formats.
  • Improve your dataset with insights from training with record and entity level accuracy drill-down for individual algorithms trained. Modify & curate your data accordingly

Annotators

Speed up and improve your annotation process with Acharya's helpful features, all from a single window


Features for annotators include

  • Annotation suggestions based on previous annotations
  • Show previous classifications of a word across different records
  • Suggest annotations based on training runs, even if the training is done on partially annotated data
  • Ability to configure 3rd party custom dictionaries (e.g. a medical terminology dictionary) to help with similar words.

Developers

Experiment with multiple algorithms using different libraries irrespective of them being on GPU, CPU, on-prem or cloud.


Algorithms are easy to scale and replicate via configuration files and they are run on Docker containers

  • Connect your model's git repo and train it. When done with the development branch, it makes experimenting easy in the dev cycle
  • Train a custom word-embedding for the domain specific data
  • Cache outputs of training and word-embeddings resulting in both time and cost savings
  • Train as per your convenience - locally, on-prem, CPU or GPU or on a remote Docker system
  • Write code independent of data format, Acharya will convert the training data to the supported format specified before the data is sent for training.
  • Compare two different training runs for code as well as data changes, resulting in faster diagnosis of drop in accuracy

Project Managers

Create multiple projects and track their progress independently.

Easily experiment with a new model/algorithm by training it in Acharya and comparing its performance with the other models in your project.

  • Train multiple algorithms and compare them
  • Track progress of data annotation
  • Train a custom word-embedding for a domain-specific application
  • Analyze over-fitting and under-fitting per entity, based on training output and take informed decisions based on the data
  • Tag and upload data from production, annotate that data to get the actual accuracy of the model in production
  • Compare the model in production with a model freshly developed and trained

Features

Data-Centric dashboard

Identify and fix data & data-classification issues, and perform drill-down analytics on the dataset. Gain insights about data classified across multiple categories/classes, missed classifications and anomalies in classifications.

Read more
Advanced Workbench

Workbench provides useful features including annotating text, entity renaming across records, editing content in-place, tag suggestions, auto-labeling suggestions, previous classifications and an ability to add a custom dictionary.

Read more
In-built data versioning

Acharya does data versioning by default. This helps with training reproducibility. Acharya also provides capability to compare a model training with a future version of the same model, with a model that uses a different algorithm and even with a model deployed in production.

Data versioning
Train, Test, Compare, Repeat

Train and compare models of different algorithms with the same dataset. Acharya versions all the training and helps compare between versions of training and data. The ability to perform comparison of both models and data at a record level significantly increases your productivity.

Auto labeling suggestions

Quickly train a few-shots classification model and get labeling suggestions based on the trained algorithms. This assists and boosts the performance of annotators and speeds up new data annotations.

NER Auto-labeling
Support for multiple data formats

Import data in CoNLL-2003, IOB (IOB1/2, BILOU, IOBES), JSONL, and txt. Import data from UI, API, command-line and Singer Taps. Also export annotated data to multiple data formats via command-line.

Product Tour

Community & Support

Questions, best-practices and brainstorming, join us on our discord.

If Acharya is providing value to your MLOps project, star us on Github. Need a feature or a customized report?