Skip to main content

5 posts tagged with "Acharya"

View All Tags

· 3 min read
Vimal Menon

Continuing our tour of data-centric features in Acharya that we began in Part 1, in Part 2, we will continue our exploration of some of the data-centric features in Acharya that can increase your efficiency as an annotator as well as potentially improve your ML models. Let's say you have trained your algorithm on your dataset three times. Please refer to the Acharya documentation for details on how to start a training of your configured algorithm on your project dataset.

Let's say you have configured 3 models for training on your dataset. Please refer to the Acharya documentation for details on how to configure your models for training in Acharya.

Training details

In the screenshot above, the left side pane shows the three trainings. Trn3 is selected, which expands the details on the right side pane. You can see the algorithm, the time it took, its status and its Score.

Training algorithm details

As in the screenshot above, you can view the details of the training run by expanding the highlighted arrow (red square box). In the Algo Details tab, you can see the Precision, Recall and the F1 score. Do note the Git commit ID. For an MLOps tool, we believe that Git support is a key requirement and like engineering code, even ML code should be version controlled in a mature MLOps implementation.

Switching to the Reports tab, you are able to further drill down to the Entity level performance of the Trn3.

Training Entity scores

You can also see the Per Record scores by expanding the 'Per Record scores' section.

Training Per record scores

This helps you to identify how well the trained model performs on the individual records marked for evaluation. In this view, you must also peruse the columns like Precision, Recall, False Negative, True Positive etc. and find annotation errors and fix those annotations appropriately in the evaluation records. This can have an impact on the score. We have seen incorrectly annotated data being skipped by the model. When we fixed the annotation, which was highlighted to us by this view - the model was able to classify the validation data appropriately increasing the score of the model.  Likewise, you can go back to see all the Trainings, choose two runs to compare their details, and even see the Best performing training run to identify the potential production candidate. This comparison will show you a diff about the algorithm code as well as the data [ please note - in Acharya Community Edition, this comparison has to be executed using the command-line interface ].

Reviewing all the Training Runs in Acharya

The ease with which Training experiments can be executed is a key feature of Acharya. The insights about how the trained model performed on individual evaluation records combined with the other data centric features that was discussed in Part-1, the ability to tune the model by tweaking the data becomes much simpler with Acharya. This helps in improving the efficiency of your NER model development process.

· 4 min read
Vimal Menon

A key approach within data-centric AI is to use the available data to gain a better understanding about the data that is shaping the model and then let the model guide you towards anomalies in the data, which you then use to reconfigure your datasets or even include or remove certain data to see the impact that it has on the model behavior.

So you need a mechanism that allows you to not only understand the data better, but also be able to quickly see the impact that the changes have on the data.

As soon as you upload the data, the Acharya Dashboard provides you immediate feedback in the form of the following reports. These are simply based on your data without running your models yet.

Use entity distribution to determine your entity bias

Entity distribution shows how many annotations belongs to each entity. If the number of annotations of a particular entity is less or more as compared to other entities, you can make out that the dataset is biased against or towards that entity. So this view gives an insight into what kind of new data should be sourced into the project to balance the entity distribution.

Entity distribution in Acharya

Classifications

This table lists the words classified against each entity. This view helps in identifying words from your text corpus that are classified as entities. If a word is classified as more than one entity then it is important to verify such annotations and confirm the validity of that annotation. This table also helps to know the count of occurrences of each annotated word in the dataset. In the screenshot below, the work NEW YORK is classified as a LOC in 102 records and as ORG in 41 records. Clicking on NEW YORK expands the list, and helps you navigate to the specific record where you can review the text to confirm if the classification is valid.

Entity Classifications view in Acharya

Missed Classifications

Often it so happens that while annotating, you might have either missed classifying the word or might have wrongly classified that word. It may also happen that you marked that word once, but missed at other locations simply because of time constraints. It is to identify such misses that we built Missed classifications. This again is a very helpful view that displays the words that have been classified once in the dataset and missed at other locations of the dataset.

Missed Entity Classifications view in AcharyaSorting on Missed Records would let the user know annotations which might have been missed by the annotator And sorting on Classification Counts would let the user know annotations that might have been wrongly annotated by the annotator.

Anomalies

Anomalies highlight those annotations where Acharya feels the annotation might be a mistake. As seen in the screenshot below, the annotator has included unnecessary symbols into the word. Often when classifying multiple records, or when using a classification service such mistakes crop in.

Like with the other views, Acharya makes it easy to jump to the record in question and correct the annotation in the underlying data.

Anomalies view in Acharya

Unclassified words

This is another feature we felt is important as you review your data. This simple table helps in identifying unclassified words in the dataset. Instead of browsing the dataset, this table helps sort the words based on their word length and number of occurrences.

Unclassified Words view in Acharya

We will continue our exploration of the data centric features and reports in Acharya in Part 2. In the meantime, feel free to let us know what are your favorite features and also share what your experience has been using Acharya. The link to Acharya is in the bio.

· One min read
Vimal Menon

I am excited to announce the alpha version of Acharya, a data-centric MLops tool for your named entity recognition projects. Download Acharya from the home page.

Please reach out to me in case you feel Acharya will be helpful in your nlp/ner projects.

A big shoutout to Nithin Stephen and Saurabh Korgaonkar for their immense hard-work and dedication.

· 2 min read
Vimal Menon

As a life-long C and now a Go programmer, CSS was a blackbox for me. Once I became a founder/developer of my product, that had to change and I had to learn CSS. I generally dedicate my Sundays to learning/practicing it.

I was elated with Neeraj Chopra's gold medal - a first for India in athletics (in javelin throw). To honor that, I was trying to build a pure css animation of a Javelin throw. And then a brainwave hit me, what if I added this around the athlete's name as another style for tagging entities in a NER project.

The result is:

Javelin launch animation

This was achieved using only css animations. There is no Javascript at all. The major challenge was to plot a bezier curve mimicking a javelin throw. Once I was able to get the bezier curve, I had to translate the curve into pixels and percentages. The math for the curve was done on paper and the translations in pixels were hardcoded into each @keyframes percentage. (see this https://developer.mozilla.org/en-US/docs/Web/CSS/@keyframes ) The javelin was added to ::before of the tag and rotated in various angles as the animation progressed again using keyframes. The celebration animation is a static background SVG which grows and shrinks.

Although I started it as a fun task, it eventually became a bit difficult to crack. I am happy with the results and in the process learnt a lot about CSS animations 😀

Here is a closer look.

Closer look at Javelin launch animation

· 2 min read
Vimal Menon

I have spent the past two decades developing software professionally, solving some of the most intricate engineering problems across a variety of business domains. When I started re-discovering ML a few years ago to address some use cases around text / language processing (my prior experience with ML was during my engineering days nearly 20 years ago), I discovered that the tooling around ML, more specifically around integrating NLP workflows into an agile engineering team's workflow needed significant rejig. Even though the tooling has been getting better, I still feel we haven't reached a level of ease today as compared to say the level with which an engineering team can introduce rigor within their DevOps cycle. The typical developer in an agile team faces quite an uphill task if they need to do NLP. More so if they need to integrate NLP notions into their dev/test/deploy loop. The sheer number of frameworks, libraries and tools and then the plumbing and interfacing required to get them to a usable state is often slow and re-invented every single time by every team.
Our intent behind Astutic AI was to further the state of the art in a developer focused tooling around ML starting with addressing some of these challenges in NLP/NER. Over the course of next few blogs, I will spend some time discussing these in more depth. Thank you for reading, and feel free to drop me a note.