Argilla open-source tool

Introducing Argilla Trainer

🏋🏽 Introducing Argilla Trainer

April 17, 2023

Natalia Elvira Astoreca, Tom Aarsen, David Berenstein

We are thrilled to introduce another exciting new feature included in Argilla v1.6.0: Argilla Trainer, a wrapper to facilitate training workflows using your Argilla datasets. It currently supports training for Text Classification and Token Classification tasks with popular NLP libraries like spacy, setfit and transformers. Watch out for support for more libraries and tasks in upcoming releases!

The Argilla Trainer takes care of all the data transformations needed to train models using Argilla datasets and offers a set of default configurations so that you can move directly from annotations to training. What’s more, you can access ready-made code directly from the Argilla UI! 🚀

Here is an example of how to use Argilla Trainer to train a text classification model with SetFit. First, we log our dataset in Argilla:

import argilla as rgfrom argilla.training import ArgillaTrainerfrom datasets import load_dataset# change these variables to connect to your Argilla instancerg.init(    api_url='YOUR_ARGILLA_URL',    api_key='YOUR_ARGILLA_API_KEY')# log the datasetrg_dataset = rg.DatasetForTextClassification.from_datasets(    dataset=load_dataset("poem_sentiment", split="train"),    text="verse_text",    annotation="label",)rg.log(rg_dataset, "train_poem_sentiment")

Now you can open Argilla and start annotating your dataset. Luckily, the poem_sentiment dataset is already annotated, so we can jump straight to training! 😃

To make this step even easier, admin users can access code snippets ready to copy-paste directly from the Argilla UI. (If you didn’t hear about the new user roles in Argilla check this out!) You just need to open your dataset, click the </> Train button and select your preferred framework. You will find there a code snippet with all the variables tailored for that dataset and the selected framework.

Train

If you paste the first code snippet and run it, your model will start training. When it’s done, you should see something similar to this:

***** Running training *****  Num examples = 14260  Num epochs = 1  Total optimization steps = 892  Total train batch size = 16Iteration: 100%|██████████| 892/892 [16:37<00:00,  1.12s/it]Epoch: 100%|██████████| 1/1 [16:37<00:00, 997.27s/it]Applying column mapping to evaluation dataset***** Running evaluation *****Downloading builder script: 100%|██████████| 4.20k/4.20k [00:00<00:00, 1.45MB/s][04/10/23 11:04:10] INFO     INFO:ArgillaTransformersTrainer:{'accuracy': 0.7653631284916201}

Et voilà! Your model is ready to start predicting! 🔮

test = trainer.predict("How beautiful can thy be, oh ArgillaTrainer.")test.prediction# output:# [('positive', 1.0)]

If you want to learn more about training models using Argilla datasets, check out our docs.