๐๐ฝ Introducing Argilla Trainer
April 17, 2023
We are thrilled to introduce another exciting new feature included in Argilla v1.6.0: Argilla Trainer, a wrapper to facilitate training workflows using your Argilla datasets. It currently supports training for Text Classification and Token Classification tasks with popular NLP libraries like spacy
, setfit
and transformers
. Watch out for support for more libraries and tasks in upcoming releases!
The Argilla Trainer takes care of all the data transformations needed to train models using Argilla datasets and offers a set of default configurations so that you can move directly from annotations to training. Whatโs more, you can access ready-made code directly from the Argilla UI! ๐
Here is an example of how to use Argilla Trainer to train a text classification model with SetFit. First, we log our dataset in Argilla:
import argilla as rgfrom argilla.training import ArgillaTrainerfrom datasets import load_dataset# change these variables to connect to your Argilla instancerg.init( api_url='YOUR_ARGILLA_URL', api_key='YOUR_ARGILLA_API_KEY')# log the datasetrg_dataset = rg.DatasetForTextClassification.from_datasets( dataset=load_dataset("poem_sentiment", split="train"), text="verse_text", annotation="label",)rg.log(rg_dataset, "train_poem_sentiment")
Now you can open Argilla and start annotating your dataset. Luckily, the poem_sentiment
dataset is already annotated, so we can jump straight to training! ๐
To make this step even easier, admin
users can access code snippets ready to copy-paste directly from the Argilla UI. (If you didnโt hear about the new user roles in Argilla check this out!) You just need to open your dataset, click the </> Train
button and select your preferred framework. You will find there a code snippet with all the variables tailored for that dataset and the selected framework.
If you paste the first code snippet and run it, your model will start training. When itโs done, you should see something similar to this:
***** Running training ***** Num examples = 14260 Num epochs = 1 Total optimization steps = 892 Total train batch size = 16Iteration: 100%|โโโโโโโโโโ| 892/892 [16:37<00:00, 1.12s/it]Epoch: 100%|โโโโโโโโโโ| 1/1 [16:37<00:00, 997.27s/it]Applying column mapping to evaluation dataset***** Running evaluation *****Downloading builder script: 100%|โโโโโโโโโโ| 4.20k/4.20k [00:00<00:00, 1.45MB/s][04/10/23 11:04:10] INFO INFO:ArgillaTransformersTrainer:{'accuracy': 0.7653631284916201}
Et voilร ! Your model is ready to start predicting! ๐ฎ
test = trainer.predict("How beautiful can thy be, oh ArgillaTrainer.")test.prediction# output:# [('positive', 1.0)]
If you want to learn more about training models using Argilla datasets, check out our docs.