This repo is the official implementation for Incubating Text Classifiers Following User Instruction with Nothing but LLM. We allow users to get a personalized classifier with only the instruction as input. The incubation is based on a llama-2-7b fine-tuned on Huggingface Meta Data and Self-Diversification.
You can use the script incubate.sh
to incubate your own classifiers.
python incubate.py --n_epoch 16 \
--batch_size 4 \
--device 1 \
--n_sample 16 \
--max_new_tokens 64 \
--instruction "Build a classifier that can categorize text messages by 'about food' and 'about movie'." \
--incubator "KomeijiForce/Incubator-llama-2-7b" \
--classifier "roberta-base" \
--save_path "roberta-base-incubated"
By running the default incubation script, you can view the following output on default test cases:
Input: I love 'Spiderman 2'!
Predicted Label: about movie
Input: I ate a delicious pudding!
Predicted Label: about food