llamagaurd3 identifies code as 'violent crime' #63

visagansanthanam-unisys · 2024-10-23T12:27:58Z

I am trying to have llamagaurd3 for a use case and I see that the model identifies any source code as unsafe violent crime. is this is a expected behavior

EricMichaelSmith · 2024-10-29T14:33:57Z

Hi @visagansanthanam-unisys can you give us other examples of this? No, this is not expected behavior

visagansanthanam-unisys · 2024-11-01T09:27:03Z

@EricMichaelSmith here are some more examples

However, I see the 8b models (llamaguard3:latest) seems to be working fine

kplawiak · 2024-11-01T18:50:14Z

Hi @visagansanthanam-unisys the two models (Llama Guard 1B and 8B) are different in terms of training data and underlying base models. Specifically, the 1B model was not trained on the coding interpreter category, which can lead to limitations for code input.
For more information on the training process and model limitations, please refer to the Llama Guard 3 1B model card (https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard3/1B/MODEL_CARD.md) and the Llama Guard 3 8B model card (https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard3/8B/MODEL_CARD.md).
Additionally, we recommend checking out the Llama Guard documentation (https://www.llama.com/docs/model-cards-and-prompt-formats/llama-guard-3/) for more examples (e.g how to format the input before passing it to the model).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llamagaurd3 identifies code as 'violent crime' #63

llamagaurd3 identifies code as 'violent crime' #63

visagansanthanam-unisys commented Oct 23, 2024

EricMichaelSmith commented Oct 29, 2024

visagansanthanam-unisys commented Nov 1, 2024

kplawiak commented Nov 1, 2024 •

edited

Loading

llamagaurd3 identifies code as 'violent crime' #63

llamagaurd3 identifies code as 'violent crime' #63

Comments

visagansanthanam-unisys commented Oct 23, 2024

EricMichaelSmith commented Oct 29, 2024

visagansanthanam-unisys commented Nov 1, 2024

kplawiak commented Nov 1, 2024 • edited Loading

kplawiak commented Nov 1, 2024 •

edited

Loading