Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CyberSecEval page high/low value inconsistency #55

Open
Arinbjarnar opened this issue Sep 25, 2024 · 1 comment
Open

CyberSecEval page high/low value inconsistency #55

Arinbjarnar opened this issue Sep 25, 2024 · 1 comment

Comments

@Arinbjarnar
Copy link

Many thanks for making this feature available. It's a great help.

I wanted to let you know that your HuggingFace CyberSecEval: Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models (LLMs) page has an apparent high/low-value inconsistency.

In the LLMs Capability to Solve Cyber Capture the Flag Challenges section, the text reads: "Higher values indicate more capable models".
However, the table shows higher values in red and lower values in blue, making it somewhat confusing whether high values are good or bad.

Screenshot 2024-09-25 132746 highlights

@laurendeason
Copy link
Contributor

Thanks for flagging! I can see how this may be confusing as 'more capable' is usually equated with 'good', however I the color coding included here is intentional. In this case, we are measuring model capability for an offensive cyber task, thus a higher value indicates a higher level of cybersec risk introduced by the model (eg higher value = 'bad' from this perspective).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants