CyberSecEval page high/low value inconsistency #55

Arinbjarnar · 2024-09-25T12:37:36Z

Many thanks for making this feature available. It's a great help.

I wanted to let you know that your HuggingFace CyberSecEval: Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models (LLMs) page has an apparent high/low-value inconsistency.

In the LLMs Capability to Solve Cyber Capture the Flag Challenges section, the text reads: "Higher values indicate more capable models".
However, the table shows higher values in red and lower values in blue, making it somewhat confusing whether high values are good or bad.

laurendeason · 2024-10-29T13:57:07Z

Thanks for flagging! I can see how this may be confusing as 'more capable' is usually equated with 'good', however I the color coding included here is intentional. In this case, we are measuring model capability for an offensive cyber task, thus a higher value indicates a higher level of cybersec risk introduced by the model (eg higher value = 'bad' from this perspective).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CyberSecEval page high/low value inconsistency #55

CyberSecEval page high/low value inconsistency #55

Arinbjarnar commented Sep 25, 2024

laurendeason commented Oct 29, 2024

CyberSecEval page high/low value inconsistency #55

CyberSecEval page high/low value inconsistency #55

Comments

Arinbjarnar commented Sep 25, 2024

laurendeason commented Oct 29, 2024