Helpful or Harmful? Evaluating LLM-Assisted Vulnerability Patching via a Human Study

Software vulnerability remediation is a cognitively demanding task that requires specialized security expertise often lacking in general developers. In the meantime, Large Language Models (LLMs) assisted tools show potential in vulnerability detection, location, and repair tasks.

[Hypothesis:] While LLM-assistance is hypothesized to accelerate patching, it also risks introducing hallucinations or insecure code, leading to a higher likelihood of generating superficial repairs that bypass the standard functionality checks but fail the security validation. [Objective:] We aim to present an empirical experiment, unveiling the capability of LLM-assisted vulnerability patching compared to manual debugging on human participants in real-world scenarios.

[Method:] We plan to conduct a controlled experiment using a Balanced Crossover design. For that, we have developed a WebApp for code execution and integrated hidden Ghost Tests to verify patch integrity beyond visible functional requirements.

The experiment involves training and evaluation scenarios. The remediation speed, remediation efficacy for both standard functionality tests and security tests, and participant perception will be evaluated.

[Pilot Study:] A pilot experiment with a small sample of participants has been conducted, providing insights for the following study.

Helpful or Harmful? Evaluating LLM-Assisted Vulnerability Patching via a Human Study

Authors

Abstract

Resources

Stay in the loop

Pages

Tools

Details