In the context of the growing role of artificial intelligence (AI) in various fields, the question of its impact on the job market in cybersecurity is becoming increasingly relevant. On one hand, automation and intelligent systems can significantly speed up and streamline processes related to data and IT infrastructure protection. On the other hand, there is concern that AI could replace humans in some functions, leading to a reduction in available jobs. From my observations, it will be quite the opposite (at least for security professionals)!
AI Capabilities in Cybersecurity
AI has the ability to analyze vast amounts of data in a short time, which is crucial in cybersecurity where quick response to threats is often decisive. Machine learning algorithms can detect previously unknown threats based on behavior patterns and historical data analysis. This allows for faster and more precise identification and response to security incidents. Automating threat analysis and response processes may reduce jobs for lower-level security analysts who handle routine monitoring and alarm response. However, new specialties such as machine learning engineering in cybersecurity, managing intelligent defense systems, and advanced threat analysis using AI are emerging. Professionals will need to not only understand the workings of modern AI tools but also develop and adapt them to specific security needs.
Code Generation Automation = More Security Bugs + New Vulnerability Classes
Errors Introduced by Automatic Tools
Never before has application development been so easy, accessible to non-technical people, and fast. This leads to a “boom” of new applications. This is accompanied by a surge in introducing AI elements into existing software. However, automation, especially in the form of code generators or development frameworks, can lead to unintended errors and security vulnerabilities. These tools, operating based on established patterns and algorithms, may not consider specific security conditions or nuances understood and applied by experienced developers. For example, automatically generated code may not be adequately protected against attacks such as SQL Injection or Cross-site Scripting (XSS) if the tool is not properly configured to address these threats.
XSS in Zoomin Zdocs
An example of such an error, where AI is being hastily implemented, is a simple “Reflected Cross-site Scripting” found by me in the Zoomin Zdocs application. Introducing an AI assistant to handle documentation for an application seems like a great idea, where we can learn how to use the software by asking questions. However, it must be remembered that AI responses can be somewhat unpredictable and cause security issues. In the Zoomin application, it was enough to ask the assistant about –
<img src=1 href=1 onerror="javascript:alert(document.domain)"></img>?
for it to reply that it could not find an answer to our question, quoting it:
I'm sorry, but I couldn't find a definitive answer to: <img src=1 href=1 onerror="javascript:alert(document.domain)"></img>?. Please provide more context or clarify your query.
The entire request looked as follows:
POST /api/aisearch/stream HTTP/1.1 |
Server response:
HTTP/2 200 OK |
As you can see, the input data was not properly validated and sanitized, which resulted in the execution of the injected JavaScript code:
Example pseudocode with incorrect response display:
function displayResponse(userInput):
# Load user input
query = userInput
# Display response without input sanitation
# This is incorrect as it allows script execution in userInput
print("Answer to your question: " + query)
Example pseudocode with correct response display:
function sanitize(input):
# Remove or encode special HTML characters in the input, e.g., <, >, ", ' and others
return input.replace("<", "<").replace(">", ">").replace("\"", """).replace("'", "'")
function displayResponse(userInput):
# Load user input
query = userInput
# Sanitize input
safeQuery = sanitize(query)
# Safely display response
# Sanitization prevents execution of malicious code
print("Answer to your question: " + safeQuery)
The vulnerability (along with a series of others) was reported and patched by the manufacturer.
Error Replication
One of the main drawbacks of automation is the replication of the same code in many places, which can lead to widespread dissemination of errors or vulnerabilities. When a bug is found in a code-generating component (and AI can be trained on buggy code), every application or system using that code is potentially at risk. This phenomenon scales the security problem, making it harder to manage and fix.
Challenges in Auditing and Code Review
Automatically generated code is often complex or generated in a way that is not intuitive to “human” programmers. This can hinder manual code reviews, which are crucial in identifying subtle logical errors or security vulnerabilities. Lack of transparency and understanding of generated code can lead to overlooking significant issues during security testing.
New Classes of Vulnerabilities
Automation can introduce new classes of vulnerabilities that would be less likely with manual code creation. For example, dependencies between automatically generated modules may not be fully understood or controlled by developers, opening the door for dependency and logic-related attacks. New vulnerability classes such as “prompt injection” are also emerging.
OWASP Top 10 for Large Language Model Applications
A year ago, OWASP began working to identify and issue recommendations to protect against the greatest threats from using LLMA – OWASP Top 10 for LLM. The list is as follows:
LLM01: Prompt Injection
Manipulating a large language model (LLM) through appropriate input causes unintended actions by the LLM. Direct injections overwrite system prompts, while indirect ones manipulate data from external sources.
LLM02: Unsafe Output Handling
Vulnerability occurs when LLM output is accepted without verification, which can expose backend systems. Abuse can lead to serious consequences such as XSS, CSRF, SSRF, privilege escalation, or remote code execution.
LLM03: Training Data Poisoning
Occurs when LLM training data is manipulated, introducing vulnerabilities or errors affecting security, effectiveness, or ethical behavior. Sources include Common Crawl, WebText, OpenWebText, and books.
LLM04: Model Delegation
Attackers cause resource-intensive operations on LLM, leading to service degradation or high costs. Vulnerability is exacerbated by LLM’s resource intensity and unpredictability of user input.
LLM05: Supply Chain Vulnerabilities
The LLM application lifecycle can be compromised by vulnerable components or services, leading to security attacks. Using external datasets, pre-trained models, and plugins can increase the number of vulnerabilities.
LLM06: Sensitive Information Disclosure
LLM may inadvertently disclose sensitive data in its responses, leading to unauthorized data access, privacy breaches, and security violations. Implementing data sanitation and strict usage policies is key to preventing this.
LLM07: Insecure Plugin Design
LLM plugins may have unsecured inputs and insufficient access control. Lack of application control facilitates exploitation and can lead to consequences such as remote code execution.
LLM08: Excessive Autonomy
LLM-based systems can take actions leading to unintended results. The problem arises from excessive functionality, too much privilege, or autonomy given to LLM-based systems.
LLM09: Over-Reliance
Systems or individuals overly dependent on LLM without oversight may be exposed to misinformation, communication errors, legal issues, and security errors due to incorrect or inappropriate content generated by LLM.
LLM10: Model Theft
Concerns unauthorized access, copying, or theft of proprietary LLM models. Impacts include economic losses, loss of competitive advantage, and potential access to confidential information.
What to Do? How to Live?
OWASP, along with the Top 10 for LLM, issued several recommendations for a secure approach to implementing LLM in your organization – LLM AI Cybersecurity & Governance Checklist. Those related to programming are presented below:
- Model threats to LLM components and define architecture trust boundaries.
- Data security: check how data is classified and protected due to its sensitivity, including personal and corporate data. (How are user permissions managed and what protective mechanisms are implemented?)
- Access control: implement access controls according to the principle of least privilege and apply layered defenses.
- Training process security: require rigorous control of training data, processes, models, and algorithms management.
- Input and output security: assess input validation methods and how output is filtered, sanitized, and validated.
- Monitoring and response: map workflows, monitor, and respond to understand automation, ensure logging, and auditing. Confirm the security of audit records.
- Include application testing, source code review, vulnerability assessment, and penetration testing in the product development process.