An Introduction to AI Model Security
In this article
This article was originally published in 2020 and has been updated to reflect current trends, innovations, and implications in AI security.
The use of artificial intelligence (AI) in the business world has exploded. Artificial intelligence has and will continue to transform how we work, learn and play. And the impressive rise of generative AI, with models that can generate synthetic text, images, video and more, presents entirely new security risks that must be understood and mitigated.
Application and system leaders focused on innovation and AI are now much more accountable for improving worker and customer experience, developing new and existing worker skills and building organizational competency. The strength of the data science community, the growth of robust open source platforms like TensorFlow and the widespread adoption of the programming language Python have all helped to spread AI technology within reach of a broad engineering market.
Data science is no longer restricted to a small set of PhD mathematicians and computer scientists. At the same time, the scope and range of AI has blossomed. Advances in high performance computing, the prevalence of big data platforms and relatively inexpensive cloud resources have made it possible to address problems that we couldn't dream of tackling ten years ago.
AI has certainly caught the eyes and ears of leading CEOs and boards of directors. AI has the potential to revolutionize many enterprises, from a long-term business strategy through enhanced business processes to value back to their customers and shareholders. With that said, and based on our experience, few CEOs sufficiently understand what AI is and where and how they can gain maximum business advantage from it. Many CEOs and boards of directors will ask their CIOs or CSOs to explain the importance of AI and the risk.
CIOs should avoid technical detail and explain AI in terms of its potential for helping their business win, and CSOs need to the risk to the business if compromised. CIOs and CSOs can help the CEO and the board decide about AI investments by being very prescriptive in the discussion, rather than leaving it open-ended.
As the use of AI has spread, inevitably a new problem has arisen: AI model security. Attacks against AI models have proliferated in the "wild" as well as in the research community. Many of these attacks rely on what we would consider to be standard vulnerabilities, issues like poor access control to servers hosting data, exploitable bugs and logic flaws in software applying a particular AI model, and lack of sufficient logging and monitoring of AI model activity.
Where AI model security becomes interesting, though, is in the discovery and development of new cyber attacks derived from the nature of the mathematics of AI itself—attacks that allow an adversary to fool the model, skew the model by carefully poisoning the input data or use carefully crafted queries to steal sensitive personal data used to train the model and sometimes even the model parameters.
The WWT security and AI teams are working together to build a comprehensive program for AI security that addresses multiple areas of concern, including:
- evaluation of the security of AI environments (training, development, production);
- vulnerability assessment of specific AI models and applications; and
- a roadmap for implementing strong security throughout the AI model lifecycle.
WWT is also cooperating with top partners like the Intel Security Group to be able to offer the strongest possible security solutions at all levels of the development stack.
An important aspect of the AI revolution in the corporate world is that many large organizations staff their own internal teams to build and manage their AI solutions. The AI development pipeline is a complex and expensive undertaking that requires a great deal of organizational maturity and careful planning to achieve the desired outcomes. The pipeline spans several critical areas and requires continual effective collaboration between teams.
- Innovation and Problem Definition: The data science team needs to work with experts throughout the organization to identify problems that can be addressed with AI. Such problems might be internal—of interest only to the organization—as well as external, as part of a service supplied to customers.
- Data Engineering: The data engineering and data science teams cooperate closely to understand what and how much data is required for the effort, as well as the mechanics of obtaining, storing and processing the data, both for training the model and for applying the model in production.
- Development Environment: The data science team requires a robust and consistent methodology for managing critical resources in particular software frameworks, data channels and computing resources.
It is critical to protect AI models throughout the entire model development life cycle: acquisition of training data, data engineering, model building, model training, deployment, storage, modification, consumption of production data and model output.
The primary goals of AI model security are:
- Integrity: Prevent attackers from degrading AI models and AI model functionality.
- Availability: Stop attackers from interfering with normal operation of AI models.
- Privacy: Protect the confidentiality of sensitive data used to build the model as well as the proprietary information in the model itself.
In practical terms, security analysis views the AI development cycle as three largely independent sections: the training environment, where often terabytes or even petabytes of data are stored in a data lake with efficient access for building the AI models; the development environment itself, encompassing a software platform like JupyterLab, source code control system and collaboration tools; and the production environment, where gigabytes and terabytes of data are continually streamed to be processed by the model in real time. The production environment earns particular focus because this is where we usually find public-facing access, a.k.a. the attack surface.
New technology leads to new cyber-attacks against that technology. First and foremost, there are simply more targets—more production AI models to attack. New approaches to using and deploying AI models have led to increased opportunities for the adversary. The explosive emergence of generative AI enterprise systems, SaaS instances, and co-pilots has grown the AI attack surface exponentially, providing bad actors with many more avenues.
Offerings of AI as a service or making an AI model accessible via an API managed by a public web server gives a bad actor the opportunity to mount attacks against the model directly, and often anonymously. And (of course) with AI, it's all about the data. The training environment is vulnerable because the need for terabytes or even petabytes of training data makes it nearly impossible to secure the data or vet the data source. AI models in the production environment often operate on data from outside the organization, often from the public internet, giving the adversary more opportunity to poison or otherwise subvert the model.
Along with greater opportunity, the adversary also has greater motivation. Money is always a popular motivation—as businesses increasingly depend on AI for solving their hardest problems, there are more ways for the adversary to profit from fooling, skewing or stealing AI models. Another area of concern is the growing use of AI models for security analytics, leading more sophisticated offensive cyber teams (think nation-state) to develop attacks on AI models as part of their standard repertoire of techniques.
Like any other enterprise application, AI models are first and foremost vulnerable to traditional cyber attacks. At all stages of the AI development lifecycle, critical components are potential targets—data, source code and model files can be accessed and stolen or subverted by a successful hacker. Strong, traditional cybersecurity hygiene must be achieved to prevent the adversary from easily compromising AI models at their very core.
All network and server infrastructure used for AI development should be regarded as critical infrastructure and afforded the highest possible protection (we will discuss methods for hardening critical resources in the AI environment in a later article).
AI models are also vulnerable to a range of attacks tailored to the underlying mathematics of AI. These attacks fall into three main categories: evasion, poisoning and stealing.
This means fooling the model by changing input, typically in a production environment (i.e. when the model is applied to real-time data as an inference engine). An example in the cyber security realm was the bypass attack against Cylance Protect, where the AV program could be made to misclassify malware simply by appending a few strings from a popular computer game to the malicious executable.
There are also numerous examples of changing model output with small changes in the physical world. One particularly disturbing example is causing the Tesla autopilot AI to misread a stop sign by applying a few sticky notes.
Another general approach to model evasion is called adversarial perturbation and, as the name implies, refers to inducing an incorrect output from the model by making a very small change to the digital representation of the targeted input. Adversarial perturbation is most often used against models based on neural nets, especially instances of deep learning (e.g. image classifiers).
There are countless examples published on the web. In the following example from Microsoft, an AI model has (correctly) classified the first image as a cat with 88 percent certainty and (incorrectly) classified the second image as guacamole with 99 percent certainty. The difference between the two images to humans is imperceptible, but to the AI model analyzing the digital file, the difference is quite significant.
The mathematics underlying this effect appear to be quite complicated and the research community is still struggling to understand the causes. Several techniques have been developed to create robust models resistant to adversarial perturbation, but with sometimes unsatisfactory results: some of the techniques have been easily subverted by new attacks, while others are too complicated to implement efficiently or have unacceptable impact on model performance.
Control the data, control the model. Attackers can be quite creative in devising ways to skew the model by corrupting the data used to train the model. This can entail changing data, adding malicious data or both. The targeted dataset can involve the training data (usually in an enterprise data lake) or the streaming real-time data that the model consumes and uses as part of an update cycle.
Access to the training data provides the adversary with almost total control over the final model. Protecting the stored data in the development environment is critical, of course, as part of the standard enterprise security architecture. Protecting, vetting and just getting visibility into the sources of the training data is also critical, but can be quite challenging. We will discuss possible approaches to addressing these challenges in a follow-up article.
Poisoning the model using input in a production provides less influence to the attacker, but access is obviously much easier—gaining access is often as easy as clicking on a link. If the model doesn't continuously update itself using real-time data, this attack vector is part of an evasion attack. If the model does update with the input, the adversary has the opportunity to skew the model (though this is typically a case of "tail wagging dog" given the relatively small amount of data and complexity of the mechanism under attack).
A good example of this class of attacks is the assault on the Gmail spam filter, an attempt to retrain the filter to accept certain types of spam as non-spam. The basic idea was to use new "burner" accounts to send and receive large amounts of spam, and then manually (or with bots) tell Gmail that the received spam was actually not spam.
Also called extraction, stealing refers to any method of inferring data from an AI model that shouldn't be inferred. The basic approach is very simple and usually hard to detect: the attacker supplies input to the model, records the corresponding output and repeats—when a sufficient number of records have been obtained, the attacker can perform offline analysis to obtain the desired information. Some basic stealing attacks:
- Membership: AI models performing classification or any task where it computes a likelihood score are potentially vulnerable to membership inference. The idea is that if a model is trained on data containing a given fact (such as the SSN 123-45-6789, for example), then that "fact" that the model observed directly in training will receive a higher score than a derived fact. Researchers have demonstrated versions of this attack on image classifiers, successfully recreating the face of a training subject with multiple queries.
- Model Stealing: Perhaps more accurately called "model reproduction," in its simplest form this can be accomplished by querying the model with a large number of valid inputs and using the corresponding output to train a new model to be functionally equivalent. For a particular model, the number of inputs required to "steal" it may be prohibitively large, but the theory is straightforward. Further, there is no clear way to defend or mitigate against these attacks beyond monitoring inputs and recognizing excessive numbers of queries ("spoofing").
- Model Reprogramming: Usually effective against more complex models (e.g. models using multiple layers of neural nets), this is a clever idea that aims to get an existing AI model to provide unintended functionality at little cost to the attacker. One class of use cases is in generating "deep fakes." For example, an adversarial model might be able to tune its parameters for generating realistic human faces by submitting candidates to facial recognition software; if a candidate is sufficiently close to human, it should resemble somebody in the targeted model, which gets reflected in the classification score.
In addition to the above traditional AI cyber attacks (evasion, poisoning, and stealing), generative models introduce their own unique threat vectors such as prompt injection, where an attacker can use a chat prompt to trick a Large Language Model into either releasing data it shouldn't or lowering its guard rails to allow for malicious activity. This is the direct form of prompt injection, this attack can also occur indirectly. For example, the attack can be embedded in an external web page that a user has the LLM ingesting for auto-summarization – upon accessing the web page the LLM could execute the attack with no direct access from the attacker to a prompt. Generative AI also might introduce vulnerabilities by way of insecure plug-ins, compromised open-source models, or insecure output handling.
As discussed above, the first step in protecting your AI is following good security practices in securing the full range of infrastructure that AI development requires. Moreover, given the growing importance of AI to the organization as well as the increased investment in building AI models, we believe that this actually qualifies as critical infrastructure and should receive the highest level of protection. A robust development platform dramatically improves security by enabling better access controls, logging, secure storage, and forensic capabilities in case of compromise.
Another important aspect of AI is that at times it is positioned and accessed much like other applications developed and deployed by the enterprise. Hence applying traditional application security safeguards is important and should follow standard practice for the organization's info security team. Similarly, monitoring, analytics and alerting are important security controls throughout the model development lifecycle, and relevant AI activity should be integrated with the enterprise systems.
Beyond general security concerns, protecting an AI model requires a deep understanding of AI technology and usually the model itself. Adversarial patching, for example, is virtually a field in itself, generating large numbers of papers and talks at academic conferences on AI. To be effective, monitoring real-time input to AI models requires an understanding of what the attacks are and how to recognize them automatically in a next-generation SIEM. Regression testing is perhaps the best mitigation for model poisoning, but what does this look like for an AI model?
For generative AI systems, access controls and monitoring of model usage are especially important to prevent misuse. Usage policies, watermarking of generated content, and controls on model distribution can help mitigate risks.
At WWT we take an integrated approach to AI security, rather than focusing on point solutions. This helps us align business goals and objectives to technical solutions, providing more effective outcomes and solutions that further the development of an enterprise AI security model architecture. Our goal is to streamline the design, implementation, management and evolution of AI architecture to establish security awareness, optimize defense capabilities, improve threat response, mitigate breaches and close compliance gaps for all of our customers.
Learn more about how we can integrate and deploy AI securely to help reduce vulnerabilities. Request our AI Model Security Workshop to better understand your organization's current AI model security posture.