How do we harden a generative AI application? 

The release of ChatGPT to the public in November of 2020, along with the APIs that allowed developers and businesses to integrate it into their applications and services, has created tremendous demand for AI-driven capabilities. 


Why? 

Interacting with a large language model via a chat interface gives the user the impression that they are communicating with a very high form of intelligence, almost sentient. You can “talk” to the bot using unstructured natural language, and the responses appear thoughtful and authoritative. Even though we know these models use machine learning algorithms to predict the best response to our questions, it still feels like we are talking to a person. 

The challenge for organizations is that business processes require more functional capability than answering general questions, but the potential is clear.  


What’s different with AI? 

Automating a business process using an LLM requires training the model in the specific content associated with the target process. This involves ingesting documents, images, and other content that may be confidential or business-sensitive. This creates a potential vulnerability that could be exploited and expose that data to the wrong people. Many examples show how a user can trick an LLM into disgorging content that it was not supposed to.  

Once the model is trained, you must build an application around it that captures user input, passes some of it to the LLM, and processes the response. The steps required to harden an application are well known, including secure data transmission, access controls, input validation, error handling and logging, patch management, testing, and awareness.  

However, generative AI, the additional challenge of managing the interaction with an LLM that can produce an almost infinite range of responses, requires a whole new level of diligence. Developers and their risk partners must understand the risks and mitigation strategies when formatting prompt inputs and managing LLM responses.  

LLMs work based on prompts, a combination of structured data, such as flags that tell the model how to process the text, and unstructured data, such as the user’s question and the style or tone of the response.  

The application must convert the user’s input into a functional prompt to produce the best answer. This process introduces many risks that impact the interaction’s relevance, accuracy, and cost. While most LLMs are based on the same core principles, the range of available parameters in their APIs can vary widely. These parameters must be understood and tested to ensure that the final application is resilient enough to withstand the range of possible interaction scenarios when it is deployed into production and exposed to its target user population. 

In subsequent posts, I will explore the risks and mitigation strategies on the response side. 

Insights & News

Find out GEG can do for you.