How to Monitor AI with AI
Artificial intelligence can’t be trusted to get everything right, so you need humans in the loop. You also need AI monitoring AI for speed and scale.
AI gaffes are sobering, whether it’s hallucinations or making dubious decisions. This is one reason why humans must be kept in the loop. However, artificial intelligence can operate at a speed and scale that is physically impossible for humans while surfacing edge case exceptions that warrant human review and oversight. This type of partnership helps ensure that AI is doing its job correctly.
“A human can’t read and evaluate things 24/7 and take actions in milliseconds. That’s why the Turing test doesn’t apply anymore because now we’re talking about the same capability as a human but at a 100,000X improvement in scale, speed and accuracy because it’s retrieving much more information,” says Mohamed Elgendy, CEO and co-founder of AI/ML testing platform Kolena. “Large language models are being used to evaluate models before they’re deployed and as guardrails after it’s deployed.”
For example, a business might want a simple guardrail for a chatbot that prevents the chatbot from mentioning a competitor or more complex guardrails around violence, hallucinations, and jailbreaking. In the fintech space, models are prevented from giving financial advice in fintech applications because doing so is illegal.
“The idea of using AI to monitor and regulate other AI systems is a crucial development in ensuring these systems are both effective and ethical,” says Cache Merrill, founder of software development company Zibtek, in an email interview. “Currently, techniques like machine learning models that predict other models' behaviors (meta-models) are employed to monitor AI. These systems analyze patterns and outputs of operational AI to detect anomalies, biases or potential failures before they become critical.”
The benefits of AI monitoring AI include a level of scalability humans can’t achieve, a higher level of consistency since AI does not require rest and depth of analysis based on deeper patterns and correlations that might be overlooked by human analysts.
Think of Monitoring in Terms of Testing
AI is expressed through software, and like any other kind of software, testing is necessary to ensure the AI is doing what it was designed to do. For example, more applications are calling large language model (LLM) APIs, but the problem with AI consumption is that someone needs to measure it. Using AI that utilizes structured temporal data, it’s possible to achieve more accurate forecasting.
“You’re trying to constantly monitor the consumption introduced by AI, trying to utilize the structured time series data to understand the patterns within it to do forecasts, understand anomalies or understand a change point relationship in between them,” says Devarvrat Shah, CEO and co-founder of Ikigai Labs, an enterprise generative AI platform for structured time series data, and an MIT data science and statistics professor.
Also needed is the use of AI for hypothesis testing, he says.
For example, society has certain norms that are expected to be upheld and AI is already being used for regulatory compliance. However, AI could be used to help define which norms are valid and invalid. The user would present the AI with a norm and the AI could tell whether that norm can be converted into a hypothesis test or not. If it can, then using data, it would be possible to verify the norm constantly. The AI could also tell the user what sort of data it needs.
“Speaking as an academic, what we need to do is define the notion of a hypothesis test and then create a test and create proof from that,” says Shah. “The AI and I should be able to both define the regulations and manage the regulation AI, whether it’s auditing the forms or litigation of the forms.”
Zulfikar Ramzan, chief scientist and EVP of product and development at identity protection company Aura, says transparency isn’t enough, especially for complex systems, because they are too complicated for human understanding.
“There’s a lot of great explainable AI research, but when it comes to the more advanced algorithms, we’re nowhere close to where we need to be for production environments,” says Ramzan.
Challenges with AI Oversight by AI
According to Zibtek’s Merrill, there are three challenges with using AI to monitor AI. The first is the complexity of self-regulation. Designing AI that effectively monitors other AI involves complex recursive training, which can be challenging to implement and maintain.
The second is overreliance on technology. There’s a risk of becoming overly dependent on technological solutions, potentially neglecting the human judgment crucial in overseeing and contextualizing AI decisions.
The third is ethical and privacy concerns. Using AI to monitor other AI systems raises significant privacy and surveillance concerns, particularly about who controls these systems and how they’re used.
Another challenge is whether it’s possible to understand what’s actually happening, which is why explainable AI is important.
“You don’t need a Ph.D. in machine learning to understand simple algorithms like decision trees and random forests. [However,] when you’re dealing with truly unconstrained environments where you are interacting with user data that can come from anywhere or be anything, being able to catch every one of those use cases reliably can be problematic,” says Aura’s Ramzan.
The first place he thinks organizations should start is using AI to monitor data since the accuracy of inferences depends on it.
“You need to look at your data upfront because if you don’t have that, it doesn’t matter what happens downstream,” says Ramzan. “The second thing is around feature engineering and knowing what to look for in the data. That’s where having domain expertise is important. Then you can start looking programmatically for those types of instances. Then and only then does a classifier matter.”
The Future of AI Supervision
OpenAI has talked openly about its desire and effort to build artificial general intelligence (AGI) which is a harder problem to solve than today’s artificial narrow intelligence (ANI) which does one specific thing well. Kolena is working on AGI in partnership with a government agency presently, and despite the long-term forecasts that have extended into 2050, Elgendy expects AGI to debut in 2025.
“When it comes to AGI, you want it to unlearn some stuff because learning information from the internet has caused hallucinations and the need to make decisions,” says Elgendy. “The monitoring piece will be two things: verifying that the AI is doing what it is intended to do and enabling a human to understand the details of how every action led to the final output when it got confused.”
Zibtek’s Merrill sees the rise of increasingly more autonomous AI systems capable of self-correction and more nuanced ethics-based decision-making.
“Advancements in explainable AI will likely play a significant role in making AI monitoring more transparent and understandable, which is vital for gaining public trust and regulatory compliance,” says Merrill. “As we refine these technologies, the goal will be to create a balanced ecosystem where AI enhances human capabilities without replacing the critical oversight that only humans can provide.”
Aura’s Ramzan is doubling down on Explainable AI.
“That’s an area where there’s a lot of active research now. Also, with the legislation and compliance regimes becoming more prominent, that’s going to drive home the point that we need better explainability,” says Ramzan. “I can tell you, it’s really scary to deploy a system for the first time and not know how it’s going to perform or how to validate it. So, a lot of effort has to be put into the validation step in the first place.”
About the Author
You May Also Like