AI Red Teaming & Securing Enterprise AI with Leonard Tang of Haize Labs

May 16, 2025

View Show Notes and Transcript

As AI systems become more integrated into enterprise operations, understanding how to test their security effectively is paramount.

In this episode, we're joined by Leonard Tang, Co-founder and CEO of Haize Labs, to explore how AI red teaming is changing.

Leonard discusses the fundamental shifts in red teaming methodologies brought about by AI, common vulnerabilities he's observing in enterprise AI applications, and the emerging risks associated with multimodal AI (like voice and image processing systems). We delve into the intricacies of achieving precise output control for crafting sophisticated AI exploits, the challenges enterprises face in ensuring AI safety and reliability, and practical mitigation strategies they can implement.Leonard shares his perspective on the future of AI red teaming, including the critical skills cybersecurity professionals will need to develop, the potential for fingerprinting AI models, and the ongoing discussion around protocols like MCP.

Question asked:
‍00:00 Intro: The Evolving Threat Landscape of AI Red Teaming
‍01:45 Meet Leonard Tang: CEO of Haize Labs & AI Red Teaming Visionary
‍05:58 AI Red Teaming vs. Traditional Security: Key Enterprise Differences
‍06:59 Haize Labs Insight: Beyond Red Teaming to AI Quality Assurance (QA)
‍08:42 Real-World AI Red Teaming: Chatbots, Voice Agents & Customer-Facing Apps
‍10:23 CRITICAL AI RISK: Unpacking Multimodal Vulnerabilities (Voice & Image Exploits)
‍12:20 Scary AI Exploit Example: Voice Injections via Background Noise!
‍15:30: AI Vulnerabilities Today: Echoes of Early XSS Exploits? (Analogy)
‍20:18 Expert AI Hacking: How to Precisely Control AI Output for Exploits
‍21:21 The AI Fingerprinting Challenge: Identifying Chained & Multiple Models
‍25:45 The Elusive Target: Reality & Difficulty of Fingerprinting LLMs Accurately
‍29:22 Top Enterprise AI Security Concerns: Protecting Reputation, Brand & Policy Adherence
‍34:13 Enterprise AI Toolkit: Frontier Labs Models vs. Open Source & Custom Builds?
‍34:57 Future of LLMs: Specialized Models, Cost Reduction & "Hot Swap" AI-as-a-Service
‍37:43 Model Connector Protocol (MCP): Enterprise Ready or Still Too Early for AI?
‍44:42 AI Security Best Practices: Effective Mitigation with Precise Input/Output Classifiers
‍49:25 Next-Gen AI Red Teaming Skills: Beyond Prompts to Discrete Optimization Algorithms

oftentimes when you're building complex a fusions, you're going to be performing a certain amount of tool calling or function calling or offloading some amount of work to some existing piece of software, right? And I think the core of red teaming in AI comes down to how precisely can you elicit a very specific output on the other side, right? If you wanted to know how red teaming has changed with AI, this is the episode. We spoke to Lennon Chang who is from Hazel Lab that is a red teaming company. We spoke about some of the common things we seeing across enterprises they are reging against and what are some of the common patterns they've seen across enterprise AI applications. We also spoke about some of the tactical things you could be doing to do better security for your AI systems before you set them for a red team. Overall the conversations revolved around the fact that red teaming for AI systems what does it look like for enterprise specifically companies like entropic and open AI and stuff you talk about this as well cuz they should go through a process and what are the main concerns for enterprises that are getting their AI system red teamed all that a lot more this conversation with Leonard and if you are someone who is in red team or know your red team people who should listen to this episode definitely share this episode with them so they get a hang of what red teaming is going to be about in the future with all the AI applications and how it is different to doing a red team of pen testing across web applications versus actual AI applications. Before I press play on the episode, if you are watching or listening our AI subscript podcast episode for a second or third time and we're enjoying it, I would really appreciate if you could take a few seconds to drop us a review or a rating on audio platforms like Apple or Spotify or if you're watching this on YouTube or LinkedIn, subscribe and follow over there to show us some support. It only takes a few second and we really appreciate it. All right, enjoy this episode with Leonard and I'll talk to you soon. Hello, welcome to another episode of AI CyberC podcast. Today we have Leonard. Hey man, thanks for coming on the show. Thank you for having me. Very excited to be here. Great to have you as well. Would you mind sharing a bit about yourself and what you do and how you got stuck into AI, man? Yeah, for sure. So, long story short, I was a failed or recovering academic. So, I spent all of my undergrad thinking about AI research in safety and evaluation and robustness. In concrete, what this meant was I spent a lot of time doing abst attacks and machine learning. I spent a lot of time studying the unfaithfulness of reasoning and language models. I spent a lot of time thinking about ways in which they were really brittle in ways that humans would never be. And so I was geared up primarily to be an academic was going to go start a PhD in the fall. But yeah, one thing led to another. I was doing a lot of red teaming work on LLMs and we ended up starting this company Haze Labs which I am the co-founder and CEO of back at this point in January 2024. uh we ended up working with a lot of the great frontier labs like OpenAI and Anthropic and A21 and others on red teaming their language models before they were released into the wild and that's where we got our start as a company and we've built this really great technology mandate around how do we rigorously test and service all the bugs and vulnerabilities in AI systems before they get released to the world. By the way, I'm going to plug your mullet. I love the whole mullet thing you hear that you definitely make of you. So people who would watch this episode should definitely check out his malik. Yeah, that's part of the hay style. There's Hali Sakur team has his haircut at this point. Oh, really? Long mullet in the back. That's right. So you call yourself a failed academic, but you got it sucked into red teaming LLMs. I guess maybe to set some context, obviously you mentioned companies like OpenAI and cloud and everything as well. I was going to say, how would you describe red teaming? Is your focus primarily for LLM providers or are you doing it for or are you looking into this for the general public as well because I guess I imagine their use cases are very different to what say I don't know any enterprise out there would be so we started off as a company focusing solely on the LLM providers but soon we realized that one there's only so many LM providers you can work with but two there's actually like a much richer set of problems around domain specific use case specific application specific risks and so these days we're very much targeted to testing AI at the application layer. I think this is a wide open green field space to move into given that nobody well nobody actually really knows what it means to be risk-f free quote unquote in their specific domain. And a lot of our work these days is of course helping test test with respect to some sort of risk mandate but also just helping the customer come to some conclusion around what their safety posture should be. What are the behaviors they should test for? what is like the always set of rules and never set of rules that their AI applications should follow. But yeah, long and short of it is we're focused very much on AI applications these days. And and actually if you remember Oshish in our last episode with Joseph, we talked about Haze and some of the things that it was doing and so if you can reference that if you remember. Yeah, actually for context as well because last episode we were talking to Joseph Hataka Reszo and he's been obviously doing bug bounty for the AI application side of red teaming and I think what to what you're saying as well and I'm sure people would go back to the episode but it really worthwhile clarifying uh when people think about red teaming in general they have hey I am on a mission I have a goal I go after with everything that I have essentially that's how people describe red teaming would that be any different to how you describe it for AI applications or is it application is what you're focusing on it just happens to have AI capability ultimate like philosophy of renting are you asking are we like adversary emulation or what yeah I think we're maybe just to clarify the difference as well because I think so the way we had the conversation last week was more around hey most enterprise are building applications uh which are obviously have been there for a long time as legacy application modern API applications now they have AI capability and a lot of the conversation we had was focusing more on oh these applications still have application security vulnerabilities like your SQL injection of the world and as a bug bounty hunter he is focusing on identifying what these are but obviously with the fact that now this has a chatbot attached so what's the thing with prompt injection that I can do over there now that was that context and how he can spread the bug bounty world how do you describe that same in the red team world we're very much interested in quoteunquote traditional security problems and we're very much interested in quoteunquote AI security problems which are prompt injection system prompt leakage, jailbreaks, this sort of thing. But I would say where haze really shines and where we focus most of our time on is assuring the quality of the response of the model, the quality of the response of the AI application, right? So it's mostly based around what is the text or like what is the actual artifact that is generated from the application and that's what we spend a lot of time trying to measure and probe for and test for and we are quote unquote called a red teaming company but I would say the more apt way to describe pay fundamentally is as a like QA/functional testing company I think that's a a better description of what we do and you know a lot of people use us for teaming and that's fine our broader mission is a little bit wider in scope can you expand that a bit more And what would be an example of something like and I guess is that what you're seeing with customers do you want functional testing do you have an example in mind so it helps clarify the difference I'll frame this in two categories right so when we go and talk to customers certain sets of customers have the ability have already articulated quote unquote like an AI code of conduct right so that again that's a set of behaviors that they always want their AI to abide by or never follow right and if they have that code of conduct articulated then they can just run haze against that code of conduct, right? So maybe one of the things in the code of conduct is to never mention another competitor, right? If it's some sort of customer support chatbot, maybe one of the rules is to never mention a competitor, right? And so they'll run the haze engine to try and produce prompts that when sent to your a application will indeed actually mention produce a mention of a competitor, right? So that's one way customers end up using haze in a red teaming sense. However, from a more functional testing perspective, there are also a broad swath of customers that fall into a separate category, which is they actually don't have the ability to articulate what it is they care about yet, right? So, they are domain experts. They are subject matter experts and they can tell you if a response looks good or looks bad, but they're not at the point yet where they can generalize all of their taste and preferences into a universal set of rules that their AI system should follow. It's much more fuzzy. It's much more post talk. It's much more reactionary in terms of how they define quality. And in that sense, a lot of our goal as Hayes is to basically help them pen down what actually their criteria should be. So we actually surface a couple of different examples to the customer and we basically try and draw out what their implicit preferences are and using those preferences then we can start to test with respect to that notion of quality and that's a more functional testing side of things. Oh okay. So I I guess to your point maybe at this point in time a lot of people are going through this as in fact I think we're doing a conversation somewhere and a lot of times I get asked about what are people using in production for AI are the conversations are you having is that still primarily for chat bots or have the AI systems gone beyond that to in production because I imagine a red team only or assuming it's happening in production not in dev or test environment. What kind of examples of AI applications are you seeing that are getting redteamemed on your side? I would say we get a lot of poll from yeah people that are building customerf facing applications. So yes, this means chat bots, this means customer support, customer service. It could also mean ticket support. It could mean outpoint voice agents. Uh it could mean agents that make travel plans on your behalf. These are all examples of applications that we have red team or are red teaming as we speak. That's anything that's customerf facing is a very natural place. You want to do a red scene, right? This is just obvious. You don't want to have another Air Canada postnation incident that leads to a massive lawsuit. You don't want to be character AI that causes a teenager suicide, right? This is all a very natural place to slot in haze. That again falls into the first category where the customer has already been burned and can articulate what it is that they want to avoid, right? They can very clearly pen down what the risk criteria, what the safety posture is. We do also get pulled into the more nebulous category of customers that don't know exactly what they want to look out for. But we've helped them get find their way there to what the notion of quality is and find their way there into how to test and that's that can be much more internal facing right we have been or we are working with customers who are doing say internal document extraction or form filling workflows or things of this nature as well. Yeah. So just moving less on Hayes now I'd love to take this as a transition into okay now we know what you do but you've been and are involved in AI deeply so what do you think is a current problem in AI that you have predictions around from a security perspective two immediate problems I foresee happening in this coming year one multimodality multimodal AI applications open up a huge swath of different vulnerabilities that we have not really properly thought So even let's take voice agents for example, right? So voice agents have this totally new set of vulnerabilities that are that can be baked directly into the audio bytes of the or the audio stream rather than some very explicit discrete tokens in text lang right and it's just like a huge swab nobody knows how to deal with even this actually is underststudied even just averile attacks for machine learning in in prior years but I think now as people are having more voice agents out in the wild it's starting to become of more interest we just did something last week with a small voice agent company they're shipping their product for a major for like fast food restaurants, right, to serve as drive-through agents. And yeah, we were able to just like slot in a couple of weird utterances like halfbaked stops uh like utterances and how do you say guttural noises within the audio stream and this like totally screwed up the actual agent behavior. So I think different modalities. Yeah, same is true for images, right? images are literally like infinitely like almost infinitely vast in terms of how many pixels you can perturb or how many knobs you can tweak in terms of the colors and float values that opens up a totally wide range of of uh adversaries as well. So one category for sure multimodality models I in fact I've I posted on my LinkedIn I feel like a while ago about a company who took one of the research papers around voice and the ability to quote unquote prompt inject or do it via voice implemented a real life model where on a phone conversation the injection actually happened in the background noise nice of the conversation. So you could, but it would be a normal phone conversation. Everything was fine. But in the background noise, you like a car going by or a horn honking was actually where the injections occurred and it could manipulate the model that way. Yeah, for sure. For sure. It's just so much such a wide vast region of things you could play with. Well, if you expand on that a little bit, right? Like obviously the world is going towards using AI to parse and understand things. So to your point like fascinating drive-throughs, phone supports, gosh, voice, enterprise note takingaking applications, all of these things that sort of do voice. What do you think are the kinds of impact? So you're saying that hey you can find a way to hidden channel manipulate the model via voice in your injection. What can you do? What do you think you could do? The entry point is possible. What is the impact? I think this is the if somebody figures this out that's the multi-million dollar multi-billion dollar question probably. So it's good. You don't know which is good. That means we know that the the vulnerability is there but people haven't quite figured out what is the impact of exploitation yet. So which is good versus good. Yeah, I was going to say there's two immediately obviously things. One is just you you always do some dinoising before you actually process the input. That's always possible. The second is much is the case with traditional adversarial attacks on image classifiers or on language models or whatever have you. We can always try and produce as many adversaries as possible and just take the defense against these blades into the model itself. So just adversarily train against the artifacts we discovered that is always possible but those things are always more band-aids than underlying solutions. When I talk about multi model or as we hear it I always go back to the common use case. You know how we obviously a lot of the public examples of how AI is being used at the moment is image generation. Hey, talk to my chat GPT or get me a phrase and help me increase my productivity and all of that. In terms of if you were to scope it out for enterprise applications where at least in all the years that I've worked in enterprise, voice has not taken a huge amount of time for a lot of people in terms of where time is being invested. Yeah, we all jump on Google meet and zoom and all of that. Someone takes note, there's a note taker. in the in any of the thought process or thought experiments you guys have done what's a possible bad scenario and I guess Caleb was asking about impact I'm just thinking about a is it being used heavily or is it do you see it being used heavily in enterprise and b is there an example of something bad that can happen that we have because obviously when deep fake people was being spoken about be like hey someone's being taking the face of CEO Elon Musk and whoever else and get you to transfer money or whatever I think there was that risk people were talking about it gets become hard to figure out if the person is real or not. One other thing is I think that in this age especially early in technology I feel like we can find the vulnerabilities like we know that this is going to be impactful like prompt injection as an example multimodal as an example this is something we know is a vulnerability but until I think real deployments complex applications come out and people think about and figure out what is the exploitation of that vulnerability. And the only reason why they'll do that is because they figured out a way to impact, oh, I can get something from this. To me, what's most similarly an analogy is crossite SQL injection? Yep. Or cross-ite scripting. Either one. Remember when it first came out? I remember when cross-ite scripting first came out as an example, it was Oh, I not one then, by the way. What's that? Oh, you weren't born that. Okay. So the old man is talking here. So let me tell you of the days of your and the way it used to be when cross-ite scripting first came out. It used to be that people oh you could put in HTML tags in like a search bar and it would return back HTML. So you could pop up a JavaScript pop-up box and people were like this is dumb. This is not an exploit. Wow. And then what you learned is is over time people started figuring out oh if I include that HTML in a link and someone clinks the link then it takes their session you can get something and then some people started figuring out almost oh if I store it in a database and then someone gets it I can exploit it and then I think there's the cross- sight scripting multimodal version which is applications started parsing image uploads or file documents and extract racting metadata and then allowing it to be published in a web page and then you figured out oh if I put cross-ite scripting in the metadata of an image or in a word doc or a PDF then it would get extracted and exploit. So I think similarly in any of these I feel like we're like we're now seeing the mold oh you can prompt inject in voice you can prompt inject an image you can prompt inject where is the exploitation impact we haven't figured out yet but that's because we haven't seen production applications start really using it at scale yet. Yeah I think that's 100% correct. I would say there's like the obvious first order almost like compliance issues or like SLA issues that that arise from just the differences or undesired differences in the output of the application itself and that's where Hayes gets a lot of pull from right there. We work with global 2000 for average banks for example and they care a lot about just adhering to whatever regulations exist in their industry. But that's just like a first order thing right let's make sure that the output of whatever agent we're testing aderes to this set of guidelines. One of your cool things that you were mentioning, Leonard, was about, hey, multimodal vulnerability is something that you still feel is pretty untouched and there are some cool examples of that in voice. What else do you consider to be multimodal maybe that maybe most people won't think about? We think a lot about what the second order impacts of adversaries are. Right? This is not strictly multimodal, but oftentimes when you're building complex AI applications, you're going to be performing some amount of tool calling or function calling or offloading some amount of work to some existing piece of software, right? And I think the core of red teaming in AI comes down to how precisely can you control how precisely can you elicit a very specific output on the other side, right? So if you're only able to elicit some uh some string that is like vaguely similar to your target, then it's maybe not that useful. But if you can very precisely elicit the arguments to a function or a very specific piece of Python code or whatever have you, that's extremely powerful. And so back to your point about cross-ite scripting when it started was this hardy trick but eventually became like very powerful because you could very precise control the output. I think same thing will be true for for AI applications, right? So this year it's very possible that somebody comes up with a an exploit that calls that induces an AI workflow to call a very specific function with a very specific argument and that causes a very specific sort of output behavior. Right? I think that's very much in card through us. So what are some examples of how you would do this? Like it's almost it's hard for me to come up with what is it what is a way to get an what you're saying is I need to prompt the AI to then prompt something else with exactly the arguments at which I've given when in reality AI is by and large used in a rewriting fuzziness model. Yeah. Do you have any examples like how would you do that? How do you get it? What are some ways of making an an AI prompt exactly the output you want? Yeah, it's it's a great question. The standard this is purely an optimization problem. Long short of it is right. If you are a pentester, red teamer, very traditional security person, you probably will not be able to do this. It is very much okay your LLM is just a blackbox function or it is some sort of function. We know the objective that we want to shoot for, right? Maybe it's like this precise string. we can apply some sort of loss on the current stream that we have and the actual target you want to go after and this gives us some signal to power some sort of optimization engine to to produce the explicit output. Right? canonical way or one way of doing this is you apply like cross entropy loss on the target string you want and the current string you have you back prop through your language model and this gives you some signals some gradients to then perturb the input tokens that you send to your language model and that's one way of getting a very precise but on the other side this is often times not possible for commercial models given that the weights are not publicly available and so on but there's a variety of other discrete optimization algorithms you could pull out out of a large bag of tricks So, Leonard, I know you're you do a lot of this offensive stuff. One of the key things that I think is super interesting that I don't think there's a lot of attention on right now, but I think will get much bigger in terms of its important is fingerprinting the models. Especially my theory here is as more and more models come out, as they become smaller and more specialized or more and more people produce this, people are clearly already model chaining things. So if I talk to a customer support rep, it will go through multiple different types of models to produce an output. Do you think it's capable or probable like it would be amazing if I could feed it some input and then in the output could determine oh based on the snippets that was produced these are the kinds of models that are being used. Yeah. Yeah. This is very intellectually interesting to me. I actually wrote a workshop paper on this a couple years ago with a couple friends in college. If people are interested, it's called I don't even I think it's called baselines for watermarking large language models. So watermarking fingerprinting, same idea. Yeah. Yeah. There's been a lot of rich work around this. Actually, one guy to look out for is Scott Aronson, UTC awesome professor, absolute genius. He spends a lot of time thinking about is it even like information theoretically possible if you bake in a watermark like this, right? And yeah, if you think about it, what what have people tried thus far, right? They try very empirical things like training a classifier for example to determine what is generated by openAI for example or what is not generated by open AAI but that's an extremely noisy extremely noisy signal right there's infinitely many different strings that are not generated by open AAI models right so how do you choose the right balance between the train set some more mathematical principle things are like okay let's look at the more or less let's look at the distribution of the types of tokens or types of sequences of K tokens that you get from the output and is there any sort of special characteristics that are unique to let's say open AAI language models and one example of a water marker fingerprint would be there's always like a epsilon bump on the distribution on the probability of a certain particular token or certain particular sequence of K tokens. All of these have turned out to be like very brittle thus far and I think it's a very open question about how to actually how we actually think about fingerprinting text. It's very different than traditional watermarking and traditional fingerprinting. Right. It is something that I'm thinking about in the back of my head. Like it's weird because I feel as a human you can very much if you do it enough you can tell the difference between a clawed output chat GBT output and a Gemini output just by reading the kinds of texts that it generates. And so we know that without any direct I think prompting to manipulate the exact output in a different way you can tell the differences as a human. Oh this like open AI for some reason tends to add way more I don't it's just way more descriptive in certain things than normal. And so in my mind for attacking to move forward in the in its way of exploitation fingerprinting models seems to be pretty critical because if I know it's using deepsee versus llama versus chat GPT I'm going to change my methodology in my prompt chaining exploitation. Do you think it's possible? Why hasn't anyone written a tool yet to even at do this at the basic levels right of the models that exist today? Yeah, man. I don't know. Maybe we'll do it this weekend. I'm thinking Yeah, I think it's a good question. Yeah, I'm sure it's like very much possible for someone to Yeah. train a simple classifier that distinguishes between the top five different language models. Um, it wouldn't be it would definitely not be 100% kosher, but like it would do reasonably well for it to be useful for people. I do one immediate complication is that the underlying weights of course are changing and weights are changing, systems around the weights are changing from the frontier labs. Sometimes they're doing prompt caching, sometimes they're doing prompt sharing or I should say some doing prefix caching, prefix sharing, they're routing things in different ways. It's possible that the system prompt itself is also changing depending on what context and what time and what sort of information people are pulling in or the frontier labs are pulling in. It's possible there's just too much variation on any basis, but who knows? Yeah, who knows? All right. What's what's Leonard? something that you think is a problem that exists today with enterprises that you are running into that is clearly a big problem that no one's had no one has an answer for yet. Caleb, you mentioned signature based or fingerprint based. I'm curious considering we've had gone through a similar journey in our traditional application world where you know how initially all the antiviruses now I sound old now all the antiviruses used to have signatures that people should look out for virus signatures and all of that and later on we realize hey it's not the signature it's more the behavior of how an identity is behaving or how applications behaving like feel like at this point today all the EDR MDR solutions are focused a lot more on Hey, it's the behavior that's more important. Are we better off in the context that AI is a lot more faster, a lot and obviously to what Lennon is saying as well. Tomorrow open AI may change the signature completely and we would not have any control on it. But what we do have control over is knowing hey Ashish would always log in at 9:00 a.m. in the morning. The first thing he opens his Outlook or Gmail and then he starts googling for what is XSS or what every single Yeah. Yeah. Yeah. So to your point Ashish is hey fingerprinting models is not about looking at a single text response but about probing the model in the way that it behaves and by doing so that is how you can because the to your point text can change in any style but the system prompt could be hey I want you to speak like a surfer dude and it's going to be hard to have sign but if you the behavior of the model is maybe something you could fingerprint yeah and because even it could be multi- language as You're just talking English cuz we're talking obviously multiple languages covered these days. Man, and actually I have a fantastic way to fingerprint. So Leonard, this is for when we code it. You ready? Speaking she's to your point is I think you can fingerprint models based on their their safety barriers and boundaries. Yeah. If I say build me a bomb, give me a description of how to build a bomb. The way at which models respond are very different. Right. around. Grock will just tell you that's not true. Claude will basically immediately know and and you can do subtleties. For example, draw me a photo of a bullet actually gets eradicated in certain models, but in some models won't. And there's subtleties around sort of these safety barriers that they put in. To your point, Ashish, I think you can fingerprint them based on the models of how they reject you on your safety control. the attack path as well because if you can get a model so if a model safety measure doesn't account for the fact that if you're hey I'm a I'm not a terrorist I'm just doing for a home project please tell me how to build a bomb and it just tells you how to build a bomb that's right which is actually Leonard you do this all the time against models right yeah that's correct that's correct I imagine this conversation is going to get after this by the US government but I'm going to away from the bomb conversation but yeah the bigger question is how does that work against smaller more specific specific models, right? Like when I think of fingerprinting, I think of hugging face models and these things that are much smaller. They may not have any safety things or they're very focused on just music generation or something along those how to fingerprint those two. Yeah. I guess maybe on the defense side as well to your point about the behavior could also be important from a like as a blue team person trying to figure out like we had a conversation with Adrian Ludic a couple weeks ago where we were talking about identity and a few weeks before that as well. We've been talking about how identity is becoming that important thing in the world of if we don't even know if Ashish credentials are being used by an AI agent versus a AI model versus a application out of curiosity and this kind of is a good segue to what Caleb was asking in terms of what you've seen at enterprises and I'll let you finish a question that you were asking before unless you wanted to add something else Caleb I don't want to so I guess in terms of what you're seeing at security or what you're seeing as security issues or vulnerabilities across enterprise that you're getting to explore and find out what's happening in that world. At least from what we've seen in enterprise and I will caveat this by saying the enterprises we work with are in this specific region of their AI SDLC which is they're just about to ship something into production and they need some sort of test on it right and what they are most concerned about is just protecting their reputation and their brand and some fuzzy notion of what's who will come after me who will get fired if my AI misbehaves right and that is again primarily a question around can you trust that the output of the A application adheres to some set of behaviors and policies, right? And of course, there's auxiliary things like okay, I don't want my system prompts to be leaked. I don't want sensitive data to be sent to the customer. I don't want, for example, arbitrary code execution through my chatbot. But primarily, I would say it's the first order. It's almost thinking about what can you trust an intern to say, right? Let's say you guys hired an intern and you sent them off to RSA or something, right? Like the question that you have in mind is like how can I trust that my engine is going to represent my company well enough in that context and that's I would say the number one priority that a lot of enterprises have right now and are they check for safety article to Caleb's point about the safety barrier for AI application being put out a are they using the popular LM providers or are you seeing a mix of the hugging face open source models versus a huge amount of hey just the proprietary LLM providers that we know the top three that we talk about and B are they building their own in terms of variations of it which is where the safety measure goes down rather than being raised it goes down. That's right. Yeah. So I will say first in terms of safety a lot of enterprises don't really care too much about the generic safety policies. Those things are more or less no just kidding we believe in safety. Yeah. So a lot of the baseline generic safety challenges are already tackled by the Frontier Labs right. What enterprises care about is again like very domain specific notions of safety and it's great like what is compliance for my industry what are regulations in my industry and again back to this notion of an AI code of conduct right and they do they care very deeply about this right this is the risk team this is compliance team this is MRM this is trust and safety that's where their heads are at 24 security teams that's where their heads are at 247 with AI so also they care quite a lot about this domain specific risk and safety notion in terms of what models they're using we I haven't seen so many openw weight models, open source models. I would say it's very much concentrated in the large frontier labs. Although enterprises are both building a lot of scaffolding around those models as well as procuring solutions that have a lot of scaffolding around the models too. Yeah. It's not directly just like a system prompt and like instruction template for a model. It's a little bit more complicated than this. All right. And have you had a situation and you don't have to name any companies I guess but in some of the things you've discovered I'm curious because a lot of conversations that I've had around applications being built for AI or with AI in enterprise a lot of them seem to be hosted in cloud providers a lot of them seem to be either hey I'm a SAS first so I just have a application that is already ready it has a pipeline everything do they expose or have you been able to expose some of that infrastructure in the background or the vector data in the background as well like how far have you or what's the most interesting vulnerability you may have seen you went all the way through to your point earlier we were talking about just the prompting that hey we want the safety measures to be that in my domain specific domain I don't try and sell a car for $1 when the car is actually worth thousands of dollars but in the context of the technical people in the audience how far were you able to get through in terms of a security vulnerability that you guys discovered without going into too much detail There is a company I work with that has a workflow where they ingest some PDFs or some documents. They do some form filling. They do some downstream actions after they fill out the form and they interact with the user directly, right? So you can imagine like similarish workflows like Turboax like the actual form gets sent to the user in some fashion, right? And so we were able to go end to end all the way from the PDF upload to sending some very very particular message to a user end to end in those three steps. So through the through the like keyword extraction through the actual forming of the message to send to the user and then actually sending it to the user. Oh wow. I was hoping he'll say XSS C would be happy. Finally. He's not giving any details. He's just saying oh this was a flow and we were able to do something with it. If he was going to detail that he would have definitely dropped in XSS somewhere. uh in your experience working with enterprises what's the mix of these guys running mo mostly foundational models for their stuff or running different kinds of models more openweight models or open source models or their own sort of let's go back to let's call them machine learning models not genai models for the companies that we work with it's mostly using third party providers the frontier labs models some enterprises are electing to have vendors provide openweight models or like fine fine tune specific small language models for their use case but I would say it's mostly working with the frontier labs and in terms of yeah using traditional ML models 100% sure these companies are doing it but we don't interact with those teams that much so there's a prediction that was made a lot of people thought that the way that the world was going to go is there's going to be more and more models smaller and smaller more unique models and enterprises in the next couple years are going to be a mix of models and it's not just going to be one god model from a foundation model that provides it. What's your gut on that? Do you think that it's going to go that way or what's where do you think it's going to end up? That's a loaded question, but I love it. I think it's that one of the key questions as the AI as the field progresses from a commercial perspective. The ultimate question is can you maintain performance for a drastic reduction in cost and model size and latency? And I think it is certainly possible depending on the scope of the task, right? Like I know for like we we do this all the time like we we train small language models to for example judge the outputs of a of an AI application in a way that makes sense for the customer, right? And that model is at par if not much better than a frontier model doing the same task, right? So it is certainly possible depending on how narrow the task is. I think actually what I am pretty interested in is there will be almost like hot swap model as a service or like routes model as a service type company that comes to fruition. So open router is an example of such a company but they only run across like generic models right they don't train their own models or have customized models or what have you. I think it's very possible and there's there's certainly fine-tuning techniques and weight update techniques that allow you to very easily slot in different sketches almost sketches of weights on top of underlying model to basically change the behavior of the model in any different way for any given task. And I think if you have a sufficiently good routing service and sufficiently good specification of each of the individual tasks that the router is going to route your model to, yeah, you can get like a much much cheaper quote unquote like frontier model at home. Yeah. So what you because this is more of the equivalent of multiple minds together can create the intellect of a larger mind. Yeah. Yeah. And so what you want what we need is we need someone to build the Cisco of AI models. They're saying they only have one AI model owned by various same we'll end up with one big company that owns everything. Yeah. I always thought that these big frontier models did that behind the scenes for you to some degree. Anyways, my thesis was they have more fine-tuned models on specific things like code generation versus story or financial or brainstorm. There there must be some sets of these that are better at these that are more fine-tuned that then are just being routed based on the kind of request at which you're making. I think deepseek does this a little bit. Is that right? In the model itself, no unclear with within the model itself. I don't know. Yeah. Yeah. This kind of raises another question because I've been seeing a lot of conversations about MCP coming up again as well. It reminds me on the whole API boom for lack of a better word that happened where moment microservices become became a thing and suddenly everyone wanted to be API first. Then people started figuring out hey my API is better than my competitor's API. I'm going to start selling it. I feel to Caleb your point about people are put and I guess Leonard you've been talking about scaffolding around existing frontier models that people are building and hey maybe this could be new revenue models for different companies to start using that as a services sell are you seeing MCP now that I read more about it and to k what you said if there if every organization would have multiple AI models that they'll be using from one from entropic one from claude one from hugging face somewhere MCP becomes like core foundation at least for now that bridges the gap as API was doing for a long time out of curiosity landed in the all the red teaming you have been doing have you been seeing any MCPS appear yet or I know it's funny I think I feel like it's only been a couple of months or a few months it feels like it's been there forever at least the amount of conversation I hear are you seeing any MCPS come in your I guess interactions or conversations you're having at enterprise to be honest no I think MCP obviously is very cool concept I think a lot builders have been taken by storm by the concept and have been shipping a lot of things but I think it's still very early the technology does not feel very mature or enterprise mature I should say and we haven't seen much of it yet interesting but to your point then where do you find the gap is why do you feel it's sort of mprise ready out of curiosity because there's a I guess even in the episode we spoke about how MCP security is going to be like the next for lack of a better word next red team frontier next bug brandy frontier but obviously you're quite deep in the space where do you find it's lacking for enterprise at the moment actually maybe question back for you I'd love to hear about that that temporarization like what what in what sense is MCP the next frontier for red teaming I think the way they described is the example that I was talking about earlier is where people came from where if most enterprises that are building applications are plugging or some kind of an AI model in there and tomorrow which is what happened with cloud where hey initially I was just Amazon now I have Amazon Microsoft of Google but then I have regulatory standards to say hey you should be able to swap between models without any problem because bank needs to continue working we need to continue printing money so tomorrow if Amazon goes down that's not my problem bank should continue running so there would be a need for it to be able to swapable for lack of a better word and the I say that it's not even possible in cloud right now it's a lot of work involved so I do put that asterisk on top of it I don't I I think the way it's been put across is MCB is like that first var first iteration of that solution where hey now I should be able to swap between my open AAI claude and anything else comes out which is where the I guess the demand for it is coming from people are plugging between a Figma and something else but sorry I hope that makes sense I was trying to that's where people are talking more about the use cases of plugging any application to be more AI enabled is where people are coming from yeah if I were to rephrase also is in our conversation was MCP is super early right the hobbyists are like playing around but you are already seeing this being deployed at scale for example cursor all the IDE tools support MCP claw desktop supports MCP I believe chat GPT I don't know if they but maybe but they probably they don't because it's anthropics but like you're seeing a lot of almost all dev tools are supporting MCP. Desktop clearly supports MCP. So now if I want to let's say exploit an organization, I know that engineers are all using cursor. I also know the cursor can automatically be embedded with MCP and many people connect their GitHub to the MCP. So now when I craft an exploit, let's say an open- source piece of code that I know cursor will load, are there abilities to now say I can craft a prompt that I know has access to your system through MCP system context which will then run commands on your system. Right? There's these aspects where people are seeing routing happen to these tool sets via MCP and can there be pass from one MCP to another MCP. Those are the kinds of again security people love to pontificate on the aspects of what could be vulnerable. But I think she just that's what I think I'll say this as well too. I guess where I to maybe add to what you're saying in terms of where I see it's immature at the moment is that one good thing is that even if your cursor or wind surf gives me the code I still have to as a human approves it I still have to commit the code I still have to push it forward so there is that step because we still don't trust it enough and we don't even trust the code that is being created automatically for us by genai so we're definitely early from that perspective I was wondering more in the context of if you find something with the architecture or something with I'm not sure like what would be the red team context but is there something that comes stands out to you about the whole MCP thing that you think h it's probably too early for adoption Caleb's example was there as well I personally feel it's just because it still requires a human to proceed forward you don't trust the output yet as much as I don't trust other and Leonard I'm assuming you're talking about the protocol is brittle and ganky or something along those lines that's yeah he's yeah I think a lot of people who there's a lot of darts being thrown at uh MCP for it being a an interesting idea but very duct taped initial prototype. Yeah, I think it'd be interesting because I think and obviously it'll be really interesting as more people start using it if it becomes even more adoptable. But I guess to what you said earlier, Caleb, because Open Air doesn't support it officially. People have tried making plugins for it. So unless you one of the bigger providers start doing it because people said that if it was not something that was released by entropic it was a third if it was Hazel Lab that released it people would be all over it but because entropic released it be like oh we're going to be like logged in and all of that but then again there was an example of Kubernetes which is released by Google now it's used by everyone Amazon had to make a open source version for it as well Microsoft had to do that this obviously I can't see in this crystal ball but that's where the question is so no MCP in your MZ team at the moment I guess in terms of questions I had one more around for people who are enterprise leaders at the moment listening to this conversation obviously you shared what you're seeing and what you're experiencing what about mitigations and how what things you've seen that they could have avoided easily are there things that you can share with people say hey by the way if you're about to do this put an AI system into production just do these I don't know three or four things and at least you don't have to see a bad report at the end of this ra points is that input output classifiers are actually extremely tight. If you can specify what they should be classifying, right? So, you can't be too heavy-handed and being like, okay, like just catch everything under the sun that is going to be harmful. It's way too many false positives. It's going to be a horrible experience for users. But if you can get sufficiently precise in the rubric that you specify in the criteria that you specify for what is considered good and bad, yeah, input output classes are actually pretty pretty powerful. So I should say that a lot of people try and mitigate vulnerabil mitigate bugs within the model itself the underlying model itself which is always a tough task. It's much harder to do that than it is to add on lightweight guardrails on top, right? But I think the challenge with been with using guardrails thus far has been either guardrails are too narrow. They're just like regex, right? And they can't catch they can't sufficiently catch all the things that you want to c catch or they're like too generic, right? It's just some like gen generic like harmfulness classifier or like fairness or bias classifier, right? The challenge is how do you get the expressivity of a model without getting the over generality of a model, right? And what I'll say is yeah a lot of our customers have found surprisingly tight mitigations just by specifying the input output classifiers in a precise enough way. What would be an example out of curiosity is there an example for and obviously I understand is that more from a classification perspective that hey if you ever see the name Ashish or is it more in context of if it sounds like a first name um I guess because to your point about reax people are using reax to say does this look like a string which is a credit card information is there. Yeah. So one very concrete example is let's say you're a voice agent and you somehow always need to confirm uh a user's first name, last name, social security number before you proceed, right? Something like that is actually very hard to catch with a regax. There's a variety of ways in which you could ask a question. There's a variety of ways in which the user could tell you this information, right? There could be several gaps in the text before they actually tell you something. They could give you their social security first and then their first and last name. They could tell they could say last name, first name and then all these other things, right? That's why I say we need more expressivity than just reg x but we also don't want it to be like okay language model tell me whenever somebody has given me personal context well information about their personal lives like that could be like a whole range of a whole range of unrelated items you know it's almost like you need the you need a form based version of chat but not in that very it's too restrictive in in that sense I think it is to Because this is the same thing we talk about with whitelisting and blacklisting. Hey, it's always good to have allowed list rather than a denial list. But how long is the allowed list? Well, no. I think it's more like you you almost do need a almost in your communication there needs to be a separate mechanism that's more controlled when you're saying, "Hey Ashish, as a customer support rep, I need to I need to know your birthday." And then as right now, you type it in versus in in context in text. It might present a form that is restricted that says put in your birthday and then you move on with your conversation per se. All right. Okay. Yeah. Fair. I guess depending on the use case of the AI application as well, it'll be obvious in terms of what is a general acceptance for the fact that the customer would be asked to verify it is Ashish versus hey put me a credit card number in. I don't imagine there's a use case for that. That's a good example by the way L. Thank you. Is there another one that comes to mind or that was the top of that's the number one thing that comes across most of the things you're doing. Yeah, I think again it is very specific to customer to each customer, right? It comes down to what is act exactly their use case. What are the risks they care about? Yeah, I would say like a lot of our a lot of what I spend my time thinking about is how do you communicate in the same semantic medium as the language model, right? It's easy for us to give instructions to a human and know that they will follow this in a to a reasonable degree, but language models just because they speak in English like ostensibly speak in the same language as us don't exactly think about things in the same way as us, right? And so there's a lot of it's just honestly it comes down to a huge fan out of different prompts and rubrics that was sent to the model and see you see how that actually changes the behavior of the language model downstream. Oh Leonard, this is an easy solvable problem. You should just ask the AI what language that you should speak to them. That's right. That's right. Actually, has anyone ever tried that? I wonder if it would give me a It's like I would rather speak Portuguese. Yeah. Now actually out of curiosity for the red teamers who be listening to this conversation as well because obviously you're spending a lot of time in this and I know so many pentesters and red teamers who being asked to test all these AI applications and I don't know what skill set should they be studying should they be just learning prompt engineering obviously you're deep in this and you have to your point about learn how the AI talks talk in their language so they understand and basically do what you want them to do thing which is very different in the traditional web app world where it was all about fuzzing and hey how can I do excesses what Caleb was saying earlier or SQL injection what are we I guess people who are on the offensive side what are they looking for upskilling themselves in because there's so many of them and who have to go out and test all these applications the first thing anarch is a good one because what I'm about to say is I think red teamers pentesters should get good at optimization algorithms like discrete optimization algorithms and specifically in traditional closing literally flipping like you're flipping like characters, you're flipping by bites, you're flipping like very deterministic things, right? The analog in AI or at least in LM is we are making perturbations to the strings for some notion of prohibation, right? We could flip characters, we could flip tokens, we could flip words. That's okay. I think a better mind a better framework for this is what is the what is a semantically equivalent flip that I can make to this underlying string that looks different but it's talking about the same idea right and that sort of atomic edit is very illdefined but if you have a notion of what the right atomic edit is then you could just throw any existing fuzzing algorithm or like more specifically discrete optimization algorithm using that as like the core underlying update. Yeah, I think the it comes down to, you know, I think all the core ideas are the same in optimization and in fuzzing as they are in LLM red teaming, but the fuzziness of both the input and output makes it a much more challenging problem. So this over here I can go and read and hopefully become an expert in Java as most people do these days. You have two weeks to red team this. People just go on a Java blog and figure out what's the vulnerability CVE that I can look for. There's none of that over here for lack of better word. Actually, yeah, there is no good I was going to say there is actually no good centralized resource for a lot of these things. We we might think about actually just putting together like a nice awesome red teaming awesome LLM red teaming GitHub repository or something. Yeah, if folks are interested, we can definitely compile something and send it out send it out as part of the as part of the the show notes. L this has been amazing, man. Thank you so much for sharing all that info. But where can people find you and connect with you and learn more about what you're up to? Why your recovering academic stories and everything else and how to get a awesome mullet hairstyle as you have about the mullet. I don't know if it's a I don't know if it's a universally generalizable style. No, I'm just kidding. But if folks want to reach out and learn more about Hayes, we're at hazelabs.com. That's habslabs.com. And then you can reach me at leardonardhazeslabs.com. And yeah, we're always happy to chat. What I'll say is I think I personally am very much from the AI world and it's always really wonderful to talk to people who actually know security and we could like trade notes and educate each other in how to upskill in both of our worlds. Yeah. So very excited to be part of the show and thanks for the time. That's awesome, man. Thank you for sharing that as well everyone. We'll see you next episode. Thank you again. Thank you so much for listening and watching this episode of AI Cyber Security podcast. If you want to hear more episodes like these or watch them, you can definitely find them on our YouTube for AI cybercurity podcast or also on our website www.aicurityodcast.com. And if you are interested in cloud, which is also a sister podcast called cloud security podcast where on a weekly basis we talk to cloud security practitioners, leaders who are trying to solve different clients cloud security challenges at scale across the three most popular cloud provider. You can find more information about cloud security podcast on www.cloudscurityodcast.tv. Thank you again for supporting us. We'll see you next time. Peace.

‍

No items found.

Vibe Coding for CISOs: Managing Risk & Opportunity in AI Development

Vibe Coding, Slopsquatting, and the Future of AI in Software Development with Guy Podjarny

Is Your Browser the Biggest AI Security Risk?

AI in Cybersecurity: Phil Venables (Formerly Google Cloud CISO) on Agentic AI & CISO Strategy