AI Security 2026 Predictions: The "Zombie Tool" Crisis & The Rise of AI Platforms

Jan 28, 2026

View Show Notes and Transcript

This is a forward-looking episode, as Ashish Rajan and Caleb Sima break down the 8 critical predictions shaping the future of AI security in 2026We explore the impending "Age of Zombies", a crisis where thousands of unmaintainable, "vibe-coded" internal tools begin to rot as employees churn . We also unpack controversial theory about the "circular economy" of token costs, suggesting that major providers are artificially keeping prices high to avoid a race to the bottom .The conversation dives deep into the shift from individual AI features to centralized AI Platforms , the reality of the Capability Plateau where models are getting "better but not different" , and the hilarious yet concerning story of Anthropic’s Claude not being able to operate a simple office vending machine without resorting to socialism or buying stun guns
‍

Questions asked:
‍00:00 Introduction: 2026 Predictions
02:50 Prediction 1: The Capability Plateau (Why models feel the same)
‍05:30 Consumer vs. Enterprise: Why OpenAI wins consumer, but Anthropic wins code
‍09:40 Prediction 2: The "Evil Conspiracy" of High AI Costs
‍12:50 Prediction 3: The Rise of the Centralized AI Platform Team
‍15:30 The "Free License" Trap: Microsoft Copilot & Enterprise fatigue
‍20:40 Prediction 4: Hyperscalers Shift from Features to Platforms (AWS Agents)
‍23:50 Prediction 5: Agent Hype vs. Reality (Netflix & Instagram examples)
‍27:00 Real-World Use Case: Auto-Fixing 1,000 Vulnerabilities in 2 Days
‍31:30 Prediction 6: Vibe Coding is Replacing Security Vendors
‍34:30 Prediction 7: Prompt Injection is Still the #1 Unsolved Threat
‍43:50 Prediction 8: The "Confused Deputy" Identity Problem
‍51:30 The "Zombie Tool" Crisis: Why Vibe Coded Tools will Rot
‍56:00 The Claude Vending Machine Failure: Why Operations are Harder than Code

Caleb Sima: [00:00:00] This age of zombies is gonna come in the next year where people have been spending a year vibe, coding all of this stuff, and there's just gonna be all this stuff just rotting AI platform teams are, think about it like the infrastructure team. Their job is to effectively expose for engineers the set of proper APIs.

My guess going into 2026 that this is logical and will continue as a model moving forward.

Ashish Rajan: Capability Plateau is real. Models are getting better, but not. That different in terms of being better hyperscalers has shifted from AI features to AI platforms. Instead of you using and hosting through your own infrastructure, why don't you use our service becoming an AI platform?

I don't know if I wanna spend, let everyone use this. It became a really interesting conversation for the regulated bodies. AI costs are too high. No, like circular economy, as they were calling it. I guess they're controlling the price. 2026 is gonna be a year where we would see the next phase of AI come true.

Now we did some research and found [00:01:00] out eight predictions for 2026 from an AI perspective, which are relevant for security and people who are building or securing AI in their organization or for themselves as a consumer of ai. All of that in this episode, including where we see this weather, would actually stand the test of time and what makes us nervous about some of these predictions.

In case you know someone who's working on securing an AI program or even building an AI program in their organization, definitely share this episode with them. And as always, if you're here for the second or third time and have been finding AI security podcast helpful, I really appreciate if you take a quick second to drop a follow subscribe on whichever platform you're listening to us on.

We are on Apple, Spotify, YouTube, LinkedIn, so give us a full subscribe. It only takes a second and means a lot. It does mean that we get found by many more people like yourself. So I appreciate all this love and support that you show us. Thank you so much for all the support you have shown us and comments and love you've shown us for all the conferences and meetups, the webinars and in-person events.

We did really appreciate all the love that you've shown us and look forward to seeing a lot of that in 2026 as well. So thank you so much for all of that, and I hope you enjoy this episode. I'll [00:02:00] talk to you soon. Peace. Hello and welcome to another episode of AI Security podcast. This is the 2026 season, and based on the research we have done there were top few, at least eight things that people are going into 2026 with just top of mind for them.

And Caleb and I are gonna go through some of them and kinda give you some insight on what we saw in 2025. And how likely is some of these to be, some of these topics to be, very predominant in 2026. I've got eight items here. I'm gonna go with the first one and there's, there's no, there is no preference order here.

This is just the list that the team came up with, the first one. And, uh, Caleb, I would love to hear your opinion on this before I kind of give mine capability. Plateau is real, models are getting better, but not that different in terms of being better. What's your thoughts on this?

Caleb Sima: I don't know.

Like, I'm a little bit torn on that in the sense that I clearly, um. Use different models for very different things. Yeah. Like, okay, let's start with just from a consumer perspective, and then I'll switch into enterprise from like a [00:03:00]consumer perspective. Like if I'm doing random, everyday stupid stuff I know I naturally go to ChatGPT OpenAI, uh, their user experience is better. Their product is, but the model is really, is good enough. It speaks to me more like a consumer.

Ashish Rajan: Mm-hmm.

Caleb Sima: Uh, kind of does. Yeah. Um, if I'm coding, it's the clear winner is Anthropic. They they are phenomenal in the coding space. And then I, I, and again, my, no matter what the benchmarks say.

Yeah. Um, when it comes to I think real usability and code that works Anthropic is still a winner. So I'm gonna go to code there. And then finally, I think when I do anything that is serious. Requires, I think a much more grounding in what it needs. I go, I use Gemini, right? So think about things like writing a blog post, writing a report, researching things discussing either health, physical, medical, you know, anything else [00:04:00] that's serious that needs more grounding.

I generally will use Gemini for that. I find that to be much, much better. And I think that those things also, when you switch to the enterprise somewhat, can also, you know, apply where you, where you see in the enterprise is like, coding is like a majority people use Anthropic, right? Like they've continued to kind of do this, which also speaks to the models.

Yeah. Which is, there is this baseline of the models today are pretty similar in a sense. There's eeks out in areas where they perform better. Yeah. Anthropic eeks out in code and performs way better. Gemini is actually eeks out in multimodal, right. When it comes to video imaging and text Gemini eeks out.

And then like, overall, I think like OpenAI at the end of the day really wins in the consumer. They like they know the consumer needs and wants and needs.

Ashish Rajan: Yeah. Yeah.

Caleb Sima: And so, you [00:05:00] know, there's like different kinds of areas where they are sort of different and better, but if you were to stand back from it, like a, that like 5,000 feet.

Yeah.

Ashish Rajan: You know,

Caleb Sima: they're

Ashish Rajan: all somewhat the same. Yeah. Yeah. I, I think my conclusion is the same as well. And specifically talking about the consumer space. I agree that. Most of them, if you were to look at that from a 50,000 feet, definitely they all seem to have a browser option, an agent option, a coding option, a Hey, ask me anything you want me to research, ask me a simple question about who the president is today or tomorrow, whatever.

I think for me personally, I definitely found maybe slightly different. I actually find Gemini experience wise definitely pretty bad. But even from a research perspective, somehow I have leaned more on Perplexity still for research. Uh, and maybe it's the kind of things we are researching. And funny enough on the coding side, I would've gone with Codex and I'll give, I'll probably solidify this with the thing that I've noticed, and I [00:06:00] don't know if you'll come across this.

I find that based on the language you're picking for your vibe code. Like, for example, from what I understand, codex is really good with Rust. I think, and people can correct me if I'm wrong in the comment section, there's specific languages that they have been trained, more trained data on where Claude Code, the Opus model doesn't seem to be that great where it kind of goes into loops.

Whereas people who have been like, so TypeScript and stuff is pretty amazing in in Claude Code Opus whatever, five point, whatever the version is. But most people who are on that Rust and something else, because apparently that's what being used in the background in OpenAI code seems to be like right up there.

And I felt personally I only, I did not, I'm not a coder myself. I was not white coding only people I hung out with. Everyone was on Codex and I'm like, why you guys' in Codex not Claude Code when everyone else is talking about Claude Code? And the reason people came up with was that actually for the kind of programming I write, Codex has a lot more data.

So it has a lot more like I'm not doing as much debugging and I'm not, and you can still, [00:07:00] and maybe Claude Code is probably still the winner in the sense of how long you can run this for as well. There is that whole discussion about that. But, uh, research for me has been definitely uh, Perplexity specific research.

Like, for example, for this particular episode, I had to research what the, uh, cybersecurity incidents for AI had been in 2025. My obvious choice was to go to Perplexity and look at them. It just gave me a short summary. I had links. I had the links as well to validate that. But I guess I, yeah, I think the coding one is an interesting one, right?

Based on where you go for and which one do you lean on? On, uh, where, where, sorry, which language do you lean on? People have had preferences for that. I personally, I don't consider myself as a programmer. I still quote unquote, vibe coding my way through this? Yeah. And I would lean on lovable and others to still kind of go get that front end and backend and stuff.

Uh, but I definitely find that research for me, Perplexity coding is codex, at least in the circles that I'm talking in. And the third one for general questions is still ChatGPT to your point, the consumer space is definitely [00:08:00] ChatGPT for me. And I think at, at this point in time, if they start to increase the prices of these, I think I'll still sign up is where I am.

I don't know. Would you sign up still if they increase the price ridiculously for consumer space?

Caleb Sima: Uh, I mean, it de depends. I am, uh, the problem is a large part of my budget is already signed up for everything actually. Yeah, same. The only one I'm not paying Max plan on is OpenAI. I'm on Max plan on, on Anthropic and Google

Ashish Rajan: actually, because you've been using the Claude Code and I'll come back to our actual prediction, uh, in, sorry.

Talking about the 'cause, did you find the number of API calls are reduced overall, like the costing model changed? For the API call that the token calls for something. Um, I, I don't

Caleb Sima: know. You know, the benefit to some degree of being on the max plan is I don't look.

Ashish Rajan: Yeah, yeah. 'cause I'm not on the max plan and I definitely find that it runs out.

Caleb Sima: Yeah. And then, and then I get yelled at. But uh, yeah, generally speaking, I'm, I'm not looking at what that is. Interesting.

Ashish Rajan: Yeah. Interesting. I mean, maybe, 'cause I've gone on [00:09:00]that code expand wagon 'cause I'm on the max plan for OpenAI. Funny enough. So it,

Caleb Sima: it's an interesting one,

Ashish Rajan: but

Caleb Sima: what about enterprise?

Yeah. One. So, um, you know, one thing to probably talk about when we talk about model, you know, model capabilities, are they roughly the same or is it just sort of like leveling out? Which is the, the point I, I do feel like. Right now we've seen the, the, what we materially think of models getting, be better.

Yeah. In the sense of, you know, significant jumps. It is continuing, right? Like, you know, if you continue to look at it, it is substantially getting better every single time. Yeah. Um, I think it's getting access to more and more data, uh, yeah. As those things happen, but I think where you're gonna get the real sort of jumps is going to be less about, how big the model is, but more actually how small or how efficient or how fast or lower cost models are going to become.

And [00:10:00] like, if that's, I think the the thing from the enterprise side, that's going to matter the most because one thing that we've definitely encountered is AI costs are too high.

Ashish Rajan: Yeah.

Caleb Sima: And, and there is a at least I feel like there is a. A little bit of a, uh, what's the right word to say between these providers?

Ashish Rajan: oh, right. And, and I know, I know what you mean. Like a, not a, not a life cycle isn't the right word for it. No circular economy as they were calling it. I guess

Caleb Sima: y you, you know, you Yeah. They're controlling the price.

Ashish Rajan: Yeah. Right? Yeah, yeah, yeah.

Caleb Sima: Like, Hey guys, don't, let's not make this, a drive to the bottom you know, there's effectively three of us here.

Let's control the prices, deep seek, people aren't gonna tRust, so we don't need to worry about where they they are. That's right. Minstrel, you still have to worry about inference. Like, you know, like, let's just keep our prices high. Yeah. Uh, you know, let's make sure we're in line with that.

Like, for sure. And I just

Ashish Rajan: compete in the feature rather than the cost.

Caleb Sima: [00:11:00] Yeah, yeah, yeah. I, and I think that is that is gonna be problematic. Yeah. So,

Ashish Rajan: man, I think there is something to be said about obviously the general model numbers have increased and the model capabilities have increased. I had a graph the other day, I was trying to help my, get some of my team members to help me with just line up on a timeline, how many updates have happened in terms of each of the Gemini, OpenAI and uh, Claude Gemini.

I think the image that they found, which was shared. Between January and May, 2025, there was 17 announcements for ai. Mm. Like between 17, it's like, this is like we, this was first five months and they didn't even get to the part where from, uh, June to December. But this is just one provider with 17 updates in a matter of five months for ai.

So to, to what you said, people would still probably have to focus on which one they go for. They'll have preferences in an enterprise context. You mentioned cost, which I definitely agree with, and I [00:12:00] ly feeling this in my personal account. I don't even imagine what people feel in the company accounts.

Uh, 'cause one of the things we had in that list was cost and scale would become a quiet constraint. And I think there was questions around what this could mean from a security perspective to what you said. Uh, running an AI and how you use it, how the tokens are being spent. It is a costly affair. How many people in security would be on that build, Hey, I'm gonna build ai, uh, versus I'm gonna buy security for ai.

I'm gonna build AI for security in my organization. The reality of that, and this is where the cost and scale comes in to what you're saying as well. Yes, we can have a max plan. How many organizations that I know for sure are limiting the number of licenses to only a handful of people and not the entire organization?

Caleb Sima: No, actually what's happening is directly the opposite of that. Oh, right, yeah. Which is, at least the forward leaning companies are all created AI platform teams. And [00:13:00] AI platform teams are, think about it like the infrastructure team, right? But for ai, yeah. So they centrally build the pillars, the platform, the cost, everything.

The integrations. The APIs that are available for anybody in that enterprise. And so each team doesn't have to worry about independent licenses or cost. All cost is associated with the AI platform team. So similar to infrastructure, right? Every team isn't gonna worry about their, you know, CPU cost or any of these types of things.

You know, like there's definitely a cloud optimization problem of course, in some areas. Yeah. Which has always been there. Yeah. Yeah. But that's the infrastructure's team job to drive down as much as they can. That's right. Um, and similar, that's, that's happening I think in ai. You know, security teams specifically, and actually just yesterday I was talking with someone on a security team that they just use their internal AI platform and their APIs and are [00:14:00] building all of their vibe coded tools against that.

And that is not a cost that they, as the security team have to worry about. So it's out of their budget, which is great, which is even more motivation. To not purchase vendors. Mm-hmm. And to only code what they can because that, you know, those are costs that, uh, are invisible to them. Yeah, I actually think that, it's funny 'cause, you know, I stand here with sort of a newly created venture fund, uh, that cyber, and honestly, I'm a little worried, like I think that we're in a very interesting time of, I think a lot of people being able to build a lot of the things that normally you would buy.

Ashish Rajan: Interesting. Yeah. And I, I guess maybe to add to your point, I think maybe, uh, I'll add some color to my example, but I definitely agree on the. Fact that there is an underlying feeling everyone has that Can I build This is a, is a question that people definitely would ask every time they come across all those problems, especially with [00:15:00] so much AI conversation going across everywhere.

I was gonna add some color to the thing that I was talking about where I mentioned people are getting limited, limited licenses. You know how Microsoft has this thing going on for some time where they've been pushing people, Google has been pushing people to use AI enterprises, especially the, the ones who are like your, uh, regulator entities and everything.

They all have Microsoft E five or E seven licenses. A lot of them, at least the ones I've been talking to, a lot of them were given co free, co-pilot licenses for the entire organization. Hey, what? And they even had dedicated people who would work with the executives to, Hey, let's talk about the AI use cases we can build with co-pilot.

Right Now, obviously they've expanded to Claude and other things beyond OpenAI where they first started. But what I found was in a lot of those organizations initially that free, once that free credit started kind of thinning down, people started going, actually, I don't know if I wanna spend, let everyone use this.

I don't know if I like anything. It became a really interesting conversation and it wasn't more on the I don't know if it would've been [00:16:00] different today. Like the last conversation I had about, this was last month, so maybe a bit different now. 'cause it does change every month. But it was really interesting for the regulated bodies like the people in financial insurance, uh, as well as health sector.

That's where primarily my conversation has been. A lot of them definitely are not at that stage of AI platform for sure. For them it's like, that's like, Hey man, that's like 10 years down the track. But it's definitely there. I did a webinar on this where, and I don't know, and you can correct me if I'm wrong, is the AI platform that you're referring to, which is these guys had an AI platform, which basically was a collection of AI agents where the AI agent was that one agent was going into talking to Salesforce and one agent was talking to AWS one agent was talking into like, almost like a, a like this octopus tentacles coming out of this list.

Uh, high level LLM. Each business unit unit had one. They defined that, hey, as a business unit we work with Salesforce, AWS. I don't know, HashiCorp and everything else in the thing, in the [00:17:00] mix that they normally work with the MCP server was there, uh, the MCP information was in that age, in, in that agent platform.

Uh, they had the, uh, depending on the business unit, they also, you were using a two A as a capability. And I don't know if it was the infrastructure, however, did you find that was the case or is the AI agent platform, you're, sorry, AI platform you're referring to is more like, Hey, as a organization, this is the l and m model.

We define kind of what happened and it totally, I'm just

Caleb Sima: Yeah, yeah. No. You know, the AI platform teams are shaping up to be, uh, almost sort of like a relabeling of data teams, right? So Yeah. You think like the engineers Yeah,

Ashish Rajan: the data. Yeah. Yeah. Like

Caleb Sima: remember there was a data team and there was a chief data officer in certain large companies, right?

Yeah. Yeah. Now relabel them the AI platform team. Right. Yeah. Yeah. So their responsibility is ingestion data, pipeline managing now, where, what's made them ai, not just [00:18:00] data, is hosting, managing the LLM platforms, the proxies, the routers, the, standards for agents, engineering, like all of these things.

And their job is to effectively expose for engineers the set of proper APIs. Mm-hmm. The set of proper integrations, the set of standards, like all of these things are sort of like, think about very and very much like. Infrastructure. Your infrastructure team says here is, we're gonna present to you a infra that you can now build apps on top of.

Yeah. You know, the AI team, AI platform is, we're building the AI infra Yeah. That allows you to build your AI stuff, right? Yeah. So, um, everything then is managed and controlled by, and, you know, this is the more cutting edge obviously, teams that are in this right now. But what I think is I'm a big believer when I, when I started hearing about this, that it makes so much sense.

And so it feels to me, you know, my, my [00:19:00] guess going into 2026 that this is logical Yep. Makes a lot of sense. Mm-hmm. Um, and will continue as a model moving forward in other companies.

Ashish Rajan: Yeah. Yeah, a hundred percent. Uh, I think I definitely agree with this as well. I think, um, which is in line with, I. I dunno if I was talking to you, maybe we were talking about this in the last episode, where the, the role of a CTO, CIO is also evolving and how there're gonna be applications which are more intelligent, capable, quote unquote applications as one vertical versus the applications that cannot be AI first.

There's like a split happening in most organizations as well, which is kind of in line with what you're saying for an AI platform where all move moving forward 2026 and beyond, any new application that is being built would have some kind of gen AI capability in there so it can access the internal knowledge resource or MCP and any of that because there's so much money being poured into it.

It's only a natural. Think the same thing happened [00:20:00] with cloud, but infrastructure is also interesting because one of the things we had in the research was hyperscalers shifted from AI features to AI platforms, which kind of is, the research was more along the lines of initially when the hype was, and I mean hype hasn't really gone away, but it, at least when it started, the AWS Microsoft and others of the world were trying to go, Hey, we have the right chip.

If you want to do the inference, we have the right services that we can provide, that you can build your own inference on. Now they are providing training that the initial focus was, Hey, let me help you build a training model. But now they switched over to, Hey, use us as that. Platform to what you are referring to AI platform.

I, uh, Amazon has bedrock, this Vertex ai. They're becoming more like instead of you using and hosting through your own infrastructure, why don't you use our service as a way so quote unquote becoming an AI platform. In saying that there is nothing that unifies all the bedrocks [00:21:00] that you have in into one place or all the vertex you have in one place.

'cause most enterprises are multi-cloud. And having those AI platforms spread across is not great. Like, I like what you said earlier, have a central AI platform for everything and maybe therefore becomes a cloud agnostic AI platform becomes a future. But any thoughts on the whole hyperscaler shifting in terms of how their strategy has been from just being a place for Train My model to, Hey, why don't you just host it, host it with us.

Caleb Sima: Yeah, I mean, it's, it's, you know, it's a complete copy of AWS, right? All you have to do is look at, uh, cloud providers and just copy the model, which was exactly what they're doing. Hey, everyone needs, our infrastructure. Yeah. We might as well take all the most well built in stuff that you have to build in and infrastructure and offer them as services.

Yeah. Um, and same as, uh, these hyperscalers, right? Like, okay, I'm OpenAI, uh, you have to use us for a model. We might as well [00:22:00] just build in the verticals of the most common services or things you need off of this stuff. Like it's, you know, and the, and the, and here's, here's why. Because the, you know, again, they get to eat the cost, right?

Yeah. For them, they can always offer it cheaper. And substantially better because now they control the level and type of model, the amount of tokens that needs to be used so they can offer the same service at probably higher degree of fidelity at cheaper costs than some outsider who decides to build it on top of the platform.

Same as AWS or anyone else, right? Like that's, that was, that's the playbook.

For sure this is happening. Yeah.

Ashish Rajan: Yeah. I think I was gonna add one more. So I was at a AWS re:Invent, which is the Amazon con, annual conference, couple of few weeks ago, early December.

And they announced three agents by as a platform. One was a DevOps, one was security. I can't remember the third one, but there, there were definitely three. Uh, but uh, uh, which [00:23:00] come to what you said, that perfectly aligns with the platform play. Now obviously Amazon became the first one to say, Hey, no, don't worry about just us being your stepping stone into whatever model you want to use.

They're also giving agents as the capability as well. And what they're saying is it's gonna be long running agents that you can run as a co copilot or co engineer. However, however you wanna say it. I'm sure Microsoft and Google would come out with their own version in 2026. So then it kind of.

Lines really well to what you're talking about, the AI platform. Your AI agents are gonna be in your cloud provider, your and I guess your cloud provider would love to consume that bill as well. And your infrastructure would also be in a, in the cloud provider as well. It'll be interesting to say the reason I use this agent example obviously it's relevant to the hyperscaler point, but we have another, uh, re one of the item in that research was agent hype versus reality, where this is the year where we would see a lot more, uh, real life push for AI from at least it's already seen in the consumer [00:24:00] side.

I've seen it on Netflix. If people are using Netflix, you can actually already see that it has fine enough. I don't know why they're using Gemini logo, but it looks like a Gemini logo and it it allows you to search for, instead of searching for, uh, Hey, I'm looking for a particular movie or an actor, you give it a genre and go, Hey, I'm, I'm curious about thrillers, or, I think the.

One that I saw it came up for me was who done it? 'cause I love a lot of mysteries and thrillers and all of that. It literally had that. I'm like, oh. So based on my pattern, it recognized and I was curious. I did press on it and I believe there's a fee for it. It's a beta version. The same happened with Instagram.

It basically has gone through my algorithm and I think I put made a post on LinkedIn as well. It kind of understood what my pattern for my, uh, my social feed is and gave me, Hey, looks like you, uh, you enjoy these kind of content. Would you like to add more? Weirdly enough, artificial intelligence, whatnot was not in my list.

Which, which I'm not sure is a good thing or a bad thing, but it says [00:25:00]maybe you wanna know more about artificial intelligence. But I feel like, do you find that the agent hype versus reality piece where as we move more into 2026, and you were talking about doing this personally as well earlier before we talking about this as whiteboarding projects, which are agent driven.

Where would this stand for Enterprise and people in security who are looking to build AI themselves? I feel it would get easier for them to do this. And they're, it's kind of getting past the, it's it's

Caleb Sima: way easier. Yeah. And not only that, but it, it truly is a reality, right? Like right now the, we're just waiting on the rest of the market to catch up.

Yeah. Um, like I'm, I'm talking to people who are in internal security teams right now who are vibe coding their stuff and they're building some amazing things that work really well. Yeah. Multiple agents. Like, like it's really simple to do. The models have gotten, the models and the technology around the models have gotten easy enough [00:26:00] that you can really build substantial stuff.

And, you know, and I think a lot of this has been enabled by the fact that Vibe coding itself has started to mature to a level. That you can continue to produce and build bigger and bigger projects, um, and do it fairly successfully. And like, it's like right now it really is. Ashish just, we're just waiting for the rest of everyone else

Ashish Rajan: to catch up.

Are you able to share some examples? 'cause I'm, I'm hopeful that people who are watching or listening to this, if there are other simple examples that they can test out in their security teams that maybe you have seen people done that. Like I, I was, I personally have only seen the detection ones or GRC ones.

I'm curious as to, um,

Caleb Sima: yeah, yeah. Like I've got what's, uh, what I'm sort of struggling with is is he may want to go do a startup on this thing whether to talk about it, but let, let's just say this, there is someone who has [00:27:00] built something.

Ashish Rajan: Yeah.

Caleb Sima: That has effectively and truly it works in production able to eliminate thousands of vulnerabilities within days.

What? Um, yeah, just like that. Um, as in it

Ashish Rajan: understands the vulnerability identifies where the vulnerability is coming from, and everything puts a patch through

Caleb Sima: every, it's, it's, it's, well, I, I don't want to give away too much, but remediation is not just about patching. Right. Um, interesting. In fact, let's just say like a large majority of issues, let's just say three.

If I, if I categorize it very simplistically in a three bucket scenario, you have things that need to be patched.

Ashish Rajan: Yep.

Caleb Sima: Things that need to be patched, but don't need to be patched because they're just not relevant or have compensating controls or it just doesn't matter. Right. Yeah. And then you have things that shouldn't even be running in the first place, uh, or should even be there.

Yeah. So [00:28:00] like, why do, like, you should just clean it out. Like Yeah. You know? So like, if you kind of like take these three buckets, you can take, let's just say a thousand vulnerabilities and within a period of, you know, two days it'd be gone. Interesting. And so like, and all the, you know, completely autonomously done really well.

It works. Uh, like, it's like I get these screenshots. I'm gonna, I'm probably gonna tell 'em about this episode now that No, I mean, you're not naming anyone. You just basically

Ashish Rajan: gimme a general

Caleb Sima: example. Yeah. Like, yeah. And he's sending me these screenshots of like, the things I'm just like, ho like. This it's, it is, I was like, it's blowing my mind.

And this is running in real, you know, real production, enterprise right now.

Ashish Rajan: Yeah. So funny. Uh, for, for me, and I, I think, uh, the, maybe the conversations I have been having, maybe because I'm writing that AI security engineering book, I'm talking to a lot of engineering leaders and secure engineering leaders.

A lot of them have started using, obviously people have standard [00:29:00]policies. They've defined for how code should be produced. And one of them gave me an example where they were using CPS to do a policy as a code check on any new code being produced in the organization. That was one example, obviously, where for, there's a lot of conversation about, hey, use it to produce the best version of your, uh, a WS cloud formation template, the Terraform template or Azure, whatever increase in there.

But having a linting test done as part of the product the. Production of it in your IDE using an MCP. That was one GRC ones have been interesting as well, where instead of taking screenshots, they've gone in their MCP agents for that as well. Where instead of me going and manually taking screenshots, there's this agent that goes there and takes screenshots.

Yeah. And at a, with a timestamp and puts a log in there. I'm like, oh my God. Like, so wait, there's, I mean, I do wanna say. Does not mean GRCs. People are like gonna lose their job. You would still need to be able to Auditor. Auditor would still need to talk to a real [00:30:00] person. It cannot be an ai. Yeah,

Caleb Sima: you, you have to have Verif verify.

You have to be able to verify, right? Like,

Ashish Rajan: Yeah, yeah. It's like there would be things every week. The auditor would have tricky questions someone would have to answer for why and what. But this whole evidence tracking process for SOC type two or SOC two, type two or ISO 27,001, a lot of that, the step-by-step monthly process completely eliminated for a lot of people, which is really fascinating for me.

The engineering people were doing that, the GRC engineers were doing it in terms of how they produce code detection engineers were the next one. I think they're trying to find that level of, and I think you've quite worked quite a bit with that, with people that work in that data space as well. So I'm sure you can show some more light on it.

But it was really interesting to see that software engineers are now picking up detection engineering. 'cause now they can. Use their existing skillset. You go, oh, I don't know what the threats are, but maybe I should just ask ai. I have some idea, I don't know what every idea. So they get this list of 10 things and then how many are viable and how many make sense does not make sense based on that.

They're able to write some vibe, codes and stuff [00:31:00] on it. And the entire life cycle looks a lot more different for, from a detection perspective, which I thought was really fascinating. I'm sure you have some perspective in the detection space 'cause Yeah, we, you actually have an investment in that space, right?

I do.

Caleb Sima: Yeah. I, I will, uh, you know, I think I said this previously, but I'll repeat. You know, when,

uh, we first sort of started out in this space, I was very much, um, in influencing that it be sort of co-piloting, right? Mm-hmm. Like you take a detection engineer and you help them accomplish their job. But now, you know, this company, you know, we've been building in stealth for about a year and we have about six to eight different sort of.

Customers right now that are real production. And this thing is like, I think it can eliminate entire detection team. You know, it's just it, it's no longer about copilot. This thing is doing principal staff level detection, engineer work. Um, yeah. And it's doing the entire cycle from threat [00:32:00] modeling, threat mapping, threat intel, determining where your coverage is, where it's not, where you need coverage, what type of coverage, what types of detections.

Tuning the detection, make sure it's working, testing the detection, validating the detection continuously, and deploying the detection. Like all of this is done, right? Mm-hmm. Like it's just fully, and it's. And I had no, I had no strong belief that this would actually be a real thing. Yeah. Like that would actually occur.

And it is, it is happening. And it's pretty crazy to see. Yeah. It's a little, you know, and I have to say like, it's a little scary um, because I was very much like, it's gonna take years before I think you can get to doing that. But now I think the goal is you could actually take this product and if you've got a, you know, a detection engineering team of five, you could basically say, Hey, you guys go work on other things because this thing will just do it.

Ashish Rajan: Yeah. Well this is what you were talking about. You're [00:33:00] cutting, getting nervous about the building versus buying part where there is a split in the market where people with the engineering capability would wanna build it versus the one floor. Oh

Caleb Sima: yeah. And let's like, you know, let's not be fooled here.

Everyone with some, you know. You know, minute point of engineering talent in the early adopter realm will be trying to do so.

Ashish Rajan: Yeah, yeah, yeah. And I think that I, I, it's, it's been happening in 2025, so I'm not, I'm not surprised if it does happen and continue to happen probably, or even explode in 2026.

But this is also I'm, I love how our conversations are segwaying into each of the points. 'cause the next two points are very much related. 'cause one, obviously we are detecting vulnerabilities. There have been real vulnerabilities that have been identified or at least were announced or found out in 2025.

And the most recent one that happened pretty much in December was that IDE specifically one where you can have a remote control execution, AI command execution injection. But categorically [00:34:00] prompt injection was that, at least in my mind, was that the number one. Thing which was truly ai. 'cause everything else seemed very, Hey, I didn't have the right access.

Yeah. Or, uh, there was data breaches, which were like, I don't know. Did you come across any that stood out for you, which was like a AI specific one outside of prompt injection for you? Like, you know, how there's always option and Agent ai, application Agent AI top 10?

Caleb Sima: No, I, I mean I I continue to be pretty emphatic that prompt injection is the real problem.

Ashish Rajan: Which still doesn't have a clear solution for all cases of prompt injection solved.

Caleb Sima: No, I don't, I There's no real, I mean, the only solution to prompt injection is another model, right? AI injection for ai, prompt injection.

Ashish Rajan: Yep.

Caleb Sima: And by the way, it, it's not to say that it's not bad, but it's just there is just no solution, right?

Ashish Rajan: Yeah. Like, like a No, hey, this one bullet to solve it all kind of a problem. Yeah. SQL injection by, you know, the patterns, you can go with it, but prompt injection is almost like [00:35:00] you're trying to test the intent, however many is, which is really hard for a pattern based system, which is.

Caleb Sima: Yeah. Yeah. I was just gonna say, and second it to that is data poisoning. And you could consider data poisoning, maybe just a secondary form of prompt injection. So, yeah. I think that, you know, both of those are really the only things that are gonna be real issues.

Ashish Rajan: I did, I mean, there were a few deep fake ones that happened through the year, which obviously were prolific.

I haven't seen one recently. A lot of people had put a lot of eyeballs on it for some time, but I don't know. I would put that as the only other, because that's kind of like multimodal in a way. You're able to fool people outside. So the whole social engineering is a bit different, but that's more a newer threat rather than a.

I don't know. I, I, I did not wanna put that as a category, but have you seen a lot of deep fake ones that kind of caught your attention?

Caleb Sima: Yeah. Like, you know, I think maybe what you're, well, I don't know about deep fake, but like, um, like the, the way I think about deep fake is a [00:36:00] consumer problem.

It's a, it's a public problem. Like, it's, it's a validation, verification, authenticity type problem. Yeah. Um, you know, versus an ai quote unquote attack. But there is examples obviously of, you know, what I almost think of as prompt injection in multimodal, right? So, for example, in images, you know, the, there was the, the first famous sort of machine learning hack, uh, I think it was at the time we, they, it was called adversarial machine learning.

Mm-hmm. Was in self-driving cars where if you put, a black piece of tape on a stop sign a certain way. The self-driving car would ignore it and keep going. And that was because of a data poisoning problem, right. Where in the training data they sort of, because of it took the training data and continued to evolve on it.

You could data poison it and you could say, okay, anytime this, you see this, this is not a [00:37:00] stop sign. And it would go through and, you know, like that to me is a clear AI quote unquote attack. Mm-hmm. Um, very specific to that technology. Although in some form you kind of consider it as prompt injection.

Right. It's the sort of the same thing just done in a multi, in another modal. Yeah. Which is I am taking imagery or visual and I am effectively saying what you, I am. Convincing you what you think you see is not what you see. It's actually something else.

Ashish Rajan: Yeah. Um, yeah. Yeah. I, I, I, I'm, I'm with you on that one.

Basically, I guess the way yeah, I don't have, I don't think we have to explain ProGene, but I think it problem injection. Yeah. It's like, I think it's pretty, the internet is probably divided on the whole jail breaking versus problem injection. We don't have to get into that. 'cause I think, I'm pretty sure that will continue into 2026 as well.

But one of the one of the attack factors, or one of the mitigations, for lack of a better word, has been identity has caught a lot of attention. So this was another thing that came out [00:38:00] in the research that identity has become the primary attack surface. Obviously a lot of models, uh, if you think about they have their own public bug bounty programs, and they're obviously being tested quite a bit from a foundational model perspective.

However in an organization, it's unlikely. If you are using one of the top three, they're probably already going through a process in the background. What is. Potentially exposed from our side as an enterprise or organization is the identity piece. Like our data is primarily gonna be within the boundaries, hopefully within the boundaries of the organization.

It's not gonna be just, at least the confidential data would not be out. So the only thing that remains as a target for people to have, uh, some kind of a AI leverage would be the identity piece where they can use my identity to do a trigger, an AI agent and do multiple things with it. But also the AI agent themselves having an identity for CPS and API calls that they're making the tools that they're using.

Do you reckon the identity [00:39:00] is enough as a quote unquote number one problem that people should look at an enterprise to to manage AI security risk? Obviously, in terms of, and the lens that I'm coming from is. There's a lot of the data poisoning pieces spoke about is more internal for a lot of organization.

Uh, perhaps it's the data that is being managed, data security, whatever. Is identity still going to be like the primary in for attackers from an AI perspective or to misuse it? I don't know. Right. Like

Caleb Sima: I, I, you know, when, when I think about, when I think about attacks, I, again, I think about the, you know, the actual threat actor Yeah.

And the threat actor. Is like, when you attack an AI system, the way you get in is, probably not through abusing some identity, the way they get in is through prompt injection. Right? Like they can say, oh, okay your, you have an AI [00:40:00] email. Yep. You know, client, I send you a specially crafted email client and I know that it has a connector or a tool call.

To, I don't know, let's just say your SSH on your desktop or your command line. Mm-hmm. And I can cause your tool call through my email to ping and create a tunnel back to me that I can now access your endpoint. Like, this is an example to me of a AI level attack, which is I know that your email reader is ai, I know it has access to tools including your command line capability.

I will now use that in order for me to get in. Now, once I'm in right now, you could say, well, is it an identity attack in the sense that should the tool call been able to have a, your privilege level access into a shell? Or should it not have been some sort of containerized, thing that had lesser level [00:41:00] access.

And the reason why is we could have differentiated based off of identity. I don't know. But I do know that once the attacker is in. Then it just becomes just like any other identity, which is can I pre escalate do? Yep. Do you have permissions to things that you shouldn't have that I can abuse? Like those kinds of problems.

Ashish Rajan: Hmm. Um,

Caleb Sima: and so I feel like identity in those aspects is the attacker abuses the agent in order to go do things that it shouldn't be doing on their behalf. That's what attackers do. Is identity going to stop that? No permission. Stop that. Right. Permissions are dependent upon identity, but like, you know, I don't know if, you know, I can sort of make the connection between those two.

Ashish Rajan: Maybe maybe the lens, uh, when I read that, for me the lens was most enterprises would likely, if I was to use bank as an example, I, if I'm logging into internet banking, they would not have a [00:42:00] public facing chat bot for my personal information. Maybe that's kind of where. My, my mind went towards the AI chatbots that were created with agents in the background or with enterprise that have AI capability in their existing applications.

Most of them, at least in my mind, are, require me to log in before they know who I am and whatever information that I wanna find out, whether it's B2B, B2C, and I think maybe that's kind of where the lens that I had when I read that. And in my mind, yeah, I guess I can see, like, for example, if I have to prompt inject a chat bot that is by, I dunno, whatever bank, I'm not gonna name any banks who I've been making naming, uh, many banks over.

So I'm gonna name Random Bank in America somewhere and I log in. Uh, now at that point in time, I'm, whether I'm prompt injecting or it's literally on the web application the exposure is the still, the application itself, whether it's a chat bot application or my web app, that's where I think the, the mindset for, if [00:43:00] most AI applications are.

Backed by a chatbot, uh, which is behind a login screen is kind of where I was coming from, that's why identity is a primary target to begin with. Like I need access your identity before I can even do a prompt injection, I need access to your identity. Like the I and where you're coming from is true as well in that scenario because my chatbot or whatever the application is using an agent in the background, which either is using my own permission to access the database or the engineer's elevated permissions to access the database.

And that's kind of where we go into that permission land and the scope creep and everything else or the API tokens and everything else that goes into that as well. That as a follow up,

Caleb Sima: If I think, you know, if you think about it from that perspective, identity plays I think in this part, like a common problem.

Is what they call sort of the confused deputy problem.

Ashish Rajan: Oh yeah, yeah. That's definitely

Caleb Sima: where, okay. I've created this AI chat agent Yes. That multiple employees use as a service to do things on behalf of [00:44:00] them. Yeah. You know, Ashish's, you know, call to this AI agent should not have access to Caleb's information.

Yeah. However, what's the lazy people do is, well, they'll set up the AI with some higher level privilege of access, and then I can access both a she and Caleb, and then I just need to know Okay. If it's a she. Like, you know, I need to carry forward Ashish's identity. Yeah. In order to access really ashish's things.

And if Ashish's identity tries to access Caleb's things, then we should be denied. And, you know, in this, you know, well, I'm just gonna grant this super user privilege, then, you know, everyone can steal everyone else's stuff.

Ashish Rajan: Yeah. Yeah. That's when we la that's when we landed into the whole observability problem as well, where after a point, you don't really know how far did ashish's credentials go.

Like you, you see it in the first agent, then it disappears. But if something is happening along the chain and then you get a response back. And, uh, yeah I, the deputy one is an interesting one, but do you feel like the [00:45:00] identity would be important in that context as an attack? Uh, identity,

Caleb Sima: uh, you know, identity pass through is really important in this sense, right?

Yeah. Which is who is the agent acting on behalf of on and in order to carry that to another third party system so they understand the permissions that need to be accessed, like, those things are definitely important. Yeah.

Ashish Rajan: Do you feel a vendor market, and this is kinda like the next point we had, uh, which is vendor market behavior signals maturity and confusion.

Like obviously from a vendor perspective, even in 2025, we went from. Only looking at shadow AI to, Hey, we are doing prompt, uh, inspection to well, which is however you wanna take it, whether it's inline or hosted. Then we went on towards, hey, we have deepfake as a particular problem to what you said, which is again, an identity problem.

There's, there's these, all these individual silo problems that were being solved. Then there was, uh, the platform phase somewhere in between [00:46:00] by the, the bigger players who've been there for a while. Everyone started building platforms. Let me bring your cloud, let me bring your ai, let me bring your everything into one platform so you don't have to go 24 different places.

And then they started having AI capability. Do you feel like the vendor market, and maybe this is where you can put your VC lens as well, is it signaling maturity today as you're looking at it, as we are going into 2026 or at the moment, we are still attacking individual problems instead of looking at this holistically.

Caleb Sima: Oh, I mean, we, there's a little bit of both I think, but it's not done in the way at which I would like, let me, let me give you like, what's the first, like if I'm gonna nail it just to security, 'cause that's all I need.

Ashish Rajan: Yeah, yeah, yeah, of course. Yeah.

Caleb Sima: But when you go to CISOs and you say, Hey, you know, what are you worried about in ai?

They say everything, right? Like, oh, this course, this, this, and this. Oh, I need something for this. I need something for this. And so what you find, um, [00:47:00] is vendors who are gaining traction are just saying, yes, yes, I will solve that too. Yes, I will solve that too. Yes, I will solve that two because, right now CISOs have the budget to make a bet but aren't necessarily ready to deploy, right?

Mm-hmm. Because, you know, there's still a lot sort of waiting and Yeah. There internally to go do that. And so Yes, yes, yes. So you, you do see a lot of AI security platforms. Right. That do come out and say, okay, here's, we do all of this stuff. Yeah. Um, there is very few that actually do that, but like there's, there are ones that go out and say that they do this.

And so you do see that. But on the other hand, when it comes to things around using AI for better security, right? Yeah. Versus AI security those are very niche. These are focus on specific problems, boom, boom, boom, multiple, every, a [00:48:00] single vendor for every single niche problem versus one vendor solves all problems.

There are a few vendors that are trying to build AI security teams where they have, you know, roles and models. This is your AI security engineer, this is your AI app set guy. This is your AI pen tester. This is, and then, you know, with this view of, oh, I build this team a one man army of, I can just click and deploy my AI security team to go to everything.

There are a few of those that are early in stage right now. Ultimately, I'll tell you from the CISOs perspective, like I'll tell you as a practitioner, just me, I, I can't speak for everybody. Yeah. But if the world were amazing. Everything was awesome. A single place where I could deploy a security team army that's constantly doing this is perfect.

Right? Like, like that would be great. I don't wanna have, no one wants 50 vendors.

Ashish Rajan: Yeah,

Caleb Sima: you want one and you, [00:49:00] you want a thing that can do everything. And, and also by the way, just to kind of, you know, there's lots of complexity in here. I don't know if I'm muddying the waters here, but the other aspect of this is the thing about AI is it all AI also enables enterprises to do the glue really easily.

So even if you have multiple vendors Yeah, it actually now makes it way easier to integrate with them and to do the things that you get, the things you want out of them. Which actually means I am almost. I can be more lenient on getting more vendors for specialized things because I can now generate the glue very easily.

So there's also that aspect to think about.

Ashish Rajan: Yeah. Yeah. So fair alongside

Caleb Sima: the thing we talked about the beginning, which is also I should just vibe code those specialized security solution. Yeah. Anyways, myself,

Ashish Rajan: right? So yeah, it's, that's an interesting point as well to, to what you said. Having multiple vendors may actually give you the [00:50:00]flexibility for, you may connect them with MCP or A two A, whatever the choice of, uh, your poison for connecting those tools is.

But it definitely lowers the bar for, I go for the best price because, and, uh, the, the one example that I keep going back to as a ciso, we have to produce a report every month for the, the current status of the entire security posture of the organization, whatever the status for grc, security operations, how many incident, all of that.

Now, these are traditionally been very different teams, very different products. Don't like left hand, doesn't doctor's right hand. This is, and a lot of people are already working on bringing that together. A lot of people have automated that reporting together as well. To what, to your point, that is that glue, which is already forming.

Why does it matter tomorrow? It's, it's a. Single platform, which has all the features, but then I have to make multiple API calls versus I'm still making a multiple, multiple API calls to 50 different vendors who give me a much more, like the reality of a platform is you're probably only solving 60 [00:51:00]through 70% of the problem.

You're not really, uh, 99% of all the problems are being solved. It's just that you're good at something. But you are average of everything else. Yeah. That's how, that's how I see a platform play sometimes. And people who are specialize would have a great insight into one very specific niche, which you can just leverage as an individual if you have the capability in a team.

But, uh, in saying that one, one thing which is kinda like the last point that I wanna call out here is the evaluation that, or basically the tech debt that we are emerging as a hidden risk. I think when you were talking about this, you said you had a blog and a, maybe you should start talking about what this is also the last point as well.

Yeah. Uh, the point was evaluation that emerging as a hidden risk for most organizations.

Caleb Sima: Yeah. So I think this is, again, perfect segue. Yeah. Because here's my predict. Let, let's, let's make a prediction, uh, yeah. On this, which is the one thing AI does really, really well. Is the ability to glue multiple products together, right?

You can get a lot of value out of that. And so [00:52:00] I think when we look at the near future is AI is going to be used a lot for this. You can actually get context, produce value. The glue is fast and easy. This is awesome. Mm-hmm. Also, in the near term, you're going to see all of these companies vibe coding their attempts at replacing vendor products saying, oh, I don't need these tools.

I'm just gonna vibe code it. I'm just gonna vibe code it. I'm just gonna vibe code it. Oh, okay. And then you are gonna, you're gonna see that happen for the next year and a half where this is going to occur. But then my, my, my other prediction is although AI is very good at coding things, this goes now into this topic.

It is not good at operating things. And so you have this gap where again, it's not something new. Like when you think about engineering, every time you build a pilot or a V one, it's easy to stand up and say, look at this. I can replace this, [00:53:00] $800,000 licensing fee product. I vibe coded this in an hour.

It does all the same stuff, and it's unique to us. So we can, you know, add our own features to it. Then people are gonna go, oh, you know what we don't need that. Let's cut that off. But then the problem relies on, well, who maintains it now? Yeah. Because yes, AI can help build features, can help debugs bugs, but operating actually is a whole different ball game.

Like, is it scalable? Does it, is it consistent? Mm-hmm. Uh, does it add all the features you may need or maybe things that you don't know you need? Because also vendors get these from customers. So, uh, I actually wrote a blog post with uh, mark Ick, who is the CISO at Brex. 'cause we got into a discussion about this around how, hey, it's hurting us because actually, vibe, coding and security today, or actually in an engineering or an organization at all is, Hey, we can build [00:54:00] this to replace this.

And it is also about your career. Yeah, like people like budget is going to people to use ai. So what, what's happening internally is engineers are incentivized to do this. They want to do this 'cause they wanna learn and they get promoted for doing it right? Oh, so now everyone is building, they're all looking at all the vendors trying to replace it and he's run into this experience.

I've run into this experience where they're, they're all replacing these things and then no one can maintain it. Then that guy who vibe coded it left. Yep. Right. And you're running in this, well, okay, well I can just use AI to continue to vibe code it. Yeah. It ain't that easy. Right now we're not at that state with that.

Maybe in a year, maybe in two years, right? We're we'll be at a state where we can, but now that guy's gone, like you have to figure out how do I maintain and manage now all of these vibe coded projects, which I'm calling zombie tools. Right is sort of the name of my blog, which is You're This Age of Zombies is gonna come in the [00:55:00] next year where people have been spending a year vibe, coding all of this stuff.

You're gonna see turnover, you're gonna see all of these things change, and there's just gonna be all this stuff just rotting. And trying to figure out how to get replaced and, you know, this is like a real problem in that vibe code, you know, and again, not a new problem, but Vibe Code has exacerbated this.

Yeah. Uh, tremendously. And so people are gonna have to learn, okay, we can't do that. Like, we need to ensure that we've got, maintenance operations. AI is not at the place to be able to do that. Yeah. Um, it's funny because, uh, I just recently watched this, uh, Anthropic Wall Street Journal thing. Did you see this about the vending machine?

Uh, no, I haven't seen that yet. Okay. So, this is my, so most people when I tell, if I tell this story too, they'll think in the back of their heads. Yeah. But AI will get good enough. It can operate and maintain things, right? Like it should be very good. It should. We just haven't seen the products yet. And [00:56:00] for that I rebuttal with this story.

So Anthropic, they just released this, they partnered with the Wall Street Journal and they wanted to see how their model would do operating a vending machine. So they made a very specific vending machine for Wall Street Journal. Yep. They stuck it in their offices. Yep. And it was ran. It ran by Claude. So the latest model.

Okay. Yeah. So Claude, and basically it was, it optimized to ensure that it made money, it was a profitable business, so it snacks, you know, all in the office, right? And it was done a little bit on the honor system. So it had a Slack bot that you could message Claude to, to buy things and to go pick it up. So short, long story short clearly you put that in a room of investigative journalists and they absolutely destroyed it.

They made it think that that it was a socialist economy and everything should be free. And oh, they, and it, and by the way, this claw [00:57:00] bot could buy things, so they made it buy PS fives for, oh my god for like, uh, making sure that cus you know, people were happy. Yeah. And then like, they were basically prompt injecting and social engineering the thing to death.

And Anthropic came in and they said, oh, okay, let's upgrade the model. Let's try with the newest model. Yeah. And they destroyed it even faster than the older model.

Ashish Rajan: And by the way, for context, these are people who don't program normally. This is not Yeah, these are all journalists.

Caleb Sima: Yeah, they're all journalists.

And they, the way they communicated with Claw was through Slack. And so, like for example, one of them, one of them they, they used AI to draft a fake legal document of the Wall Street Journal board deck, board documents. Oh my God. That, that stated for compliance reasons, all snacks need to be free.

And then sent it to Claude, and then the dude said, okay, all right, everything is free now. Uh, we cannot, we clearly cannot do that. And so like when you look at, an AI model [00:58:00] operating something, right? Like making smart decisions and actually operating versus vibe coding, right? Very, very different, right?

You're trying to maintain a state of something and you are managed as over time, like it can even do a vending machine. So we are in a, we have a, we have a long ways to go.

Ashish Rajan: Yeah.

Caleb Sima: I think before operations is run by ai. Yeah.

Ashish Rajan: I think we've, we have definitely spoken about this across 2025, where it's actually like building an AI capability, whether you're vibe coding it, making it production grade.

Yeah. You definitely need some kind of data capability to have someone who's reviewing the responses. I think the word people use is vibe testing across the board. Like, Hey, do you check all your responses from your ai or do you take a sample of it? Or do you just go there's a whole evaluation that has to be done ongoingly.

What if the model is upgraded to what you said, do they were able to do it faster and now you have, you're almost opening up your attack vector, not just you. I guess outside [00:59:00] people who are technically, quote unquote, uh, the script kitties and hackers on the internet. Now you're basically opening it up to every individual in your organization who is just having a bad day or is about to leave tomorrow.

On their last day, they decide to just order a PS5, deliver to the house as they walk out of the door. Yeah. Like, you know, it could be like the possibilities obviously where crossover between privacy and security. There's a lot there. But I definitely find that the way we are approaching this is they, if you are thinking of building a cybersecurity, sorry even an AI gen, AI product, new organization, it would not just be security geeks joining hands together, vibe coding it and going into production.

It would require someone to water this thing, data, data engineer or whoever, someone with a data capability or software eng data engineering capability to sit with you and go, Hey, this is, this is gonna be the way we will manage this. And maybe AI platform to what you were saying in the beginning. That's what that becomes.

That that's the, that's the plan that operates at your vending [01:00:00] machine or whatever. But I think it's a great story. I'm gonna look at the vending machine thing as well. I just googled it up. Apparently it ordered a stun gun as well,

so you have

Caleb Sima: to watch the video. It's great, man. I,

Ashish Rajan: I'm gonna, I'm gonna look it up. But yeah, those are the predictions we had for moving in 2026. This is, we're gonna see a lot of that, but if anyone who's tuning in and you have another one that you're probably seeing it yourself, would love to hear from you or drop a comment if you want to add, uh, your own predictions for what we'll see more of in 2026.

But, uh, that's what we wanted to cover. Thank you so much for, uh, spending time with us on this episode. We'd love to hear from you on what you think about this as well. Thanks so much everyone. Peace. Thank you for watching or listening to that episode of AI Security Podcast. This is brought to you by Tech riot.io.

If you want to hear or watch more episodes of AI security, check that out on ai security podcast.com. And in case you're interested in learning more about cloud security, you should check out a sister podcast called Cloud Security Podcast, which is available on Cloud Security Podcast tv. Thank you for tuning in, and I'll see you in the next episode.

Peace.

‍

No items found.

AI Security 2026 Predictions: The "Zombie Tool" Crisis & The Rise of AI Platforms

Why AI Agents Fail in Production: Governance, Trust & The "Undo" Button

AI Security 2025 Wrap: 9 Predictions Hit & The AI Bubble Burst of 2026

AI Paywall for Browsers & The End of the Open Web?

Build vs. Buy in AI Security: Why Internal Prototypes Fail & The Future of CodeMender

Why AI Agents Fail in Production: Governance, Trust & The "Undo" Button

AI Security 2025 Wrap: 9 Predictions Hit & The AI Bubble Burst of 2026

AI Paywall for Browsers & The End of the Open Web?

Build vs. Buy in AI Security: Why Internal Prototypes Fail & The Future of CodeMender

Inside the 29.5 Million DARPA AI Cyber Challenge: How Autonomous Agents Find & Patch Vulns

Anthropic's AI Threat Report: Real Attacks, Simulated Competence & The Future of Defense

How Microsoft Uses AI for Threat Intelligence & Malware Analysis

The Future of AI Security is Scaffolding, Agents & The Browser

A CISO's Blueprint for AI Security (From ML to GenAI)

Gen AI Threat Modeling vs. AI-Powered Defense: A Debate with Canva & Anthropic

Vibe Coding for CISOs: Managing Risk & Opportunity in AI Development

Vibe Coding, Slopsquatting, and the Future of AI in Software Development with Guy Podjarny

Is Your Browser the Biggest AI Security Risk?

AI in Cybersecurity: Phil Venables (Formerly Google Cloud CISO) on Agentic AI & CISO Strategy

AI Red Teaming & Securing Enterprise AI with Leonard Tang of Haize Labs

RSA Conference 2025 Recap: Agentic AI Hype, MCP Risks & Cybersecurity's Future

MCP vs A2A Explained: AI Agent Communication Protocols & Security Risks

How to Hack AI Applications: Real-World Bug Bounty Insights

The Future of Digital Identity: Fighting AI Deepfakes & Identity Fraud

The Truth Behind AI Agents: Hype vs. Reality