00:06
Hello, everyone. Welcome to today's tech talk on building data pipelines for AI agents. We are so happy to have you here. Twenty twenty-six really is the year of AI agents, and while everyone understands the vast potential of getting agents into your business, it's SafeMode to say we are all trying
00:28
to figure out how to exactly make them work in practice and at scale. I'm Erin Stevens, Senior AI Product Marketing Manager here at Everpure. Today, I'm joined by Andrea Moccia, VP of AI/ML and Data at Options Technology. Welcome, Andrea. Thank you for having me, Erin. It's good to be here. Yeah.
00:50
It's so good to have you here. Andrea leads AI/ML and data strategy for the financial technology firm, Options Technology, that powers six hundred plus asset managers, hedge funds, and forty of the world's largest banks. So he knows a thing or two about, making sure that AI works at, at scale, in practice,
01:13
and in a way that can be trusted. He's also built Options' Sovereign AI platform. And then we're also joined today by Amir Bassir. You may have seen him before from Everpure. He is one of our AI Solutions Architects.
01:29
He designs production AI infrastructure for enterprise customers and has a deep background in data systems, vector stores, and agent orchestration. I have learned so much just working on preparing for this webinar with Amir, and I am so excited for you get-- to you all to get to watch him run through the architectural walkthrough and demo that we're, we're going to go through today.
01:54
A lot of really rich, great content. So very much looking forward to, to getting to, speak with both, both of these, both of these experts. All right. So everyone's talking about AI agents. Almost no one is talking about what it actually takes to run them in production.
02:14
So today we are going to, we're going to go through that. First, we are going to hear, that real-world view from Options Technology. Then we're going to go and talk about some foundational principles just so that we're all on the same page. What is agentic AI? What do the pipelines look like for agentic AI to access data?
02:34
And then we're gonna g-dive deep into Kontxt and Kontxt engineering and actually get to see a demo in practice. So this is, this is what I'm, I'm very excited, very excited about. So before we actually get into the architecture, again, I'm so thrilled to bring on someone who's living this every day.
02:53
Andrea, welcome. Thank you. Thank you. Awesome. Awesome. Well, I'm so excited to pick your brain about, some of what we're going to talk about today. So I'll jump right in with my first question. What's actually changed about how AI agents consume data in the last twelve months, and
03:13
what new problems do agents create? Well, the model isn't really, what has changed, Erin. What changes the access pattern. So, Andrej Karpathy said something at Sequoia just a few days ago, and that I was listening to, to him over the weekend, and he called this, software three dot zero.
03:40
And, and his point is that the context window is now the programming surface. So you're not writing code and running it on data. The data that you put into the prompt is the program, and that is such a clever insight, in my opinion, because what you feed to the model determines what the model does, right? And that should change how every infrastructure team, thinks about their
04:09
storage layer. It's not just an input anymore, it's the most important part of the stack. So with traditional AI, even sophisticated RAG, you're doing, effectively targeted retrieval. You, you, you know what you need, you run inference, a-and it's one request,
04:29
goes to one model, you get one response. It's predictable, and it's contained. Agents break that pattern completely. So an agent on a KYC review will bridge different datasets, will, will, will cross structures, will do connections you didn't anticipate.
04:50
They are exploratory by nature, and they run on machine speed. So not one query, but hundreds of calls per task, and every one of those eating data, Evergreen//One adding a latency that compounds across the chain. Many firms, includ-including us, by the way, just until a few years ago, spent a huge effort in tiering data aggressively.
05:17
So hot data was staying fast, and everything else was pushed to cold, pushed to archive to cut the costs. And that was smart when humans decided what to retrieve. But agents don't follow your tier plan. They go looking, and if the data that they need is sitting some-somewhere low,
05:40
the agents either stall or, worse, make a decision without that data. And that is not just a performance problem anymore. That's a liability. And speed is only part of the picture. Agents, face a really unique paradox.
05:58
They are data hungry-They need to pull data from everywhere, but they are also data fragile. The more context they accumulate, the, the worse they get at using it. And the crazy part is that they are self-poisoning. The data that they, they generate doing a task, fill the same context window that they need to
06:20
stay sharp. So their own work becomes effectively noise. So every tool on the loop creates a, a new Kontxtual, tool outputs, reasoning traces, intermediate results. There is a, an Entropic piece that says that context must be treated as a finite resource with diminishing returns.
06:43
LLM have this attention budget, and token that they generate shrinks it. Mm-hmm. Mm-hmm. So th- this is effectively an it's fundamentally part of the transformer architecture. There's not much that can be done. And the cost of this Kontxtual, it grows quadratically, because the full history gets
07:07
reprocessed at every step. So a 20-step task doesn't cost 20 times a single tool. It costs 400 times. Right. So the infrastructure requirements now have really completely inverted.
07:22
It used to be store everything and retrieve selectively. Now it's make everything accessible but deliver surgically. Yeah. So the agent needs the right data in the right quantity at the right moment, and nothing else. Too slow, and, and the agent stalls.
07:41
If too much data, and then you have this Kontxtual rot, right? And too stale, and it makes the wrong call. And that means that you can't have your data fragmented across six system and six different systems with three different access patterns, and then two of the system are offline. Your data layer needs to be unified and fast, or the agent, is, crippled.
08:07
And that's the shift, I think. The, the data layer isn't just supporting AI anymore. It is AI. Yeah. Yeah. Oh, that's such a good point. And, and if I You know, just listening to you, the, the idea behind an agent is so
08:23
fundamentally different from even generative AI, where you're writing a query, and it's going out, but you've predefined where, where that RAG pipeline can actually go and get data from. An agent is it's alive, right? It's curious. Yeah.
08:38
It, it's gonna go exploring to find more data to just like a human would, right? I think about when I'm tasked with a problem to solve at work, right? I go and explore. Yeah. And so those rail You know, taking away those rails, you really, you have to think so differently about the way that you're building systems so that you're, you're actually, you
08:58
know, allowing the agent to access those things- Mm-hmm but also having to put all the right controls in place, right? To, like you said- Yeah make it surgical. Yeah. Yeah. I mean- You don't know where it will end, like, and they move fast.
09:12
Yeah. So, yeah. Yeah. Exactly. Exactly. And, you know, in financial services, if I understand it correctly- Mm-hmm having context lag isn't just a performance problem, right? It can even be a compliance violation. You were talking about ha- having the right Kontxtual, making sure agents are aware of what
09:31
the data that they're, they're accessing, you know, how fresh it is, things like that. So how do you think about data freshness as a governance requirement, not just an engineering requirement? Yeah, 100%. So I, I want to frame this. Like, in March, four UK regulators co-signed a paper on agentic AI, and the core message was
09:53
one sentence: When an agent breaks a rule, the company gets fined. Not the agent, not the vendor, the company. My agent did it. It's not defensive. So how does that relate to, to freshness? Okay, so take sanction screening.
10:10
Most regimes are strict liability. Intent doesn't matter. Sanction lists update multiple times a day, and say for 80% of companies, a human analyst takes about, five minutes to clear a single alert, and this is slow. An agent can do it in seconds, which is great.
10:29
But the analysts bring judgment. They stop when something feels off. An agent doesn't hesitate. If that sanction list is stale, you haven't built fast compliance. You have automated exposure.
10:45
And also, with conventional software, you can replay a decision. Same inputs, same result every time. Agents are not like that. They are probabilistic. So run the same data through the same agent twice, you might get a very different answer.
11:01
There was an article on Financial recently saying, that the industry has been arguing about agent capabilities, but the harder problem in reality is agent reconstructability. You can't reconstruct that decision that an agent made after the fact. You have to capture it at the moment it happens. So the way I see it, a- and this is how we approach with, with our clients, the data
11:28
pipeline is now a governance surface. So three things have to be true simultaneously. Data has to be current, not just at ingestion, but at the moment of inference. You have to log, log exactly what was served, the data version, the model version, the, the settings, the full context window of the agent at the moment of the call.
11:51
And finally, to go back to this point, you have to deliver less, not more. So the instinct is to give the agent everything. But, as we were saying, agent accuracy collapses as context get bigger. So the design principle is simple: just in time, not just in case.The agent that win won't be the one with the biggest context window.
12:13
It will be the one with a small ultra current carefully engineered prompt. Yeah. Oh, that's such an interesting way of, of thinking about it. And, you're right, it, it Agents don't, don't notice if something's off, right? They don't, they don't think twice.
12:35
Yeah. You have to make sure that they are getting the most up-to-date data. You have to make sure that, that they have just what they need to do their task or make decisions, or they're gonna go get distracted. You know, I almost think of agents I, I have a nine-year-old, and she is so smart, and I It,
12:54
it blows my mind, honestly, y- you know, the things that she can understand and comprehend. And then there are these moments where I just think, "How did you not know to check that or do that?" Yeah. 'Cause I, I get kind of lulled into thinking, you know, she's, she's got an adult level of reasoning and processing, but she doesn't, right? Yeah.
13:14
But she is able She has a lot of intelligence in certain areas. I feel like agents are like that. It's like setting a nine-year-old loose in your, in your system. Oh, yeah. So you have to give them really good rails. They call this staggered intelligence.
13:27
Like, they're, they're, they're exactly like that. They can be like PhD level on one task, and then the second after, you're sh- screaming at them because- Yeah they fail on the simplest thing. Like, why? Yeah. Yeah. But that, that advice is so useful to think about, you know, it's not the agents who
13:46
have the biggest context windows. It's not the agents that have access to the most data. It's the agents who have access to the data at the right time that are going the best decisions. That's, that's such useful advice. So when you're building a pipeline for an agent, say it's a know your customer,
14:08
use case or fraud detection workflow, what is the hardest part? The hardest part is that when agents fail, they fail beautifully. These models are the best storytellers on the planet. They, they will take something wrong, like a, a stale entry, an incorrect data point, and
14:30
they will polish it. They will wrap it in perfect grammar, structure, total confidence, and then pass it to the next agent in the chain, and that agent won't question it. Why would? It looks great. There is a term, researcher use for this, conformity bias.
14:49
When one agent in a chain makes a confident assertion, the downstream agents align. They don't push back. One hallucinated fact introduced early gets reinforced at every hop until you have full consensus locked in across the entire workflow. And these models don't know what they don't know. There is a benchmark called the Omniscience Index.
15:15
You can find it on, artificial analysis. And very few LLMs score above zero because the benchmark, what it does, it penalizes confident wrong answers. And LLM are not rewarded during training to say, "I don't know." Right. So the question becomes, how do you catch this?
15:35
And right now, I think the industry's answer is, is effectively, well, use another agent. AI watching AI, right? And Gartner says 15% of all agents will be guardian agents by 2030. And Azure that helps, but you're still one probabilistic system, to catch another probabilistic system. What we think work as a more fundamental f-
16:03
first defense, and, and this is what we have learned, building these pipelines, is making the infrastructure a participant, not just a pipe. To give a few ex- examples. The data layer has to validate freshness before it serves, not after, with time to leave on every data object.
16:25
It has to track, the provenance and the jurisdiction in its metadata. It has to version what it's giving to the agent and enforce, what have we told. And it's also important it does all of this at inference speed. The design principle is simple. Have a deterministic foundation under a probabilistic system.
16:50
Mm-hmm. It's not just, guarders though. The data layer has to also be structured with LLM in mind, thinking about how they think. You can have, like, perfectly fresh, perfectly provenanced data, but if you hand it over to the model as floating fragments without structure, then the model has to figure out the relationships, and that's
17:18
an easy path for hallucination. And the reason that we're investing heavily in, pipelines to build So not replacing back to retrieval, but layering structure on top of it. We- Yeah. Okay. With flat With flat RAG, it's, it's extremely hard to look at an embedding and,
17:42
it's hard to explain, why a specific chunk and not another was relevant, right? Uh- Yeah. With a knowledge graph, the agent is effectively traversing explicit relationships. So from one entity to another, from one fact to its source. The retrieval path itself is the explanation.
18:05
The, the model at the top is still like a probabilistic model, of course, that doesn't change. But the path to the data is no longer a black box. And it's also worth noting that, in, in many of our use cases-Uh, the actual power of these models is, like, ex- vastly, improved.
18:25
So- Yeah yes, that's where we're spending a lot of, of our, of our effort on. Yeah, and I've actually been hearing this conversation in, in a few other pockets as well, and so I wanna make sure that everyone listening, you know, really, really understands the significance of this. I mean, one, agents can hallucinate as well, right?
18:45
So we have to address the same issue when we move to agents. And arguably, it's even more important if generative AI hallucinates, you still a human sort of checking that. But if an agent has the power to go make a decision off of a hallucination, well, then you're in real trouble.
19:01
So bringing those guardians in. And then also, the second thing you're talking about, layering a knowledge graph into your system. It's like, it's like a giving, giving it an uber brain, right, to the system. So the system understands, you know, the agents have, have a map or a pathway, right,
19:20
to understand, what different data means, how it relates to, one another. And I think this concept of knowledge graph, I mean, I know it's not new, but I think it's something that folks are going to hear a lot more in the mainstream conversation as we talk more about making sure that agents, have access to the right data. Andrea, this has been so fascinating.
19:45
I feel like I could talk to you for hours, and that our audience would just love to, hear, hear more from you. I know that we already have you on our website in different places, and so- Yeah folks see you there, and I know, I know we have you speaking at, at Accelerate, I believe. And so, you know, it, it's, it's gonna be great to, great to have you there, and so good
20:05
to, to have you here, giving us your expertise. I just I can't thank you enough. Thank you so much for, for taking the time and sharing. Thank you very much for having me, Erin. It was a pleasure. Awesome.
20:17
Thanks, Andrea. All right. So we'll bring Amir back. Amir, wasn't that fantastic? It was awesome, yeah. That was some really good insight from the production floor. Yeah.
20:31
Yeah, absolutely. I love speaking to Andrea. Just, just a wonderful, wonderful fountain of knowledge. So, yeah, some, some really, really fantastic insights there. All right. So let's get into it. First, we're going to take just a couple minutes, to talk a little bit about some of
20:49
the foundational princil- principles of agentic AI. I promise I won't take too long before I, I pass it over to you, Amir, so we can get into the heart of it. But just really quickly, for everyone here, you know, AI is changing so rapidly. Four years ago was when, the first, OpenAI GPT model was released to the
21:12
public, and just look at how much has happened since then. We're moving at such a rapid pace. We went from predictive AI that was all about classifying historical data, batch jobs, stable data sets. Periodic refresh was fine here, right? Then we went to generative AI, which really shifted that focus to creation using more text,
21:33
images, code. And, and at this point, this is where, high throughput s- data storage became important to make sure that, that you could train your systems, quickly enough. Agentic AI now changes the paradigm again, right? Agents plan and act across many steps, so you start seeing things like real-time data access,
21:56
persistent memory, and low latency retrieval are now the baseline to ensure that those agents can get access to the data that they need when they need it. And, you know, as, Andrea put it, you know, we, we have these agents who they're not just consuming data, right? They're, they're putting, They're depending on it to make these decisions.
22:24
And they need just the right amount of data at just the right time, to be able to do that. So let's talk a little bit about what makes up an agentic system. Again, we're gonna dive into the data layer, but I want you to have the exactly what makes this up.
22:43
The large language model, or the LLM, is really the reasoning engine. It receives the, the data and information, the Kontxtual, and the plans, and then it produces the output. The tools and APIs are what the agent can actually do with that output, whether that's querying a database, calling an external service, or searching the web.
23:07
So it's both what it can do to actually go and get the information it needs, but also, what it can do with the output there as well. And then memory is short term and long term, and typically lives in a vector or a relational database. And orchestration manages multi-agent workflows.
23:28
So once you, kind of like Andrea was talking about, once you have multiple agents in a system working together, that orchestration becomes really important. And then the data layer is the foundation of ev- that everything else really depends on. There's elements in so many of these places that really depend on that data being optimized and working right.
23:51
Without it, none of the rest works. So with that, Amir, I am going to pass it to you to, to, to take us a little bit deeper. Awesome. Yeah, thanks, Erin. And, and I love how you, pointed out the, baseline requirement shifts, the, the whole idea around real-time data access, persistent memory, low latency.
24:12
They're basic requirements to get into this space, and it's exciting time to be in this space. Um-So if we look a little bit at this architecture map, right? What we're really looking at here is a full map of all of the, the sources, right? The, the, the transformation, that's required, the vector stores, the consumption, the
24:33
feedback loops that really kind of go into a play to make a full system ready to go. What's really important to kind of note are the three primary domains that exist. So we have our storage really, or our capacity, at, tier. We also have speed, right? Just being able to do ingestion, do transformation, being able to, to have the,
24:58
the data, available, and then the freshness itsel- itself, right? Or the state, of the data set itself, as well as the, the memory that, ultimately being translated into it. Each one of these pieces will show up as we walk, through this entire pipeline, so it's really just kind of a reference, right?
25:20
For all of us, as a mental, mental map. So if we then look at the, the kind of day one operations, right? And what you actually need to consider when you're building out a production system, there are, like, four main items that really, come into, architectural, design, right? So latency is probably the first, and the one
25:48
that we talk a lot about, which really covers, you know, the, the, the speed of being able to retrieve that data, put it into Kontxtual, and be able to, get into the reasoning and, and, following the breadcrumbs. But if we then also look at the freshness, you know, or the, the scale, right? And observability is how you actually know when something breaks in production or how do
26:17
you know something has been effectively, completed. If we kind of look at a POC to a production deployment, you know, POC with one agent becomes very quickly 10, 10 agents or really 10 pieces that you have to now address in production. So you really need to have an understanding of how this is going
26:42
And obviously, as things get more complex, the freshness question comes into play. So what do we do, if, you know, a state or a, a, a portion of this produces a wrong decision? And as Andrea showed, right? Like regulatory environments, this becomes a compliance issue.
27:01
You can't point it out to an agent. It, it is, you know, company's problem or, or yeah, that just really needs to, to be addressed. So it's not really just purely about performance, but really about completeness. And as we kind of go into the next section of this, right?
27:20
The, this agentic data pipeline, typically, you know, kind of has multiple pieces, right? From the query all the way down to the response. And, you know, if as we kind of look at the brain or the core of this, you know, we're looking at, a, an agent in this case where a query comes in, and this
27:42
agent has to kind of establish its context. So it's gonna do a lookup in CRM. It's gonna maybe look through some troubleshooting docs, right? That could maybe exist in a vector DB, as embeddings. Or it could potentially orchestrate our calls, that, you know, through an MCP that is gonna
28:00
maybe be looking at, information in some other system. Really in the route's response, w- like all of these responses basically have to go through a judge or most likely should go through a judge to see if this is, this is acceptable, right? Before it actually goes into the, the, the actual, response generation.
28:24
So, each data source here is going to have a different performance profile, each connectivity, right? And, they will all not behave the same way. And really, these pipelines ultimately become as fast and a- as its slowest link or ability to move the data from the stage to, from one stage into a next stage. Yeah. Yeah, absolutely.
28:48
Amir, I wonder if you could, I love this slide, by the way. It just makes so clear exactly what you to, what you need to be considering when you're, when you're building that data pipeline for agentic AI. For folks who are listening that have built a RAG pipeline before, how is this agentic data pipeline fundamentally different?
29:11
That's a great question. You know, and I think what we are required to do is maybe look at RAG as a fixed pipeline, right? You had a query, and you would, you would be retrieving that information. But agentic becomes more like a, a researcher, right?
29:27
You need to understand the information, understand the context, understand if it's potentially maybe outdated or if it's conflicting with one of your previous steps. So, what you really have to do with agentic is you have to find a path to reconcile, the previous step and really understand if this is giving you the, the, the correct answer to the breadcrumb. Mm-hmm.
29:53
Yeah. Th- I mean, that it relates so much to what Andrea was talking about with, probabilistic versus deterministic, right? It sounds like the RAG pipeline is still a little bit more deterministic. You know, you can kind of define the path and, and with the, the agentic AI, it's more probabilistic, so you kind of you can set some guardrails, but, but there's gonna be a little
30:15
bit more exploration, in that, that, that you have to consider and allow for. More moving pieces. Absolutely. Yeah. Yeah. Exactly. Exactly.Um, okay, yeah. So if we kind of look at the, what has been happening, right, in our, in our of this, it's been primarily focused on, you know, prompt development, or at least this is
30:41
what we were doing previously, right? And, it was prompt engineering is really crafting kind of one good input and, and to get a really quality output out of it. And what we're primarily focusing on now is this Kontxtual engineering, right? An, an ability to have a well-defined architecture, with these decision points and
31:03
being able to understand, the, like how dynamic the data is, how fresh it is, and how precise it actually is in, in terms of giving us the answers that we're looking for. As Anthropic put it, right, context is a critical, but finite resource for, for our agents. And with that, what that really means is that, you know, we have to treat it as that, that
31:29
finite resource and be able to, really understand how to apply it, in a, in a best way. If we have bad Kontxtual management policies, it really compounds and it, it falls apart fairly quickly. Users lose faith or don't see the value, and then, we have not really advanced the,
31:53
the mission forward at all. So how do we manage context, right? And what are the dimensions that are associated with this? Well, you know, freshness and, or the, the, the, the, the time dimension is extremely important, right?
32:08
Is, is the data fresh enough to, to actually act on? We also have things like location, state, basically, what was the, the previous, answer to this or what was the memory, right, contained around this? Are there cultural or domain differences, right? Is there a frame of reference that the user might be facing, or agent might be facing that
32:33
we want to address? So, like governance also comes into the extre- extremely it's extremely important, right? Like, what rules does, does the organization care about, and what be put into that place to, you know, implement if there's safety or data or other things, right, that, that, that we need to kind of address.
32:56
So it's really important to have this, checklist, kind of run through it as you design your, workload and understand, you know, the, the, the, the, the parameters that you would be looking at around each and one of these pieces. Yeah. This is such a good checklist to, to have. And, and, you know, if, hopefully folks are, are screenshotting it if, if they wanna keep
33:24
it for, for later. Of these six dimensions, which one would you say teams will sometimes skip or overlook that, that they should make sure that they're really paying attention to? Well, I would say probably, you know, the, the domain or the identity piece, right?
33:47
Is, is the user, do they have right access to this information? A lot of teams focus on knowledge and history, which is great, right? Those are extremely important. Identity kind of helps paves the- pave this way very clearly for, you know, governance, and ability to, to manage, the, the Kontxtual properly.
34:12
Mm-hmm. And I, you know, I think, I think about, a lot of folks that we talk to who maybe have built an agent as kind of an experiment or a POC, and they probably didn't really have to worry about that piece as much, right? You're using, you know, dummy data or a sample set of data. That's right. Yeah.
34:29
But then you go to move it to production, and that governance piece becomes, you know, the, the, the blocker potentially, right? And so, we always talk about making sure you're building on systems that going to help you with that, that governance piece, and so that it, it doesn't become a block or a problem. And because it's so
34:47
critically important, right? We, we need to be able to- That's right trust agents, and part of that is trusting that they're only accessing what they, should be. That's right. Yeah. I mean, there's no, better way to kill a project, right, is for that governance to go off rails. And, yeah, absolutely spot on.
35:09
So yeah, if we c- move on and look at, kind of traditional pipelines, right, or traditional ETLs and kind of what is changing around, the Kontxtual engineering, what we're really, trying to do is that understand that ETL is still relevant. However, what we're now really doing is doing it more in a Kontxtual, state of, of it, right? So we're, we're, a- as we're evaluating how
35:42
data is going to go from, you know, this access to, to an answer portion, what we're really trying to, focus on is, do we have the right freshness? Do we have the format awareness? Do we have lineage, right? Do we have the semantic relevance that actually matters for this, for this agent?
36:04
And, really be able to, position the, the workload, right, to be able to move it, to, to move away from just moving data around, but really understand the state of it, the Kontxtual of it, and actually apply it in a correct, correct state. And really kind of, if you look at it from our perspective, right?
36:29
This is really where we come into play, for-For this type, these type of workloads. So Erin, if you can advance the slide, right? And what we have done is we have built this unified platform, right? And, that's really what, what it's all about.
36:48
So I'll let you say a few, few words on it, um- Yeah and go from there. Yeah, for sure. So yeah, I mean, this is really where, Pure comes in. You know, the unified data storage platform that you might already be relying on for your database workloads, for your virtualization workloads.
37:08
Well, guess what? We've been out ahead developing, for what's coming next with AI, and so in this, platform that you are already invested in, or, you know, already working with, you can also support the needs of AI. You know, you get real-time dependable performance, you get that production-grade resiliency that you're so used to, and, and the flexibility for,
37:37
for your data infrastructure to grow and flex with you as your scale needs change, as, requirements of AI and demands of AI change, because this time next year we're going to have a completely different conversation about what's needed, but rest assured our engineers are going to be, be building for that as well. And then of course that data governance and Kontxtual awareness built in.
37:59
And you know, this is a really interesting space, for, for you to watch, especially as Amir was talking about on that last slide of the paradigm shift from ETL, where you sort of take the data, you transform it, and then you put it somewhere else. Instead of that paradigm, AI really needs to be able to access the data where it's at. And so things like having a knowledge graph that grants it that brain, having a semantic
38:26
layer, all of those things start to become really important. And so, you know, wa- watch this space, watch us, you know, listen for those things and, you know, definitely more to come, more to come on that. So, if we look at, kind of a context development, right?
38:49
We really have four different models that we can actually implement here, and they drive really this balance between accuracy, latency, cost, right? And, and just being able to understand what choice, should be applied to a specific subset of the problem. So just for a second, imagine you kind of walked up into a library and you're asked You've walked in front of a, a, a reference
39:14
desk, right? And you need some questions answered, right? And what we're gonna do today is primarily kind of focus on demonstrating what these patterns actually look like, how do they behave, and what does that actually mean for, you know, getting your answers. So, you know, if you look at that reference desk as an AI, agent, right?
39:40
What you're really looking for is you want something that's fast, you want something that's accurate, you want something that's cost-effective. So with RAG, right, and what you typically have is you would have, you know, that librarian in front of a reference desk be able to retrieve information based upon the card catalog that already has, right?
40:02
They'll be able to find, let's say, maybe three most relevant books and be able to pull them off the shelf and basically say, "Here is your answer," right? Here is your, reference to the information that, that you're looking for." With some of these other patterns, like TAG, what you're actually looking for is, is you're looking not just for the snippets that are coming out of that, library, but you'll be looking The
40:28
librarian would basically have access to the full encyclopedia, right, that's available at their desk at all times, and they have that information ready and are able to retrieve it and use it, to give you the answer, within the full context of that full document, right, that can exist. If we, for example, look at, CRAG, what you're really doing in this kind of
40:54
scenario is that, the, the agent or the, the, the portion of it would really have ability to give, kind of a cheat sheet, right? That's, that's exists already on a desk. It has, information, available there from the previous conversation. So before you actually are e- even able to ask your question, librarian or the, the person,
41:18
right, sitting in front of this referen- reference desk is able to give you that answer, that quickly, right? So information is stored, and it exists. So when we also look at KVDB, and KVDB has been out there for some time, right? What we're really looking at is, you these long conversations, ability to take all the interaction back and forth, maybe you
41:46
fine-tuning, that lookup, and be able to kind of do a shorthand note that's saved in your, in your memory or potentially even offloaded onto the, onto storage, right? To understand what was the, the question that was asked, maybe a previous question that was asked by multiple speakers or multiple, having this interaction with the librarian, and they're able to now retrieve that information, pull it up, and, and be able to
42:18
answer the same, information again. So all of these pieces, are valid as architectures and, they can be used in conjunct- conjunction, right, to actually craft the perfect ar- perfect architecture or architecture that will work to give you, give you those answers.And Amir, on the bottom two, the, the CRAG and KVDB rag, are, are those Is that caching
42:45
happening per user, or per agent? Or is that-- is, is what other users or agents are querying now available, to others? Yeah. So that comes back to that important question, right?
43:01
Around identity, right, and roles and so forth. So, it would be most likely caching across your entire user space, right? But actually where the lookup comes in, right, and how you orchestrate those pieces, that's extremely important. Okay. So how that information is retrieved, it would
43:21
be very much, dependent within the, the, the context of maybe permissions, right? That the- Right users have access to, to bring, bring back into, into, into decode state, so. Yep. Okay. That makes sense. so let's look at a demo of this, right? And, if we just kind of look at a little bit
43:43
of, of a setup here, right? So what we have done is we have preloaded all of this, Wikipedia information, and what we're gonna actually do is load the, the context of, in this case, Vietnam, right, into, into, a RAG, environment, right? So we're basically gonna set up, an embedding, right?
44:05
And we're gonna create a Wiki Docs, collection of, of information. This allows us to really just have kind of a base of all the content that exists. We're gonna ask some basic questions, and really what we have here is we're pulling up, once again, that same country. We're k- we're keeping the documents that are, retrieved at about three, right?
44:31
And we are storing this information, inside of that database. So if we do, if we do this kind of lookup, what we can actually see is that, you know, RAG, as well as, as, as CRAG, as well as KV cache, what, what we have is some metrics that have ca- been captured, on this content. So if you look at each one of these cards, right, what we are looking at, are, time to
44:59
first token, total time throughput, in and out tokens, right? And the basic question in this case is population and geography, right? So, so if you look at the RAG, the first card, right, is what you really have is kind of an efficiency play, right? What we're looking at is, is somewhere around six hundred millisecond total time, right?
45:19
It's a balanced throughput, and it keeps the input fairly small, looking at about, one thousand tokens. The caching, we basically bypass the heavy lift entirely, right? And you're able to get that answer at forty-four milliseconds, right? And y- are really able to kind of get the response back, for, for those common queries
45:41
or queries that were executed previously. If we look at KV cache, the primary keys the primary piece here is really throughput, right? Even though the total time is higher, what we're really looking at are the, the, the throughputs of a little over a hundred tokens per second. And this is what makes AI, AI or an, workload really feel, like, instant, right?
46:04
So as you're basically typing and asking for those questions, you're getting answers because we're re- reusing KVDB, states to avoid that, some of that redundant math. So if we resume and we kind of then also look at, CAG, right? So we're gonna load additional documents inside of, GPU memory, and then if we kind of pause here, and look at the what we're actually looking at, right,
46:35
CAG has now taken all of this information, loaded it into GPU memory. So basically the, the documents, have been loaded into GPU, memory, and we pay a penalty, right? Basically, this load fa- phase allows us to take all of this, information and load it into, into GPU memory. However, what we're really now, looking at, as long as we maintain that memory
47:01
that we really have this book open, and now we're able to do a repeated query, right? So if you had, for example, policy or something that you, you really wanted to, have all agents, have a-available and, and, be able to consume, you now are able to retrieve this information extremely quickly without this, initial load phase.
47:23
But, you know, I think the important piece out of all of these items, right, is that there's no universal best, right? It all, all of these are choices and as we understand, architecture's full of choices, right? So how we actually, consume these things really depends on our access patterns, what accuracy we're looking for.
47:44
What is our latency budget? Can we keep content in memory? Because if we are, right, and we're renting those GPUs, does that mean that we're gonna pay for the full time that, that it memory is loaded into a GPU, right? There's decisions to be made, and obviously depending on, what type of deployment you're
48:03
actually looking at, that's all gonna come down to budget numbers. But really the main message out of this is that, you know, great AI solutions, they really use all of these pieces in combination, right? We can use CRAG to maybe handle some, you know, common questions that occur, like in a, a help chatbot, right? Or, or something like, like that, right?
48:24
Where there is a lot of, similarity, in terms of questions, right? We can use RAG for, you know, specific, niche details, right? And then maybe CAG really comes down to situations where you have, like, very complex policy or, or s- you know, data heavy documents that you really, need to, interrogate properly.
48:49
And obviously KVDB cache, right, is, is, is a piece that helps us make all of this feel fluid, right? And our ability to, to-Um, use this information, in a kind of fluid sense, right? Or fluid motion to give that user really good experience becomes extremely, extremely important.
49:10
Yeah. Amir, I just I love this demo. I love un- you know, getting the chance to understand these four different methods. I also think it's so important for folks who are running the data storage infrastructure to understand these concepts, right? Because these are, these are the tasks that the, AI/ML ops developers are,
49:32
are putting on the system, right? That's right. And fundamentally, you know, you need different things from your storage system to make this happen than, say, when you're building a traditional application, right? Um- That's right like a web application, for example.
49:48
That's right. Yeah. It's just It's, it's so helpful, I think, to understand all these components and, and why, an, an, an AI, ops person might be building something the way that they are, and exactly what they're going to need from that data storage infrastructure. So I just, I absolutely love this.
50:05
Yeah. And, and, you know, that's extremely piece, right? Data storage infrastructure really comes into play for all of them, right? Yeah. I- in terms of hosting those vector databases, loading, or the throughputs, memory into I'm sorry, to get the documents into GPU memory.
50:23
Or really just, like, this KVDB and the accelerators, for example, that, that we provide, which really allow you to optimize that piece, right? And give you, a way to, to, you know, understand the economics and, and interpret the e- economics, and manage the economics of this, right? So you can actually serve all this information in efficient manner. Yeah.
50:45
Yeah, absolutely. And in, and in plain speak, you know, it The data infrastructure matters to ensure that, your agents are, are accurate, that they're performant. Even that governance piece comes in, right? All those pieces we talked about.
51:03
W- ultimately, at the end of the day, infrastructure is critical to making you can trust your agents and that your agents can, can actually perform as well. That's right. Yeah. Awesome. Okay. This has been a fantastic webinar. I learned even more going through this, Amir, so thank you so much.
51:23
We have a few key takeaways for the audience here, and then we might have time for maybe a question. I know this one's been, been a little bit longer. So, I But just wanna leave you all with, you know, the agentic AI really has fundamentally changed the data requirements again.
51:40
And, and so you need to think about those requirements and, and what they put on your data infrastructure, to be able to, to support, support your business. We have moved from prompt engineering to Kontxtual engineering. Context is king now. And the architectural decisions you make about how to get the right Kontxtual matter.
52:02
But it's not a one-size-fits-all, right? That you can There are, there are multiple that are valid depending on the access patterns and, and the constraints that you have. And again, at the end of the day, your data storage infrastructure really is the foundation that's going to ensure that you have, agents that perform and that you can
52:18
trust as well. Amir, anything you'd add as, as a, as a final take? Yeah. Archi- architecture is a choice, right? Depending on what you choose, and what access patterns, storage is going to be the foundational elements.
52:38
We understand that AI runs on data. We have heard, Andrea talk about it. You probably know it in your own space. But we strongly believe that data runs best on Everpure. Mm-hmm. That's right.
52:51
That's right. All right. Let me, check out the questions. I know that, we've had some, being answered in the chat as well. So let me see. I think we have time for one, so let's, let's pick a good one. For teams already deep into RAG,
53:12
what's the shortest path to Kontxtual engineering maturity? Yeah. That's a really good question. You know, Kontxtual engineering, maturity really comes down to just understanding how, you know We talked a little bit about some of the pieces that are required, right, to have kind of a complete maturity, inside of Kontxtual.
53:46
So understand which of those dimensions you're short on or you're not addressing currently within your RAG environments, and let's have a conversation to figure out how we can, you know, bridge that gap. Maybe there's a way to, you know, just reuse your, pipeline with maybe a few added to it to have that complete completeness of, necessary dimensions.
54:17
That would be my recommendation. Yeah, absolutely. That's great advice. And, you know, you, you said, said let's talk. I think that's a, that's a great takeaway here as well. Amir is one of our solutions architects.
54:32
He's an expert in AI. We have a whole team of them. If you are struggling to get started on, infrastructure to support, an AI deployment that you're working on, you know, definitely reach out to your contact here at Everpure, and, let them know you wanna set up some time.
54:50
Amir, Amir is happy to talk with you. Yes, that's right. Awesome. Awesome. Well, thank you so much, Amir, for joining and sharing your expertise. Thank you. It was amazing to have you and, and Andrea both, here today.
55:04
I, I know I learned a lot. I'm sure the audience did as well. And I can't wait to do more of these. Thank you. Thank you. Yeah, it was a pleasure. Awesome.
55:13
So before I leave, make sure that, you check out Pure//Accelerate happening at Resorts World in Las Vegas, June 16th through 18th. I'll be there. You can come say hi. We can talk about AI, and, and anything else you want. Make sure to, to register and, and get ready to join us.
55:35
And of course, if you can't make it in person to Accelerate, you can always come talk to us on the community. Lots of amazing conversations happening there. And, and Amir and I can, can reach out to you, that way as well. Thank you so much for joining us, and we'll see you next time.