00:08
Okay, welcome again. Hi, my name is Stefan Zimmermann. I'm a consulting field solutions architect here at Everpure. And today I wanna give you not a product pitch, but a technical walkthrough through a published joint architecture that we have together with Commvault, so Everpure and Commvault.
00:28
We will go layer by layer, explain what each one does and why it exists, and then walk through how the layers work together, for a real recovery sequence. I want to answer all the questions at the end, but if they come to your mind, please drop them in the chat or in the Q&A window, right, as we go, and I will answer them, if possible, all at the end.
00:56
Now, a quick outline for our time together. I wanna start with the business case, right? Why this problem matters, and why most organizations are solving the wrong part of it. Then we walk through the four-layer architecture from the inside out. So we start with the production and then working towards the most isolated layer.
01:16
We finish with the recovery workflow and how the layers then actually interact when something goes wrong, and then Q&A. Again, put your questions into the Q&A, into the chat, and I'm gonna keep put them, get them up at the end and answer them. Get into-- Let's get into the topic, right?
01:38
One of the first things that I, I need to mention is here, and probably one of the most important distinctions in this session is backup is not resilience. The industry uses these terms quite as synonyms, but they are not the same thing. Backup answers one question, which is: Do I have a copy of that data? But resilience answers a different question: Can I bring this back, this data
02:09
back into production when it matters most, so in an incident, right, and at the speed that the business needs? The industry average for ransomware recovery is about twenty-one days. this is not like, oh, bad luck, it just takes that time, right? It's a result of architectures which are designed around backup and backup speed and
02:37
backup windows, but not for recovery. A backup that needs three weeks to restore is, from an operations point of view, equivalent to not having a backup at all. So what I wanna put in focus from this point onwards is that the only metric we wanna care about is recovery time and recovery speed.
03:01
If we look on how attacks actually work, right? Most people imagine that ransomware is an immediate explosion. Files are suddenly encrypted. The ransom note appears. But this is not how it works.
03:17
Attackers spend weeks or months inside your environment first, right? They map the infrastructure. They find the backup systems. They understand how your monitoring and alerting works, and they identify all your failover paths. And what they do is they start disabling or
03:35
corrupting these first. Encryption is then only the very, very last step in all of this. If they have full control, they hit the button. By the time the ransom note appears, the attacker has been in your environment long enough to probably understand it better than most of your IT team, and this is why we need
03:58
an architecture that needs-- So we need an architecture that can protect your recovery capability specifically and not just the data. And if we put a number on it, right, this is super expensive if you get that wrong. The average cost for unplanned downtime is around fourteen thousand dollar per minute. Yes, that varies a lot about industry, right, by industry and, and also size of your, your
04:30
company, obviously. But it is material thing. The key point is twenty-one-day recovery window. It's not a technical problem. It is basically the attacker's leverage, right? That window.
04:45
Because it is so large, that's the reason why extortion works at all. If you can shorten your recovery window and your timeframe that you need to hours, then the attacker loses their negotiation position entirely. That's their business case, right, for everything that follows. and servers you can replace, even though at the current times it might be more difficult
05:11
than it was before, right, and more costly. But your data is still the heart of everything, right? You cannot replace your data. If it's gone, it's gone. Not at any, right, cost or, or, amount of money you can get your data back.
05:29
And attackers do understand this better than most organizations do. So here we come in with our published joint architecture from Everpure and Commvault. We have a white paper on this, which is also referenced and in the resources later. We have four layers, and each of these layers has a specific job. They are not redundant, right?
05:52
They cover, they cover different attack surfaces and different recovery scenarios. We will walk through them starting from layer four, which is the closest to production. That's the first line of defense, basically. And then we workOnward, or outward to the world, which is the most isolated and though also the, the hardest one for the attacker to reach.
06:19
There are three main principles that run through every layer. The first one is zero trust, right? No layer trusts another by default, and there are no implicit standing network connections between them. Second, we need immutability and indelibility, meaning we cannot delete or modify data, even not with a compromised admin account.
06:47
And third one is parallelism. And that's actually on two levels, right? First one is we will do multiple processes in parallel. We will do forensics and recovery validation at the same time and not one after the other. And we're also using an underlying storage architecture that is built for massively
07:09
parallel IO, meaning we can run restores at full speed across every blade simultaneously. And that's not just a design principle, right? It's the reason, we, we for the performance numbers we will talk about are possible. So layer four, we start here.
07:31
That's our production layer. It's, it's not just a fallback, it's the fastest recovery path for the workloads that, absolutely cannot wait. We use Commvault IntelliSnap, which quiets the application, triggers a hardware-level snapshot directly on the flash arrays, and then manage this all through the same
07:54
Commvault policy framework that you already use for your backups. If you do a recovery, it goes directly back to production. So from the snapshot, which is already on your production array, there is no media agent in the data path, right? There's no separate copy phase.
08:14
That makes it possible to have an RTO of seconds to minutes. And the reason why all of this works is safe mode. Every single snapshot is protected at the array level. So even if the Commvault server is fully compromised, even if the attacker has the admin credentials to your flash array, the attacker cannot delete these recovery points,
08:39
and you can use them for your restores. Any attempt to tamper with that protection requires a formal multi-party approval process, and we will look into how that works exactly. So this combination of IntelliSnap and safe modes, right? Operationally, that's much simpler than it sounds.
09:03
You use the same Commvault policies that manage your traditional backups to also manage IntelliSnap schedules, replication and retention. So there's no separate tool to learn, right? It's a easy process all from within your Commvault. And safe mode is just, just a configuration on the flash array.
09:24
It's not a separate product, and especially it's not an add-on license. It's all licensed within your purity that you have on the flash array anyway. Enabling and configuring safe mode is completely self-service operation, right? You won't need any support for normal restores, or adding protection. You just restore from your snapshots completely independently.
09:48
When you require EverPure support, however, is if there's any attempt that you want to weaken or disable protection settings itself when safe mode is enabled. And that process is called multi-party out-of-band approval process. This one requires the EverPure support plus at least two predesignated safe mode approvers from the customer. Each of these is verified with a personal PIN,
10:17
and that separation is then an architectural one, right? The approval workflow is physically outside of your production. And so even with a compromised admin account, you cannot socially engineer around this process. Another one is, is an operational note.
10:37
Safe mode works like a ratchet, right? So it locks only into one direction, and you can always do, increase your protection via self-service. You can do more snapshots, longer retention, a longer eradication timer. You don't need any EverPure support for that.
11:00
If, however, you w- really wanna reduce your protection any direction, so you wanna do fewer snapshots, shorter retention, or a shorter eradication timer, or even disable the schedule at all, you require this full multi-party out-of-band approval process that we talked about. The good thing is this ratchet also covers replication settings.
11:24
So increasing replication, again, like increasing fr- frequency or retention, self-service. You just decide I want to have more protection. But if you wanna disable replication or reducing the frequency, or the retention, right, equally sup- requires the EverPure support and your two safe mode approvers.
11:47
The replication piece therefore basically is a protected additional fault domain, right? Snapshots can be replicated to a second flash array at a different location, and you cannot have that replication silently disabled by any attacker. So what I mentioned basically before how attackers work, right? If they understand your fault ways, they cannot just disable your replication without
12:12
going through that process, which they can't go through.So very, very important for this to h- to work, right? You need to activate safe mode beforehand because it only protects snapshots going forward after activation. So retrofitting it like, "Oh, we have an attack in progress.
12:33
Let's just enable safe mode directly to, to protect everything," won't work, and that's not an option. So while we have layer four really for y- our tier one workloads, which we need in minutes, right? What about the rest of the environment? We still have lots of file servers, secondary databases, application tiers, and that is
12:57
layer three. So FlashBlade is here our ideal Commvault backup repository, either via NFS or, even better, with object storage S3. There are two specific Commvault optimizations that make big difference here too. We have First, we have the storage accelerator. Commvault clients can directly write to a FlashBlade object storage.
13:20
They bypass media agents entirely, so you again remove that bottleneck from your data path. Of course, that means you need some processing power on all of the clients. But if you have a lot of clients, right, everyone takes a little bit of this load. And the second is, if we go back to restores, we have range reads, which is instead of
13:44
streaming full backup files, Commvault can request only specific byte ranges it needs for a restore. If you combine that up and have it in a, in a reasonable setup with hundred G networking, you get up to two hundred and seventy terabyte per hour from FlashBlade S repository. That's the high number, of course, and you can get even get higher if you wanna use four
14:08
hundred G networking, which is usually not the case for, for backup use cases. And you can start very low also, right? Or not very low, but a, a lot lower if you just have like a single chassis and, maybe just forty G network. Then usually networking will be, will be your bottleneck.
14:29
But this speed would be the difference, right, between a three-week or a three-hour restore. So a few words on why the hardware matters here because y- that, that is often overlooked. So FlashBlade is not a repurposed disk with just a little bit of flash bla- flash cache on a tier on top, right? It's a purpose-built scale-out all-flash platform, and it's built for parallel IO
14:58
across every blade and every component within the FlashBlade. So that exactly gives you the restore and backup speeds, and these scale linearly where while you add capacity to the FlashBlade. There is no spinning disk, seek latency, no tape rehydration delay. And on top of that, you meanwhile get even deduplication at flash speed.
15:26
EverPure's DeepReduce technology is on FlashBlade E and on FlashBlade S provides a post-process intelligent similarity-based deduplication to reduce the capacity that you need to store your backups. As it's a post-process, right, it has no impact on write speeds. It can affect read speeds, though, on highly deduplicated data- deduplicated datasets.
15:56
But even so, FlashBlade's scale-out all-flash parallel architecture gives you so much speed that the impact of this is really reduced to a minimum. And if you have two hundred terabyte per hour line speeds, right, then we're talking about something. But before, you will surely meet more often a network bottleneck before you meet the performance bottleneck of the FlashBlades.
16:23
So you're not sacrificing your RTO, but you still get storage efficiency. We also have a real-world reference for that speed. We have, AVNI, which is a large contact center operator, and they recovered five hundred servers in three hours using our architecture, and their previous expectation was three weeks for doing that.
16:49
So while layers three and four give you lots of speed, layer two gives you especially one thing, and this is confidence. Before anything goes back to production, you need to know that it's actually clean, that you're not reintroducing the malware along with the data. We're using the Clean Recovery Zone.
17:10
This is an isolated recovery environment, and it's completely logically separated from both production and the vault. There is no share- shared network path, right, on shared management plane between them. In the full architecture, the, the IRE, right, the isolated recovery environment, I'm using that word a lot, so I will, use the abbreviation IRE.
17:34
So the IRE data stores run on a flash array, right? And that's obviously safe mode protected. Then the recovery CommServe will orchestrate all of this. And this CommServe re- this dedicated recovery CommServe runs in a secure management zone. It's pre-staged, always ready.
17:54
It's air-gapped from production. And this is where our EverPure hardware story basically continues from layer four to layer two because we still have flash speed recovery inside the IRE and not just in production. There might be cases where you cannot commit to a dedicated on-prem IRE, right?
18:15
Hardware, is expensive. You need the space and stuff like that.If it comes to that, Commvault Clean Room Recovery is an alternative for that. This provisions an on-demand cloud-isolated tenant, so you don't need this pre-staged hardware.
18:33
This is also useful for continuous recovery testing, right, and for smaller environments where, yeah, you just cannot justify a permanent IRE environment, so an isolated recovery environment, a doubled environment, and thus on. but if you really want to have this production scale fast recovery, right, with the performance numbers we talked about, then you definitely need to be on-prem, right?
18:59
Because also a restore back from the cloud would be super, super expensive if you're not restoring to the cloud anyways. But let's understand what actually happens in, in this isolated recovery environment. This serves three purposes, and these three purposes simultaneously. First one is data cleaning and data validation.
19:28
You use Commvault Threat Scan to identify clean recovery points, and it validates that applications can start cleanly from it. The output of this is called the clean point. Second, you can have forensic support. Usually, forensics will work in your production environment, right?
19:48
They, they're investigating and understanding the attack. Probably there is even government agents coming in and, yeah, taking hardware, whatever. but the IRE gives them a safe and isolated space where they can work with copy of the data without re- without risking to reinfect the restore. And third, you have the IRE for disaster recovery.
20:16
You can use first critical core services directly, run them directly in- inside the IRE while your production is still down or being cleaned. So you start your restore already while you're still cleaning. The IRE is though not just a staging area, right? It can carry these live workloads with the power you gave the IRE, right?
20:41
So with the amount of service, with the, with the amount and the speed of the storage. Again, speedy storage comes in very handy at this point. Then if your production is really cleaned, validated, and ready, right, and you hopefully have fixed that security hole the, the, the, the malicious attackers came in, right? At that point, you can have your most critical servers running in the IRE
21:11
falling back to production. And again, this is where our layer four comes in. So sna- a snapshot speed, and you have a near, yeah, near instant cut over instead of a, a longer bulk restore operation. As said already, right, before anything goes back, you wanna have Commvault Threat Scan run
21:37
against your backups inside the isolated recovery environment. It checks for known malware indicators across all the backup generations and flags infected copies. That works across all versions, not just the latest one. And that is quite critical, right?
21:55
Because if you think about your attacker had a dwell time of possibly three months, so your most recent ninety days of backups could all be contaminated. Threat Scan helps you to identify which generation is actually a clean backup, because infected copies are flagged, right? So you have the full visibility of what you can safely restore and whatnot.
22:20
The output, as said, is a clean point, and that's a verified malware-free recovery point that you can safely restore from. This also generates a full audit trail, so you can hand that directly to regulators as evidence for a validated controlled recovery process. Threat Scan and Clean Point, are standard Commvault cloud features, so there's no
22:43
special license tier required for that. The licensing, consideration that you have to take is though on the IR- IRE side. So if you wanna use the Commvault Clean Room Recovery for the cloud-isolated tenant, right? So if you from the last slide, that would require the Commvault Cloud Cyber Recovery licensed tier. So the outermost layer, layer one.
23:10
This one is the one that survives, right, even if everything else is compromised. That's our vault. The storage layer here is a FlashBlade, maybe FlashBlade S, if, if you want to have that, really that speed, and that's a good one. And we use S3 object lock. We enforce warm immutability at the bucket level, and we use safe mode protection again.
23:36
Once data is written into the vault, right, it cannot be deleted, it cannot be overwritten until the retention period expires, and that not by an admin, not by ransomware, not by anyone with production or, admin credentials. One, on the, on the physical separation. So basically, a logical separation with proper network controls, everything like that, that
24:01
defeats ransomware already. But a second physical side adds disaster resilience, and for most compliance frameworks like DORA or NIS2, this is also required, right? They mandate physically separated backup copies, at least for regulated entities. So if you fall under DORA or NIS2 regulation, you need to have that second site
24:23
anyway.How network isolation and access control are enforced, we'll look into. So we have object lock and safe mode. I mentioned both of these, right? And these two come together and, constantly get confused. So let me be precise.
24:42
Object lock on FlashBlade enforces immutability at the S3 protocol layer. So that's a Vorm policy on the bucket. It comes in, in two ways. One of them is governance mode. This basically allows then from the admin side with the right permissions to override this
25:03
under controlled conditions. This is not what you want, right? You want compliance mode. Compliance remote removes this option entirely, so nobody can delete the data before the retention per- period expires, period.
25:20
safe mode is a completely different mechanism. This operates below the admin plane at the array level, and that works both on FlashArray and on FlashBlade. So even if a fully authenticated array administrator, right, highest permissions that you can get, they cannot delete safe mode protected data without the time-delayed multi-party process requiring
25:41
EverPure support, right? And this is not a software policy, right? You cannot just disable it with the right credentials. It's enforced below the admin plane. At FlashArray, we already saw that in layer four, right?
25:56
It protects your production snapshots. And on FlashBlade, it adds a second layer to prot- of protection on top of object lock, for the backup data in your vault. These are not alternatives, right? Object lock is protocol level, and safe mode is on the array level.
26:11
So you want both because they defeat different attack vectors. And of course, the vault only works if it's actually unreachable during an attack, right? So zero trust is here really not just a marketing term. It means that you have no persistent network connection from the production to the vault
26:36
under normal operations. Production never initiates a connection to the vault, right? The vault pulls data from production. We, we need that data, right? We wanna have a copy of our current production data, of course.
26:47
But you do that on a schedule, and you manage that from a secure isolated management zone. And this network path should only consi- exist first in one direction, and you time box it. So if a production is really compromised, the attacker has no path towards the vault, right, because none exists from the production side. Access to the vault itselfs, requires again, multiple independent pro- controls, right?
27:18
You need MFA, so multi-factor authentication for, for authentication. You have the multi-person authorization for critical operations. you have a role-based access control RBAC. and of course, you should have a PAM solution, right? So that all your privileged sessions and, and every access to the vault
27:39
is locked and auditable. and you can, can also refer to these locks. Of course, all kind of unusual behavior, right? If you have unusual replication behavior towards the vault, like large scale deletes or tries of large scale deletes, unexpected replication volume changes, something like
28:00
that, that should trigger alerting, right, become-- before it becomes really an incident. Now, let's look in, in what happens during a ransomware incident, because this is where a lot of other vendors, oversimplify. Step one, right, what you need to do is isolate. You need to isolate that attack and contain the blast radius as much as possible.
28:28
You need your production intact for forensics, right? And, this compromised environment is evidence, so you cannot start, immediately start restoring over it. Step two is then running two work streams in parallel. Your security team investigates the production, right?
28:48
And your IT operations team pulls backup data into the, the isolated recovery environment. It scans it. It identifies the clean recovery point and validates that applications start correctly. And the good thing here is these two work streams do not block each other, right? They run in parallel.
29:06
This gives you a lot of restore speed, basically for the whole workload recovery time window. And step three is the cut over. So once forensic signs off, you restore. The production is free, right?
29:21
You have your tier one systems that you run probably in the IRE, and there is a near instant active DR direction flip from the FlashArray, which runs in the isolated recovery environment, back to your production FlashArray. So-- And for all the secondary workloads, you can restore them from layer three, right, up to some hundred terabyte per hour.
29:49
To be very honest here, right, a serious ransomware incident still takes hours to days to fully recover from in a good case, right? The architecture changes your confidence, it changes your prioritization, it changes the audit trail. It will not change the fundamental reality that a large scale recovery takes time.
30:12
But if you have the possibility to restore in seconds, minutes, hours, right, this reduces your overall workload recovery time dramatically.We designed this architecture not around compliance, but it was designed around recovery. But if you do recovery right, right, all of the compliance follows naturally.
30:41
We have DORA and NIS2 as the big regulations here, and they are asking the same fundamental questions. Can you actually recover, and can you prove it? You have immutable copies, you have an isolated environment, you have tested recovery processes.
30:58
And these are not checkup, checkbox items, right? They are in the architecture. And the compliance evidence is then basically just a byproduct of doing the right thing operationally anyway. This gives you the confidence, right, that your architecture and processes are actually
31:15
working, and not just on paper, but in the real world. And I, I wanna emphasize this testing angle on this one, right? This is not just for the auditor. modern regulations require regular testing. DORA, for example, requires this at least yearly.
31:33
But the real value is that regular testing lets you sleep better at night, right? And it means you're generally better prepared for when a real incident happens. there are two layers of testing here. So, one is automated restore and validation tests. You can, with an isolated recovery environment, you can test these automatically and
31:59
continuously against the IRE without touching anything in production, right? And with the all-flash storage underneath, a test would not take days, but it just takes some hours. So you can do that very often, very repetitive. but what you also wanna test is the full incident response process. And so that's people, communication chains, escalation paths, decision-making under
32:29
pressure, all of these things, worst case scenarios. And you need to exercise this as a drill periodically. not continuously, right, because that's a lot of effort, a lot of manpower to put in it. It's not automatable, but at least deliberately. So if you test more often, right, you catch gaps earlier, and when something real
32:54
happens, you're not running that playbook for the first time. So compliance gives you a framework for all of this, but our architecture gives you confidence that this framework is really backed by something real. There are five decisions, that should happen before an incident and not during one. First one, your vault location, right?
33:22
Do you put that on the same data center with just logical isolation, which is, as that, sufficient to defeat ransomware, or do you have to n- or do you need a second physical site, right? You might require that because of compliance, and you definitely need it, right, if you wanna protect against site level failures. So you should know what you need, right?
33:43
There's no best answer for that, but just think about it and, and decide what on your requirements. Second, testing frequency. So as a DORA minimum is annual, quarterly is probably a practical operational cadence that can give you much more audit evidence on the one side, but as
34:07
that, also the confidence that all your processes work and you're prepared. That goes for the big tests, right? Including processes, communication metrics, et cetera. but you want your actual restore tests, right, or restore plans, so the real data restores, tested to be run automatically and continuously, right?
34:29
So you don't have much operational overhead here. It just needs to run and pop up, "Hey, something didn't work here," which you then can directly fix so it works it- when you need to plan later on. Third one, so this is not a priority list, right? But, third one, your tier one workload list.
34:48
So if you do not have one today, that's the first thing you need to build. You need to understand which systems need your ZAP to our RTO. and these need that layer four treatment, right? And that's tier one databases, payment systems, systems from factory which are required for production, these kind of stuff, all that lead and that keeps your company running.
35:19
Fourth, you should check your Commvault license, so especially if you looking into the clean room recovery, which would require the, the cloud cyber recovery tier licensing. so if you're planning to rely on that, you should factor in the upgrade of the license if you don't have it yet. And fifth, very important, safe mode.
35:43
I cannot emphasize this enough, right? This is just a configuration. It's not a separate product. Nothing you need to, to buy. safe mode needs to be activated beforehand.
35:54
You cannot enable it during an active attack. That's-- then it's worthless. If you're an EverPure customer already, activate it now, right? You can start with, like, very low protection because you can increase protection in, in self-service anytime.
36:11
And only when it's activated, it will give you that safety net, right, that you want in case of an accident, or incident. and if you're not yet an EverPure customer, right, please reach out to, to your EverPure partner or your EverPure representative and get access to our powerful, yet simple technology.That brings us to the end of the, of the
36:39
structured part. thank you for staying with me through, through this architecture. if there's one thing for you to take away of this, right, it's that arch- the recovery speeds, which is, like, the, the primary thing we want to talk about, right? This is an architectural decision that you make today, and it's not a problem that you
37:04
can solve when you are hit by ransomware and that ransom note appears. The white paper behind all, all of this architecture, right, is, I linked it directly on the slide, and, I think Ariana will put it in- into the, resources here also. it's a joint pub- it's a published joint reference, right, from EverPure and Commvault, and it's definitely worth a read if you wanna go de- deeper in the topic.
37:29
And I saw there were some questions popping up during the session. Let's go through them. Okay. Quick question on Threat Scan. Does it only check for known malware signatures or does it also catch things like
37:53
zero-day ransomware? Gonna answer that live. A very, very good question. and no. So Threat Scan goes beyond signature scanning. Threat Scan itself runs multiple detection engines in parallel. So you have sig- of course, you have signature-based, antivirus scanning, but you
38:13
also have behavioral heuristic scanning, entropy detection, right, and, and encrypted file changes, these things. And one very important is YARA rule matching. So it's, it's not just, okay, do I recognize this file, right? But it also looks like, okay, does this file look like ransomware, or does this look like
38:33
ransomware behavior, even, even if I don't know or haven't seen this ransomware yet. but the bigger point basically here is really that, Threat Scan is a completely separate scanning engine from your, from your live endpoint antivirus scanning, right? It does not replace it. So it adds a second independent layer that runs against your backup data, and because it reads from, from your cold backup data, right,
39:05
you have some advantages. First one is the malware is never loaded into system memory, so, and, and never executed. That means the scanner is not really at risk of being, like, compromised at the time of scanning. And, and the, the second thing is because you're testing or you're, you're scanning on
39:26
live data, you, not on live data, pardon. you're scanning on the backup data, you can do a more thorough scan, a more intense scan, than with your live antivirus, right? because it has no impact on your running workloads, and you basically don't care about how long it takes to scan a backup copy.
39:48
So not, not really true, right? But it's an, it's a difference for a live system and, and a decision needs to be taken in milliseconds because you don't want to be, like, blocked by, by an antivirus for seconds while, you can take seconds for, for a scan and a decision in a backup copy, which is, a factor of, let's say, some, some hundreds, right?
40:14
it takes longer, but it is not really taking long. And where it Sorry, that's a very long, long answer. Good, very good question. I like it. also there, there's something that's really powerful in the, in the post-incident scenario.
40:32
if your forensics team is, is analyzing, your, your production, right, and it finds, like, these indicators, of, of compromise, that your live antivirus missed, right? That nobody's seen because they've been in your environment for weeks, right, and nobody notices. We, we need to think about this case, right?
40:54
but if they find, found something then later on, they can write custom YARA rules, based on, on these indicators. And if you take these YARA rules, you, import them into Threat Scan plan, and then you can run these across your entire backup chain, right? That exactly will help you to identify this clean point where this indicator of compromise
41:21
was not yet in the backup. And also there's, there's more, right? I mean, timing works in your favor because the, the scan on your restore will usually be a little bit later than on the production. So again, there's a chance of updated signatures and, and, AV vendor updates anyway.
41:43
but it, yeah, caveat, no detection is one hundred percent, right? There's a lot of things that help you, and, and this combination, all of these things will definitely help you to, to improve your scanning. Oh my God, sorry, that was a long answer . Okay, that hopefully answers also the Commvault's uses signature-based scanning, to
42:09
scan the backups. So I hope this one is too. And then, okay, yeah, I, I talked a lot about RTO of, of snapshots. and what about RPO, right? How often can you realistically run snapshots without impacting production? Also a very good question.
42:35
so of, of course, right, RTO, I mean, I'm, I'm talking about resource speed is the most important thing, so that gets all the attention, but RPO is equally important. so the recovery point objective, right? So how much data can you lose, um-For, for an incident, right? And snapshots are, are really cool here because they can run, like, as every few
43:00
minutes or so, and you can orchestrate them with Commvault IntelliSnap policies, right? snapshots are super space efficient. you only store the change blocks, and a practical cadence for, for your Yeah. It depends a little bit on change rate, how long you wanna keep them, right?
43:22
But typically would be possibly something like 15 to 60 minutes for your tier one workloads, and, and that's realistic and manageable. Hope that answers this one, too. this one: What happens if my main CommServe is taken out in the attack? How do I orchestrate the recovery workflow?
43:48
yes. Al- also a great, great question. this is here we are-- This is also overlooked in planning often, right? you would use your DR CommServe. So I mentioned that before. In, in your secure isolated management zone, you have a disaster recovery CommServe, right?
44:08
That's completely outside of production. And you have a configuration backup stored, and protected in the vault, so you can restore cleanly even in the worst case scenario that all your production is gone. again, going back to what I wanted to emphasize before, testing. So this is really super important to test deliberately, not just your data restore, not
44:34
just, "Can I restore this VM? Does it just boot up? Does the application start?" That's fine. That, that needs to be done, definitely. But you need to test the, the whole recovery scenario. Yeah. What happens if I lost access to my building?
44:50
What ha- happens if I, lost access to all my, my passwords? if, if I cannot do anything of that, right, how can I rebuild my recovery environment from scratch, right? and, and that's really where, like, pen and paper comes back into play. So you need to have your disaster recovery plan printed out in a current version, right?
45:15
Most important, passwords, IPs, whatever credentials need to be somewhere on paper safe in, in, in a safe, hopefully, right? In a vault. In a real-world vault. but this is definitely nothing that you wanna test the first time if you have a real incident, but you need to test it before.
45:38
Do you propose replicating safe mode snaps into a FlashArray IRE FlashArray to aid fastest recovery? Yes, absolutely. if you wanna have that fastest recovery, the IRE recovery, right, that's the fastest way. You replicate the snapshots already into your IRE. In the IRE, you have the possibility with threat scan to, again, identify the,
46:01
the proper snapshot, the proper backups, the, the point in time which is clean, and then you can directly start these, these tier one workloads in your IRE from the snapshot. So that's the fastest way to do. Obviously, right, it, it goes into seconds, right? You can just start it again.
46:20
But if it is compromised, right, you need to do some cleaning before. So there's a little bit of a, of a caveat to that, right? You-- While you still can restore the data in seconds, your whole restore process will still take probably some hours for your tier one, applications to make sure that this is a clean point, that you cleaned it probably, that you identified what, what was the attack and these
46:42
kind of things. Okay. so that were all the questions. I If you have more, just type them in. Otherwise, I mean, for anyone who wants to continue this conversation after today, feel free, please reach out to me directly.
47:00
My email is on the slide. I'm generally happy to, to go deeper on, on any of this, right? one-on-one, we can discuss any, like, specific architecture question, sizing discussions. But just a follow-up if, if you have, like, a follow-up question on something I-- that didn't make it into the presentation today.
47:22
So again, thank you very much for your time, and I wish you the very best for the rest of your day and week.