?

TEC37 E06: AIOps – A Technical Deep Dive

27:50
32
Plays

We conclude the AIOps three-part series with a technical dive into the AIOps space. Our experts discuss our partnerships and the technical complexities we deal with in today's world.

Robb Boyd:
Welcome to the TEC37 podcast, your source for technology, education, and collaboration from World Wide Technology, and for today, AppDynamics. Together, WWT and AppDynamics enable you to deliver the application experience your customers demand today. My name is Robb Boyd, and on tap today, it's part three of our APM and AIOps series. The first two episodes can be found wwt.com. Now, you may find it helpful to have watched these before today's show, but based on feedback so far, many of you have already and are quite anxious for today's final act because we're going deep. We're going specific.

Robb Boyd:
So I want to provide a little context based on what we have learned so far. AppDynamics is well known for application performance monitoring or APM. They can play a big part in a successful AI Ops deployment. However, as the team is going to cover next, there are multiple solutions at work and they are but a small subset of the hundreds of companies offering some kind of AIOps expertise.

Robb Boyd:
So today's use case examples should not be considered exhaustive. In fact, I would add that what you're about to see exemplifies the work that WWT is doing and has done for years, which is to qualify, to build, and test the best multi-vendor options for solving complex customer issues.

Robb Boyd:
Back with us again, we've got Tanner Bechtel, Global Director for AIOps and APM with World Wide Technology, Arsalan Lari, Lead Technical Architect with WWT, and then also Ben Haddox, Global Partner AIOps Lead with AppDynamics.

Robb Boyd:
And so Ben, I'll start with you. You and Arsalan have been leading an actual AIOps build out within the World Wide Technology Advanced Technology Center, the ATC, and I believe you've got some details to share with us based on what you've learned and what you guys have been building. Is that correct?

Ben Haddox:
That is correct. And in fact, when you look at the AIOps landscape out there, there are a lot of vendors, a lot of companies doing AIOps, but the approach we're taking with WWT is to get out of that siloed AIOps approach and look at it from an enterprise-wide full-stack coverage of AIOps. In order to do that, you do need multiple vendors communicating together and working together, and that's where WWT comes in because they're able to coordinate all of this and pull in the resources that are needed, especially through the ATC, to be able to make this work.

Robb Boyd:
Perfect. Well, I understand you've got a flow. I don't know what the right way to reference this document, but I wonder if you could set this up appropriately and give us an idea of what we're about to get into.

Ben Haddox:
Yeah, so we have a reference architecture that we put together with WWT that takes all of these technologies and puts them together to help get this full coverage, full stack, full enterprise-wide AIOps. So what you're looking at here is you've got your security infrastructure application, the three core components that support the business that companies are doing.

Ben Haddox:
In traditional siloed approach is you have different tools that monitor these and work in a siloed environment. What we did using AppDynamics, and then using Moogsoft over an event correlation, along with ServiceNow, was put everything into an application centric view of the entire stack. And we're going to walk through some use cases on how this all works together, but really you're taking a metrics view of the world through application point of view with AppD, and then you're taking an events view of the world with application centric with Moogsoft and combining those together to get a full AIOps coverage, along with other tools in your environment.

Robb Boyd:
Okay. So this diagram here goes from this siloed all the way into that application centric. So the idea is here to eliminate all the different points of view that are coming from different directions, can sometimes be overwhelming. A lot of this represents data that's probably already on the network, already coming from the various things customers have. And then, you're going to show us how you guys in the labs have started to consolidate things so that you can get one dashboard to give you a better setup, perhaps to take better actions, better visibility result in better automation opportunities and such, I would imagine.

Ben Haddox:
Correct. Correct. The whole idea is you can collect data, but when you correlate that data together and put it to a common reference or a common language of an application or an end user, that's when you can really start to make meaningful decisions. Tanner covered this, I think, in our last podcast, where it's not about replacing people and it's not about taking jobs away. It's about collecting all that data and making it meaningful that people can actually make better decisions and spend their time iterating or developing new ways to do business instead of just trying to maintain what they have.

Robb Boyd:
Arsalan, you've got our first use case to walk us through. I wonder if you could set this up?

Arsalan Lari:
Absolutely. So here we have the first WWT AIOps architecture overview. In the last podcast we talked about the siloed approach. This is what the current industry standard is throughout any organization we're talking about, right? We have everybody from networking, from security, from infrastructure, and app owners, all talking about different things. They're only focused on what they own and that's it.

Arsalan Lari:
So as you can see in this overview, we have security just worrying about the security tools. We have infrastructure team just mainly focused on their own tools that they want to focus on for hardware and infrastructure. Same with the application owners. They only care about what's going in the PM world.

Arsalan Lari:
So in the new approach, the modern approach I guess we're talking about, it's AIOps, right? It's everything integrated together. A lot of the companies already have these tools. The only problem is they're not integrated. It's not a single pane of glass, per se. And it's not correlated together, right? So you're getting separate stories from each tools each piece, all together. So let's go and do a breakdown of it.

Arsalan Lari:
So we have here, we have event correlation to gather all the events that are happening through all the tools. We have CMDB, like ServiceNow, for example, where we're talking about your ticketing, your change management, and then your application performance monitoring, that's your APM for all your front end applications and your backend applications, all talking to security, networking, resource management, security monitoring, and then infrastructure monitoring.

Arsalan Lari:
So it's a whole cohesive story on what the journey is for a user. Since we're talking about application centric, it's a journey of what the users from top to bottom. So we can talk about the first use case. So imagine you're investing to stocks, right? So if you're in a bank portal and you see a lot of problems going on, so in this case, we have in AppD, we have an application that we have installed, but we see a lot of red errors going on.

Arsalan Lari:
And if you look at it, the flow map is not looking too good right now, it has no one on top of the spiking. You've got errors spiking as well, response time is also going up, which is not good. Right? So as you can see, we can go into the network dashboard part of it, where you see specific network layers causing problems, right? So we can look into what's going on.

Arsalan Lari:
So now, if you go back into, look at the overview where we are, so we are in AppDynamics, we've got network issues going on. So we have a network integration going on, but with that dynamics of extra hop, right? So we've got the data flowing that way. And in the extra hop you'll see a bunch of networking errors going on right now. So you've got all the specific DNS areas that we're seeing, and then we're also seeing areas coming correlated from AppD as well.

Arsalan Lari:
So if you look at the overview again, you'll see extra hops also integrated with Moogsoft. So we've got data going to Moogsoft as well. So in Moogsoft you've got a full on, everything, all the situations bundled together. You've got several alerts inside of them, all embedded within. So you're not seeing a hundred different alerts in an email sitting out there. It's all correlated together, so that way you can easily manage everything.

Arsalan Lari:
So the next step would be for us to go ahead and create a ticket in ServiceNow in Moogsoft, it's already integrated with ServiceNow, so we can talk about creating a ticket and we can go ahead and go troubleshoot. We can, at that point, start troubleshooting. What's going on, right?

Arsalan Lari:
The next phase of it, we talk about the automation. So automation is a big piece of AIOps. What we can talk about that at that point is the network issues we saw in the application earlier. Well, what we need to do is automate everything. So we're reallocating the application to another segment, network segment, that way, while the technician is looking at it, they can keep investigating the problem. And the application is at a healthy state so the users are not being impacted.

Arsalan Lari:
So if you're investing to stocks at that point, and you've got a certain number you're trying to hit, you won't be impacted at that point. So as you can see, the network is healthy again, and the technicians keep troubleshooting their problem. And the user is not impacted, so.

Robb Boyd:
Tanner, I'd like you to weigh in on... So talking through, you oversee the group and Arsalan and others have been working on this, how well does this represent what you guys have built out and what customers are asking for, in terms of the value being provided? How would you describe what we just saw?

Tanner Bechtel:
Sure. That's a good question. The value, we talk about the tech, we talk about the build, and the tech, and what's out there, and how it integrates. And we tend to look at things through the lens of technology inter-operability, and I think at the end of the day, which is right at the tail end, when he's talking, I thought, time. What do we give organizations back? And I think if you had to boil it down to one thing, it's time. 

Robb Boyd:
Yeah.

Tanner Bechtel:
It's time. And I'm talking about all kinds of time. The amount of time from the moment a problem happens and the panic sets in to the moment when somebody, their portfolio is square and their trade went through. Or it's time because I might be somebody who's in charge of monitoring at the corporate level. And what does that entail? It entails everything. Everything from hardware routers to the actual end user experience.

Tanner Bechtel:
How in the world do I get all that together and make sense of it all? And the faster I can do that, the more time I have to make high-value decisions that affect the customer or the business. And so, AIOps for us, and it's a constant push and pull, I think, and we have to see it in two ways. So today it's very tech, when we're talking about reference architecture and independent tools, but we always have to keep in mind that we're doing something that takes an idea to an outcome. That's a worldwide phrase, a slogan, per se.

Tanner Bechtel:
Our goal is really to take all these tools and integrate them in a way that accomplishes a business objective. And I'd argue that AIOps is maybe one of the first things that's ever looked at the way the business runs and the way the customer, the customer, I'm not talking about the customer segment or the hundreds of thousands or millions of customers. I'm talking about a customer. It's the first time we've taken the full power of the data center and said, "I want to make Arsalan's experience on this site perfect. I want to make Robb's experience perfect."

Tanner Bechtel:
And to do that, I need to be aware of everything. I need to be able to automate and act on everything in real time. And I need all of these numbers to give me some contextual sense. And that is really where we're going. I hope through both of these use cases, you see the end user is the center of this conversation.

Robb Boyd:
Yeah. I think it's another reason AppD is here too. And so, as we go to Ben, Ben, AppD has focused on, historically, I always think of application performance management because it's the application that is the, and I think we said this in episode one or the first part, which is that's the front door.

Ben Haddox:
Right.

Robb Boyd:
Especially these days when we're all virtual, we're not going through physical doors as much anymore, but this is how everybody's going to see your company. And so, the focus is there. And then, what Arsalan started showing us, and what Tanner is alluding to here is the fact that, well, there's more data coming out of all these different places that would be overwhelming and probably is not even being paid attention to.

Robb Boyd:
So I would assume, Ben, I'm curious, before you launch into your next use case, if someone hasn't been doing what you guys are describing here, the normal reaction to an application slow down or something like that is what? How does a company react? Assuming they notice it, and they notice it in time, is it a bunch of people being thrown at it from multiple vendors, all trying to weigh in with what they think the problem is and big meetings? What is your experience there?

Ben Haddox:
Yeah. So it's funny. My background, I did come from a services organization and we did provide services out. And so, traditionally in that siloed environment, if you go back to the use case Arsalan talking about, the minute something started slowing down an application and there was an impact, you would pull everybody into a war room. So you'd have 40, 50, 60 people sitting in trying to figure out what was wrong.

Ben Haddox:
And the number one goal was to get the application back up and running. And there's a lot of pressure because as the application is down, you're losing business, you're losing revenue. It's costing money and it's affecting the bottom line. So everybody's in this room. It's high pressure. Everybody's trying to figure out. There's a lot of finger pointing of, "Well, whose fault is it? What's going on?"

Ben Haddox:
But with an AIOps approach and having all this correlated, as Arsalan talked about, you start to have an issue, say in the network segments, we can automatically move that application to a segment that's not effected on the network, restore the application within minutes to a working environment, and then give technicians the time to say, "Hey, we know it's in the network. Go take a look at this."

Ben Haddox:
Now, there's not all of that pressure. You don't have 60 people tied up in a room. There's no pressure to get the application back up and running to restore business, as Tanner was talking about, because bottom line we're looking at the business. Business has already been restored and it hasn't even been affected at this point. And we've already identified the problem and have the technicians working on it without them under the gun to say, "Oh, we got to get this up because we're losing money and someone's tapping me on the shoulder constantly."

Tanner Bechtel:
And can I add to that, Rob?

Robb Boyd:
Of course, of course.

Tanner Bechtel:
If I might. The whole purpose of AIOps, in my opinion, monitoring and management, the way we've traditionally looked at it, and that's why APM is the core of this, AppD is one of our core tenets here. One, we can't change what we can't measure. If we can change it, we can see it. And two, our goal is not to find a problem. So right now, everything's measured in MTTI and MTTR, meantime to identify, meantime to recover.

Tanner Bechtel:
I want to get to the point where we start to understand and see things clearly in a way that allows us to prevent them from happening overall. Pattern recognition, automation, using machine learning to start to look at situations and understand what their chronology looks like so that we can do what Ben's talking about.

Tanner Bechtel:
Move to a new segment, change hardware, reposition on app or restore it somewhere, before we ever have those problems. And that's really the promise of AIOps, right? Is to not be better at cleaning up broken glass, but to prevent glass from hitting the ground to begin with. That is really where we're aiming with all of this.

Robb Boyd:
Because it is going to happen but it's all based on what's our reaction to it, and do the customers even notice? Ideally they wouldn't. And I'm always reminded of the meantime to innocence being the other one that I've always heard. Because I have this sign that's always behind me, at least it has been for the series, that Carl, a network engineer friend of mine had made on his 3D printer. But it basically says, "It's not the network."

Rob Boyd:
Because that seemed to be the driving motivation behind using any of these tools, which is for the network engineers, just to go, "Hey, hey, hey. Everything's fine on my end.", because there are so many potential bumps in the road that you need to be aware of. It's this kind of thing that helps us smoothly get an idea, but I love what you're saying, Ben.

Robb Boyd:
And so, I want to see this use case because you mentioned not just being aware of it, but you mentioned the idea of automatically routing around the issue and opening a ticket because it's not a matter of then ignoring the issue because things are working fine, which is how we might historically treat things. As long as it's working, then we thought we were done, but it's saying, "Nope, this has got to be worked on.", but at least it's not affecting customers while you do it.

Ben Haddox:
Right.

Robb Boyd:
Yeah.

Ben Haddox:
In fact, this next use case we're going to walk through, we actually get into what Tanner was talking about is preventing the glass from breaking in the first place and really just doing some mundane task that not only keeps the business up and running, but it frees up resources to be able to do things to improve the business instead of just trying to put fires out.

Robb Boyd:
Got it.

Ben Haddox:
So starting with our architecture here, you're going to see that, again, we'll start over here in the APM. We have an application, this is a financial application again, but this is a little different because what we've done is this application went out, we ran a marketing campaign. Let's say we ran a marketing campaign, and for the next couple of months we're going to give free trades to everybody who signs up.

Ben Haddox:
And so, there will be no commissions on the trades or anything. We're just giving free trades. And so, we got inundated from this marketing campaign with a lot of traffic that we were not expecting.

Robb Boyd:
A big surge. Okay.

Ben Haddox:
Yep. So you're going to see the applications starting to really suffer because of this, lots of reds and yellows that are going on. So coming back, we've got AppDynamics here seeing this. Again, we're sending a ticket over into Moogsoft, an alert, but we're also connected with CWOM, which is Cisco's workload optimization manager. And what CWOM does is it gives us a look at all the stuff that's going on in our infrastructure, all the resources that we have.

Ben Haddox:
But because we have AppD and CWOM talking together, we can actually drill down and look at the infrastructure that specifically supports this application. And right away, we can see we've got CPU congestion going on that's causing this slow down because of the added load going on now.

Ben Haddox:
Right now we have it set up to show us pending actions. So if we're still a little gun shy on doing complete automation, we can have a person come in and actually click on this and say, "Yeah, go ahead and move stuff around to restore this." Or we can just have it, as soon as CWOM sees this going on, we can actually automate it to go ahead and move these containers without us saying they have to. They can do it on its own.

Ben Haddox:
And what happens is we go ahead and say, "Take that action." The application is restored and the whole time we're opening these tickets in Moogsoft and Moogsoft is opening the ticket in ServiceNow. And as the application is restored, we're sending those alerts back to Moogsoft. Moogsoft sees that everything has been restored. It's taking the data from CWOM, dumping it into a ServiceNow ticket and closing it out.

Ben Haddox:
So now, we've had a complete change. We've restored a problem, and we have an audit record of it in ServiceNow without ever having to bug a technician about it. And the business is up and running and we don't have to pull resources off of other important projects that we're doing.

Robb Boyd:
Wow. Okay. Are you going to circle back on this part? Okay.

Ben Haddox:
Yeah. So that's how all of this works together.

Robb Boyd:
Okay.

Ben Haddox:
That's the power of taking this application centric view with metrics and adding event centric with the application and end user in mind and putting that together with everything else in your environment to get that full AIOps. So honestly, that's the use cases that we brought today, but the beauty of this architecture that we've been able to develop with WWT is that WWT clients can come over to the ATC contact WWT and say, "Hey, how does our use case fit over top of this?" And it's plug and play. Give us the use case. We'll put it in here and we'll show you how it all works together.

Robb Boyd:
Yeah. So you guys have been on this journey and I think now the ATC has been around a long time. And in fact, the first time I ever visited the ATC, it was because of a multi-vendor situation. I worked for Cisco at the time and we were telling a story around business disaster recovery.

Robb Boyd:
But it was only at World Wide Technology, which Cisco widely depended on, and I know other vendors are the same way, because it was there where we could truly see everything running in that multi-vendor environment. And so, I would imagine there's a lot of things behind the scenes that, Arsalan, you've probably gone through and said, "Well, this looked promising, but it didn't quite do what we want it to.", at least for the situations that we're showing here.

Robb Boyd:
But I imagine this is just scratching the surface because this is not saying these are the only ones that you work with. You guys have a suite of capabilities and that's the whole point of the ATC, which is to customize for each individual customer experience and answering the problems that they have. Right?

Arsalan Lari:
Oh, absolutely. The one main thing is this is ongoing. We continually research OEM products. We test it out based on our use cases. So we constantly do this. The one thing I do want to add is we do have another demo that highlights integrations with Moogsoft and AppDynamics that are farther enhanced and in depth than what we went through. So feel free to go to wwt.com, take a look on the platform. The demos are there and we are more than welcome to walk you guys through any questions you guys have.

Robb Boyd:
Yeah. I'm glad you brought that up because I watched that demo as well, because it, again, it's always impossible in any of these episodes, shows if you will, even though we've done this in a three-part series, we'll never finish. I know you guys are not finished. You've come a long way in the 11 months or so that I think than when the first time that, I remember, Tanner, you were telling me that it had struck you at Cisco Live, that everybody was talking a good individual siloed game.

Robb Boyd:
And yet that was actually the problem, is it no individual silo could speak to all of AIOps. And so, the whole point is that someone needed to bring this together and you're like, "Why not us?", And so you started building on this. So where do you feel like you are, in terms of customers being able to come in?

Robb Boyd:
Also, I want to bring up, speaking to our audience, you guys don't have to go to St. Louis and physically go to the ATC, although I think it's worth it if you've got a chance to. Obviously, this is World Wide Technology, and these guys are available, all this stuff's online, and the platform, as they call it, at wwt.com will give you access, of course, to the videos and the resources, but also a lot of interaction, self-driving resources, and things like this that you can go into.

Robb Boyd:
You guys are very education-focused, Tanner. And so, I don't know, how do you feel? As we wind up episode three of our three-part series on AIOps, do you feel like we're telling the complete story, as of now? I imagine you're going to have more to say in the future.

Tanner Bechtel:
Sure. As a technologist, as a software engineer, my whole life, where are we? We're always just starting, right? We're always just beginning. We're so much further ahead than we were a month ago or a year ago. But AIOps is a journey.

Tanner Bechtel:
I think one of the things that led us to where we are, to answer that in context, is because I couldn't figure out where we were. It's like knowing that you're traveling down a path with no map. I knew we were getting somewhere, but I had no idea where we were getting. What's our destination?

Tanner Bechtel:
And then, I realized that I don't know that anybody really set the destination. They just said, "Let's go. Let's go along off on a hike." So what we have done is we've tried to map it out. We know what we're trying to get to, and we know where we are, and we are so much further ahead than we used to be. And the great advantage of that, and the great advantage of World Wide, I think our strategic partners would tell you this, is that we have the ability to be very objective.

Tanner Bechtel:
Arsalan and I have gone through lots of vendors we love, lots of partners we love and said, "This is awesome. But also, so is this." We've got to make sure that we're opening the door for both people, for both vendors or these two are not. As much as we love the people and the partnership, it doesn't fit for our customer.

Tanner Bechtel:
And so, being the voice, the advocate, the ombudsman, of our customer for 30 years this year, by the way, it's 30 years we've been doing this, it gives us a very, very unique position that lends me a lot of faith in what we're building.

Tanner Bechtel:
So we can't do it on our own. We absolutely need the tools and the innovation of partners like AppD, like Moogsoft, tools like CWOM and a myriad of different tools that are integrated into this. But what we do have is a very unique understanding of what the business at large, what our customers at large, seek to find and what they find valuable.

Tanner Bechtel:
And so, us being able to look at AIOps through that lens instead of just an autonomous, AI, software-based perspective, which I think is the industry standard, which, that's what they make. That's what our partners make, it's software. They're looking for that advantage by combining those two together, at being Worldwide together, we always talk about the better together story, right?

Tanner Bechtel:
This is the ultimate, the penultimate, better together story in my opinion, is that we know the customer. We know the objective. We know the architecture. When you bring that together with incredibly innovative machine learning and AI-based tool sets that allow us to see the entire data center through the lens of one person and make that experience great, man, that's the promise, right?

Tanner Bechtel:
When we all started writing software, what was the promise is that you could turn this behemoth IT infrastructure to serve one individual. And we are at the point of being able to tie real numbers of a person's experience to the real numbers of the entire enterprise and data center. And that, to me, you asked me where we're at.

Robb Boyd:
Yeah.

Tanner Bechtel:
We're there, man. We're there, and that's really fascinating for me.

Robb Boyd:
Well, that's excellent. Tanner, I want to thank you. Also want to thank you, Arsalan. I know you've been hard at work and normally you're squirreled away in the lab, or at least remotely connected to the lab, building these things out, testing, breaking things, and fixing them all so that you can make that path easier for the next customer that is reaching out to you guys for help with where to get started and how to take it, which is of course what we encourage everybody to do.

Robb Boyd:
I also want to thank AppDynamics, of course, not just Ben, who's a very good representative here, but also for sponsoring this episode. A big thank you for helping make this possible. A very important key component, and thank you for allowing us to tell the truth behind the story, which is, it's not just about AppDynamics, it's multiple vendors coming together and I think that was well-represented today.

Robb Boyd:
So guys, thank you so much. I appreciate your time.

Tanner Bechtel:
Thanks, Robb.

Robb Boyd:
And guys, thank you so much also for watching. Let me make sure I'm getting on the right camera. There we go. Guys, thank you so much for watching TEC37. It's the technology podcast for technology education and collaboration from World Wide Technology. My name is Robb Boyd. This has been part three of our three-part series on AIOps. AIOps. That's hard to say without messing up.

Robb Boyd:
So be sure and go back and check out the rest of the series, if you missed those already, and then subscribe. Please keep watching and let us know what you want to hear more of. Interact with us online or otherwise, and we look forward to seeing you on the next one. Y'all take care.