How Do We Monitor Applications?
One of the leading questions I hear when creating a strategy for Application Performance Monitoring (APM) is, "What do we monitor?" The traditional process for issue root cause analysis involves log crawling, uncorrelated hardware metrics, tribal knowledge and disruptive war rooms. Get an outline below for simple questions that can alleviate the anxiety of taking the first steps towards true APM.
Where do we start?
The hardest part in doing anything is the first step. We know the least amount of information about what we want right now and can only learn more by doing and assessing what will be kept and what is dropped moving forward. Luckily, at WWT we have experience in building monitoring strategies from scratch and have learned some similarities across the solutions.
Immediately below outlines the one main question that, when consensus is reached, can answer the majority of other downstream questions.
What makes your application successful?
Step one: get everyone into a room. CEO, developers, testers, project managers, application users — everyone. Brainstorm the high level behaviors the application facilitates and refine the boundaries of their definitions until you have a hand full of agreed upon milestones that the application represents. A good rule of thumb is to think how the application supports the business and try not to get into the weeds of conditional paths.
There are always measurable metrics that explain how the application is performing. If we can identify truly what makes an application successful (what justifies keeping it funded), then we can create heuristics from objective measurements to verify success and baseline — and track over time. This approach allows us to focus on the top echelon of behavior that is supported by everything else.
Take, for example, a business to business (B2B) shipping company. This business most likely utilizes software to automate the majority of the inventory, scheduling and shipping processes. We can define the success of these processes based on specific goals, such as timely delivery, up selling, up-time, downtime costs, maintenance costs and many more — but it is up to the stakeholders to define, as the needs of each company is different. Normally high-level leadership have an idea of what success looks like, and we can use that as a starting place.
The next task is to achieve consensus around success measures so we all look at the same source of truth. Are total number of orders the best metric to determine success? Do we track abandonment rates when up selling?
When we all can agree on what makes an application successful, we can bring in the engineers and begin to find technical invocations that map to the agreed upon heuristics. By framing APM with the above questions, we are able to focus the strategy on measurements that are meaningful to the business.
What makes your application healthy?
This is what most people refer to when thinking of APM, as application health represents the physical measurements of behavior and interactions with other systems. Even so, explicitly defining what it means to be a healthy application takes forethought and care.
A good analogy is to think of an application like a building — it has roads that allow external traffic to and from and entrances and exits for internal traffic. Walking up to the door can represent a 200 percent success for hitting the page, but can the user “walk in” and experience the intended behavior? Each milestone has different thresholds and measurements that, when aggregated, present a picture of health for the application.
Our goal is to monitor an application from the top down, opposite from the traditional bottom up strategy. Instead of monitoring hardware metrics explicitly, we identify well known critical entry points, time these behaviors when they enter and exit specific run-times (code), then correlate to foundational hardware metrics. Modern computing allows us to defer ownership of the hardware layer, which puts more focus on the code running instead of memory and CPU consumption. By measuring from the user perspective, the picture of health is more reflective of real world behavior.
Once the critical business behavior is defined in AppDynamics, the next step is to begin visualizing this flow in order to quickly identify issues.
What does the result look like?
It is difficult to create a standard for all businesses, however if you follow the steps above you will have a list of critical behaviors that can be monitored by AppDynamics. In the end, how these metrics are displayed is up to your imagination!