AppDynamics Essentials: Anomaly Detection

Other articles in this series piggyback off each other and take some assumptions with your knowledge of AppDynamics concepts. If you haven't read those articles, you may want to go back and do that first.

AppDynamics is an Application Performance Management (APM) platform that provides real-time monitoring of applications to detect anomalies, monitor application environment performance, and collect and analyze metrics. Anomaly Detection is a module that adds additional monitoring resources to the already robust AppDynamics suite of tools.

What is Anomaly Detection, and why should I care?

Anomaly Detection uses the power of AppDynamics and its machine learning to enable automated root cause analysis (RCA). It discovers normal ranges of key entities and/or Business Transaction metrics and then alerts you when they significantly deviate, providing you with the suspected causes. You can then address them through snapshots, the deviating metrics or you can compare a metric's value within its expected range. This helps to reduce mean time to resolution (MTTR) and find root cause for application performance-related problems.

Anomaly Detection feels a lot like Health Rules in that they both alert you to application problems, but Anomaly Detection use the aforementioned machine learning to notify when metrics deviate outside normal. This allows Anomaly Detection to identify a larger array of application problems than Health Rules created by someone.

In addition, it requires no extra configuration. Outside of the default Health Rules, you largely have to configure everything that interacts with them, including Policies and Actions.

Rethink SaaS

Anomaly Detection is a feature that is only available in the AppDynamics SaaS environment and isn't the only feature exclusive to the AppDynamics SaaS implementation. It offers easy scalability, continuous software updates, better metric retention periods and a slew of other benefits.

Is it plugged in?

Anomaly Detection is a feature that has to be enabled beforehand. Let's walk through it.

Start at the Home screen after logging into the AppDynamics Controller.

From the top menu, select the Alert & Respond option.

Once we're in the Alert and Respond menu, we see a number of options, including the Health Rules, Policies and Actions that were defined in a previous article. Select the Anomaly Detection option.

We're now at the Anomaly Detection menu. We're going to use TeaStore application for this article.

Select the drop-down next to the Anomaly Detection title.
Select Applications.
Select the TeaStore application.

Now that we have our application selected, let's turn on Anomaly Detection. Click the switch to "ON."

Wait for it…

No, seriously, wait for it. This is not a trick — you have to wait 24 hours from the moment you turn on Anomaly Detection for it and the Automated Root Cause Analysis features to become available. After those 24 hours, the machine learning models finish training and Anomaly Detection will report on Business Transaction (BT) behavior. Reviewing the list of BTs from the Anomaly Detection window will provide a status of all the BTs that AppDynamics has discovered (including custom configuration) for this application.

The Not Available status means that the model training hasn't finished and the Business Transaction is not yet visible to Anomaly Detection.
The Ready status means that the model training is complete and the BT is healthy.

There are other available BT statuses that we don't see here but may happen during the 24-hour model training period:

In Training: Model training is in progress for the Business Transaction.
Warning: Model training has completed, but the Business Transaction has experienced Warning level anomalies during the training period.
Critical: Model training has completed, but the Business Transaction has experienced Critical level anomalies during the training period.

Model training continues to take place as long as Anomaly Detection is enabled. If for some reason the Business Transaction is interrupted for a full 24 hours, Anomaly Detection will continue to function using model training from the previous seven days. In addition, if a particular BT has very low calls per minute, Anomaly Detection will be unable to perform model training due to the extremely small sample size.

Set it… and forget it.

It requires no additional configuration beyond this unless you want to limit the number of alerts you receive. As a default, Anomaly Detection provides Warning and Critical alerts for all Business Transactions in your application whenever Errors Per Minute (EPM) or Average Response Time (ART) deviate significantly from normal ranges. Once the aforementioned model training is complete, Anomalies will now start showing up in Events or the Event lists, or you can address them from the Anomalies tab.

Let's get familiar

Anomaly Detection looks and feels like other AppDynamics modules, making it very easy to discover application problems and drill-down to isolate them.

Since we're already at Anomaly Detection within the Alert & Respond menu, select the Anomalies tab.
AppDynamics has discovered an anomaly with the /tools.decartes.teastore.webui/profile Business Transaction. Like Health Rules, the same critical [!] symbol is also used for anomalies.
The status is open, meaning that the problem still exists and hasn't been resolved according to AppDynamics.
AppDynamics knows that the EPM metric for this particular BT is deviating from normal, giving us more information to solve the problem.

If we double-click the /tools.decartes.teastore.webui/profile Business Transaction link, a new window opens with a call graph and a sidebar, providing a lot more information to dissect this Anomaly. Let's take a look at what's available in the sidebar.

Like we previously discovered, the status is in open, meaning that the problem sill exists. What is more important here is that AppDynamics has the start time when the anomaly occurred along with a time graph that we can see how long this metric has been deviating.
Although we already have a dashboard highlighting the affected BT, it only creates a graph with the teastore-webui tier. By clicking the BT link in this section, a new dashboard with the flow map of the Business Transaction, providing further information. We also see the metric deviation that this Anomaly is reporting as critical is EPM.
This is where it gets awesome. The Top Suspected Causes is what AppDynamics believes is the root cause for this Anomaly.
This is a list of all the snapshots that have been taken during the Anomaly window. We can double-click a snapshot from the list and drill down to determine the root cause of the error within the Business Transaction. This will provide all the information we need to solve our problem, but we want to see more information on the metric trend that caused the Anomaly. Click the "More Details" in Top Suspected Causes (3).

This is suspect

A window will pop up and replace the tiers flow map of this Anomaly with a Suspected Cause graph.

We should have already guessed this from our previous anomaly-related menus, but there is a problem with the /tools.decartes.teastore.webui/profile BT, located in the teastore-webui tier.
The Top Deviating metrics provide us a graph of the metric trend of this BT as it started and continues to deviate. As you'll see, this tells us that the BT is deviating, but that might not be the cause. We've already estimated that based on information from the previous windows.
The Suspected Cause Metrics provide us a similar graph of the Business Transaction, as it deviates but with the true suspected cause. In this case, as we've already discovered that EPM on the teastore-webui tier is causing the problem.
You can see what "normal" looks like from the graph and when it starts to deviate (5). This helps us understand how Anomaly Detection works.

Snap back

Revisit the sidebar on the left and double-click on the top (latest) snapshot to drill-down into it. A new transaction window will open. Clearly, there are a large number of errors that are occurring within the teastore-webui tier.

Pay close attention to the Potential Issues in the Summary sidebar, as most of the work of discovering the root cause is done for you by AppDynamics. Click the top Potential Issues link to open a small information window highlighting the potential problem.

We can review the information presented here or you can drill down into the call graph to find root cause. Click Drill down Into Call Graph.

A new window opens into the Snapshot Overview. We'll see details on the error like the timestamp, execution time, node and tier. In the right-pane, you'll see all of the details of the error. We can now share this with your development team so they can fix it in a future release.

That's it! You can see how Anomaly Detection can provide another way to find problems in your application and reduce downtime. We can then further use the information from this Anomaly to build or modify Health Rules, Policies and Actions to catch and report these errors as they occur.

Begin at the end

Since you now know how easy it is to configure Anomaly Detection and see its benefits, let's dive deeper into what you can accomplish with AppDynamics by scheduling a demo.