Hunting for anomalies with time-series analysis

A time series is a sequence of numerical data points, such as a frequency of counts or occurrences against a column of a particular dataset. Time series is used to identify meaningful statistics in historical data, which can later be used to baseline certain statistics.

In this blog post, we are going to apply time-series analysis on Azure Active Directory Sign-In logs, to baseline a couple of identity related use-cases. We will go through a few examples and highlight import things that are worth to look at.

When to use Time-Series analysis?

Analyzing time-series data gives a good overview of statistics in a period of time. It is useful when you want to apply it on statistics, that you think may have a higher frequency of count then expected. There are different examples, but in order to understand it. We have to cover a few specific use-cases and one simple example would be, baselining result types of sign-in operations.

Time-Series analysis with KQL functions and operators

There are couple of KQL functions and operators that will be used during this blog post, so understanding them on a high-level would benefit a lot.

make-series: This operator allow us to create series of specified aggregated values along specified axis. It is used to aggregate data points in a series by a specified column available in a data source of a schema.

series_decompose_anomalies: This function takes an expression containing a series (dynamic numerical array) as input, and extracts anomalous points with scores. It will return the following series:

ad_flag: What is the frequency? (e.g. How many times has a user generated a result type in a hour?)

Baseline: The predicted value (e.g. Bob has generated 5 result types, but the prediction was 2)

Score: Anomaly score is how far the observed value is from the calculated baseline value.

mv-expand: This is a operator that allows to expand a row into multiple rows by splitting a column with array or property bag values.

make_list: This aggregation function returns of all the values of Expr in the group. If the input to the summarize operator is not sorted, the order of elements in the resulting array is undefined.

Time-Series analysis, but practical

Step 1:

The first query that we will run is preparing the time-series data for each identity. As discussed earlier, we are going to look for result type operations in Azure Active Directory Sign-in logs.

A few important notes is, that we are going to look for data in the last 14 days, while having a threshold set with the value 5.

Query:

let starttime = 14d;
let endtime = 1d;
let timeframe = 1h;
let TotalEventsThreshold = 5;
let resultCodes = dynamic([53000, 53003]); // Access has been denied by CA - Device is not compliant
let TimeSeriesData = 
SigninLogs
| where TimeGenerated between (startofday(ago(starttime))..startofday(ago(endtime)))
| make-series PerHourCount=count() on TimeGenerated from startofday(ago(starttime)) to startofday(ago(endtime)) step timeframe by Identity;
TimeSeriesData

Results:

In the following results, we will see the ‘PerHourCount’ and ‘TimeGenerated’ column. Both of these columns shows the frequency of count (per hour) for every ‘ResultType’ that is associated with an identity. As you might have notice, there are some 0 values as well, but those values have been filled by the make-series operator, because it is missing values.

Step 2:

The second step is to use the series_decompose_anomalies() function to detect for trends in our dataset, so we can flag for anomalies based on the provided augments that have been documented here:

Source: https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/series-decompose-anomaliesfunction

Query:

let starttime = 14d;
let endtime = 1d;
let timeframe = 1h;
let TotalEventsThreshold = 5;
let resultCodes = dynamic([53000, 53003]); // Access has been denied by CA - Device is not compliant
let TimeSeriesData = 
SigninLogs
| where TimeGenerated between (startofday(ago(starttime))..startofday(ago(endtime)))
| make-series PerHourCount=count() on TimeGenerated from startofday(ago(starttime)) to startofday(ago(endtime)) step timeframe by Identity;
let TimeSeriesAlerts=TimeSeriesData
| extend (anomalies, score, baseline) = series_decompose_anomalies(PerHourCount, 1.5, -1, 'linefit')
| mv-expand PerHourCount to typeof(double), TimeGenerated to typeof(datetime), anomalies to typeof(double),score to typeof(double), baseline to typeof(long)
| where anomalies > 0
| project Identity, TimeGenerated, PerHourCount, baseline, anomalies, score
| where PerHourCount > TotalEventsThreshold;
TimeSeriesAlerts

Results:

We will now see the 3 new columns; ‘baseline’, ‘anomalies’, and ‘score’. However, it might not be clear what we are seeing here, so to keep it short, but detailed. The ‘PerHourCount’ shows the frequency of counts for every ResultType that is associated with an identity. This means for example that a user Bob has generated 16 ResultTypes on 12:00 PM, and all those ResultTypes can be different from each other. Like for example, Bob generated a couple of 53000, 53003, and 0. Once we count all of them together, it is equals to 16.

In the first row, we can see the value ’16 ‘ at ‘PerHourCount’ and the value ‘4 ‘ at the ‘baseline’ column. This means that a user has generated 16 ResultTypes in one hour, which is in this case. 12:00 PM. However, the ‘baseline’ column shows, that it was predicting 4 ResultTypes, instead of 16.

Since it was predicting 4, but the outcome was 16. It will tell in the results, that there is one anomaly, which will be displayed at the ‘anomalies’ column. We will explain the ‘score’ column later in this blog post.

Step 3:

The last query will add additional information to our current query to gather some context. It might help to determine if something was malicious or not. We are going to filter on Result Type 53000 & 53003.

let starttime = 14d;
let endtime = 1d;
let timeframe = 1h;
let TotalEventsThreshold = 5;
let resultCodes = dynamic([53000, 53003]); // Access has been denied by CA - Device is not compliant
let TimeSeriesData = 
SigninLogs
| where TimeGenerated between (startofday(ago(starttime))..startofday(ago(endtime)))
| make-series PerHourCount=count() on TimeGenerated from startofday(ago(starttime)) to startofday(ago(endtime)) step timeframe by Identity;
let TimeSeriesAlerts=TimeSeriesData
| extend (anomalies, score, baseline) = series_decompose_anomalies(PerHourCount, 1.5, -1, 'linefit')
| mv-expand PerHourCount to typeof(double), TimeGenerated to typeof(datetime), anomalies to typeof(double),score to typeof(double), baseline to typeof(long)
| where anomalies > 0
| project Identity, TimeGenerated, PerHourCount, baseline, anomalies, score
| where PerHourCount > TotalEventsThreshold;
TimeSeriesAlerts
| join (
SigninLogs
| where ResultType in (resultCodes)
| summarize ResultTypeCount=count(),ResultTypes=make_set(ResultType), Location=make_set(Location), Applist=make_set(AppDisplayName), IpList=make_set( IPAddress) by Identity, bin(TimeGenerated, 1h)
) on Identity, TimeGenerated | extend AnomalyTimeattheHour = TimeGenerated
| project AnomalyTimeattheHour, Identity, ResultTypeCount, PerHourCount, ResultTypes, Applist, IpList, baseline, anomalies , score, Location

Results:

In the following results, we can see what applications have been accessed in a specific hour. Including the associated IP address and location that belongs to an identity.

There is also one new column with the name ‘ResultTypeCount’ and this column tells that on 12:00 PM, there was 1 ‘ResultType’ with the code 53000. While the ‘PerHourCount’ shows a value 19. This column tells that there was in total 19 Result Types. To avoid any confusion, this means that there is 1 ‘ResultType’ with the code 53000 and 18 other Result Types. The purpose of the two Result Types are mainly to provide additional context.

Baseline:

Ideally the ‘baseline’ column should not contains a lot of 0, because this means that you may not have consistent time series data and have a lot of missing values for the period. Another reason could be, that the event has only being seen a few times. The reason that we included this in the blog post helps you to be aware of this case, so if you encounter it. You know where the problem may lie.

Step 4: Modifying our final query

This will be the final example and we are going to replace the Result Types 53000 & 53003 with two other Result Types. Both of the new result types indicates that a user did not pass a MFA challenge. You can recognize them as 50074 and 50076.

Query:

let starttime = 14d;
let endtime = 1d;
let timeframe = 1h;
let TotalEventsThreshold = 5;
let resultCodes = dynamic([50074, 50076]); // User did not pass the MFA challenge - Interactive & Non-Interactive
let TimeSeriesData = 
SigninLogs
| where TimeGenerated between (startofday(ago(starttime))..startofday(ago(endtime)))
| make-series PerHourCount=count() on TimeGenerated from startofday(ago(starttime)) to startofday(ago(endtime)) step timeframe by Identity;
let TimeSeriesAlerts=TimeSeriesData
| extend (anomalies, score, baseline) = series_decompose_anomalies(PerHourCount, 1.5, -1, 'linefit')
| mv-expand PerHourCount to typeof(double), TimeGenerated to typeof(datetime), anomalies to typeof(double),score to typeof(double), baseline to typeof(long)
| where anomalies > 0
| project Identity, TimeGenerated, PerHourCount, baseline, anomalies, score
| where PerHourCount > TotalEventsThreshold;
TimeSeriesAlerts
| join (
SigninLogs
| where ResultType in (resultCodes)
| summarize ResultTypeCount=count(),ResultTypes=make_set(ResultType), Location=make_set(Location), Applist=make_set(AppDisplayName), IpList=make_set( IPAddress) by Identity, bin(TimeGenerated, 1h)
) on Identity, TimeGenerated | extend AnomalyTimeattheHour = TimeGenerated
| project AnomalyTimeattheHour, Identity, ResultTypeCount, PerHourCount, ResultTypes, Applist, IpList, baseline, anomalies , score, Location

Results:

There are 703 records being returned in the results. Now when you take a look at the ‘baseline’ column. It will show different values that it was expecting, because this event has showed up frequently and at least every hour.

What to do with the score?

When applying time-series analysis in your hunting engagement. You may want to prioritize stuff, so the ‘score’ column might help, because it will tell how far the observed value is from the baseline value.

Generally, the higher the score is. The more chances that it’s an anomaly, but again. It doesn’t necessary mean that it is also the case. Everything requires verification and context.

On 2/12/2021 8:00 AM in the morning, there were 19 Result Types, which we can see in the ‘PerHourCount’ column. The ‘baseline’ column was predicting that there would be only 9 Result Types. It gave it a score of 52 as float value. Usually when the score is higher than 10. It might be considered an ‘anomaly’, but why is that?

Now when we are going to do a verification of the associated identity that has a score of 52. We will look at the frequency of count for every Result Type. How many Result Types did this user generated in an hour on this specific date?

If we now run the following query:

SigninLogs
| where TimeGenerated >= datetime(2/12/2021)
| where Identity == 'Test Account'
| summarize OperationCount = count() by ResultType, bin(TimeGenerated, 1h)
| sort by TimeGenerated desc
| render timechart

We will visualize the data and what we will notice is that there is a spike in the results. On 8:00 AM in the morning, because the results are much higher than the other timestamps.

This is probably the reason why our previous query has assigned it a score of 52 as float value.

Summary:

We have applied time-series analysis in Azure Active Directory Sign-In logs to baseline the result types of sign-in operations for each identity to discover anomalies. Keep in mind that this is just an example to demonstrate Time-Series analysis.

Everything was covered step by step to make it practical for all the readers. All the important column names have also been explained in short to make it understandable to interpret the results.

Time-series helps to baseline the frequency of count or occurrences against a column to find patterns in your dataset. It will flag for an anomaly once it has discovered a spike. When we are talking about a ‘spike’ in a dataset, we actually mean, that the frequency of count was much higher than predicted.

Acknowledgement:

I would like to thank Ashwin Patil for helping me out, because he has provided useful feedback to make the queries much better. He has also answered a lot of my questions, so I’m grateful for all his support. Without him, I wouldn’t be able to understand this topic.

References:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s