Introduction to XQL: Custom Datasets for Threat Hunting

Introduction

In the past few blogs, we've learned how to work with XQL using datasets that were natively available to us from the XDR Agent and NGFW datasets. We'll now be presented with a new challenge using data that isn't native to the platform. When I was recently working with a colleague, they were considering deploying Proxmox to a local site to run a specific virtual machine. However, during this proof of concept, the engineer was concerned that one of the local staff members was trying to log in to the management console overnight. Proxmox runs a version of Debian Linux, which is supported by the XDR Agent, but currently, the engineer is not comfortable installing any agents on the host hypervisor. We'll leverage the Broker VM and the syslog capabilities of Proxmox to forward the logs into a custom dataset for analysis.

Forming our hypothesis

Our objectives are the following:

Deploy and configure the Broker VM.
Configure syslog forwarding on our hypervisor to forward to our Broker VM.
Parse the syslog message for the actionable alert.
Create a correlation rule to alert the SOC to failed logins outside of business hours.
Verify that the incident has worked as expected.

We'll start by trying to log in to Proxmox using a known bad username and then searching the System logs for the log format.

A screenshot of a computer

AI-generated content may be incorrect.

In the log line, we have several strings that we'll want to separate into different columns in our dataset to capture the most relevant fields for our use case:

Timestamp - May 20 09:47:07
Hostname - ms01-a
Process - pvedaemon
Process ID - [1179]:
Remote Host - rhost=::ffff:10.0.0.164
User - user=baduserlogin@pam
Message - msg=no such user ('baduserlogin@pam')

Since we'll be collecting these logs over syslog into the Cortex Data Lake, we'll need to deploy the Broker VM.

What is the Broker VM?

The Broker VM is a hardened virtual machine appliance that acts as a bridge between your internal network resources and Cortex XDR/XSIAM. Its primary purpose is to facilitate a secure connection for various applets without exposing endpoints directly to the internet. The appliance supports multiple hypervisors such as AWS, Azure, GCP, VMware ESXi, Hyper-V, KVM, Nutanix, and Alibaba Cloud.

The Broker VM also supports high availability clustering in both active/passive and active/active modes. Palo Alto Networks updates the appliance, which can be configured from the XDR/XSIAM console.

The Broker VM is modular and supports different applets to retrieve data and upload it securely to the Cortex Data Lake, such as:

Syslog Collector – Ingests syslog messages from external systems. Supports Active/Active.
NetFlow Collector – Collects and parses NetFlow data for network traffic analysis. Supports Active/Active.
Windows Event Collector (WEC) – Gathers Windows event logs. Supports Active/Active.
Kafka Collector – Pulls event streams from Apache Kafka. Supports Active/Passive.
CSV Collector – Monitors shared Windows directories for CSV files and uploads them to Cortex XDR. Supports Active/Passive.
FTP Collector – Gathers data via FTP transfers. Supports Active/Passive.
Files and Folders Collector – Collects files and logs from specified directories. Supports Active/Passive.
DB Collector – Extracts logs from databases. Supports Active/Passive.
Local Agent Settings – Acts as an agent proxy and optionally caches installer and content update packages for endpoints. Supports Active/Active.
Network Mapper – Performs discovery of devices on a network using active scanning. Does not support HA.

The Broker VM can be deployed with as little as a 4-core CPU, 8 GB RAM, 512 GB disk (thin provisioned), and a minimum of 10 Mbps of bandwidth (with an optimal outgoing bandwidth around 25% of incoming data), but these requirements may change based on the applets configured.

Deploying the Broker VM

The Broker VM image can be downloaded from the Cortex XDR/XSIAM console under Settings > Configurations > Data Broker > Broker VMs. From the Add Broker drop-down, select the appropriate virtual machine type for your hypervisor.

A screenshot of a phone

AI-generated content may be incorrect.

You can find the correct documentation for deployment on your hypervisor here: Broker VM Installations

After the Broker VM has been deployed, we can log in to the initial console and use the simplified menu to configure a static IP address.

Once a static IP address has been configured, we can log in to the WebUI at https://[IP-Address]:4443 with the default password !nitialPassw0rd and then change the password to our own choosing for the appliance.

Lastly, we just need to register the Broker VM to our Cortex XDR/XSIAM tenant using the token from the Add Broker > Generate Token dropdown.

A screenshot of a computer screen

AI-generated content may be incorrect.

Configuring the Broker VM

After the Broker VM has been registered to our tenant, we can click the Add link to add the syslog applet to our Broker VM.

The Syslog Applet will default to the UDP/514 collection, but we can add additional protocols and ports to listen to fit our needs. The Broker VM will make an attempt to discover the vendor and product for the most popular applications.

A screenshot of a black screen

AI-generated content may be incorrect.

Our Proxmox system engineer has informed us they are forwarding all syslog's to our Broker VM using UDP/5514.

We will manually define the vendor and product for our two Proxmox hosts to store their syslog in the proper dataset.

With the logs flowing into our new dataset, we'll need to configure parsing rules to structure our syslog data.

What are parsing rules?

Parsing rules in Cortex XDR/XSIAM using a Broker VM are a part of customizing data ingestion for custom datasets. These rules enable preprocessing, filtering, enrichment, and routing of logs before they are ingested into a dataset.

Parsing rules are divided into two sections: default rules provided by Marketplace integrations and user rules that we can define and augment.

The parsing rules can be composed of key sections:

Collect – Performs filtering and enrichment on the Broker VM before ingestion. This reduces outbound network traffic and Pro Per GB license usage. Optional. If using the Collect section, be sure that the Broker VM minimum hardware requirements are set to 8-core CPU, 8 GB RAM, and a 512 GB disk; this will support 10K events/second per core capacity.
Const – Defines reusable constants, such as regex strings or constants, referenced via $CONST_NAME. Optional.
Rule – Defines named reusable logic blocks that can be called in other sections. Optional.
Ingest – Specifies how the incoming data is parsed and sent to a dataset. Required.

Writing the parsing rule

Parsing rules use an XQL-like structure. The main differences are that every rule must end in a semicolon, and our stages are limited in scope. While building the parsing rules, my personal preference is to use the XQL Query Builder to filter and structure the logs using the alter stage.

A black screen with white text

AI-generated content may be incorrect.

Our XQL Query uses RegEx to parse the authentication failure logs. The RegEx is commented on each alter stage for its purpose.

| alter timestamp = arrayindex(regextract(_raw_log, "^<\d+>([A-Z][a-z]{2}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2})"), 0), //Extracts the syslog timestamp (month, day, and time)
        hostname = arrayindex(regextract(_raw_log, "^<\d+>[A-Z][a-z]{2}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}\s+([^\s]+)"), 0), //Extracts the hostname following the timestamp
        process = arrayindex(regextract(_raw_log, "^<\d+>[A-Z][a-z]{2}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}\s+[^\s]+\s+([a-zA-Z0-9_-]+)\[\d+\]"), 0), //Extracts the process name before the PID in square brackets
        pid = to_integer(arrayindex(regextract(_raw_log, "^<\d+>[A-Z][a-z]{2}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2}\s+[^\s]+\s+[a-zA-Z0-9_-]+\[(\d+)\]"), 0)), //Extracts the process ID from within square brackets
        rhost = arrayindex(regextract(_raw_log, "rhost=(?:::ffff:)?(\d+\.\d+\.\d+\.\d+)"), 0), //Extracts the remote host IP address (IPv4), without optional IPv6 mapping prefix
        user = arrayindex(regextract(_raw_log, "user=([^ ]+)"), 0), //Extracts the username associated with the authentication attempt
        msg = arrayindex(regextract(_raw_log, "msg=(.+)"), 0) //Extracts the remaining message content following msg=

With the XQL Query refined, we can verify our results in the table below.

With our XQL Query complete, we'll navigate to Settings > Dataset Management > Parsing Rules and translate our XQL Query into the User-Defined rules. We'll use the Ingest stage to target the vendor and product we configured on the Broker VM to store these logs into the proxmox_proxmox_raw dataset. We'll use the filter and alter stages to store the values in the columns we've defined. We'll use the no_hit = drop to drop any other logs that do not match our filter.

Building the correlation rule

In our last step, we'll compose our correlation rule using the concepts from Introduction to XQL: Writing Your First Correlation Rule. According to our requirements, we want to create an alert only when no such user or failed appears in the MSG column at any time, on weekends, or Monday through Friday after normal business hours.

We'll use the alter stage to extract the day of the week and hour from the _time column and then convert the hour to UTC against our time zone.

dataset = proxmox_proxmox_raw
| filter (
        (msg contains "no such user")
        or (msg contains "failed") )
| alter event_hour = extract_time(_time, "HOUR"),
       event_day = extract_time(_time, "DAYOFWEEK")
| filter (
        (event_day in (6, 7))  // Saturday (6) or Sunday (7)
        or (event_day in (1,2,3,4,5) and (event_hour < 13 or event_hour >= 22)) ) // M–F but outside 8-5 CDT using UTC
| fields _time, timestamp, hostname, process, pid, rhost, user, msg, event_hour, event_day

Once we verified we have our desired result, we're ready to save this query as a correlation rule.

We'll configure a rule name for our new correlation rule and set the schedule to run every 10 minutes.

We'll suppress alerts against the rhost column since this is the source IP of the endpoint attempting to login.

Next, we'll generate the alert with some of the variables for the SOC Analyst to easily understand why the alert occurred.

Lastly, we can verify that our Incident was created using our correlation rule and new custom dataset.

Conclusion

Custom datasets and the Broker VM are game changers for anyone using Cortex XDR and XSIAM. They let you pull in logs from just about anywhere you can imagine, your legacy firewalls, hypervisors, weird one-off systems, and give you the flexibility to build detections that match how your environment works. The Broker VM handles the heavy lifting behind the scenes, securely getting that data to the Cortex Data Lake. If you're looking to level up your threat hunting or finally bring visibility to those forgotten applications of your network, this is the way to do it.

Stay tuned for our last part of this series to optimize your XQL queries and parsing rules to improve performance from search to ingesting data into Cortex XDR and XSIAM.