🐺cwolves
cwolves manages the implementation of the LogSlash log reduction method.
LogSlash is an intelligent log reduction method that automatically generates log-merging strategies to reduce the data size of various log formats. These strategies can be configured to increase or decrease compression up to being losslessly reproduced. Let's break that down:
"LogSlash is an Intelligent Log Reduction Method"
This tool takes any set, or stream, of logs and applies transformations to reduce them into log summary lines. These summary lines are interpretable; they can provide value on their own. However, they can also be unpacked in your SIEM to reproduce the original data.
"Automatically Generates Log-Merging Strategies"
This part of the process explains how we take a set of logs and converge them into a set of log summary lines. First, our algorithm analyzes a sample of the logs and isolates the data type for each column. We are not talking about data types in the classical sense (e.g., string, integer); we mean the semantic meaning of the data. For example, a string of an IP address should be treated differently than a string of an error message and an epoch time value should be treated differently than a TTL value. Once the algorithm understands the data it's dealing with, it selects an appropriate merging strategy with which to converge the data. There are only a few types of merging strategies:
Group By
The Group By merging strategy is essential. It represents the columns that log summary lines should converge to; meaning that when multiple logs have identical values in their Group By column(s), they will converge to one line. Here is an example where the Group By merging strategy is applied to an IP Address field (ignore the other fields for now):
IP Address | Src Port | Dst Port |
---|---|---|
192.168.45.32 | 63543 | 443 |
192.168.45.32 | 61542 | 80 |
127.0.0.1 | 62322 | 443 |
IP Address | Src Port | Dst Port |
---|---|---|
192.168.45.32 | [63543, 61542] | [443, 80] |
127.0.0.1 | [62322] | [443] |
As you can see, the logs were reduced from three to two lines by converging logs with identical IP Address values to a single log line.
Array
The Array merging strategy simply records all values and stores them as a list in the log summary line. We can see this merging strategy in action using the example above, where the IP Address column follows the Group By merging strategy, while the Src Port and Dst Port columns follow the Array merging strategy:
IP Address | Src Port | Dst Port |
---|---|---|
192.168.45.32 | 63543 | 80 |
192.168.45.32 | 61542 | 443 |
127.0.0.1 | 62322 | 443 |
IP Address | Src Port | Dst Port |
---|---|---|
192.168.45.32 | [63543, 61542] | [80, 443] |
127.0.0.1 | [62322] | [443] |
Range
The Range merging strategy simply records an array of the minimum and maximum values seen. In this example, we add a Timestamp column and merge it using the Range merging strategy:
IP Address | Src Port | Dst Port | Timestamp |
---|---|---|---|
192.168.45.32 | 63543 | 80 | 1705011061 |
192.168.45.32 | 61542 | 443 | 1705011112 |
127.0.0.1 | 62322 | 443 | 1705012112 |
IP Address | Src Port | Dst Port | Timestamp |
---|---|---|---|
192.168.45.32 | [63543, 61542] | [80, 443] | [1705011061, 1705011112] |
127.0.0.1 | [62322] | [443] | [1705012112, 1705012112] |
Stats
The Stats merging strategy provides statistical metrics on numerical values. In this example, we'll configure the Stats merging strategy on the TTL column:
IP Address | Src Port | Dst Port | Timestamp | TTL |
---|---|---|---|---|
192.168.45.32 | 63543 | 80 | 1705011061 | 31 |
192.168.45.32 | 61542 | 443 | 1705011112 | 20 |
127.0.0.1 | 62322 | 443 | 1705012112 | 63 |
IP Address | Src Port | Dst Port | Timestamp | TTL |
---|---|---|---|---|
192.168.45.32 | [63543, 61542] | [80, 443] | [1705011061, 1705011112] | {avg: 25.5, min: 20, max: 31} |
127.0.0.1 | [62322] | [443] | [1705012112, 1705012112] | {avg: 63, min: 63, max: 63} |
Count
The Count merging strategy simply creates an object with the count of each unique value. Let's apply this strategy to the Src and Dst Port fields:
IP Address | Src Port | Dst Port | Timestamp | TTL |
---|---|---|---|---|
192.168.45.32 | 63543 | 80 | 1705011061 | 31 |
192.168.45.32 | 61542 | 443 | 1705011112 | 20 |
127.0.0.1 | 62322 | 443 | 1705012112 | 63 |
IP Address | Src Port | Dst Port | Timestamp | TTL |
---|---|---|---|---|
192.168.45.32 | {63543: 1, 61542: 1} | {80: 1, 443: 1} | [1705011061, 1705011112] | {avg: 25.5, min: 20, max: 31} |
127.0.0.1 | {62322: 1} | {443: 1} | [1705012112, 1705012112] | {avg: 63, min: 63, max: 63} |
To summarize, when a log is processed by LogSlash, if it has never seen it before, LogSlash will generate merging strategies based on the semantic data types to produce log summary lines. These log summary lines can be interpreted on their own but are also able to be unpacked to their original form within your SIEM or SOAR using plugins provided by cwolves. After the fact, these merging strategies can be configured to tune compression to your specific use case. While our algorithm provides these merging strategies out of the box, most often you'll want to customize how each field is interpreted. To maximize compression, increase the use of the Count, Stats, and Range strategies to reduce data to metric data. To decrease compression (up to lossless reduction), only use the Group By and Array strategies. This ensures no data loss and the original log data can be identically reproduced in your SIEM.
A Note on Reporting Requirements
While you should always be careful with lossy reduction of your log data, many people are misinformed about the true nature of their industry's regulatory requirements. To oversimplify, most of the time, the requirement is that there is enough information to capture a specific event happening. This is not the same as capturing all of the information about that event. We're happy to discuss your reporting requirements and make a recommendation based on our many years of experience in the industry.
Another Note, On Correlated Backups
We believe correlated backups should be the gold standard in logging. Logs that enter your SIEM should have just enough information to trigger alerts and provide crucial data for analysis, while the original data should be stored off-platform in cheap storage solutions. By assigning IDs to log summary lines that correlate with IDs assigned to the original data, you can always get more information when needed. If you're interested in this approach, we'd love to discuss with you. Please reach out to support@cwolves.com!
Last updated