🐺cwolves

cwolves manages the implementation of the LogSlash log reduction method.

LogSlash is an intelligent log reduction method that automatically generates log-merging strategies to reduce the data size of various log formats. These strategies can be configured to increase or decrease compression up to being losslessly reproduced. Let's break that down:

"LogSlash is an Intelligent Log Reduction Method"

This tool takes any set, or stream, of logs and applies transformations to reduce them into log summary lines. These summary lines are interpretable; they can provide value on their own. However, they can also be unpacked in your SIEM to reproduce the original data.

"Automatically Generates Log-Merging Strategies"

This part of the process explains how we take a set of logs and converge them into a set of log summary lines. First, our algorithm analyzes a sample of the logs and isolates the data type for each column. We are not talking about data types in the classical sense (e.g., string, integer); we mean the semantic meaning of the data. For example, a string of an IP address should be treated differently than a string of an error message and an epoch time value should be treated differently than a TTL value. Once the algorithm understands the data it's dealing with, it selects an appropriate merging strategy with which to converge the data. There are only a few types of merging strategies:

Group By

The Group By merging strategy is essential. It represents the columns that log summary lines should converge to; meaning that when multiple logs have identical values in their Group By column(s), they will converge to one line. Here is an example where the Group By merging strategy is applied to an IP Address field (ignore the other fields for now):

IP AddressSrc PortDst Port

192.168.45.32

63543

443

192.168.45.32

61542

80

127.0.0.1

62322

443

IP AddressSrc PortDst Port

192.168.45.32

[63543, 61542]

[443, 80]

127.0.0.1

[62322]

[443]

As you can see, the logs were reduced from three to two lines by converging logs with identical IP Address values to a single log line.

Array

The Array merging strategy simply records all values and stores them as a list in the log summary line. We can see this merging strategy in action using the example above, where the IP Address column follows the Group By merging strategy, while the Src Port and Dst Port columns follow the Array merging strategy:

IP AddressSrc PortDst Port

192.168.45.32

63543

80

192.168.45.32

61542

443

127.0.0.1

62322

443

IP AddressSrc PortDst Port

192.168.45.32

[63543, 61542]

[80, 443]

127.0.0.1

[62322]

[443]

Range

The Range merging strategy simply records an array of the minimum and maximum values seen. In this example, we add a Timestamp column and merge it using the Range merging strategy:

IP AddressSrc PortDst PortTimestamp

192.168.45.32

63543

80

1705011061

192.168.45.32

61542

443

1705011112

127.0.0.1

62322

443

1705012112

IP AddressSrc PortDst PortTimestamp

192.168.45.32

[63543, 61542]

[80, 443]

[1705011061, 1705011112]

127.0.0.1

[62322]

[443]

[1705012112, 1705012112]

Stats

The Stats merging strategy provides statistical metrics on numerical values. In this example, we'll configure the Stats merging strategy on the TTL column:

IP AddressSrc PortDst PortTimestampTTL

192.168.45.32

63543

80

1705011061

31

192.168.45.32

61542

443

1705011112

20

127.0.0.1

62322

443

1705012112

63

IP AddressSrc PortDst PortTimestampTTL

192.168.45.32

[63543, 61542]

[80, 443]

[1705011061, 1705011112]

{avg: 25.5, min: 20, max: 31}

127.0.0.1

[62322]

[443]

[1705012112, 1705012112]

{avg: 63, min: 63, max: 63}

Count

The Count merging strategy simply creates an object with the count of each unique value. Let's apply this strategy to the Src and Dst Port fields:

IP AddressSrc PortDst PortTimestampTTL

192.168.45.32

63543

80

1705011061

31

192.168.45.32

61542

443

1705011112

20

127.0.0.1

62322

443

1705012112

63

IP AddressSrc PortDst PortTimestampTTL

192.168.45.32

{63543: 1, 61542: 1}

{80: 1, 443: 1}

[1705011061, 1705011112]

{avg: 25.5, min: 20, max: 31}

127.0.0.1

{62322: 1}

{443: 1}

[1705012112, 1705012112]

{avg: 63, min: 63, max: 63}

To summarize, when a log is processed by LogSlash, if it has never seen it before, LogSlash will generate merging strategies based on the semantic data types to produce log summary lines. These log summary lines can be interpreted on their own but are also able to be unpacked to their original form within your SIEM or SOAR using plugins provided by cwolves. After the fact, these merging strategies can be configured to tune compression to your specific use case. While our algorithm provides these merging strategies out of the box, most often you'll want to customize how each field is interpreted. To maximize compression, increase the use of the Count, Stats, and Range strategies to reduce data to metric data. To decrease compression (up to lossless reduction), only use the Group By and Array strategies. This ensures no data loss and the original log data can be identically reproduced in your SIEM.

A Note on Reporting Requirements

While you should always be careful with lossy reduction of your log data, many people are misinformed about the true nature of their industry's regulatory requirements. To oversimplify, most of the time, the requirement is that there is enough information to capture a specific event happening. This is not the same as capturing all of the information about that event. We're happy to discuss your reporting requirements and make a recommendation based on our many years of experience in the industry.

Another Note, On Correlated Backups

We believe correlated backups should be the gold standard in logging. Logs that enter your SIEM should have just enough information to trigger alerts and provide crucial data for analysis, while the original data should be stored off-platform in cheap storage solutions. By assigning IDs to log summary lines that correlate with IDs assigned to the original data, you can always get more information when needed. If you're interested in this approach, we'd love to discuss with you. Please reach out to support@cwolves.com!

Last updated