Ingesting PCAP Files with Zeek and Splunk

Whenever I want to analyze a relatively small packet capture (PCAP), I load it up in Wireshark and get the job done. However, this process does not scale and becomes a problem with large-sized pcap files. Even if you split, let’s say, a 25 terabyte PCAP up into smaller chunks, can you imagine how long it would take using just Wireshark for analysis?

This blog post will look at a simple data pipeline to read and ingest pcap files into a Security Incident and Event Management (SIEM) platform using Zeek and Splunk.

Getting Started

I will assume you have the following software installed and configured for your environment. We will cover some configuration details, but this article is not necessarily geared towards someone with no knowledge of these tools:

– Zeek
– Splunk Universal Forwarder (UF)
– Splunk Indexer (SIEM)
– Tcpdump / Editcap
Splunk Logo

The examples will use a sample PCAP from Malware Traffic Analysis containing Log4j attacks against a web server and other scanning traffic.

Splunk UF Configuration

Make sure you have the Technical Add-on (TA) for Zeek built by Corelight installed on the respective tier(s) of your Splunk architecture. If you are not sure, Splunk details some best practices here. For example, I have the TA installed on the UF, Search Head, and Search Peer in my lab setup. The TA will help normalize your Zeek data to match the Splunk Common Information Model (CIM).

Create or edit the Zeek TA on the Splunk UF system so that the inputs.conf points to the location of your Zeek log directory. In the example inputs configuration below, the first monitor stanza tells the UF to ship logs it finds under my system’s zeek log directory. The second stanza monitors an arbitrary zeek folder I created under my user account.

If you have the Zeek TA installed on different tiers, you only need to make the input configuration file changes on the Splunk Universal Forwarder system.

Note: Remember configuration file changes for apps go in $SPLUNK_HOME/etc/apps/<app_name>/local



Lastly, ensure the UF is configured to ship logs to your SIEM. You can check your running configuration with:

$ splunk show config outputs

If you want to see what is stored on the disk, you can also use the following command:

$ splunk btool outputs list

Reading a Single PCAP with Zeek

If we want to read a single pcap file, we can use the following command:

$ zeek -C -r mta-log4j-training.pcap LogAscii::use_json=T

The command above will tell zeek to ignore packet checksums (-C) and parse (-r) the pcap file using the JSON output logging method. The log files will be output to the current directory, and if you configure the UF correctly, the log files will be shipped to your SIEM for analysis.

Splitting Large PCAP files

We need to determine how to split up a large pcap file into more manageable pieces. For instance, if you have a 1 TB pcap, do you want ten files that are 100 Gigabytes (GB) a piece or a thousand 1 GB files? On the other hand, maybe you want each file to be even smaller.

The first way to get this done is to use tcpdump:

$ tcpdump -n -r mta-log4j-training.pcap -w mta-log4j-split -C 1

The command line above will tell tcpdump to not resolve (-n) DNS names, read (-r), and split (-C) the pcap file into several one-megabyte (1MB) files that will, in aggregate equal the size of the unified pcap. Then tcpdump will write (-w) each file to disk using the prefix ‘mta-log4j-split and append a number to the end of each file to avoid name collisions. See the example below:


You should be aware that tcpdump will split a pcap file with no concern for session or packet boundaries. This behavior could affect the accuracy of the data reported by Zeek, and you may have issues getting Wireshark to read files using the tcpdump split.

The second method uses editcap to split the pcap file and is much more flexible. Editcap comes packaged with Wireshark and uses the same command-line option to specify size, but instead of megabytes, it uses bytes. The option is an uppercase C as the lowercase c splits based on packet count:

$ editcap –C 1048576 mta-log4j-training.pcap mta-log4j-split

The command line above will result in several one-megabyte (1MB) files with the following naming convention:


There are quite a few tools available to split up large PCAP files. Another one I would recommend is SplitCap which can split pcap files based on each unique TCP/UDP session. SplitCap is a windows tool, but it can be run on Linux using Mono.

Reading Multiple PCAP Files with Zeek

So now that we have our PCAP file split into more manageable pieces, we can create a bash script that will read each of these files and help you keep track of the progress.

#!/bin/env bash


# Create directory to store parsed files
if [[ ! -d $DONE_DIR ]]; then
  mkdir $DONE_DIR

for i in $(find $LOCATION -iname ${PREFIX}* 2>/dev/null | sort)
  echo "Zeek Parsing: $i"
  $ZEEK -C -r $i LogAscii::use_json=T local
  mv $i $DONE_DIR

Note: This script does not come with any warranty, and you are urged to test it thoroughly before employing it in production.

The shell script above will sort a list of your spit pcap files in the LOCATION specified using the PREFIX you specify. Then each file is passed to Zeek for processing. Finally, the completed files are automatically moved to a separate directory. So it should be reasonably safe to run the script multiple times if interrupted for some reason.

Splunk Indexer Configuration

On the Splunk Indexer or Search Peer Cluster, you should have the appropriate index created that is listed in the inputs configuration file we looked at earlier. I would also recommend installing the TA for Zeek. If you don’t already have data going to your indexer, then remember to check your firewall rules and have the receiving port enabled. The default port is 9997.

Carving vs Ingesting Large PCAP Files

Carving a packet capture file means extracting actionable data from the pcap. Using the one terabyte pcap file example, we can carve out specific information for analysis if we know the timeframes, artifacts, protocols, or other information that can help with scoping. Tcpdump and editcap can be used to carve out data we are interested in from a large pcap and reduce it to a much more manageable pcap file. I have provided two examples of carving data needed from a large pcap.

tcpdump -n -r large.pcap -w small.pcap 'tcp and port 22'

The command above will read the large pcap file and extract all the traffic where the source or destination port is 22, and the protocol is TCP.

Next, we take a look at using editcap to scope a packet capture.

editcap  -F pcap -v -A “2021-08-15 14:00:00” -B “2021-08-15 15:00:00” large.pcap small.pcap

The command above will carve out one hour of traffic from the specified date in the large pcap and store it in the small pcap. Then you can perform analysis on this much smaller subset of the pcap file.

The takeaway here is to assess if you can answer your hypothesis by carving the data out of the pcap and forgo the overhead of parsing and ingesting huge pcap files. If carving is not feasible or you don’t have the data points necessary to make carving viable, then following the instructions in this blog will undoubtedly be warranted.

Bonus: Using Security Onion for PCAP Ingest

Security Onion (SO) is an open-source Linux distribution for threat hunting, enterprise security monitoring, and log management. You can use SO to ingest pcap files using the so-import-pcap tool. Do not use your production SO system to ingest pcap files. I would suggest you build and keep a standalone SO system specifically for importing pcap files. The image above shows the exact setting you need to select when setting up Security Onion.

Single PCAP file import:

$ sudo so-import-pcap mta-log4j-training.pcap

Multiple PCAP file import:

$ sudo so-import-pcap ./mta-log4j-split0000 ./mta-log4j-split0001 ./mta-log4j-split0002

Whenever you ingest a pcap file using the so-import-pcap tool there is a hunt link generated. However, when you import multiple pcap files the hunt link is only generated for the last file. In order to avoid this issue, the SO team recommends using a for loop to ingest multiple pcap files.

Security Onion uses the Elastic Stack, so you will not be ingesting the pcap data into Splunk. However, you can install a Splunk Universal Forwarder to monitor the zeek log directory, and when you import pcap files, they will be shipped to your Splunk Instance and the Local Elastic Stack. Using Security Onion also provides you the benefit of importing pcap data into an ecosystem designed for blue team operations.

That’s it. If you have any questions feel free to reach out.

Thanks for reading.

Start a discussion or ask a question.

This site uses Akismet to reduce spam. Learn how your comment data is processed.