Migrating Splunk Storage to S3 SmartStore

Splunk is a software technology that allows you to index, search, analyze, and visualize data at scale. I use it to ingest logs from my honeypots, homelab, and other projects. Initially, I had Splunk running in a local virtual machine where everything was fine except for the downtime, so I rapidly migrated it to an Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instance. Migrating to the cloud was also a forcing function to increase my knowledge of AWS. I learned a lot daily about the ecosystem; cost optimization was one of those beautiful lessons. A short guide follows on how I transitioned an existing Splunk deployment to S3 SmartStore.

Data Storage

Splunk technical documentation currently recommends EC2 instances using Elastic Block Storage (EBS) be GP2 and at least 300GB. When configuring an instance, the default volume type is GP2, and as you increase the size of the volume, the base IOPS increases. Splunk requires at least 800 IOPS, and not surprisingly, we get 900 IOPS with 300GB.

I started with a 500GB GP2 EBS volume since I had an index with over 300GB of existing data. The problem I soon realized was that I would quickly need more storage as the amount of data increased. Splunk does have a configuration option (maxVolumeDataSizeMB) to limit aggregate index storage, but that meant I had to accept losing data and limit data retention.

After researching, I realized I could use the Splunk SmartStore feature to decouple and scale storage. Thankfully, this isn’t an all-or-none process; a hybrid approach is a configuration choice. The rest of this blog highlights some changes I needed to make for the migration. Remember that this isn’t a standalone tutorial, so I’ve included reference links as much as possible.

S3 Bucket Creation

If you want to use Splunk SmartStore, select a compatible cloud storage service first. I chose Amazon S3 because I already use AWS and want to leverage Identity and Access Management (IAM) roles. Amazon has detailed documentation on creating an S3 bucket, but I will point out that you must ensure you make the bucket in the same region as your EC2 instance. The current S3 bucket defaults are reasonably secure, but adjust them based on your use case. I strongly recommend you ensure that “Block all public access” is enabled.

S3 IAM Role Creation

Since we are using an EC2 instance, we can leverage IAM roles instead of managing API keys. We will need to create an IAM role for an AWS Service. The IAM policy below includes the required permissions for SmartStore S3 operations. Remember to replace “bucket-name” with your actual bucket name. You may be tempted to use the prebuilt AmazonS3FullAccess policy, but please understand that this will likely give your Splunk EC2 instance full access to every S3 bucket in your account.

  "Version": "2012-10-17",
  "Statement": [
    "Sid": "SplunkSmartStore",
    "Effect": "Allow",
    "Action": [
    "Resource": "arn:aws:s3:::bucket-name"
    "Effect": "Allow",
    "Action": [
    "Resource": "arn:aws:s3:::bucket-name/*"

SmartStore Configuration

When enabling SmartStore, we need to edit the indexes.conf file to reflect the settings shown below. The location of this configuration file will vary based on your deployment type and how you deploy configurations. If you require more details, Splunk has excellent documentation on how to set up SmartStore.

WARNING: According to Splunk documentation, at the time of this writing, you cannot revert an index that has been converted to using SmartStore.

# Enable SmartStore
storageType = remote
path = s3://data
remote.s3.bucket_name = splunkbucket
# indexes.conf

Let’s assume the S3 bucket I created for the SmartStore migration is called splunkbucket, and I made a directory within this bucket called data. Then you should be able to extrapolate how the last two configuration options are assigned. I’m using the v2 addressing scheme for the path setting. Suppose you see configurations with the bucket name included with the path setting. In that case, they are using v1 addressing which will be eventually deprecated by Amazon.

Now let’s turn our attention to the index stanzas, which are also in the indexes.conf file. If you use local storage without volumes, your configuration should be similar to what I have below.

# Indexes
homePath   = $SPLUNK_DB/wineventlog/db
coldPath   = $SPLUNK_DB/wineventlog/colddb
thawedPath = $SPLUNK_DB/wineventlog/thaweddb
disabled   = false

homePath   = $SPLUNK_DB/nsm/db
coldPath   = $SPLUNK_DB/nsm/colddb
thawedPath = $SPLUNK_DB/nsm/thaweddb
disabled   = false

homePath   = $SPLUNK_DB/honeypot/db
coldPath   = $SPLUNK_DB/honeypot/colddb
thawedPath = $SPLUNK_DB/honeypot/thaweddb
disabled.  = false

Converting the wineventlog index to using remote storage requires that we simply add the remotePath setting. This one-line addition is really all that is necessary, and after restarting Splunk, you can check the S3 bucket is receiving the indexed data. When migrating an index with existing data to using SmartStore, it will trigger large data uploads that could affect search and indexing performance.

homePath   = $SPLUNK_DB/wineventlog/db
coldPath   = $SPLUNK_DB/wineventlog/colddb
thawedPath = $SPLUNK_DB/wineventlog/thaweddb
remotePath = volume:remote_store/$_index_name
disabled.  = false

Note: Remember to restart Splunk after making configuration changes.

If you have additional indexes that you want to use SmartStore, you would add the remotePath line to those index stanzas. For example, I have two Splunk internal indexes configured for SmartStore, shown below. You do not need the other lines for the default indexes.

remotePath = volume:remote_store/$_index_name

remotePath = volume:remote_store/$_index_name

Purging Orphaned Data

So now that you have all your existing and new data using S3, how do we free up local storage? Thankfully, Splunk has a native command that allows us to clean out indexed data from local storage.

WARNING: This is a dangerous operation, and there is no ‘undo’ or CTRL-Z to get your data back. Make sure you triple check you have the correct index and that the data has been provisioned in S3 before you attempt this command.

splunk:$ /opt/splunk/bin/splunk clean eventdata -index wineventlog

When this command completes, you should be able to take advantage of the reclaimed space or perform any storage reduction operations you need.

Cost Optimizations

The general idea of cost optimization is to maximize value while reducing costs. In my case, I was using an on-demand EC2 instance with a 500GB EBS volume. I’ve transitioned to using a reserved instance, and with the migration to SmartStore, I reduced my EBS volume to 300GB. These changes resulted in 66% cost savings.

Thanks for reading.

Start a discussion or ask a question.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: