How Splunk Analysis and Kafka Connect Works / Blogs / Perficient

General Introduction

Splunk Connect for Kafka is a sink connector that allows a Splunk software administrator to subscribe to a Kafka topic and stream the data to the Splunk HTTP Event Collector. After the Splunk platform indexes the events, you can then directly analyze the data or use it as a contextual data feed to correlate with other Kafka-related data in the Splunk platform.
There are primarily two methods for getting metrics into Splunk when using agents such etcd and collectd. Either using a Direct TCP/UDP input or via the HEC. Using Http Event Collector(HEC) is considered a best practice due to the resiliency and scalability of the HEC endpoint, and the ability to horizontally scale the collection tier easily.

Universal Splunk Forwarders

Splunk Use-Case Modelling

Splunk Connect For Kafka

Splunk Connect for Kafka introduces a scalable approach to tap into the growing volume of data flowing into Kafka. With the largest Kafka clusters processing over one trillion messages per day and Splunk deployments reaching petabytes ingested per day, this scalability is critical.

Overall, the connector provides:

High scalability, allowing linear scaling, limited only by the hardware supplied to the Kafka Connect environment
High reliability by ensuring at-least-once delivery of data to Splunk
Ease of data onboarding and simple configuration with Kafka Connect framework and Splunk’s HTTP Event Collector

Procedural Flow

Step 1: Start Zookeeper Server

Kafka uses ZooKeeper so you need to first start a ZooKeeper server if you don’t already have one. You can use the convenience script packaged with kafka to get a quick-and-dirty single-node ZooKeeper instance.

> bin/zookeeper-server-start.sh config/zookeeper.properties
[2013-04-22 15:01:37,495] INFO Reading configuration from: config/zookeeper.properties (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
...

Step 2: Start Kafka Cluster

After starting the Zookeeper server, Kafka Clusters and brokers can able to run properly without any interuptions

> bin/kafka-server-start.sh config/server.properties
[2013-04-22 15:01:47,028] INFO Verifying properties (kafka.utils.VerifiableProperties)
[2013-04-22 15:01:47,051] INFO Property socket.send.buffer.bytes is overridden to 1048576 (kafka.utils.VerifiableProperties)
...

Step 3: Create a topic

Build an AI-First Enterprise

From early pilots to enterprise-wide deployment, our award-winning AI consulting and technical services help you build the right foundation, scale responsibly, and deliver meaningful business outcomes.

Learn More

Let’s create a topic named “test” with a single partition and only one replica:

> bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test

We can now see that topic if we run the list topic command:

> bin/kafka-topics.sh --list --bootstrap-server localhost:9092
test

Alternatively, instead of manually creating topics you can also configure your brokers to auto-create topics when a non-existent topic is published.

How it Works

As a sink connector, Splunk Connect for Kafka takes advantage of the Kafka Connect framework to horizontally scale workers to push data from Kafka topics to Splunk Enterprise or Splunk Cloud. Once the plugin is installed and configured on a Kafka Connect cluster, new tasks will run to consume records from the selected Kafka topics and send them to your Splunk indexers through HTTP Event Collector, either directly or through a load balancer.

Key configurations include:

Load Balancing by specifying list of indexers, or using a load balancer
Indexer Acknowledgements for guaranteed at-least-once delivery (if using a load balancer, sticky sessions must be enabled)
Index Routing using connector configuration by specifying 1-to-1 mapping of topics to indexes, or using props.conf on indexers for record level routing
Metrics using collectd, raw mode, and collectd_http pre-trained sourcetype

Where can I get it?

The latest release is available on Splunkbase
The Kafka Connect project is also open sourced and available on GitHub

Deployment Considerations

Clone the repo from https://github.com/splunk/kafka-connect-splunk
Verify that Java8 JRE or JDK is installed.
Run mvn package. This will build the jar in the /target directory. The name will be splunk-kafka-connect-[VERSION].jar.

First, we need to get our hands on the packaged JAR file (see above) and install it across all Kafka Connect cluster nodes that will be running the Splunk connector. Kafka Connect has two modes of operation—Standalone mode and Distributed mode. To start Kafka Connect we run the following command from the $KAFKA_CONNECT_HOME directory, accompanied with the worker properties file connect-distributed.properties. Ensure that your Kafka cluster and zookeeper are running.

Establish Connection

Run .$KAFKA_HOME/bin/connect-distributed.sh $KAFKA_HOME/config/connect-distributed.properties to start Kafka Connect.

Kafka ships with a few default properties files, however the Splunk Connector requires the below worker properties to function correctly. connect-distributed.properties can be modified or a new properties file can be created.

Note the two options that need some extra configuration:

bootstrap.servers – This is a comma-separated list of where your Kafka brokers are located.
plugin.path – To make the JAR visible to Kafka Connect, we need to ensure that when Kafka Connect is started that the plugin path variable is folder path location of where your connector was installed to in the earlier section.

Configuration Properties

#These settings may already be configured if you have deployed a connector in your Kafka Connect Environment
bootstrap.servers=<BOOTSTRAP_SERVERS>
plugin.path=<PLUGIN_PATH>
#Required
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.flush.interval.ms=10000
#Recommended
group.id=kafka-connect-splunk-hec-sink

Run the following command to create connector tasks. Adjust topics to configure the Kafka topic to be ingested, splunk.indexes to set the destination Splunk indexes, splunk.hec.token to set your Http Event Collector (HEC) token and splunk.hec.uri to the URI for your destination Splunk HEC endpoint.

Splunk Indexing using HEC Endpoint

curl <hostname>:8083/connectors -X POST -H "Content-Type: application/json" -d'{
  "name": "kafka-connect-splunk",

    "config": {
     "connector.class": "com.splunk.kafka.connect.SplunkSinkConnector",
     “tasks.max": "1",
     "topics": "topic_name",
     "splunk.hec.uri":"https://localhost:8088",
     "splunk.hec.token": "1B901D2B-576D-40CD-AF1E-98141B499534",
     "splunk.hec.ack.enabled : "true",
     "splunk.hec.raw" : "false",
     "splunk.hec.ssl.validate.certs": "false",
     "splunk.hec.track.data" : "true"
    }
}' | jq

Now is a good time to check that the Splunk Connect for Kafka has been installed correctly and is ready to be deployed. Verify that data is flowing into your Splunk platform instance by searching using the index specified in the configuration. Use the following commands to check status, and manage connectors and tasks:

Useful Commands

    # List active connectors
    curl http://localhost:8083/connectors

    # Get kafka-connect-splunk connector status
    curl http://localhost:8083/connectors/kafka-connect-splunk/status | jq

    # Get kafka-connect-splunk connector config info
    curl http://localhost:8083/connectors/kafka-connect-splunk/config

    # Delete kafka-connect-splunk connector
    curl http://localhost:8083/connectors/kafka-connect-splunk -X DELETE

    # Get kafka-connect-splunk connector task info
    curl http://localhost:8083/connectors/kafka-connect-splunk/tasks

Thoughts on “How Splunk Analysis and Kafka Connect Works”

Saad S Syed October 5, 2020 at 1:53 pm

Lokesh, I have a cluster of 5 worker nodes. Should i start the connector(establish the connection) on all worker nodes? or just establishing the connection from one worker node will automatically balance the load from all the brokers.

In my “$KAFKA_HOME/config/connect-distributed.properties” I have listed all the worker nodes in the configs
kavya February 22, 2021 at 12:27 am

Looking forward to getting more updates and we play a small role in upskilling people providing the latest tech courses. Join us to upgrade on BIG DATA SPLUNK ONLINE TRAINING
kavya February 26, 2021 at 11:44 pm

Great read! Thank you for such useful insights. Visit here for advanced technical courses on BIG DATA SPLUNK ONLINE TRAINING

This site uses Akismet to reduce spam. Learn how your comment data is processed.

How Splunk Analysis and Kafka Connect Works

by Lokeshkumar Ravi on August 23rd, 2020 | ~ minute read