Build and Configure a Kafka Cluster on AWS EC2: A DevOps Guide
Apache Kafka is a powerful distributed streaming platform for building real-time data pipelines and streaming applications. Deploying Kafka on AWS EC2 offers flexibility and control, making it a preferred choice for DevOps professionals who want to manage the infrastructure directly. This guide walks you through the end-to-end setup process.
1. Prerequisites
Before diving in, ensure the following:
An AWS account with permission to launch EC2 instances and configure networking components.
Basic knowledge of SSH, Linux commands, and Kafka architecture.
Security groups, VPCs, and subnets are set up (or permission to create them).
2. Architecture Overview
Your Kafka cluster will consist of the following components:
Zookeeper nodes: 3 nodes (odd number for quorum).
Kafka brokers: At least 3 EC2 instances.
Clients: Producer/consumer applications or test utilities.
Optional monitoring tools: e.g., Prometheus + Grafana.
3. Step-by-Step Setup
Step 1: Launch EC2 Instances
Choose Amazon Linux 2 or Ubuntu.
Select instance types (e.g., t3.medium or higher).
Configure network: Use the same VPC and availability zone for all nodes.
Open required ports in the security group:
Kafka: 9092
Zookeeper: 2181
SSH: 22
(Optional: Prometheus, Grafana ports)
Step 2: Install Java
Kafka requires Java. On each EC2 instance, run:
sudo yum install java-1.8.0-openjdk -y
java -version
Step 3: Download and Install Kafka
On each instance:
wget https://downloads.apache.org/kafka/3.7.0/kafka_2.13-3.7.0.tgz
tar -xzf kafka_2.13-3.7.0.tgz
cd kafka_2.13-3.7.0
Step 4: Configure Zookeeper Cluster
Edit config/zookeeper.properties to define each Zookeeper node:
tickTime=2000
dataDir=/tmp/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=<zk1_private_ip>:2888:3888
server.2=<zk2_private_ip>:2888:3888
server.3=<zk3_private_ip>:2888:3888
Create myid file on each Zookeeper node in /tmp/zookeeper/ with a unique ID (1, 2, or 3).
Start Zookeeper:
bin/zookeeper-server-start.sh -daemon config/zookeeper.properties
Step 5: Configure Kafka Brokers
Edit config/server.properties:
broker.id=1 # unique per broker
listeners=PLAINTEXT://<broker_private_ip>:9092
log.dirs=/tmp/kafka-logs
zookeeper.connect=<zk1_ip>:2181,<zk2_ip>:2181,<zk3_ip>:2181
Start Kafka:
bin/kafka-server-start.sh -daemon config/server.properties
Step 6: Create a Kafka Topic
From one of the broker nodes:
bin/kafka-topics.sh --create --topic test-topic --bootstrap-server <broker_ip>:9092 --replication-factor 3 --partitions 3
Step 7: Test Kafka Setup
Producer:
bin/kafka-console-producer.sh --topic test-topic --bootstrap-server <broker_ip>:9092
Consumer:
bin/kafka-console-consumer.sh --topic test-topic --from-beginning --bootstrap-server <broker_ip>:9092
4. Hardening and Optimization
Use systemd services for managing Kafka/Zookeeper.
Mount EBS volumes for log storage durability.
Enable monitoring via JMX and integrate with Prometheus.
Tune JVM and broker configurations for production load.
Add security: Use TLS, SASL, and IAM for access control.
5. Automation with Ansible or Terraform
For large-scale deployments, automate instance provisioning and Kafka configuration using:
Terraform is used to provision VPC, subnets, EC2, and security groups.
Ansible for installing Java, Kafka, and configuring systemd units.
6. Monitoring and Logging
Use Prometheus JMX Exporter to collect metrics from Kafka and Zookeeper.
Visualize metrics with Grafana.
Forward logs to CloudWatch or an ELK stack.
Conclusion
Deploying Kafka on AWS EC2 gives you complete control over the environment and tuning capabilities. While it demands more operational effort than managed solutions like MSK (Amazon Managed Streaming for Apache Kafka), it’s ideal for custom DevOps pipelines and cost-sensitive deployments.

Comments
Post a Comment