Apache Kafka : an open-source message broker project developed by the Apache Software Foundation written in Scala and is a distributed publish-subscribe messaging system.
Features of kafka
High Throughput : Support for millions of messages with modest hardware
Scalability : Highly scalable distributed systems with no downtime
Replication : Messages are replicated across the cluster to provide support for multiple subscribers and balances the consumers in case of failures
Durability Provides support for persistence of message to disk
Stream Processing Used with real-time streaming applications like Apache Spark & Storm
Data Loss Kafka with proper configurations can ensure zero data loss
Various components of Kafka:
Topic – a stream of messages belonging to the same type
Producer – that can publish messages to a topic
Brokers – a set of servers where the publishes messages are stored
Consumer – that subscribes to various topics and pulls data from the brokers.
Topic :
Topic is like a table identified by name.
Topic is split in partitions.
Topic 1 -- Partition 0, partition 1 , partition 2.
Explain the role of the offset.
Messages contained in the partitions are assigned a unique ID number that is called the offset. The role of the offset is to uniquely identify every message within the partition.
- kafka stores offsets at which consumer group has been reading.
- offsets will be stored in a separate topic called "_consumer_offsets"
- When consumer has processed data received from kafka it should be committing the offsets.
- offset is specific to a partition. because offset 3 in partion 1 and 2 are not same.
- offset is in order with in a partition.
- data / offsets kept only for one week by default.
- once data written cant be changed (immutable)
- Data is assigned randomly to a partition if we dont specify key
What is a Consumer Group?
To enhance parallelism.
Consumer Groups is a concept exclusive to Kafka. Every Kafka consumer group consists of one or more consumers that jointly consume a set of subscribed topics.
You cant have more consumers than partitions.
if you have 3 partitions u should not have 4 consumers in one group. because consumers in a group shares the partitions. if we have 3 partitions for a topic and 4 consumers in a group each consumer connects to one partitions and 4th one become idle and do nothing.
consumer has to specify broker name and topic name to read and kafka will take care of pulling data from right brokers
Messages are read in order like 0,1,2,... but in parallel across the partitions.
B1 - Topic 1 - partition 0 - 0,1,2,3,4
B2 - Topic 2 - partition 1 - 0,1,2,3,4,5,6,7
Each consumer within a group read from exclusive partitions.
Brokers
- A Kafka cluster is composed of multiple brokers(servers)
- Each broker contains certain topic partitions.
- After connecting to any broker , you will be connected to entire cluster.
- A cluster with 3 brokers can be seen as follows , data is distributed with partitions.
Broker 1 Broker 2 Broker 3
Topic 1 Topic 1 Topic 1
P-0 P-2 P-1
Topic 2 Topic 2 Topic 1
P-1 P-0 P-0
Replication Factor always > 1
Partitions :
- Partitions are the main concurrency mechanism in kafka.
- A topic is divided into 1 or more partitions enabling producer and consumer loads to be scaled.
What is the role of the ZooKeeper?
Kafka uses Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group.
Is it possible to use Kafka without ZooKeeper?
No, it is not possible to bypass Zookeeper and connect directly to the Kafka server. If, for some reason, ZooKeeper is down, you cannot service any client request.
Explain the concept of Leader and Follower.
Every partition in Kafka has one server which plays the role of a Leader, and none or more servers that act as Followers. The Leader performs the task of all read and write requests for the partition, while the role of the Followers is to passively replicate the leader. In the event of the Leader failing, one of the Followers will take on the role of the Leader. This ensures load balancing of the server.
Why are Replications critical in Kafka? Kafka is durable with replications.
Replication ensures that published messages are not lost and can be consumed in the event of any machine error, program error or frequent software upgrades.
How do you define a Partitioning Key?
Within the Producer, the role of a Partitioning Key is to indicate the destination partition of the message. By default, a hashing-based Partitioner is used to determine the partition ID given the key. Alternatively, users can also use customized Partitions.
0 comments to "Apache Kafka - FAQs"
Popular Posts
-
The best solution to know about these init levels is to understand the " man init " command output on Unix. There are basically 8...
-
How to Unlock BSNL 3G data card to use it with Airtel and Vodafone Model no : LW272 ? How to unlock BSNL 3G data card( Model no : LW272 )us...
-
How to transfer bike registration from one State to other (Karnataka to Andhra)?? Most of us having two wheelers purchased and registered in...
-
ఓం శ్రీ స్వామియే శరణం ఆయ్యప్ప!! Related posts : Trip to Sabarimala - Part 1 Trip to Sabarimala - Part 2 Ayappa Deeksha required things...
-
Following are some of interesting blogs I found till now ...please comment to add your blog here. Blogs in English : http://nitawriter.word...
Popular posts
- Airtel and vodafone GPRS settings for pocket PC phones
- Andhra 2 America
- Ayyappa Deeksha required things
- Blogs I watch !
- Captions for your bike
- DB2 FAQs
- Deepavali Vs The Goddes of sleep
- ETV - Dhee D2 D3
- Evolution of smoking in India Women
- How to make credit card payments?
- init 0, init 1, init 2 ..
- Java-J2EE interview preparation
- mCheck Application jar or jad download
- My SQL FAQs
- My Travelogues
- Old is blod - New is italic
- Online pay methids for credit cards
- Oracle FAQs
- Pilgrimages
- Smoking in Indian Women
- Technology Vs Humans
- Twitter feeds for all Telugu stars on single page.
- Unix best practices
- Unix FAQs
Post a Comment
Whoever writes Inappropriate/Vulgar comments to context, generally want to be anonymous …So I hope U r not the one like that?
For lazy logs, u can at least use Name/URL option which doesn’t even require any sign-in, The good thing is that it can accept your lovely nick name also and the URL is not mandatory too.
Thanks for your patience
~Krishna(I love "Transparency")