Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added solutions/communication_kafka/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
371 changes: 371 additions & 0 deletions solutions/communication_kafka/index.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,371 @@
//Category=Communication;Kafka;Microservice Platforms;Tracing;Logging;Retry;
//Product=Apache Kafka;Spring Kafka;Spring Cloud Sleuth;Spring Boot;
//Maturity level=Advanced

:toc:

= Implementing Kafka with Spring Boot and Spring Kafka Module

== Abstract

The goal of documentation is to present the Spring Kafka functionalities
in the context of microservices. The main focus would be how Kafka-based
messaging solutions are implemented with Spring Boot. Furthermore, the
documentation also provides information regarding the implementation of
logging, tracing, and retry within an application.

Spring Kafka, Docker, and Postman are the tools being used during the
creation of this document. To run both Kafka message broker and
Zookeeper, a Docker image is used.

== Introduction and Goals

The goal of this solution is to provide guidance on how to use Kafka
efficiently with spring boot leveraging the spring-kafka module.

____
The Spring for Apache Kafka (spring-kafka) project applies core Spring
concepts to the development of Kafka-based messaging solutions. It
provides a "template" as a high-level abstraction for sending messages.
--
https://spring.io/projects/spring-kafka[https://spring.io/projects/spring-kafka]
____

Basic concepts and functionalities of kafka can be found
https://developer.confluent.io/what-is-apache-kafka/[here].

In short, Apache Kafka was built to make real-time data available to all
the applications that need to use it. Kafka models events as key/value
pairs. These pairs are saved in an distributed append-only log in
specific topics. This way Kafka can be used as a distributed pub/sub
messaging system that replaces traditional message brokers like ActiveMQ
and RabbitMQ.

This solution concentrates on the following aspects:

* Configuration
* Basic functionalities and Retry
* Logging
* Tracing

== Context and Scope

There are several common use cases for kafka (_further reading:
https://kafka.apache.org/uses[https://kafka.apache.org/uses]_), such as:

* Messaging
* Website Activity Tracking
* Metrics
* Log Aggregation
* Stream Processing
* Event Sourcing

This solution focuses not on the many use cases, it is limited to the
implementation of the following functionalities in a general spring boot
application:

* Configuration
* Basic functionalities and Retry
* Logging
* Tracing

The tools that are set for this solution are as followed:

* Spring Boot: An extension of the popular Spring frameworks, but it has
a few unique features that make it ideal for quickly developing web
applications and microservices.
* Spring Kafka: Extends the simple and common Spring template programming
model with a KafkaTemplate as well as Message-driven POJOs via
@KafkaListener annotation.
* Java / Kotlin
* Spring Cloud Sleuth: Provides Spring Boot auto-configuration for
distributed tracing (_read more:
https://spring.io/projects/spring-cloud-sleuth[https://spring.io/projects/spring-cloud-sleuth]_).

=== Solution Strategy

==== Configuration

This provides basic guidelines on how to set up prerequisites before
implementing spring-kafka, such as defining the necessary dependencies
and setting up specific spring-kafka configurations.

To use `spring-kafka` in a project, add the following dependency to the
`pom.xml` of a maven project:

....
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
....

For a gradle project add the following line to the dependencies in the
`build.gradle` file:

....
implementation 'org.springframework.kafka:spring-kafka'

....

To set specific spring-kafka configurations, the spring boot
`application.yml` can be used:

....
spring:
application:
name: shipkafka
kafka:
bootstrap-servers: "localhost:9092"
producer:
key-serializer: "org.apache.kafka.common.serialization.LongSerializer"
value-serializer: "org.springframework.kafka.support.serializer.JsonSerializer"
consumer:
key-deserializer: "org.apache.kafka.common.serialization.LongDeserializer"
value-deserializer: "org.springframework.kafka.support.serializer.JsonDeserializer"
properties:
spring:
json:
trusted:
packages: "*"
....

The Kafka broker is specified at `localhost:9092` in this example
configuration.

Spring-kafka uses the type String for keys and values per default. This
behavior can be changed by setting the specific serializers and
deserializers like in the example.

To deserialize objects received in json, the specific object needs to be
added to the trusted packages. In this example all packages of the
application are trusted through the use of the wildcard `*`.

==== Creating topics

The spring-boot application can create topics automatically:

....
@Bean
public NewTopic bookings() {
return TopicBuilder.name("bookings")
.partitions(2)
.compact()
.build();
}
....

This creates a topic with the name `bookings` with two partitions and
compact logging. Further options for `TopicBuilder` can be found
https://docs.spring.io/spring-kafka/api/org/springframework/kafka/config/TopicBuilder.html[here].

==== Sending messages

The class `KafkaTemplate` simplifies the sending of messages to the
broker. It can be autowired.

....
private final KafkaTemplate<Long, Object> longTemplate;
....

This defines a template for sending messages with a `Long` key and an
object as a value. The Object will be serialized as json as specified in
the `application.yml`.

The class has the methods `send()` for sending messages. The different
methods can be looked up in the
https://docs.spring.io/spring-kafka/api/org/springframework/kafka/core/KafkaTemplate.html[class
documentation].

....
longTemplate.send(topic, key, message);
....

This sends a message with a key to the specified topic.

==== Receiving messages

To receive messages you have to define a listener.

The listener is defined by implementing and annotating a method like in
the following example:

....
@KafkaListener(id = "bookings", topics = "bookings", groupId = "ship")
public void listenBookings(Booking booking){
...
}
....

Receiving messages from a topic is simplified with the
https://docs.spring.io/spring-kafka/reference/html/#annotation-properties[`@KafkaListener`]
annotation. In this example messages of the type Booking are consumed
from the `bookings` topic.

==== Retry

Failures in a distributed system may happen, i.e. failed message
process, network errors, runtime exceptions. Therefore, the retry logic
implementation is something essential to have.

It is important to note that Retries in Kafka can be quickly implemented
at the consumer side. This is known as Simple Blocking Retries. To
accomplish visible error handling without causing real-time disruption,
Non-Blocking Retries and Dead Letter Topics are implemented.

Non-Blocking Retries can easily be added to a listener:

....
@RetryableTopic(attempts = "3", backoff = @Backoff(delay = 2_000, maxDelay = 10_000, multiplier = 2))
@KafkaListener(id = "bookings", topics = "bookings", groupId = "ship")
public void listenBookings(Booking booking){
...
}

@DltHandler
public void listenBookingsDlt(Booking booking){
LOG.info("Received DLT message: {}", booking);
}

....

In this example the `@RetryableTopic` annotation attempts to process a
received message 3 times. The first retry is done after a delay of 2
seconds. Each further attempt multiplies the delay by 2 with a max delay
of 10 seconds. If the message couldn't be processed, it gets send to the
deadletter topic annotated with `@DltHandler`.

....
bookings-retry-5000
....

Each retry creates a new topic like in the example above.

DLT creates a topic for messages that couldn't get processed. The topic
gets named like the example below:

....
bookings-dlt
....

Further information can be found in the
https://docs.spring.io/spring-kafka/reference/html/#retry-topic[official
documentation].

==== Logging

Spring-kafka doesn't log everything that's happening in the applicaiton.
The usage of Slf4J is recommended to implement further logging. It's
straightforward yet adaptable, allowing for better readability and
performance. Sending and receiving messages should be logged
appropriately. It needs to be implemented manually as spring-kafka
doesn't create logs of it automatically.

This is a simple example for logging received messages:

....
LOG.info("Received message: {}", message);
....

==== Tracing

In microservice architecture, tracing is implemented to monitor
applications as well as to help identifying where errors or failures
occur, which may cause poor performance. In applications that may
contain several services, it is necessary to trace the invocation from
one service to another.

The Spring Cloud Sleuth library adds tracing to spring-kafka. The
dependency can be added to a project by adding the following to the
`pom.xml` file:

....
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>${release.train.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
</dependencies>
....

For a gradle project add the following to the `build.gradle` file:

....
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:2021.0.2"
}
}

dependencies{
implementation 'org.springframework.cloud:spring-cloud-starter-sleuth'
}
....

This will add a traceId and spanId to the Slf4J logs. If an application
name is specified in the `application.yml` like in the example, the
service name will be added to the logs as well.

Further information can be found in the
https://spring.io/projects/spring-cloud-sleuth[official documentation].

==== Health Monitoring

Spring-kafka doesn't provide an inbuild health indicator for Kafka as a
general implementation for all use cases isn't possible. Further
information and updates on the situation can be found on
https://github.com/spring-projects/spring-boot/issues/14088#issuecomment-830410907[GitHub].

=== Alternatives

This section presents some alternatives to Apache Kafka as a message
broker.

==== ActiveMQ

While Kafka is designed to process a high load of data in real-time,
ActiveMQ is intended to process only a small number of messages with a
high level of reliability. It can also be used for ETL jobs.

ActiveMQ is a push-based messaging system. The producer has to ensure
that messages are delivered to the consumers. The acknowlegment of
messages decreases the throughput and increases the latency. Kafka is
pull-based and the consumers consume the messages from the topic.

Kafka is a complex system, but it is highly scalable through partitions
and replicas. ActiveMQ is much simpler to deploy, but the performance
slows down the more consumers there are.

If big data processing is not required, ActiveMQ is a good alternative
to Kafka.

==== RabbitMQ

RabbitMQ is a traditional message broker. It's highly flexible,
supporting many messaging protocols like AMQP, MQTT and STOMP.
Functionalities can be added through plugins.

Unlike Kafka, RabbitMQ is mainly a push-based system. Consumers define a
prefetch limit. Messages will be prefetched until this limit is met.
After messages are processed and acknowleged, more messages can be
fetched until the limit is met again.

While events are stored in append-only logs in Kafka, RabbitMQ stores
the events in queues. Messages are retained until they are acknowleged.

RabbitMQ is easier to deploy than Kafka, but it cant reach the
scalability and performance of Kafka.

If big data processing is not required and a highly flexible message
broker is required, RabbitMQ is a good alternative to Kafka.