Quickly Build Data Streaming API Using gRPC

Kamesh Sampath
9 min readDec 8, 2023

It’s holiday time of the year and time to learn and improve your knowledge on technology. This time I thought to learn more about building API/applications using Data Streaming platforms.

Leading to this story I shared my knowledge on fundamentals of Data Streaming, take a look at it if you wish to brush your knowledge around the jargons used Data Streaming world.

When learning any technical concept in this case Data Streaming, a natural flow for any developer will be to try build an API or application. That’s exactly what I started to do as well, in this story I will share my personal experiments on how to build a Data Streaming API and the framework that helped me to do it much effectively and quickly.

Data Streaming Platform

Apache Kafka is the name that rings bells when you are in need of Data Streaming platform. While Apache Kafka is best in class but when learning Data Streaming and building applications a developer might need a platform that has,

  • Rich Developer Experience — CLI to manage the platform and its resources
  • A GUI console to the platform — enhances the debugging capabilities during development
  • Cloud Native — supported on containers, cloud, Kubernetes etc.,
  • Lighter on their laptops

When looking out for one such Data Streaming platform, is when I bumped on Redpanda. I felt it was right choice for a developer’s need for Data Streaming platform and it is 100% compatible with Kafka API.

The Redpanda CLI a.k.a rpk. With rpkbootstrapping your local Apache Kafka compatible Redpanda cluster is just a matter of seconds, all you need do is to run a simple command rpk container start ( assuming you have Docker Desktop).

The next logical thing a developer do with Apache Kafka is creating Topics. The rpk topic * commands helps to you manage topics, produce and consume messages. Let us get familiar with the rpk topic group of commands,

# create a topic called "todo-list"
rpk topic create todo-list

# produce a message to topic todo-list
rpk topic produce todo-list
# producting a mesage with key:value format with \n as message separator
rpk topic produce -f "%k:%v\n" todo-list
# finally what other options while producing
rpk topic poduce --help

# consume messages from topic todo-list
rpk topic consume todo-list
# consume messages only latest messages
rpk topic consume --offset=end todo-list
# finally what other options while consuming
rpk topic consume --help

Check rpk CLI get started guides for more details on the other commands and options.

The Redpanda also allows to manipulate the topics and its messages using the Redpanda console, this highly enhances the debugging capabilities thereby improving devloper productivity.

We will explore the other related features of the Redpanda Console e.g. Schema Registry, automatic deserialization of Protobuf messages etc., in the upcoming sections of the story.

Redpanda List of Topics

With Data Streaming platform in place the next focus was for a framework that could be used to build the API that will interact with the Redpanda topics.

So what is required to build an effective Data Streaming API ?

  • Data Stream are events carrying payload with defined data structure associated with it. It means the framework we choose to implement the API should support Data Validation. Data Validation is usually done by defining a Schema — set of rules and validations — for data and exchange data confined to the Schema. This helps the producers and consumers to expect what data it might receieve and how to manipulate the same.
  • In Data streaming the producers and consumers are usually loosely coupled, that makes an implicit need for the framework to support clean contract of interfaces. The producers and consumers can then be built independently with the interface definitions as the source of truth.
  • The interface definitions also helps producers and consumers to be written in different programming languages i.e.polyglot.
  • Data Stream events can flow in one direction, uni-directional or in both ways, bi-directional. This adds another need for underlying framework to support these data flow patterns.
  • Natural support for security features like SSL, TLS,Application Layer Transport Security(ALTS), Token based authentication etc.,
  • Operational needs like monitoring, logging, telemetry and health checks etc.,

The natural choice for building APIs was always been REST, with years of development we understand that REST is good but has its own constraints. It may not support one or more of the aforementioned needs of building a Data Streaming API without adding extra dependencies.

In my humble opinion adding extra dependencies adds more burden on developer, to maintain the API and possibly open up security vulnerabilities.

Nothing to blame REST but I personally felt it is not my choice building a Data Streaming API. Then what is the framework that can help me here ? It is gRPC.

gRPC is a modern open source high performance Remote Procedure Call (RPC) framework that can run in any environment. It can efficiently connect services in and across data centers with pluggable support for load balancing, tracing, health checking and authentication. It is also applicable in last mile of distributed computing to connect devices, mobile applications and browsers to backend services.

Source: https://grpc.io

Let us quickly analyse how gRPC will map the to the points of building an effective Data Streaming API ,

  • gRPC naturally supports protobuf — Protocol Buffers are a language-neutral, platform-neutral extensible mechanism for serializing structured data — . Protobuf solves the need for Schema to define the data structure of event payloads.
  • Defining clean contract is often called as Interface Definition Language(IDL), with protobuf we can define both the messages (data structures for the payload) and services (API methods) in one place.
  • Using protobuf allows us to naturally develop consumers/producers in multiple programming languages(polyglot). As of writing this story the protobuf supports Java, Go, Ruby, C++, Objective-C,PHP,Python,C#,Dart etc.,
  • gRPC is built on top of http2, there by naturally supporting uni-directional and bi-directional message flow. By adhering to http2 gRPC can also provide enhanced performance required by Data Streaming applications as events are bound to spike.
  • gRPC naturally support the security semantics SSL/TLS/ALTS as part of its framework with an optional need for a plugin or external dependency.
  • gRPC is built with support for monitoring, logging, telemetry and health check which are essentials for building and exposing API.

With enough theory let me quickly walk through the demo todo-app, considering the brevity of the story I will just list important code snippets. But you can find the completed demo with more detailed instructions on my GitHub Repo.

Application Overview

The repo has docker-compose.yml that allows to start the Redpanda server and console , the console will reachable on http://localhost:8080.

docker compose up redpanda-0 console -d

The todo-app has the following Interface contract (IDL) defined using protobuf.

The todo-app application takes care of registering the todo.proto with Redpanda Schema Registry using the Subject Name per Topic Strategy. For the demo we just name the schema subject to be todo-list-value , allowing the schema to be used for serializing/deserializing the topic value.

Todo Schema Registration

The following snippet shows the implementation TodoList service method of the Todo service, that allows us to do server side streaming(uni-directional) to list the todos i.e. the messages that gets ingested in totodo-list topic,

// TodoList implements todo.TodoServer.
func (s *Server) TodoList(empty *emptypb.Empty, stream todo.Todo_TodoListServer) error {
ch := make(chan result)
go func() {
s.poll(ch)
}()

for {
select {
case r := <-ch:
{
if errs := r.errors; len(errs) > 0 {
var errors = make([]*todo.Error, len(errs))
for _, err := range errs {
log.Debugf("Error Details",
"Topic", err.Topic,
"Partition", err.Partition,
"Error", err.Err,
)
errors = append(errors, &todo.Error{
Topic: err.Topic,
Partition: err.Partition,
Message: err.Err.Error(),
})
}
stream.Send(&todo.TodoListResponse{
Response: &todo.TodoListResponse_Errors{Errors: &todo.Errors{
Error: errors,
}},
})
}
b := r.record.Value
task := new(todo.Task)
if err := s.serde.Decode(b, task); err != nil {
//Skip Sending invalid data, just log the error
log.Errorw("Error Decoding task",
"Data", string(b),
"Error", err.Error())
} else {
stream.Send(&todo.TodoListResponse{
Response: &todo.TodoListResponse_Todo{
Todo: &todo.TodoResponse{
Task: task,
Partition: r.record.Partition,
Offset: r.record.Offset,
},
},
})
}
}
}
}
}

// poll fetches the record from the backend and adds that the channel
func (s *Server) poll(ch chan result) {
log.Debugf("Started to poll topic:%s", s.config.DefaultProducerTopic())
//Consumer
for {
fetches := s.client.PollFetches(context.Background())
if errs := fetches.Errors(); len(errs) > 0 {
ch <- result{
errors: errs,
}
}

fetches.EachPartition(func(p kgo.FetchTopicPartition) {
for _, r := range p.Records {
ch <- result{
record: r,
}
}
})
}
}

The implementation uses the franz-go as the Go Kafka client library. As you see from the code snippet how gRPC naturally fits into Data Streaming needs and makes it easier for us to process and send the topic message seamlessly to the consumer.

Let us test the application quickly,

docker compose up -d todo-app todo-list

The todo-app is the containerised image the of Todo demo gRPC server and the todo-list is the containerised image of the Todo demo client that streams the list of Todos from the backend Kafka topic todo-list.

Assuming the todo-app(gRPC server) and todo-list(todo list streaming client) are started successfully.

Open a new terminal and run the following command to check the logs of todo-app,

docker compose logs -f todo-app

A successful start of the server will show the following output (trimmed logs for brevity),

todo-app-server  | ... Server started on port 9090
todo-app-server | ... Started to poll topic

Press ctrl + c or cmd + c to quit the todo-app server logs and run the following command to view the logs of the todo-list application, let us refer to this terminal as todo-list-logs-terminal,

docker compose logs -f todo-list

On another new terminal run the following command using grpcurl (grpcurl is a cURL like utility for gRPC) to post a todo task to the todo-list topic,

PORT=9090 grpcurl -plaintext -d @ "localhost:$PORT" todo.Todo/AddTodo <<EOM
{
"task": {
"title": "Finish gRPC Data Streaming Story",
"description": "Complete the gRPC Data Streaming Medium story, on how to build Data Streaming API using gRPC and Redpanda",
"completed": false
}
}
EOM

Once the post is successfully on the todo-list-logs-terminal we should see an output like (log message formatted for readability),

todo-list  | 2023-12-07T06:35:07.692Z   INFO    client/main.go:56Task  
{
"Title": "Finish gRPC Data Streaming Story",
"Description": "Complete the gRPC Data Streaming Medium story, on how to build Data Streaming API using gRPC and Redpanda",
"Completed": false,
"Last Updated": "Thursday, 01-Jan-70 00:00:00 UTC",
"Partition": 0,
"Offset": 2
}

One of the coolest developer centric features Redpanda console is the automatic deserialize the protobuf messages. This is highly useful during API development when we want to debug the messages sent to the topic.

Todo Messages Deserialized

Automatic deserialization of protobuf messages is possible only when the respective schema is registered with the Schema Registry

Summary

Hope you enjoyed this really quick get started guide to Building Data Streaming API using gRPC and Redpanda.

To summarise what we learned,

  • How to build a simple Data Streaming todo-app using Redpanda using gRPC.
  • How gRPC naturally fits into the needs of building Data Streaming API.
  • Briefly explored on what is Redpanda and how it can help you run a 100% Kafka compatible Data Streaming platform on your laptop.
  • Explored few features of the Redpanda Console

Explore further on Redpanda at https://redpanda.com to see how it can massively simplify your Apache Kafka operational needs with its simplicity and high performance.

Some useful links to read further on concepts discussed,

The completed todo-app demo is available on https://github.com/kameshsampath/grpc-todo-app

--

--

OSS Dev, educating developers on #ci #gitops #java serverless #servicemesh on @openshift/@kubernetesio.LEARN MORE CODE MORE SHARE MORE. https://kameshs.dev