How to subscribe to Kafka topic by key - spring-boot

I'm super new to using Kafka, especially within a Kotlin service. I currently have a local broker which I'm able to send events to I'm trying to figure out how to subscribe to my topic and receive the message but I'm not able to actually consume the event by a key.
import io.confluent.kafka.serializers.KafkaAvroDeserializer
import io.confluent.kafka.serializers.KafkaAvroDeserializerConfig
import io.confluent.kafka.serializers.subject.TopicRecordNameStrategy
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.clients.consumer.KafkaConsumer
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.logging.log4j.LogManager
import org.springframework.beans.factory.annotation.Qualifier
import org.springframework.beans.factory.annotation.Value
import org.springframework.context.annotation.Bean
import org.springframework.kafka.annotation.KafkaListener
import org.springframework.stereotype.Service
#Service
class KafkaMessageConsumer {
#Bean("consumerConfig")
fun consumerConfigs(
#Value("\${KAFKA_SASL_USERNAME}")
kafkaServer: String,
#Value("\${KAFKA_SASL_USERNAME}")
userName: String,
#Value("\${KAFKA_SASL_PASSWORD}")
password: String,
#Value("\${KAFKA_SCHEMA_REGISTRY_URL}")
schemaRegistryUrl: String,
#Value("\${KAFKA_SCHEMA_REGISTRY_USER_INFO}")
schemaRegistryUserInfo: String
): Map<String, Any> {
val config: MutableMap<String, Any> = mutableMapOf()
config[ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG] = kafkaServer
config[ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG] = StringDeserializer::class.java
config[ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG] = KafkaAvroDeserializer::class.java
config[KafkaAvroDeserializerConfig.VALUE_SUBJECT_NAME_STRATEGY] = TopicRecordNameStrategy::class.java
config[KafkaAvroDeserializerConfig.SCHEMA_REGISTRY_URL_CONFIG] = schemaRegistryUrl
return config
}
#KafkaListener(topics = ["my.kafka.topic"], groupId = "group-id")
fun consume(
#Qualifier("consumerConfig")
config: Map<String, Any>,
message: String
) : Unit {
val consumer: KafkaConsumer<String, String> = KafkaConsumer(config)
println("message received from topic: $message")
}
}
This is pretty much all I have. There are a list of IDs I have in the db that I would like to use to consume these events via a key that matches the ID. I'm using this article as a reference

It is not possible to subscribe to a key.
You can change the function parameters to get a key in addition to the message record value (see docs on #Header(KafkaHeaders.RECEIVED_KEY)), then you will just use normal if-statement to compare and look for specific keys for all partitions in the topic.
Related - Message Filtering

Related

aggregator spring cloud stream with timeout

I want to make an application that receives messages, stores those messages in a list, and later with and schedule releases those messages every x amount of time.
I know spring cloud stream has an aggregator that already does this, but I think I need it to be done manually because I need to keep a unique message based upon a key and only replace the old message if it matches a specific condition ( I think of it as a Set aggregator with conditions)
what I have tried so far.
also in this link https://github.com/chalimbu/AggregatorQuestionStack
Processor.
import org.springframework.cloud.stream.annotation.EnableBinding
import org.springframework.cloud.stream.annotation.Input
import org.springframework.cloud.stream.annotation.Output
import org.springframework.cloud.stream.messaging.Processor
import org.springframework.scheduling.annotation.Scheduled
#EnableBinding(Processor::class)
class SetAggregatorProcessor(val storageService: StorageService) {
#Input
public fun inputMessage(input: Map<String,Any>){
storageService.messages.add(input)
}
#Output
#Scheduled(fixedDelay = 20000)
public fun produceOutput():List<Map<String,Any>>{
val message= storageService.messages
storageService.messages.clear()
return message;
}
}
Memory storage.
import org.springframework.stereotype.Service
#Service
class StorageService {
public var messages: MutableList<Map<String,Any>> = mutableListOf()
}
This code generates the following error when I start pushing messages.
Caused by: org.springframework.integration.MessageDispatchingException: Dispatcher has no subscribers
at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:139) ~[spring-integration-core-5.5.8.jar:5.5.8]
The idea is to deploy this app as part of the spring cloud stream (dataflow) platform.
I prefer the declarative approach(over the functional approach), but if somebody knows how to do it with the reactor way, I could settle for it.
Thanks for any help or advice.
thanks to this example(https://github.com/spring-cloud/spring-cloud-stream-samples/blob/main/processor-samples/sensor-average-reactive-kafka/src/main/java/sample/sensor/average/SensorAverageProcessorApplication.java) I was able to figure something out using flux in case someone else needs it
#Configuration
class SetAggregatorProcessor : Function<Flux<Map<String, Any>>, Flux<MutableList<Map<String, Any>>>> {
override fun apply(data: Flux<Map<String, Any>>):Flux<MutableList<Map<String, Any>>> {
return data.window(Duration.ofSeconds(20)).flatMap { window: Flux<Map<String, Any>> ->
this.aggregateList(window)
}
}
private fun aggregateList(group: Flux<Map<String, Any>>): Mono<MutableList<Map<String, Any>>>? {
return group.reduce(
mutableListOf(),
BiFunction<MutableList<Map<String, Any>>, Map<String, Any>, MutableList<Map<String, Any>>> {
acumulator: MutableList<Map<String, Any>>, element: Map<String, Any> ->
acumulator.add(element)
acumulator
}
)
}
}
update https://github.com/chalimbu/AggregatorQuestionStack/tree/main/src/main/kotlin/com/project/co/SetAggregator

throw not found exception if pubsub topic is not available

I am using spring boot to interact with pubsub topic.
My config class for this connection look like this:
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.cloud.gcp.pubsub.core.PubSubTemplate;
import org.springframework.cloud.gcp.pubsub.core.publisher.PubSubPublisherTemplate;
import org.springframework.cloud.gcp.pubsub.support.PublisherFactory;
import org.springframework.cloud.gcp.pubsub.support.converter.SimplePubSubMessageConverter;
import org.springframework.util.Assert;
import org.springframework.util.concurrent.ListenableFuture;
import org.springframework.util.concurrent.SettableListenableFuture;
import com.google.api.core.ApiFuture;
import com.google.api.core.ApiFutureCallback;
import com.google.api.core.ApiFutures;
import com.google.pubsub.v1.PubsubMessage;
public abstract class PubSubPublisher {
private static final Logger LOGGER = LoggerFactory.getLogger(PubSubPublisher.class);
private final PubSubTemplate pubSubTemplate;
protected PubSubPublisher(PubSubTemplate pubSubTemplate) {
this.pubSubTemplate = pubSubTemplate;
}
protected abstract String topic(String topicName);
public ListenableFuture<String> publish(String topicName, String message) {
LOGGER.info("Publishing to topic [{}]. Message: [{}]", topicName, message);
return pubSubTemplate.publish(topicName, message);
}
}
And I am calling this at my service, like this:
publisher.publish(topic-name, payload);
This publish method is async one, which always pass on did not wait for acknowldgrment. I make add get after publish for wait until it get the response from pubsub.
But I wanted to know if in case my topic is not already present and i try to push some message, it should throw some error like resource not found, considering using default async method only.
Might be implementing the callback would help but i am unable to do that in my code. And the current override publish method which use callback is just throwing the WARN not exception i wanted that to be exception. that is the reason i wanted to implement the callback.
You can check if Topic already present
from google.cloud import pubsub_v1
project_id = "projectname"
topic_name = "unknowTopic"
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(project_id, topic_name)
try:
response = publisher.get_topic(topic_path)
except Exception as e:
print(e)
This returns the error as
404 Resource not found (resource=unknowTopic).

How to pass region as query param for aws java lambda function

I'm new to AWS lambda. I'm creating ec2 instance using AWS java Lambda function in which i'm trying to pass the region dynamically using API gateway.
I'm passing the region as queryparamstring. I'm not sure how to get the queryparam inside the lambda function. I have gone through the questions asked similar to this but unable to understand how to implement that.
Please find the below java lambda function:
package com.amazonaws.lambda.demo;
import java.util.List;
import org.json.simple.JSONObject;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.ec2.AmazonEC2;
import com.amazonaws.services.ec2.AmazonEC2ClientBuilder;
import com.amazonaws.services.ec2.model.CreateTagsRequest;
import com.amazonaws.services.ec2.model.DescribeInstanceStatusRequest;
import com.amazonaws.services.ec2.model.DescribeInstanceStatusResult;
import com.amazonaws.services.ec2.model.Instance;
import com.amazonaws.services.ec2.model.InstanceNetworkInterfaceSpecification;
import com.amazonaws.services.ec2.model.InstanceStatus;
import com.amazonaws.services.ec2.model.RunInstancesRequest;
import com.amazonaws.services.ec2.model.RunInstancesResult;
import com.amazonaws.services.ec2.model.StartInstancesRequest;
import com.amazonaws.services.ec2.model.Tag;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.lambda.runtime.events.S3Event;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
public class LambdaFunctionHandler implements RequestHandler<S3Event, String> {
private static final AWSCredentials AWS_CREDENTIALS;
static String ACCESS_KEY="XXXXXXXXXX";
static String SECRET_KEY="XXXXXXXXXXXXXXXX";
static {
// Your accesskey and secretkey
AWS_CREDENTIALS = new BasicAWSCredentials(
ACCESS_KEY,
SECRET_KEY
);
}
private AmazonS3 s3 = AmazonS3ClientBuilder.standard().build();
public LambdaFunctionHandler() {}
// Test purpose only.
LambdaFunctionHandler(AmazonS3 s3) {
this.s3 = s3;
}
#Override
public String handleRequest(S3Event event, Context context) {
context.getLogger().log("Received event: " + event);
// Set up the amazon ec2 client
AmazonEC2 ec2Client = AmazonEC2ClientBuilder.standard()
.withCredentials(new AWSStaticCredentialsProvider(AWS_CREDENTIALS))
.withRegion(Regions.EU_CENTRAL_1)
.build();
// Launch an Amazon EC2 Instance
RunInstancesRequest runInstancesRequest = new RunInstancesRequest().withImageId("ami-XXXX")
.withInstanceType("t2.micro") // https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html
.withMinCount(1)
.withMaxCount(1)
.withKeyName("KEY")
.withNetworkInterfaces(new InstanceNetworkInterfaceSpecification()
.withAssociatePublicIpAddress(true)
.withDeviceIndex(0)
.withSubnetId("subnet-XXX")
.withGroups("sg-XXXX"));
RunInstancesResult runInstancesResult = ec2Client.runInstances(runInstancesRequest);
Instance instance = runInstancesResult.getReservation().getInstances().get(0);
String instanceId = instance.getInstanceId();
String instanceip=instance.getPublicIpAddress();
System.out.println("EC2 Instance Id: " + instanceId);
// Setting up the tags for the instance
CreateTagsRequest createTagsRequest = new CreateTagsRequest()
.withResources(instance.getInstanceId())
.withTags(new Tag("Name", "SampleLambdaEc2"));
ec2Client.createTags(createTagsRequest);
// Starting the Instance
StartInstancesRequest startInstancesRequest = new StartInstancesRequest().withInstanceIds(instanceId);
ec2Client.startInstances(startInstancesRequest);
/*// Stopping the Instance
StopInstancesRequest stopInstancesRequest = new StopInstancesRequest()
.withInstanceIds(instanceId);
ec2Client.stopInstances(stopInstancesRequest);*/
//describing the instance
DescribeInstanceStatusRequest describeInstanceRequest = new DescribeInstanceStatusRequest().withInstanceIds(instanceId);
DescribeInstanceStatusResult describeInstanceResult = ec2Client.describeInstanceStatus(describeInstanceRequest);
List<InstanceStatus> state = describeInstanceResult.getInstanceStatuses();
while (state.size() < 1) {
// Do nothing, just wait, have thread sleep if needed
describeInstanceResult = ec2Client.describeInstanceStatus(describeInstanceRequest);
state = describeInstanceResult.getInstanceStatuses();
}
String status = state.get(0).getInstanceState().getName();
System.out.println("status"+status);
JSONObject response=new JSONObject();
response.put("instanceip", instanceip);
response.put("instancestatus", status);
System.out.println("response=>"+response);
return response.toString();
}
}
I would like to pass the query param instead of Regions.EU_CENTRAL_1
// Set up the amazon ec2 client
AmazonEC2 ec2Client = AmazonEC2ClientBuilder.standard()
.withCredentials(new AWSStaticCredentialsProvider(AWS_CREDENTIALS))
.withRegion(Regions.EU_CENTRAL_1)
.build();
Please find the API configuration below:
Any advise on how to achieve that would be really helpful. Thanks in advance.
How are you?
If you want to get those query parameters, you may want to use Lambda Proxy Integration.
That way, you will get to you function an APIGatewayProxyRequestEvent, that you can perform a Map<String, String> getQueryStringParameters() operation.
You'll need to declare your handler like:
public class APIGatewayHandler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> {
/// You cool awesome code here!
}
That way, your method will look like:
#Override
public APIGatewayProxyResponseEvent handleRequest(APIGatewayProxyRequestEvent event, Context context) {
Map<String, String> params = event.getQueryStringParameters();
Optional<String> region = Optional.ofNullable(params.get("region"));
// Create EC2 instance. You may need to parse that region string to AWS Region object.
}
Let me know if that works for you!

Flink thowing serialization error when reading from hbase

When I read from hbase using richfatMapFunction inside a map I am getting serialization error. What I am trying to do is if a datastream equals to a particular string read from hbase else ignore. Below is the sample program and error I am getting.
package com.abb.Flinktest
import java.text.SimpleDateFormat
import java.util.Properties
import scala.collection.concurrent.TrieMap
import org.apache.flink.addons.hbase.TableInputFormat
import org.apache.flink.api.common.functions.RichFlatMapFunction
import org.apache.flink.api.common.io.OutputFormat
import org.apache.flink.api.java.tuple.Tuple2
import org.apache.flink.streaming.api.scala.DataStream
import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
import org.apache.flink.streaming.api.scala.createTypeInformation
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08
import org.apache.flink.streaming.util.serialization.SimpleStringSchema
import org.apache.flink.util.Collector
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.TableName
import org.apache.hadoop.hbase.client.ConnectionFactory
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.client.Result
import org.apache.hadoop.hbase.client.Scan
import org.apache.hadoop.hbase.filter.BinaryComparator
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.util.Bytes
import org.apache.log4j.Level
import org.apache.flink.api.common.functions.RichMapFunction
object Flinktesthbaseread {
def main(args:Array[String])
{
val env = StreamExecutionEnvironment.createLocalEnvironment()
val kafkaStream = env.fromElements("hello")
val c=kafkaStream.map(x => if(x.equals("hello"))kafkaStream.flatMap(new ReadHbase()) )
env.execute()
}
class ReadHbase extends RichFlatMapFunction[String,Tuple11[String,String,String,String,String,String,String,String,String,String,String]] with Serializable
{
var conf: org.apache.hadoop.conf.Configuration = null;
var table: org.apache.hadoop.hbase.client.HTable = null;
var hbaseconnection:org.apache.hadoop.hbase.client.Connection =null
var taskNumber: String = null;
var rowNumber = 0;
val serialVersionUID = 1L;
override def open(parameters: org.apache.flink.configuration.Configuration) {
println("getting table")
conf = HBaseConfiguration.create()
val in = getClass().getResourceAsStream("/hbase-site.xml")
conf.addResource(in)
hbaseconnection = ConnectionFactory.createConnection(conf)
table = new HTable(conf, "testtable");
// this.taskNumber = String.valueOf(taskNumber);
}
override def flatMap(msg:String,out:Collector[Tuple11[String,String,String,String,String,String,String,String,String,String,String]])
{
//flatmap operation here
}
override def close() {
table.flushCommits();
table.close();
}
}
}
Error:
log4j:WARN No appenders could be found for logger (org.apache.flink.api.scala.ClosureCleaner$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: Task not serializable
at org.apache.flink.api.scala.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:172)
at org.apache.flink.api.scala.ClosureCleaner$.clean(ClosureCleaner.scala:164)
at org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.scalaClean(StreamExecutionEnvironment.scala:617)
at org.apache.flink.streaming.api.scala.DataStream.clean(DataStream.scala:959)
at org.apache.flink.streaming.api.scala.DataStream.map(DataStream.scala:484)
at com.abb.Flinktest.Flinktesthbaseread$.main(Flinktesthbaseread.scala:45)
at com.abb.Flinktest.Flinktesthbaseread.main(Flinktesthbaseread.scala)
Caused by: java.io.NotSerializableException: org.apache.flink.streaming.api.scala.DataStream
- field (class "com.abb.Flinktest.Flinktesthbaseread$$anonfun$1", name: "kafkaStream$1", type: "class org.apache.flink.streaming.api.scala.DataStream")
- root object (class "com.abb.Flinktest.Flinktesthbaseread$$anonfun$1", <function1>)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1182)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:301)
at org.apache.flink.api.scala.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:170)
... 6 more
I tried wrapping the field inside a method and a class by making the class serializable as wel, but no luck. Could someone throw some lights on this or suggest some workaround for this.
The problem is that you're trying to access the kafka stream variable in the map function which is simply not serializable. It is just an abstract representation of the data. It doesn't contain anything, which invalidates your function in the first place.
instead, do something like this:
kafkaStream.filter(x => x.equals("hello")).flatMap(new ReadHBase())
The filter funtion will only retain the elements for which the condition is true, and those will be passed to your flatMap function.
I would highly recommend you to read the basis API concepts documentation, as there appears to be some misunderstanding as to what actually happens when you specify a transformation.

Writing data from JavaDStream<String> in Apache spark to elasticsearch

I am working on programming to process data from Apache kafka to elasticsearch. For that purpose I am using Apache Spark. I have gone through many link but unable to find example to write data from JavaDStream in Apache spark to elasticsearch.
Below is sample code of spark which gets data from kafka and prints it.
import org.apache.log4j.Logger;
import org.apache.log4j.Level;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Arrays;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;
import java.util.regex.Pattern;
import scala.Tuple2;
import kafka.serializer.StringDecoder;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.*;
import org.apache.spark.streaming.api.java.*;
import org.apache.spark.streaming.kafka.KafkaUtils;
import org.apache.spark.streaming.Durations;
import org.elasticsearch.spark.rdd.api.java.JavaEsSpark;
import com.google.common.collect.ImmutableMap;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import java.util.List;
public class SparkStream {
public static JavaSparkContext sc;
public static List<Map<String, ?>> alldocs;
public static void main(String args[])
{
if(args.length != 2)
{
System.out.println("SparkStream <broker1-host:port,broker2-host:port><topic1,topic2,...>");
System.exit(1);
}
Logger.getLogger("org").setLevel(Level.OFF);
Logger.getLogger("akka").setLevel(Level.OFF);
SparkConf sparkConf=new SparkConf().setAppName("Data Streaming");
sparkConf.setMaster("local[2]");
sparkConf.set("es.index.auto.create", "true");
sparkConf.set("es.nodes","localhost");
sparkConf.set("es.port","9200");
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));
Set<String> topicsSet=new HashSet<>(Arrays.asList(args[1].split(",")));
Map<String,String> kafkaParams=new HashMap<>();
String brokers=args[0];
kafkaParams.put("metadata.broker.list",brokers);
kafkaParams.put("auto.offset.reset", "largest");
kafkaParams.put("offsets.storage", "zookeeper");
JavaPairDStream<String, String> messages=KafkaUtils.createDirectStream(
jssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
kafkaParams,
topicsSet
);
JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
#Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
});
lines.print();
jssc.start();
jssc.awaitTermination();
}
}
`
One method to save to elastic search is using the saveToEs method inside a foreachRDD function. Any other method you wish to use would still require the foreachRDD call to your dstream.
For example:
lines.foreachRDD(lambda rdd: rdd.saveToEs("ESresource"))
See here for more
dstream.foreachRDD{rdd=>
val es = sqlContext.createDataFrame(rdd).toDF("use headings suitable for your dataset")
import org.elasticsearch.spark.sql._
es.saveToEs("wordcount/testing")
es.show()
}
In this code block "dstream" is the data stream which observe data from server like kafka. Inside brackets of "toDF()" you have to use headings. In "saveToES()" you have use elasticsearch index. Before this you have create SQLContext.
val sqlContext = SQLContext.getOrCreate(SparkContext.getOrCreate())
If you are using kafka to send data you have to add dependency mentioned below
libraryDependencies += "org.apache.kafka" % "kafka-clients" % "0.10.2.1"
Get the dependency
To see full example see
In this example first you have to create kafka producer "test" then start elasticsearch
After run the program. You can see full sbt and code using above url.

Resources