Writing flume interceptors to bring the data from solace - hadoop

I want to bring the data from solace to hadoop using flume, can some one let me know how to write interceptor to convert protobuf to avro ?

There's a very detailed integration guide available describing how to use the JMS Flume Source to receive messages from a Solace message bus.
Is this the interface you are using?
If so the blog post by Ken Barr (https://solace.com/blog/devops/solace-as-flume-channel-technical-look) gives an implementation of both Flume Source and Sink. The complete source code is at http://dev.solace.com/wp-content/uploads/solace-flume-channel.tgz
The FlumeEventToSolaceMessageConverter.solaceToFlume() method is the one you'd need to modify to support your protobuf to avro use case. OOTB it just assumes that the body of the JMS message is an avro message.
On GitHub we found a protobuf to avro converter (vpon/protobuf-to-avro) which generates a POJO converter using a .proto schema file.

Related

kafka streams - can I use kafka streams processing in cases where the source is not a kafka topic?

I have an application (call it smscb-router) as shown in the diagram.
It reads data from a legacy system (sms).
Based on the content (callback type), I have to put into corresponding outgoing topic (such as billing-n-cdr, dr-cdr, ...)
I think streams API is better suited in this case, as it has the map functionality to do the content mapping check. What I am unsure is, can I read source data from a non-kafka-topic source.
All the examples that I see on the internet blogs, explain steaming apps with the context of reading from a source topic and put to other destination topics.
So, is this possible to read from a non-topic source, such as say a redis store, or a message queue such as RabbitMQ?
We had a recent implementation, where we had to poll an .xml file from a network attached drive and convert it into the KAFKA Events i.e. publishing each record into an output topic. In such, we wont even call it as something we have developed using a Streams API, but it is just a KAFKA Producer Component.
Java File Poller Module (Quartz time based) -> XML Schema Management -> KAFKA Producer Component -> Output Topic (KAFKA Broker).
And you will get all native features of KAKFA Producer API in terms of retries and you can use producer.send (Sync) or producer.send.get(Asyn) with call-back.
Hope this helps. Streams API is meant for big and something very complex that to be normalized through using Stateful operations.
Thanks,
Christopher
Kafka Streams is only about Topic to Topic Data Streaming
All external system should be integrated by another method :
Ideally Kafka Connect : for example with this one :
https://docs.confluent.io/kafka-connect-rabbitmq-source/current/overview.html
You may also use a manual consumer for the first step, but it always better to reuse all availability mecanism built in Kafka Connect. (No code, just some Json config).
In your schema i would recommend to add one topic and one producer or one connector in front of your Pink Component, then it can become a fully standard Kafka Streams microservice.

Spring Kafka JDBC Connector compatibility

Is Kafka JDBC connect compatible with Spring-Kafka library?
I did follow https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector/ and still have some confusions.
Let's say you want to consume from a Kafka topic and write to a JDBC database. Some of your options are
Use plain Kafka consumer to consume from the topic and use Jdbc api to write the consumed record to database.
Use spring Kafka to consume from the Kafka Topic and spring jdbc template or spring data to write it to the database
Use Kafka connect with Jdbc connector as sink to read from topic and write to a table.
So as you can see
Kafka Jdbc connector is a specialised component that can only do one job.
Kafka Consumer is very genric component which can do lot of job and you will be writing lot of code. In facr, it will be the foundational API from which other frameworks build on and specialise.
Spring Kafka simplfies it and let you deal with kafka records as java objects but doesnt tell you how to write that object to your db.
So they are alternative solutions to fulfil the task. Having said that you may have a flow where different segments are controlled by different teams and for each segment, any of them can be used and Kafka topic will act as joining channel

Is it possible sending websocket messages to a kafka topic?

I am trying to find a way to consume messages that being sent by a websocket to a kafka topic (the messages are sent by the websocket to the address 'ws://address:port/topic_name' and I want to add all of those messages to a kafka topic).
I read about kafka connect and tried to find a way to do it with it but it doesnt seem to work...
thanks in advance :)
There is no Kafka Connector to a socket in Confluent Platform.
I work in a team that use Kafka in production and our source is a socket, so your options are to use platforms that support this socket->Kafka producing, or write one by yourself.
About possible platforms, I think most of them will be overkill though you can utilize them for this problem, some options are:
1. NiFi or MiniFi for smaller loads, use PublishKafka Processor
2. StreamSets with Kafka Producer Destination
3. Apache Flume- not very recommended, this project is stops to evolve.
If you wish to write your own producer, you basically have to create a listener on this port, and produce the incoming messages to Kafka; if this is a web socket, just get the payload of the requests and produce them to Kafka.
Example Kafka Producer Code can be copied from tutorialspoint simple producer example*
Here are some open-source projects examples:
1. https://github.com/DataReply/kafka-connect-socket-source
2. https://github.com/kafka-socket/miniature_engine
3. https://github.com/dhanuka84/kafka-connect-tcp
4. https://github.com/krux/tcp-stream-kafka-producer
The idea of Kafka connect is that you have some sort of external integration that serves as storage. This can be SAP, Salesforce, RDBMS, MQ or anything else that has state. You websocket endpoint does not have data, you can not poll it it is someone else that is invoking it and there fore the data is transfered. Now if you know who is actualy holding the data than you can potentialy build a conector using this guide. https://docs.confluent.io/current/connect/devguide.html
For your particular case, the best you can do is either to use Kafka Producer API https://docs.confluent.io/current/clients/producer.html
and from your websocket enpoint use this producer to post a message to the topic, or even better if you are using spring you can use a higher level abstraction, that will be KafkaTemplate https://docs.spring.io/spring-kafka/reference/html/#sending-messages.
Full disclosure: I work for MigratoryData.
You can check out MigratoryData's solution for Kafka. MigratoryData is a scalable WebSocket server. The MigratoryData Source/Sink Connector for Kafka makes use of Kafka Connect API and can be used to stream data in real-time from Kafka to WebSocket clients and vice versa. The main advantage of the solution is it extends Kafka messaging to WebSocket clients while preserving Kafka's key features like guaranteed delivery, message ordering, etc.

Why to use SpringKafka Template in place existing Kafka Producer Consumer api?

What benefits does spring Kafka template provide?
I have tried the existing Producer/Consumer API by Kafka. That is very simple to use, then why use Kafka template.
Kafka Template internally uses Kafka producer so you can directly use Kafka APIs. The benefit of using Kafka template is it provides different methods for sending message to Kafka topic, kind of added benefits you can see the API comparison between KafkaProducer and KafkaTemplate here:
https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html
https://docs.spring.io/spring-kafka/api/org/springframework/kafka/core/KafkaTemplate.html
You can see KafkaTemplate provide many additional ways of sending data to Kafka topics because of various send methods while some calls are the same as Kafka API and are simply forwarded from KafkaTemplate to KafkaProducer.
It's up to the developer what to use. If you feel like working with KafkaTemplate is easy as you don't have to create ProducerRecord a simple send method will do all the work for you.
At a high level, the benefit is that you can externalize your properties objects more easily and you can just focus on the record processing logic
Plus Spring is integrated with lots of other components.
Note: Other options still exist like Reactor Kafka, Alpakka, Apache Camel, Smallrye reactive messaging, Vert.x... But they all wrap the same Kafka API.
So, I'd say you're (marginally) trading efficiency for convinience

Does VoltDB's Connector Sink support Avro messages?

Does VoltDB's Kafka connector support reading avro messages Below is my attempt to configure the connector ?
{
"name":"KafkaSinkConnector",
"config":{
"connector.class":"org.voltdb.connect.kafka.KafkaSinkConnector",
"tasks.max":"1",
"voltdb.servers":"node1:21212,node2:21212,node3:21212",
"voltdb.procedure":"USER.insert",
"voltdb.connection.user":"test",
"voltdb.connection.password":"test",
"topics":"user",
"key.converter":"io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url":"http://kafka-schema-registry:8081",
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://kafka-schema-registry:8081",
"key.converter.schemas.enable":"true",
"value.converter.schemas.enable":"true",
}
}
The connector loads with no error messages are displayed but nothing is written to the DB.
Unfortunately, neither the Kafka importer nor the sink connector currently support Avro messages. This is a good feature request that could be added in the future, however.
Full disclosure: I work for VoltDB.
-Andrew

Resources