I am using Flink 1.1.2 and have added ElesticSearch dependency in Maven as follows
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-elasticsearch2_2.10</artifactId>
<version>1.2.0</version>
</dependency>
My program contains the following code that is reading data from Kafka and inserting to Elastic search
public class ReadFromKafka {
public static void main(String[] args) throws Exception {
// create execution environment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
properties.setProperty("zookeeper.connect", "localhost:2181");
properties.setProperty("group.id", "test");
DataStream<JoinedStreamEvent> message = env.addSource(new FlinkKafkaConsumer09<JoinedStreamEvent>("test",
new JoinSchema(), properties));
System.out.println("reading form kafka ");
message.print();
Map<String, String> config = new HashMap<>();
config.put("bulk.flush.max.actions", "1"); // flush inserts after every event
config.put("cluster.name", "elasticsearch_amar"); // default cluster name
List<InetSocketAddress> transports = new ArrayList<>();
// set default connection details
transports.add(new InetSocketAddress(InetAddress.getByName("127.0.0.1"), 9300));
message.addSink(new ElasticsearchSink<>(config,transports,new ElasticInserter()));
env.execute();
} //main
public static class ElasticInserter implements ElasticsearchSinkFunction<JoinedStreamEvent>{
#Override
public void process(JoinedStreamEvent record, RuntimeContext runtimeContext, RequestIndexer requestIndexer) {
Map<String, Integer> json = new HashMap<>();
json.put("Time", record.getPatient_id());
json.put("heart Rate ", record.getHeartRate());
json.put("resp rete", record.getRespirationRate());
IndexRequest rqst = Requests.indexRequest()
.index("nyc-places") // index name
.type("popular-locations") // mapping name
.source(json);
requestIndexer.add(rqst);
} //process
} //ElasticInserter
} //ReadFromKafka
I have installed ElesticSearch using homebrew and then started it using elesticsearch command as shown below
however, when I start my program I got following error
my reputation below 50, can not comment.
I have a bit of suggestion:
first check whether ES is up,
see Can't Connect to Elasticsearch (through Curl).
recommended to use the docker container to start ES, eg. docker run -d --name es -p 9200:9200 elasticsearch:2 -Des.network.host=0.0.0.0
BTW, You can try: modify es.network.host value to 0.0.0.0 in ES config elasticsearch.yml:
Related
I used spring boot data redis to connect to the redis cluster, using version 2.1.3 The configuration is as follows:
#Bean
#Primary
public RedisConnectionFactory myLettuceConnectionFactory(GenericObjectPoolConfig poolConfig) {
RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration();
final List<String> nodeList = redisProperties.getCluster().getNodes();
Set<RedisNode> nodes = new HashSet<RedisNode>();
for (String ipPort : nodeList) {
String[] ipAndPort = ipPort.split(":");
nodes.add(new RedisNode(ipAndPort[0].trim(), Integer.valueOf(ipAndPort[1])));
}
redisClusterConfiguration.setPassword(RedisPassword.of(redisProperties.getPassword()));
redisClusterConfiguration.setClusterNodes(nodes);
redisClusterConfiguration.setMaxRedirects(redisProperties.getCluster().getMaxRedirects());
LettuceClientConfiguration clientConfig = LettucePoolingClientConfiguration.builder()
.commandTimeout(redisProperties.getTimeout())
.poolConfig(poolConfig)
.build();
RedisClusterClient clusterClient ;
LettuceConnectionFactory factory = new LettuceConnectionFactory(redisClusterConfiguration,clientConfig);
return factory;
}
However, during the operation, a WARN exception message will always be received as follows:
Well, this seems to be a problem with lettuce, How to map remote host & port to localhost using Lettuce,but I don't know how to use it in spring boot data redis. Any solution is welcome, thank you
I've got the answer, so let's define a ClinentRourse like this:
MappingSocketAddressResolver resolver = MappingSocketAddressResolver.create(DnsResolvers.UNRESOLVED ,
hostAndPort -> {
if(hostAndPort.getHostText().startsWith("172.31")){
return HostAndPort.of(ipStr, hostAndPort.getPort());
}
return hostAndPort;
});
ClientResources clientResources = ClientResources.builder()
.socketAddressResolver(resolver)
.build();
Then through LettuceClientConfiguration.clientResources method set in, the normal work of the lettuce.
I used Elasticsearch Connector as a Sink to insert data into Elasticsearch (see : https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/elasticsearch.html).
But, I did not found any connector to get data from Elasticsearch as source.
Is there any connector or example to use Elasticsearch documents as source in a Flink pipline?
Regards,
Ali
I don't know of an explicit ES source for Flink. I did see one user talking about using elasticsearch-hadoop as a HadoopInputFormat with Flink, but I don't know if that worked for them (see their code).
I finaly defined a simple read from ElasticSearch function
public static class ElasticsearchFunction
extends ProcessFunction<MetricMeasurement, MetricPrediction> {
public ElasticsearchFunction() throws UnknownHostException {
client = new PreBuiltTransportClient(settings)
.addTransportAddress(new TransportAddress(InetAddress.getByName("YOUR_IP"), PORT_NUMBER));
}
#Override
public void processElement(MetricMeasurement in, Context context, Collector<MetricPrediction> out) throws Exception {
MetricPrediction metricPrediction = new MetricPrediction();
metricPrediction.setMetricId(in.getMetricId());
metricPrediction.setGroupId(in.getGroupId());
metricPrediction.setBucket(in.getBucket());
// Get the metric measurement from Elasticsearch
SearchResponse response = client.prepareSearch("YOUR_INDEX_NAME")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.termQuery("YOUR_TERM", in.getMetricId())) // Query
.setPostFilter(QueryBuilders.rangeQuery("value").from(0L).to(50L)) // Filter
.setFrom(0).setSize(1).setExplain(true)
.get();
SearchHit[] results = response.getHits().getHits();
for(SearchHit hit : results){
String sourceAsString = hit.getSourceAsString();
if (sourceAsString != null) {
ObjectMapper mapper = new ObjectMapper();
MetricMeasurement obj = mapper.readValue(sourceAsString, MetricMeasurement.class);
obj.getMetricId();
metricPrediction.setPredictionValue(obj.getValue());
}
}
out.collect(metricPrediction);
}
}
Hadoop Compatibility + Elasticsearch Hadoop
https://github.com/cclient/flink-connector-elasticsearch-source
I am using this docker-compose setup for setting up Kafka locally: https://github.com/wurstmeister/kafka-docker/
docker-compose up works fine, creating topics via shell works fine.
Now I try to connect to Kafka via spring-kafka:2.1.0.RELEASE
When starting up the Spring application it prints the correct version of Kafka:
o.a.kafka.common.utils.AppInfoParser : Kafka version : 1.0.0
o.a.kafka.common.utils.AppInfoParser : Kafka commitId : aaa7af6d4a11b29d
I try to send a message like this
kafkaTemplate.send("test-topic", UUID.randomUUID().toString(), "test");
Sending on client side fails with
UnknownServerException: The server experienced an unexpected error when processing the request
In the server console I get the message Magic v1 does not support record headers
Error when handling request {replica_id=-1,max_wait_time=100,min_bytes=1,max_bytes=2147483647,topics=[{topic=test-topic,partitions=[{partition=0,fetch_offset=39,max_bytes=1048576}]}]} (kafka.server.KafkaApis)
java.lang.IllegalArgumentException: Magic v1 does not support record headers
Googling suggests a version conflict, but the version seem to fit (org.apache.kafka:kafka-clients:1.0.0 is in the classpath).
Any clues? Thanks!
Edit:
I narrowed down the source of the problem. Sending plain Strings works, but sending Json via JsonSerializer results in the given problem. Here is the content of my producer config:
#Value("\${kafka.bootstrap-servers}")
lateinit var bootstrapServers: String
#Bean
fun producerConfigs(): Map<String, Any> =
HashMap<String, Any>().apply {
// list of host:port pairs used for establishing the initial connections to the Kakfa cluster
put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers)
put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer::class.java)
put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer::class.java)
}
#Bean
fun producerFactory(): ProducerFactory<String, MyClass> =
DefaultKafkaProducerFactory(producerConfigs())
#Bean
fun kafkaTemplate(): KafkaTemplate<String, MyClass> =
KafkaTemplate(producerFactory())
I had a similar issue. Kafka adds headers by default if we use JsonSerializer or JsonSerde for values.
In order to prevent this issue, we need to disable adding info headers.
if you are fine with default json serialization, then use the following (key point here is ADD_TYPE_INFO_HEADERS):
Map<String, Object> props = new HashMap<>(defaultSettings);
props.put(JsonSerializer.ADD_TYPE_INFO_HEADERS, false);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
ProducerFactory<String, Object> producerFactory = new DefaultKafkaProducerFactory<>(props);
but if you need a custom JsonSerializer with specific ObjectMapper (like with PropertyNamingStrategy.SNAKE_CASE), you should disable adding info headers explicitly on JsonSerializer, as spring kafka ignores DefaultKafkaProducerFactory's property ADD_TYPE_INFO_HEADERS (as for me it's a bad design of spring kafka)
JsonSerializer<Object> valueSerializer = new JsonSerializer<>(customObjectMapper);
valueSerializer.setAddTypeInfo(false);
ProducerFactory<String, Object> producerFactory = new DefaultKafkaProducerFactory<>(props, Serdes.String().serializer(), valueSerializer);
or if we use JsonSerde, then:
Map<String, Object> jsonSerdeProperties = new HashMap<>();
jsonSerdeProperties.put(JsonSerializer.ADD_TYPE_INFO_HEADERS, false);
JsonSerde<T> jsonSerde = new JsonSerde<>(serdeClass);
jsonSerde.configure(jsonSerdeProperties, false);
Solved. The problem is neither the broker, some docker cache nor the Spring app.
The problem was a console consumer which I used in parallel for debugging. This was an "old" consumer started with kafka-console-consumer.sh --topic=topic --zookeeper=...
It actually prints a warning when started: Using the ConsoleConsumer with old consumer is deprecated and will be removed in a future major release. Consider using the new consumer by passing [bootstrap-server] instead of [zookeeper].
A "new" consumer with --bootstrap-server option should be used (especially when using Kafka 1.0 with JsonSerializer).
Note: Using an old consumer here can indeed affect the producer.
I just ran a test against that docker image with no problems...
$docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f093b3f2475c kafkadocker_kafka "start-kafka.sh" 33 minutes ago Up 2 minutes 0.0.0.0:32768->9092/tcp kafkadocker_kafka_1
319365849e48 wurstmeister/zookeeper "/bin/sh -c '/usr/sb…" 33 minutes ago Up 2 minutes 22/tcp, 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp kafkadocker_zookeeper_1
.
#SpringBootApplication
public class So47953901Application {
public static void main(String[] args) {
SpringApplication.run(So47953901Application.class, args);
}
#Bean
public ApplicationRunner runner(KafkaTemplate<Object, Object> template) {
return args -> template.send("foo", "bar", "baz");
}
#KafkaListener(id = "foo", topics = "foo")
public void listen(String in) {
System.out.println(in);
}
}
.
spring.kafka.bootstrap-servers=192.168.177.135:32768
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.enable-auto-commit=false
.
2017-12-23 13:27:27.990 INFO 21305 --- [ foo-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [foo-0]
baz
EDIT
Still works for me...
spring.kafka.bootstrap-servers=192.168.177.135:32768
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.enable-auto-commit=false
spring.kafka.consumer.value-deserializer=org.springframework.kafka.support.serializer.JsonDeserializer
spring.kafka.producer.value-serializer=org.springframework.kafka.support.serializer.JsonSerializer
.
2017-12-23 15:27:59.997 INFO 44079 --- [ main] o.a.k.clients.producer.ProducerConfig : ProducerConfig values:
acks = 1
...
value.serializer = class org.springframework.kafka.support.serializer.JsonSerializer
...
2017-12-23 15:28:00.071 INFO 44079 --- [ foo-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [foo-0]
baz
you are using kafka version <=0.10.x.x
once you using using this, you must set JsonSerializer.ADD_TYPE_INFO_HEADERS to false as below.
Map<String, Object> props = new HashMap<>(defaultSettings);
props.put(JsonSerializer.ADD_TYPE_INFO_HEADERS, false);
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
ProducerFactory<String, Object> producerFactory = new DefaultKafkaProducerFactory<>(props);
for your producer factory properties.
In case you are using kafka version > 0.10.x.x, it should just work fine
I want to use the elastic producer on flink but I have some trouble for authentification:
I have Nginx in front of my elastic search cluster, and I use basic auth in nginx.
But with the elastic search connector I can't add the basic auth in my url (because of InetSocketAddress)
did you have an Idea to use elasticsearch connector with basic auth ?
Thanks for your time.
there is my code :
val configur = new java.util.HashMap[String, String]
configur.put("cluster.name", "cluster")
configur.put("bulk.flush.max.actions", "1000")
val transportAddresses = new java.util.ArrayList[InetSocketAddress]
transportAddresses.add(new InetSocketAddress(InetAddress.getByName("cluster.com"), 9300))
jsonOutput.filter(_.nonEmpty).addSink(new ElasticsearchSink(configur,
transportAddresses,
new ElasticsearchSinkFunction[String] {
def createIndexRequest(element: String): IndexRequest = {
val jsonMap = parse(element).values.asInstanceOf[java.util.HashMap[String, String]]
return Requests.indexRequest()
.index("flinkTest")
.source(jsonMap);
}
override def process(element: String, ctx: RuntimeContext, indexer: RequestIndexer) {
indexer.add(createIndexRequest(element))
}
}))
Flink uses the Elasticsearch Transport Client which connects using a binary protocol on port 9300.
Your nginx proxy is sitting in front of the HTTP interface on port 9200.
Flink isn't going to use your proxy, so there's no need to provide authentication.
If you need to use a HTTP Client to connect Flink with Elasticsearch, one solution is to use Jest Library.
You have to create a custom SinkFunction, like this basic java class :
package fr.gfi.keenai.streaming.io.sinks.elasticsearch5;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.Index;
public class ElasticsearchJestSinkFunction<T> extends RichSinkFunction<T> {
private static final long serialVersionUID = -7831614642918134232L;
private JestClient client;
#Override
public void invoke(T value) throws Exception {
String document = convertToJsonDocument(value);
Index index = new Index.Builder(document).index("YOUR_INDEX_NAME").type("YOUR_DOCUMENT_TYPE").build();
client.execute(index);
}
#Override
public void open(Configuration parameters) throws Exception {
// Construct a new Jest client according to configuration via factory
JestClientFactory factory = new JestClientFactory();
factory.setHttpClientConfig(new HttpClientConfig.Builder("http://localhost:9200")
.multiThreaded(true)
// Per default this implementation will create no more than 2 concurrent
// connections per given route
.defaultMaxTotalConnectionPerRoute(2)
// and no more 20 connections in total
.maxTotalConnection(20)
// Basic username and password authentication
.defaultCredentials("YOUR_USER", "YOUR_PASSWORD")
.build());
client = factory.getObject();
}
private String convertToJsonDocument(T value) {
//TODO
return "{}";
}
}
Note that you can also use bulk operations for more speed.
An exemple of Jest implementation for Flink is described at the part "Connecting Flink to Amazon RS" of this post
Spring boot new for me as non web project. please guide me how to code Spark streaming in spring boot, i have already work in java-spark project and want to convert in spring boot non web application. any help or suggestion please.
Here is my Spark config
#Bean
public SparkConf sparkConf() {
SparkConf sparkConf = new SparkConf();
sparkConf.set("spark.app.name", "SparkReceiver"); //The name of application. This will appear in the UI and in log data.
//conf.set("spark.ui.port", "7077"); //Port for application's dashboard, which shows memory and workload data.
sparkConf.set("dynamicAllocation.enabled","false"); //Which scales the number of executors registered with this application up and down based on the workload
//conf.set("spark.cassandra.connection.host", "localhost"); //Cassandra Host Adddress/IP
sparkConf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer"); //For serializing objects that will be sent over the network or need to be cached in serialized form.
sparkConf.set("spark.driver.allowMultipleContexts", "true");
sparkConf.setMaster("local[4]");
return sparkConf;
}
#Bean
public JavaSparkContext javaSparkContext() {
return new JavaSparkContext(sparkConf());
}
#Bean
public SparkSession sparkSession() {
return SparkSession
.builder()
.sparkContext(javaSparkContext().sc())
.appName("Java Spark SQL basic example")
.getOrCreate();
}
#Bean
public JavaStreamingContext javaStreamingContext(){
return new JavaStreamingContext(sparkConf(), new Duration(2000));
}
Here is my testing class
#Autowired
private JavaSparkContext sc;
#Autowired
private SparkSession session;
public void testMessage() throws InterruptedException{
JavaStreamingContext jsc = new JavaStreamingContext(sc, new Duration(2000));
Map<String, String> kafkaParams = new HashMap<String, String>();
kafkaParams.put("zookeeper.connect", "localhost:2181"); //Make all kafka data for this cluster appear under a particular path.
kafkaParams.put("group.id", "testgroup"); //String that uniquely identifies the group of consumer processes to which this consumer belongs
kafkaParams.put("metadata.broker.list", "localhost:9092"); //Producer can find a one or more Brokers to determine the Leader for each topic.
kafkaParams.put("serializer.class", "kafka.serializer.StringEncoder"); //Serializer to use when preparing the message for transmission to the Broker.
kafkaParams.put("request.required.acks", "1"); //Producer to require an acknowledgement from the Broker that the message was received.
Set<String> topics = Collections.singleton("16jnfbtopic");
//Create an input DStream for Receiving data from socket
JavaPairInputDStream<String, String> directKafkaStream = KafkaUtils.createDirectStream(jsc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
kafkaParams, topics);
//Create JavaDStream<String>
JavaDStream<String> msgDataStream = directKafkaStream.map(new Function<Tuple2<String, String>, String>() {
#Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
});
//Create JavaRDD<Row>
msgDataStream.foreachRDD(new VoidFunction<JavaRDD<String>>() {
#Override
public void call(JavaRDD<String> rdd) {
JavaRDD<Row> rowRDD = rdd.map(new Function<String, Row>() {
#Override
public Row call(String msg) {
Row row = RowFactory.create(msg);
return row;
}
});
//Create Schema
StructType schema = DataTypes.createStructType(new StructField[] {DataTypes.createStructField("Message", DataTypes.StringType, true)});
Dataset<Row> msgDataFrame = session.createDataFrame(rowRDD, schema);
msgDataFrame.show();
}
});
jsc.start();
jsc.awaitTermination();
while run this app i am getting error please Guide me.
Here is my error log
Eclipse Error Log