looking for expert's help(i am newbie on elastic search)... have multiple nodes of elastic search.
i am using ElasticSearch java lib for indexing the json docs. would like to know how to handle the node balancing,is it possible to handle that from client side?
---elasticSearch transport client code------
public static Client getTransportClient(String host, int port) {
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "ccw_cat_es")
.put("node.name", "catsrch-pdv1-01")
.build();
return new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress(host, port));
}
public static IndexResponse doIndex(Client client, String index, String type, String id, Map<String, Object> data) {
return client
.prepareIndex(index, type, id)
.setSource(data)
.execute()
.actionGet();
}
public static void main(String[] args) {
Client client = getTransportClient("catsrch-pdv1-01", 9200);
String index = "orderstatussearch";
String type = "osapi";
String id = null;
Map<String, Object> data = new HashMap<String, Object>();
data.put("OrderNumber", "444");
data.put("PO", "123");
data.put("WID", "ab234");
id= "444";
IndexResponse result = doIndex(client, index, type, id, data);
}
The TransportClient will automatically use a round robin strategy to load balance against nodes that it is connected too. In your case, you are only connecting to one node, so there is nothing to balance. You can add other nodes to the list and it will balance them appropriately.
Alternatively, you can "sniff" out the data nodes automatically by just connecting to one of them with an extra setting applied:
Settings settings = ImmutableSettings.settingsBuilder()
// ...
.put("client.transport.sniff", true)
// ...
.build()
This will then round robin against all data nodes that it finds in the cluster state.
This probably leads to the question: why isn't this the default? The reason is that, if you have standalone client nodes, then they are better proxies to the cluster rather than directly communicating with data nodes. For smaller clusters, this is a perfectly acceptable strategy though.
Related
I used Elasticsearch Connector as a Sink to insert data into Elasticsearch (see : https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/elasticsearch.html).
But, I did not found any connector to get data from Elasticsearch as source.
Is there any connector or example to use Elasticsearch documents as source in a Flink pipline?
Regards,
Ali
I don't know of an explicit ES source for Flink. I did see one user talking about using elasticsearch-hadoop as a HadoopInputFormat with Flink, but I don't know if that worked for them (see their code).
I finaly defined a simple read from ElasticSearch function
public static class ElasticsearchFunction
extends ProcessFunction<MetricMeasurement, MetricPrediction> {
public ElasticsearchFunction() throws UnknownHostException {
client = new PreBuiltTransportClient(settings)
.addTransportAddress(new TransportAddress(InetAddress.getByName("YOUR_IP"), PORT_NUMBER));
}
#Override
public void processElement(MetricMeasurement in, Context context, Collector<MetricPrediction> out) throws Exception {
MetricPrediction metricPrediction = new MetricPrediction();
metricPrediction.setMetricId(in.getMetricId());
metricPrediction.setGroupId(in.getGroupId());
metricPrediction.setBucket(in.getBucket());
// Get the metric measurement from Elasticsearch
SearchResponse response = client.prepareSearch("YOUR_INDEX_NAME")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.termQuery("YOUR_TERM", in.getMetricId())) // Query
.setPostFilter(QueryBuilders.rangeQuery("value").from(0L).to(50L)) // Filter
.setFrom(0).setSize(1).setExplain(true)
.get();
SearchHit[] results = response.getHits().getHits();
for(SearchHit hit : results){
String sourceAsString = hit.getSourceAsString();
if (sourceAsString != null) {
ObjectMapper mapper = new ObjectMapper();
MetricMeasurement obj = mapper.readValue(sourceAsString, MetricMeasurement.class);
obj.getMetricId();
metricPrediction.setPredictionValue(obj.getValue());
}
}
out.collect(metricPrediction);
}
}
Hadoop Compatibility + Elasticsearch Hadoop
https://github.com/cclient/flink-connector-elasticsearch-source
I want to use the elastic producer on flink but I have some trouble for authentification:
I have Nginx in front of my elastic search cluster, and I use basic auth in nginx.
But with the elastic search connector I can't add the basic auth in my url (because of InetSocketAddress)
did you have an Idea to use elasticsearch connector with basic auth ?
Thanks for your time.
there is my code :
val configur = new java.util.HashMap[String, String]
configur.put("cluster.name", "cluster")
configur.put("bulk.flush.max.actions", "1000")
val transportAddresses = new java.util.ArrayList[InetSocketAddress]
transportAddresses.add(new InetSocketAddress(InetAddress.getByName("cluster.com"), 9300))
jsonOutput.filter(_.nonEmpty).addSink(new ElasticsearchSink(configur,
transportAddresses,
new ElasticsearchSinkFunction[String] {
def createIndexRequest(element: String): IndexRequest = {
val jsonMap = parse(element).values.asInstanceOf[java.util.HashMap[String, String]]
return Requests.indexRequest()
.index("flinkTest")
.source(jsonMap);
}
override def process(element: String, ctx: RuntimeContext, indexer: RequestIndexer) {
indexer.add(createIndexRequest(element))
}
}))
Flink uses the Elasticsearch Transport Client which connects using a binary protocol on port 9300.
Your nginx proxy is sitting in front of the HTTP interface on port 9200.
Flink isn't going to use your proxy, so there's no need to provide authentication.
If you need to use a HTTP Client to connect Flink with Elasticsearch, one solution is to use Jest Library.
You have to create a custom SinkFunction, like this basic java class :
package fr.gfi.keenai.streaming.io.sinks.elasticsearch5;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.Index;
public class ElasticsearchJestSinkFunction<T> extends RichSinkFunction<T> {
private static final long serialVersionUID = -7831614642918134232L;
private JestClient client;
#Override
public void invoke(T value) throws Exception {
String document = convertToJsonDocument(value);
Index index = new Index.Builder(document).index("YOUR_INDEX_NAME").type("YOUR_DOCUMENT_TYPE").build();
client.execute(index);
}
#Override
public void open(Configuration parameters) throws Exception {
// Construct a new Jest client according to configuration via factory
JestClientFactory factory = new JestClientFactory();
factory.setHttpClientConfig(new HttpClientConfig.Builder("http://localhost:9200")
.multiThreaded(true)
// Per default this implementation will create no more than 2 concurrent
// connections per given route
.defaultMaxTotalConnectionPerRoute(2)
// and no more 20 connections in total
.maxTotalConnection(20)
// Basic username and password authentication
.defaultCredentials("YOUR_USER", "YOUR_PASSWORD")
.build());
client = factory.getObject();
}
private String convertToJsonDocument(T value) {
//TODO
return "{}";
}
}
Note that you can also use bulk operations for more speed.
An exemple of Jest implementation for Flink is described at the part "Connecting Flink to Amazon RS" of this post
I am sending data from spring boot to client using stomp client and web socket. It is able to send data to the first user but as soon as user increases it is fetching data for only some users. This seems weird because its behavior should be same for all the users. I have found out after extensive researching that the reason for this is because i am connecting to a queue ('/user/queue') and have more than one client listening to it. How to avoid this problem or is it impossible to solve this issue.
My controller code-
#Controller
public class ScheduledUpdatesOnTopic {
#Autowired
public SimpMessageSendingOperations messagingTemplate;
#Autowired
private SimpMessagingTemplate template;
DateFormat df = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss");
Date date = new Date();
String json[][] = {{"Lokesh Gupta","34","India",df.format(date)},{"Meenal","23","Pakistan",df.format(date)},{"Gongo","12","Indonesia",df.format(date)},{"Abraham","17","US",df.format(date)},{"Saddam","56","Iraq",df.format(date)},{"Yimkov","67","Japan",df.format(date)},{"Salma","22","Pakistan",df.format(date)},{"Georgia","28","Russia",df.format(date)},{"Jaquline","31","Sri Lanka",df.format(date)},{"Chenchui","78","China",df.format(date)}};
String t[] = {"Lokesh Gupta","34","India","11/8/2017"};
String temp[][];
int p=0;
int count=0;
private MessageHeaderInitializer headerInitializer;
#MessageMapping("/hello")
public void start(SimpMessageHeaderAccessor accessor) throws Exception
{
String applicantId=accessor.getSessionId();
System.out.println("session id " + applicantId);
this.messagingTemplate.convertAndSendToUser(applicantId,"/queue/cache",json,createHeaders(applicantId));
}
private MessageHeaders createHeaders(String sessionId) {
SimpMessageHeaderAccessor headerAccessor = SimpMessageHeaderAccessor.create(SimpMessageType.MESSAGE);
if (getHeaderInitializer() != null) {
getHeaderInitializer().initHeaders(headerAccessor);
}
headerAccessor.setSessionId(sessionId);
headerAccessor.setLeaveMutable(true);
return headerAccessor.getMessageHeaders();
}
public MessageHeaderInitializer getHeaderInitializer() {
return this.headerInitializer;
}
public void setHeaderInitializer(MessageHeaderInitializer headerInitializer) {
this.headerInitializer = headerInitializer;
}
And client side html is-
var socket = new SockJS('/gs-guide-websocket');
var stompClient = Stomp.over(socket);
stompClient.connect({ }, function(frame) {
console.log('Connected this ' + frame);
stompClient.subscribe("/user/queue/cache", function(data) {
// code to display this data..........
});
I have to use queue because that is the only way to send data to particular session ids. Any help will be appreciated !!
It sounds like you need to use the "Request-Reply" messaging pattern.
When the client connects to the server on the common queue, it includes a private return address. This return address can be used to generate a new private message queue name for use by the server and client exclusively (since they are the only 2 that know the private return address. The server can then send the client data over the private message queue.
The return address could be a random UUID for example, and the private queue name could be /queue/private. .
This "Request-Reply" messaging pattern is more formally explained here, among other useful messaging patterns:
http://www.enterpriseintegrationpatterns.com/patterns/messaging/ReturnAddress.html
So I've been reading about Spring Message Relay (Spring Messaging stuff) capability with a RabbitMQ broker. What I want to achieve is as follows:
Have a service (1), which acts as a message relay between rabbitmq and a browser. This works fine now. I'm using MessageBrokerRegistry.enableStompBrokerRelay to do that.
Have another service (2) on the back-end, which will send a message to a known queue onto RabbitMQ and have that message routed to a specific user. As a sender, I want to have a control over who the message gets delivered to.
Normally, you'd use SimpMessagingTemplate to do that. Problem is though, that the origin of the message doesn't actually have access to that template, as it's not acting as a relay, it's not using websockets and it doesn't hold mapping of queue names to session ids.
One way I could think of doing it, is writing a simple class on the service 1, which will listen on all queues and forward them using simp template. I fell however this is not an ideal way to do it, and I feel like there might be already a way to do it using Spring.
Can you please advise?
This question got me thinking about the same dilemma I was facing. I have started playing with a custom UserDestinationResolver that arrives at a consistent topic naming scheme that uses just the username and not the session ID used by the default resolver.
That lets me subscribe in JS to "/user/exchange/amq.direct/current-time" but send via a vanilla RabbitMQ application to "/exchange/amqp.direct/users.me.current-time" (to a user named "me").
The latest source code is here and I am "registering" it as a #Bean in an existing #Configuration class that I had.
Here's the custom UserDestinationResolver itself:
public class ConsistentUserDestinationResolver implements UserDestinationResolver {
private static final Pattern USER_DEST_PREFIXING_PATTERN =
Pattern.compile("/user/(?<name>.+?)/(?<routing>.+)/(?<dest>.+?)");
private static final Pattern USER_AUTHENTICATED_PATTERN =
Pattern.compile("/user/(?<routing>.*)/(?<dest>.+?)");
#Override
public UserDestinationResult resolveDestination(Message<?> message) {
SimpMessageHeaderAccessor accessor = MessageHeaderAccessor.getAccessor(message, SimpMessageHeaderAccessor.class);
final String destination = accessor.getDestination();
final String authUser = accessor.getUser() != null ? accessor.getUser().getName() : null;
if (destination != null) {
if (SimpMessageType.SUBSCRIBE.equals(accessor.getMessageType()) ||
SimpMessageType.UNSUBSCRIBE.equals(accessor.getMessageType())) {
if (authUser != null) {
final Matcher authMatcher = USER_AUTHENTICATED_PATTERN.matcher(destination);
if (authMatcher.matches()) {
String result = String.format("/%s/users.%s.%s",
authMatcher.group("routing"), authUser, authMatcher.group("dest"));
UserDestinationResult userDestinationResult =
new UserDestinationResult(destination, Collections.singleton(result), result, authUser);
return userDestinationResult;
}
}
}
else if (accessor.getMessageType().equals(SimpMessageType.MESSAGE)) {
final Matcher prefixMatcher = USER_DEST_PREFIXING_PATTERN.matcher(destination);
if (prefixMatcher.matches()) {
String user = prefixMatcher.group("name");
String result = String.format("/%s/users.%s.%s",
prefixMatcher.group("routing"), user, prefixMatcher.group("dest"));
UserDestinationResult userDestinationResult =
new UserDestinationResult(destination, Collections.singleton(result), result, user);
return userDestinationResult;
}
}
}
return null;
}
}
I have a hard time understanding how to provide values to storm since i am a newbie to storm.
I started with the starter kit. I went through the TestWordSpout and in that the following code provides new values
public void nextTuple() {
Utils.sleep(100);
final String[] words = new String[] {"nathan", "mike", "jackson", "golda", "bertels"};
final Random rand = new Random();
final String word = words[rand.nextInt(words.length)];
_collector.emit(new Values(word));
}
So i see it's taking one word at a time _collector.emit(new Values(word));
How i can provide a collection of words directly.Is this possible?
TestWordSpout.java
What I mean when nextTuple is called a new words is selected at random from the list and emitted. The random list may look like this after certain time interval
#100ms: nathan
#200ms: golda
#300ms: golda
#400ms: jackson
#500ms: mike
#600ms: nathan
#700ms: bertels
What if i already have a collection of this list and just feed it to storm.
Storm is designed and built to process the continuous stream of data. Please see Rationale for the Storm. It's very unlikely that input data is feed into the storm cluster. Generally, the input data to storm is either from the JMS queues, Apache Kafka or twitter feeds etc. I would think, you would like to pass few configurations. In that case, the following would apply.
Considering the Storm design purpose, very limited configuration details can be passed to Storm such as the RDMBS connection details (Oracle/DB2/MySQL etc), JMS provider details(IBM MQ/RabbitMQ etc) or Apache Kafka details/Hbase etc.
For your particular question or providing the configuration details for the above products, there are three ways that I could think
1.Set the configuration details on the instance of the Spout or Bolt
For eg: Declare the instance variables and assign the values as part of the Spout/Bolt constructor as below
public class TestWordSpout extends BaseRichSpout {
List<String> listOfValues;
public TestWordSpout(List<String> listOfValues) {
this.listOfValues=listOfValues;
}
}
On the topology submission class, create an instance of Spout with the list of values
List<String> listOfValues=new ArrayList<String>();
listOfValues.add("nathan");
listOfValues.add("golda");
listOfValues.add("mike");
builder.setSpout("word", new TestWordSpout(listOfValues), 3);
These values are available as instance variables in the nextTuple() method
Please look at the Storm integrations at Storm contrib on the configurations set for RDBMS/Kafka etc as above
2.Set the configurations in the getComponentConfiguration(). This method is used to override the topology configurations, however, you could pass in few details as below
#Override
public Map<String, Object> getComponentConfiguration() {
Map<String, Object> ret = new HashMap<String, Object>();
if(!_isDistributed) {
ret.put(Config.TOPOLOGY_MAX_TASK_PARALLELISM, 1);
return ret;
} else {
List<String> listOfValues=new ArrayList<String>();
listOfValues.add("nathan");
listOfValues.add("golda");
listOfValues.add("mike");
ret.put("listOfValues", listOfValues);
}
return ret;
}
and the configuration details are available in the open() or prepare() method of Spout/Bolt respectively.
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
this.listOfValues=(List<String>)conf.get("listOfValues");
}
3.Declare the configurations in the property file and jar it as part of the jar file that would be submitted to the Storm cluster. The Nimbus node copies the jar file to the worker nodes and makes it available to executor thread. The open()/prepare() method can read the property file and assign to instance variable.
"Values" type accept any kind of objects and any number.
So you can simply send a List for instance from the execute method of a Bolt or from the nextTuple method of a Spout:
List<String> words = new ArrayList<>();
words.add("one word");
words.add("another word");
_collector.emit(new Values(words));
You can add a new Field too, just be sure to declare it in declareOutputFields method
_collector.emit(new Values(words, "a new field value!");
And in your declareOutputFields method
#Override
public void declareOutputFields(final OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declare(new Fields("collection", "newField"));
}
You can get the fields in the next Bolt in the topology from the tuple object given by the execute method:
List<String> collection = (List<String>) tuple.getValueByField("collection");
String newFieldValue = tuple.getStringByField("newField");