How to use BasicAuth with ElasticSearch Connector on Flink - elasticsearch

I want to use the elastic producer on flink but I have some trouble for authentification:
I have Nginx in front of my elastic search cluster, and I use basic auth in nginx.
But with the elastic search connector I can't add the basic auth in my url (because of InetSocketAddress)
did you have an Idea to use elasticsearch connector with basic auth ?
Thanks for your time.
there is my code :
val configur = new java.util.HashMap[String, String]
configur.put("cluster.name", "cluster")
configur.put("bulk.flush.max.actions", "1000")
val transportAddresses = new java.util.ArrayList[InetSocketAddress]
transportAddresses.add(new InetSocketAddress(InetAddress.getByName("cluster.com"), 9300))
jsonOutput.filter(_.nonEmpty).addSink(new ElasticsearchSink(configur,
transportAddresses,
new ElasticsearchSinkFunction[String] {
def createIndexRequest(element: String): IndexRequest = {
val jsonMap = parse(element).values.asInstanceOf[java.util.HashMap[String, String]]
return Requests.indexRequest()
.index("flinkTest")
.source(jsonMap);
}
override def process(element: String, ctx: RuntimeContext, indexer: RequestIndexer) {
indexer.add(createIndexRequest(element))
}
}))

Flink uses the Elasticsearch Transport Client which connects using a binary protocol on port 9300.
Your nginx proxy is sitting in front of the HTTP interface on port 9200.
Flink isn't going to use your proxy, so there's no need to provide authentication.

If you need to use a HTTP Client to connect Flink with Elasticsearch, one solution is to use Jest Library.
You have to create a custom SinkFunction, like this basic java class :
package fr.gfi.keenai.streaming.io.sinks.elasticsearch5;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.Index;
public class ElasticsearchJestSinkFunction<T> extends RichSinkFunction<T> {
private static final long serialVersionUID = -7831614642918134232L;
private JestClient client;
#Override
public void invoke(T value) throws Exception {
String document = convertToJsonDocument(value);
Index index = new Index.Builder(document).index("YOUR_INDEX_NAME").type("YOUR_DOCUMENT_TYPE").build();
client.execute(index);
}
#Override
public void open(Configuration parameters) throws Exception {
// Construct a new Jest client according to configuration via factory
JestClientFactory factory = new JestClientFactory();
factory.setHttpClientConfig(new HttpClientConfig.Builder("http://localhost:9200")
.multiThreaded(true)
// Per default this implementation will create no more than 2 concurrent
// connections per given route
.defaultMaxTotalConnectionPerRoute(2)
// and no more 20 connections in total
.maxTotalConnection(20)
// Basic username and password authentication
.defaultCredentials("YOUR_USER", "YOUR_PASSWORD")
.build());
client = factory.getObject();
}
private String convertToJsonDocument(T value) {
//TODO
return "{}";
}
}
Note that you can also use bulk operations for more speed.
An exemple of Jest implementation for Flink is described at the part "Connecting Flink to Amazon RS" of this post

Related

How to redirect Prometheus Metrics to the default spring boot server

I am trying to expose a custom Gauge metric from my Spring Boot Application. I am using Micrometer with the Prometheus registry to do so. I have set up the PrometheusRegistry and configs as per - Micrometer Samples - Github but it creates one more HTTP server for exposing the Prometheus metrics. I need to redirect or expose all the metrics to the Spring boot's default context path - /actuator/prometheus instead of a new context path on a new port. I have implemented the following code so far -
PrometheusRegistry.java -
package com.xyz.abc.prometheus;
import java.io.IOException;
import java.io.OutputStream;
import java.net.InetSocketAddress;
import java.time.Duration;
import com.sun.net.httpserver.HttpServer;
import io.micrometer.core.lang.Nullable;
import io.micrometer.prometheus.PrometheusConfig;
import io.micrometer.prometheus.PrometheusMeterRegistry;
public class PrometheusRegistry {
public static PrometheusMeterRegistry prometheus() {
PrometheusMeterRegistry prometheusRegistry = new PrometheusMeterRegistry(new PrometheusConfig() {
#Override
public Duration step() {
return Duration.ofSeconds(10);
}
#Override
#Nullable
public String get(String k) {
return null;
}
});
try {
HttpServer server = HttpServer.create(new InetSocketAddress(8081), 0);
server.createContext("/sample-data/prometheus", httpExchange -> {
String response = prometheusRegistry.scrape();
httpExchange.sendResponseHeaders(200, response.length());
OutputStream os = httpExchange.getResponseBody();
os.write(response.getBytes());
os.close();
});
new Thread(server::start).run();
} catch (IOException e) {
throw new RuntimeException(e);
}
return prometheusRegistry;
}
}
MicrometerConfig.java -
package com.xyz.abc.prometheus;
import io.micrometer.core.instrument.MeterRegistry;
public class MicrometerConfig {
public static MeterRegistry carMonitoringSystem() {
// Pick a monitoring system here to use in your samples.
return PrometheusRegistry.prometheus();
}
}
Code snippet where I am creating a custom Gauge metric. As of now, it's a simple REST API to test - (Please read the comments in between)
#SuppressWarnings({ "unchecked", "rawtypes" })
#RequestMapping(value = "/sampleApi", method= RequestMethod.GET)
#ResponseBody
//This Timed annotation is working fine and this metrics comes in /actuator/prometheus by default
#Timed(value = "car.healthcheck", description = "Time taken to return healthcheck")
public ResponseEntity healthCheck(){
MeterRegistry registry = MicrometerConfig.carMonitoringSystem();
AtomicLong n = new AtomicLong();
//Starting from here none of the Gauge metrics shows up in /actuator/prometheus path instead it goes to /sample-data/prometheus on port 8081 as configured.
registry.gauge("car.gauge.one", Tags.of("k", "v"), n);
registry.gauge("car.gauge.two", Tags.of("k", "v1"), n, n2 -> n2.get() - 1);
registry.gauge("car.help.gauge", 89);
//This thing never works! This gauge metrics never shows up in any URI configured
Gauge.builder("car.gauge.test", cpu)
.description("car.device.cpu")
.tags("customer", "demo")
.register(registry);
return new ResponseEntity("Car is working fine.", HttpStatus.OK);
}
I need all the metrics to show up inside - /actuator/prometheus instead of a new HTTP Server getting created. I know that I am explicitly creating a new HTTP Server so metrics are popping up there. Please let me know how to avoid creating a new HTTP Server and redirect all the prometheus metrics to the default path - /actuator/prometheus. Also if I use Gauge.builder to define a custom gauge metrics, it never works. Please explain how I can make that work also. Let me know where I am doing wrong.
Thank you.
Every time you call MicrometerConfig.carMonitoringSystem(); it is creating a new prometheus registry (and trying to start a new server)
You need to inject the MeterRegistry in your class that is creating the gauge and use the injected MeterRegistry that way.

Elasticsearch Connector as Source in Flink

I used Elasticsearch Connector as a Sink to insert data into Elasticsearch (see : https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/connectors/elasticsearch.html).
But, I did not found any connector to get data from Elasticsearch as source.
Is there any connector or example to use Elasticsearch documents as source in a Flink pipline?
Regards,
Ali
I don't know of an explicit ES source for Flink. I did see one user talking about using elasticsearch-hadoop as a HadoopInputFormat with Flink, but I don't know if that worked for them (see their code).
I finaly defined a simple read from ElasticSearch function
public static class ElasticsearchFunction
extends ProcessFunction<MetricMeasurement, MetricPrediction> {
public ElasticsearchFunction() throws UnknownHostException {
client = new PreBuiltTransportClient(settings)
.addTransportAddress(new TransportAddress(InetAddress.getByName("YOUR_IP"), PORT_NUMBER));
}
#Override
public void processElement(MetricMeasurement in, Context context, Collector<MetricPrediction> out) throws Exception {
MetricPrediction metricPrediction = new MetricPrediction();
metricPrediction.setMetricId(in.getMetricId());
metricPrediction.setGroupId(in.getGroupId());
metricPrediction.setBucket(in.getBucket());
// Get the metric measurement from Elasticsearch
SearchResponse response = client.prepareSearch("YOUR_INDEX_NAME")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(QueryBuilders.termQuery("YOUR_TERM", in.getMetricId())) // Query
.setPostFilter(QueryBuilders.rangeQuery("value").from(0L).to(50L)) // Filter
.setFrom(0).setSize(1).setExplain(true)
.get();
SearchHit[] results = response.getHits().getHits();
for(SearchHit hit : results){
String sourceAsString = hit.getSourceAsString();
if (sourceAsString != null) {
ObjectMapper mapper = new ObjectMapper();
MetricMeasurement obj = mapper.readValue(sourceAsString, MetricMeasurement.class);
obj.getMetricId();
metricPrediction.setPredictionValue(obj.getValue());
}
}
out.collect(metricPrediction);
}
}
Hadoop Compatibility + Elasticsearch Hadoop
https://github.com/cclient/flink-connector-elasticsearch-source

Feign with RibbonClient and Consul discovery without Spring Cloud

I was trying to setup Feign to work with RibbonClient, something like MyService api = Feign.builder().client(RibbonClient.create()).target(MyService.class, "https://myAppProd");, where myAppProd is an application which I can see in Consul. Now, if I use Spring annotations for the Feign client (#FeignClient("myAppProd"), #RequestMapping), everything works as Spring Cloud module will take care of everything.
If I want to use Feign.builder() and #RequestLine, I get the error:
com.netflix.client.ClientException: Load balancer does not have available server for client: myAppProd.
My first initial thought was that Feign was built to work with Eureka and only Spring Cloud makes the integration with Consul, but I am unsure about this.
So, is there a way to make Feign work with Consul without Spring Cloud?
Thanks in advance.
In my opinion, it's not feign work with consul, its feign -> ribbon -> consul.
RibbonClient needs to find myAppProd's serverList from its LoadBalancer.
Without ServerList, error: 'does not have available server for client'.
This job has been done by SpringCloudConsul and SpringCloudRibbon project, of course you can write another adaptor, it's just some glue code. IMHO, you can import this spring dependency into your project, but use it in non-spring way . Demo code:
just write a new feign.ribbon.LBClientFactory, that generate LBClient with ConsulServerList(Spring's class).
public class ConsulLBFactory implements LBClientFactory {
private ConsulClient client;
private ConsulDiscoveryProperties properties;
public ConsulLBFactory(ConsulClient client, ConsulDiscoveryProperties consulDiscoveryProperties) {
this.client = client;
this.properties = consulDiscoveryProperties;
}
#Override
public LBClient create(String clientName) {
IClientConfig config =
ClientFactory.getNamedConfig(clientName, DisableAutoRetriesByDefaultClientConfig.class);
ConsulServerList consulServerList = new ConsulServerList(this.client, properties);
consulServerList.initWithNiwsConfig(config);
ZoneAwareLoadBalancer<ConsulServer> lb = new ZoneAwareLoadBalancer<>(config);
lb.setServersList(consulServerList.getInitialListOfServers());
lb.setServerListImpl(consulServerList);
return LBClient.create(lb, config);
}
}
and then use it in feign:
public class Demo {
public static void main(String[] args) {
ConsulLBFactory consulLBFactory = new ConsulLBFactory(
new ConsulClient(),
new ConsulDiscoveryProperties(new InetUtils(new InetUtilsProperties()))
);
RibbonClient ribbonClient = RibbonClient.builder()
.lbClientFactory(consulLBFactory)
.build();
GitHub github = Feign.builder()
.client(ribbonClient)
.decoder(new GsonDecoder())
.target(GitHub.class, "https://api.github.com");
List<Contributor> contributors = github.contributors("OpenFeign", "feign");
for (Contributor contributor : contributors) {
System.out.println(contributor.login + " (" + contributor.contributions + ")");
}
}
interface GitHub {
#RequestLine("GET /repos/{owner}/{repo}/contributors")
List<Contributor> contributors(#Param("owner") String owner, #Param("repo") String repo);
}
public static class Contributor {
String login;
int contributions;
}
}
you can find this demo code here, add api.github.com to your local consul before running this demo.

How to balance the Elastic search nodes using TransportClient java code

looking for expert's help(i am newbie on elastic search)... have multiple nodes of elastic search.
i am using ElasticSearch java lib for indexing the json docs. would like to know how to handle the node balancing,is it possible to handle that from client side?
---elasticSearch transport client code------
public static Client getTransportClient(String host, int port) {
Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "ccw_cat_es")
.put("node.name", "catsrch-pdv1-01")
.build();
return new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress(host, port));
}
public static IndexResponse doIndex(Client client, String index, String type, String id, Map<String, Object> data) {
return client
.prepareIndex(index, type, id)
.setSource(data)
.execute()
.actionGet();
}
public static void main(String[] args) {
Client client = getTransportClient("catsrch-pdv1-01", 9200);
String index = "orderstatussearch";
String type = "osapi";
String id = null;
Map<String, Object> data = new HashMap<String, Object>();
data.put("OrderNumber", "444");
data.put("PO", "123");
data.put("WID", "ab234");
id= "444";
IndexResponse result = doIndex(client, index, type, id, data);
}
The TransportClient will automatically use a round robin strategy to load balance against nodes that it is connected too. In your case, you are only connecting to one node, so there is nothing to balance. You can add other nodes to the list and it will balance them appropriately.
Alternatively, you can "sniff" out the data nodes automatically by just connecting to one of them with an extra setting applied:
Settings settings = ImmutableSettings.settingsBuilder()
// ...
.put("client.transport.sniff", true)
// ...
.build()
This will then round robin against all data nodes that it finds in the cluster state.
This probably leads to the question: why isn't this the default? The reason is that, if you have standalone client nodes, then they are better proxies to the cluster rather than directly communicating with data nodes. For smaller clusters, this is a perfectly acceptable strategy though.

How to purge/delete message from weblogic JMS queue

Is there and way (apart from consuming the message) I can purge/delete message programmatically from JMS queue. Even if it is possible by wlst command line tool, it will be of much help.
Here is an example in WLST for a Managed Server running on port 7005:
connect('weblogic', 'weblogic', 't3://localhost:7005')
serverRuntime()
cd('/JMSRuntime/ManagedSrv1.jms/JMSServers/MyAppJMSServer/Destinations/MyAppJMSModule!QueueNameToClear')
cmo.deleteMessages('')
The last command should return the number of messages it deleted.
You can use JMX to purge the queue, either from Java or from WLST (Python). You can find the MBean definitions for WLS 10.0 on http://download.oracle.com/docs/cd/E11035_01/wls100/wlsmbeanref/core/index.html.
Here is a basic Java example (don't forget to put weblogic.jar in the CLASSPATH):
import java.util.Hashtable;
import javax.management.MBeanServerConnection;
import javax.management.remote.JMXConnector;
import javax.management.remote.JMXConnectorFactory;
import javax.management.remote.JMXServiceURL;
import javax.management.ObjectName;
import javax.naming.Context;
import weblogic.management.mbeanservers.runtime.RuntimeServiceMBean;
public class PurgeWLSQueue {
private static final String WLS_USERNAME = "weblogic";
private static final String WLS_PASSWORD = "weblogic";
private static final String WLS_HOST = "localhost";
private static final int WLS_PORT = 7001;
private static final String JMS_SERVER = "wlsbJMSServer";
private static final String JMS_DESTINATION = "test.q";
private static JMXConnector getMBeanServerConnector(String jndiName) throws Exception {
Hashtable<String,String> h = new Hashtable<String,String>();
JMXServiceURL serviceURL = new JMXServiceURL("t3", WLS_HOST, WLS_PORT, jndiName);
h.put(Context.SECURITY_PRINCIPAL, WLS_USERNAME);
h.put(Context.SECURITY_CREDENTIALS, WLS_PASSWORD);
h.put(JMXConnectorFactory.PROTOCOL_PROVIDER_PACKAGES, "weblogic.management.remote");
JMXConnector connector = JMXConnectorFactory.connect(serviceURL, h);
return connector;
}
public static void main(String[] args) {
try {
JMXConnector connector =
getMBeanServerConnector("/jndi/"+RuntimeServiceMBean.MBEANSERVER_JNDI_NAME);
MBeanServerConnection mbeanServerConnection =
connector.getMBeanServerConnection();
ObjectName service = new ObjectName("com.bea:Name=RuntimeService,Type=weblogic.management.mbeanservers.runtime.RuntimeServiceMBean");
ObjectName serverRuntime = (ObjectName) mbeanServerConnection.getAttribute(service, "ServerRuntime");
ObjectName jmsRuntime = (ObjectName) mbeanServerConnection.getAttribute(serverRuntime, "JMSRuntime");
ObjectName[] jmsServers = (ObjectName[]) mbeanServerConnection.getAttribute(jmsRuntime, "JMSServers");
for (ObjectName jmsServer: jmsServers) {
if (JMS_SERVER.equals(jmsServer.getKeyProperty("Name"))) {
ObjectName[] destinations = (ObjectName[]) mbeanServerConnection.getAttribute(jmsServer, "Destinations");
for (ObjectName destination: destinations) {
if (destination.getKeyProperty("Name").endsWith("!"+JMS_DESTINATION)) {
Object o = mbeanServerConnection.invoke(
destination,
"deleteMessages",
new Object[] {""}, // selector expression
new String[] {"java.lang.String"});
System.out.println("Result: "+o);
break;
}
}
break;
}
}
connector.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Works great on a single node environment, but what happens if you are on an clustered environment with ONE migratable JMSServer (currently on node #1) and this code is executing on node #2. Then there will be no JMSServer available and no message will be deleted.
This is the problem I'm facing right now...
Is there a way to connect to the JMSQueue without having the JMSServer available?
[edit]
Found a solution: Use the domain runtime service instead:
ObjectName service = new ObjectName("com.bea:Name=DomainRuntimeService,Type=weblogic.management.mbeanservers.domainruntime.DomainRuntimeServiceMBean");
and be sure to access the admin port on the WLS-cluster.
if this is one time, the easiest would be to do it through the console...
the program in below link helps you to clear only pending messages from queue based on redelivered message parameter
http://techytalks.blogspot.in/2016/02/deletepurge-pending-messages-from-jms.html

Resources