How do you add routing for elasticsearch destination in 2.5 - elasticsearch

I'm using StreamSets (2.5.1.1) to pipe data to Elasticsearch (5.4.1). My index requires routing but I do not see how to add routing to the Elasticsearch destination in my pipeline. I thought I could just add a "routing" http param but it needs to be dynamic and SS doesn't like my EL expression to my record (tried something like ${record:value("/myRoutingId")} as a value.
What is the right way to add routing?

This feature is coming in SDC 2.7.0.0 under SDC-5244.

Related

Apache Camel Specify _id in Elasticsearch connector

I'm using the elastic rest component https://camel.apache.org/components/latest/elasticsearch-rest-component.html
I'm not able to specify the id of a document in any way. The documentation seems to have a lack in this sense
the code is more o less like
from(RAW_ROUTE)
.process(new RawProcessor())
.to("elasticsearch://local?operation=Index&indexName=raw&indexType=_doc")
;
The RawProcessor set a Map as body of the exchange object
Many thanks in advance
_id can be set with indexId header.
See Message Operations:
Index: Adds content to an index and returns the content’s indexId in the body. You can set the indexId by setting the message header with the key "indexId"
You can find example usage in unit test ElasticsearchIndexTest#testIndexWithIDInHeader.

Nifi: Route on Attribute processor to wrong processor

I am testing a flow in NIFI that checks for specific value of a counter using REST API. I am able to extract it correct value from REST response. However, when I check the condition in Route on Attribute processor, the expected matched condition's results are routing to unmatched processor.
Attached is the :
Flow and configuration
I have already checked my response to be "1". But its routing to unmatched branch.
Is there something wrong with NIFI expression language I am using?
Jasim,
Initial setup check counter attribute in which value is 1.
And modify the expression language like ${counter:equals('1')} or ${counter:matches('1')} instead of contains.
because contains not suitable for your scanerio.
Hope this helpful for you.

Why I can't search for UUID in elasticsearch database using filters

I have Django app which use elasticsearch database, I use elasticsearch-dsl and all my filters and queries works. But I have a problem with one parameter, it's UUID. I always got 0 results from my request in shell:
s = Search(index='my_index_name').filter('term', UUID='0deaa49b-15b6-4c10-acb7-d98df800e0df')
response=s.execute()
response
I use django-rest-elasticsearch and I have the same issue, I got correct REST result with all my filters, but not with UUID request. Something like this works, but I need to use filtering.
q = Q("multi_match", query="0deaa49b-15b6-4c10-acb7-d98df800e0df", fields=["UUID",])
response=s.execute()
response
Maybe someone know hot to use UUID in my REST, because UUID=0deaa49b-15b6-4c10-acb7-d98df800e0df don't work.

How do I log all queries in embedded ElasticSearch?

I'm trying to debug an ElasticSearch query. I've enabled explain for the problematic query, and that is showing that the query is doing a product of intermediate scores where it should be doing a sum. (I'm creating the query request using elastic4s.)
The problem is I cannot see what the generated query actually is. I want to determine whether the bug is in elastic4s (generating the query request incorrectly), in my code, or in elasticsearch. So I've enabled logging for the embedded elasticsearch instance used in the tests using the following code:
ESLoggerFactory.setDefaultFactory(new Slf4jESLoggerFactory())
val settings = Settings.settingsBuilder
.put("path.data", dataDirPath)
.put("path.home", "/var/elastic/")
.put("cluster.name", clusterName)
.put("http.enabled", httpEnabled)
.put("index.number_of_shards", 1)
.put("index.number_of_replicas", 0)
.put("discovery.zen.ping.multicast.enabled", false)
.put("index.refresh_interval", "10ms")
.put("script.engine.groovy.inline.search", true)
.put("script.engine.groovy.inline.update", true)
.put("script.engine.groovy.inline.mapping", true)
.put("index.search.slowlog.threshold.query.debug", "0s")
.put("index.search.slowlog.threshold.fetch.debug", "0s")
.build
but I can't find any queries being logged in the log file configured in my logback.xml. Other log messages from elasticsearch are appearing there, just not the actual queries.
You can't, at least not directly, at least not in ES versions currently available. It's something that has been discussed at some length (eg https://github.com/elastic/elasticsearch/issues/9172 and https://github.com/elastic/elasticsearch/issues/12187) it seems like this may change soon, with the rewrite of the tasks API. In the meantime, you can use things like ES Restlog (https://github.com/etsy/es-restlog) and/or put nginx in front of ES and capture the queries in the nginx logs. You can also use tcpdump (eg tcpdump -vvv -x -X -i any port 9200) and capture the query as it's running on the server. One last option is to modify your application and echo the query instead of executing it (and/or inserting the query into ES itself before you execute it, since the query itself is JSON).
In the specific case of elastic4s, it offers the ability to call .show on the elastic4s query object to generate what the JSON body part of the request would have been if the JSON-over-HTTP protocol had been used to send the request, for most types of request. This can then be logged at a convenient point in your code, e.g. if you have one method that generates all ES search queries. The code in Elasticsearch that generates the fake JSON could still have bugs of course, so it should not entirely be trusted. However, it's worth trying to reproduce the issue with the output of .show using Sense against a real Elasticsearch cluster over HTTP - if you can, you (a) know that it's not an elastic4s bug, and (b) can easily manipulate the JSON to try to figure out what's causing the problem.
show calls toString in some cases, so with the plain Elasticsearch API or another JVM-based wrapper on top of it, you can call that to get the JSON string to log.
With embedded Elasticsearch, this is as good as you're going to get in terms of logging - short of putting a breakpoint on the builder invocations and observing the actual Java Elasticsearch request objects that are created (which is the most accurate approach).

ElasticSearch API POST/PUT DeDupe

I am looking to use the Elastic Search RESTful API to send data to my ES instance. Here is some sample data:
[{"subject":"matt","predicate":"likes","object":"coffee","label":"1_10"}]
[{"subject":"james","predicate":"likes","object":"water","label":"1_10"}]
[{"subject":"leo","predicate":"likes","object":"liquor","label":"1_10"}]
[{"subject":"matt","predicate":"likes","object":"coffee","label":"1_10"}]
[{"subject":"matt","predicate":"likes","object":"coffee","label":"1_10"}]
My post call looks like:
"http://" + url + "/something/quads/
With the JSON payload.
I was looking at a put call and was trying the following:
"http://" + url + "/something/quads/_create
from this documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/create-doc.html
The problem is that it created the ID in ES as _create Is there something I am doing wrong?
If you use POST calls with URLs like /something/quads/ then ES will automatically generate IDs for your documents.
If, instead, you want to use PUT calls, you need to provide document IDs yourself in the URL /something/quads/123, /something/quads/456, etc.
In your second URL /something/quads/_create, you're missing the document ID. It should be /something/quads/123/_create. Check the docs you've linked to again and you'll see.
Also note that the difference between the two following commands
PUT /something/quads/123
PUT /something/quads/123/_create
is that the second will fail if a document with the ID 123 already exists. The first command, however, will always succeed and overwrite the document with ID 123 if one exists.

Resources