JsonQueryElasticSearch Processor in Nifi

JsonQueryElasticSearch Processor in Nifi - apache-nifi

I am working with JsonQueryElasticSearch Processor in Nifi (v1.9.2).
The query string is as below:
{
"query": {
"bool": {
"must": [
{ "match": { "event": "New" }},
{ "match": { "uniqueId": "${unique_id}"}},
{ "match": { "header.schemaVersion": "1.3" }}
]
}
},
"sort" : {
"header.sourceSystemCreationTimestamp" : {"order" : "desc"}
}
}
It's not giving me any result as value of ${unique_id} flow attribute within query is blank. If I hard code the value in query it works as expected. At processor level, I do see the value for ${unique_id} flow attribute.
Thanks much for your time and help.

(I'm the developer who wrote this processor)
I tried to duplicate the issue by doing the following:
Creating an index with several test documents.
Using GenerateFlowFile -> JsonQueryElasticsearch.
Putting this simple query in the query parameter of JsonQueryElasticsearch:
{
"query": {
"match": {
"from": "${sender}"
}
},
"aggs": {
"senders": {
"terms": {
"field": "from",
"size": 10
}
}
}
}
All of the expected results were returned. If you are attempting to pass the query in via the flowfile content, you cannot use Expression Language (${unique_id}). That's expected behavior because Expression Language is not evaluated on the contents of flowfiles, only on configuration properties.

Related

Elasticsearch: How to filter results with a specific word in a value using elasticsearch

I need to add a parameter to my search that filters results containing a specific word in a value. The query is searching for user history records and contains a url key. I need to filter out /history and any other url containing that string.
Here's my current query:
GET /user_log/_search
{
"size" : 50,
"query": {
"match": {
"user_id": 56678
}
}
}
Here's an example of a record, boiled down to just the value we're looking at:
"_source": {
"url": "/history?page=2&direction=desc",
},
How can the parameters of the search be changed to filter out this result.

You can use the filter param of boolean query in Elasticsearch.
if your url field is of type keyword, you can use the below query
{
"query": {
"bool": {
"must": {
"match": {
"user_id": 56678
}
},
"filter": { --> note filter
"term": {
"url": "/history"
}
}
}
}
}

I found a way to solve my specific issue. Instead of filtering on the url I'm filtering on a different value. Here's what I'm using now:
{
"size" : 50,
"query": {
"bool" : {
"must" : {
"match" : { "user_id" : 56678 }
},
"must_not": {
"match" : { "controller": "History" }
}
}
}
}
I'm still going to leave this question open for a while to see if anyone has other ways of solving the original problem.

Spring Data Elasticsearch - Is Inner Hit supported at root level on query?

I'm trying to use ElasticsearchOperations to write a query like this one:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "dialogLines",
"query": {
"match_phrase": {
"dialogLines.text": {
"query": "No problem"
}
}
}
}
}
]
}
},
"collapse": {
"field": "movieId",
"inner_hits" : {
"name": "by_movie",
"collapse" : {"field" : "boundaryGroup"},
"sort": [{ "boundaryStartInMillis": "desc" }],
"size": 6
}
}
}
I know NestedQuery supports innerHit(InnerHitBuilder innerHitBuilder), but the root level query of type NativeSearchQuery doesn't have that method.
Is there a way to write this query using ElasticsearchOperations?
I'm on version org.springframework.data:spring-data-elasticsearch:3.2.9.RELEASE
Running Elasticsearch v6.8.2

When you create a query using NativeSearchQuery, you normally use
Query query = new NativeSearchQueryBuilder()
.withQuery(queryBuilder)
.build();
The queryBuilder argument is any implementation of org.elasticsearch.index.query.QueryBuilder, so you can use any of the Elasticsearch query builders that you like.
Edit 29.09.2020:
NativeSsearchQuery supports a collapse field with the NativeSearchQueryBuilder.withCollapseField(String) method. But there is no support for the inner hits of a collapse field yet.
I created https://jira.spring.io/browse/DATAES-939 to address that.

Elasticsearch: Search in an array of JSONs

I'm using Elasticsearch with the python library and I have a problem using the search query when the object become a little bit complex. I have objects build like that in my index:
{
"id" : 120,
"name": bob,
"shared_status": {
"post_id": 123456789,
"text": "This is a sample",
"urls" : [
{
"url": "http://test.1.com",
"displayed_url": "test.1.com"
},
{
"url": "http://blabla.com",
"displayed_url": "blabla.com"
}
]
}
}
Now I want to do a query that will return me this document only if in one of the displayed URL's a substring "test" and there is a field "text" in the main document. So I did this query:
{
"query": {
"bool": {
"must": [
{"exists": {"field": "text"}}
]
}
}
}
}
But I don't know what query to add for the part: one of the displayed URL's a substring "test"
Is that posssible? How does the iteration on the list works?

If you didn't define an explicit mapping for your schema, elasticsearch creates a default mapping based on the data input.
urls will be of type object
displayed_url will be of type string and using standard analyzer
As you don't need any association between url and displayed_url, the current schema will work fine.
You can use a match query for full text match
GET _search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "text"
}
},
{
"match": {
"urls.displayed_url": "test"
}
}
]
}
}
}

How to check field data is numeric when using inline Script in ElasticSearch

Per our requirement we need to find the max ID of the document before adding new document. Problem here is doc may contain string data also So had to use inline script on the elastic query to find out max id only for the document which has integer data otherwise returning 0. am using following inline script query to find max-key but not working. can you help me onthis ?.
{
"size":0,
"query":
{"bool":
{"filter":[
{"term":
{"Name":
{
"value":"Test2"
}
}}
]
}},
"aggs":{
"MaxId":{
"max":{
"field":"Key","script":{
"inline":"((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"}}
}
}
}

The error is because the max aggregation only supports numeric fields, i.e. you cannot specify a string field (i.e. Key) in a max aggregation.
Simply remove the "field":"Key" part and only keep the script part
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"Name": "Test2"
}
}
]
}
},
"aggs": {
"MaxId": {
"max": {
"script": {
"source": "((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"
}
}
}
}
}

Elasticsearch terms query on array of values

I have data on ElasticSearch index that looks like this
{
"title": "cubilia",
"people": [
"Ling Deponte",
"Dana Madin",
"Shameka Woodard",
"Bennie Craddock",
"Sandie Bakker"
]
}
Is there a way for me to do a search for all the people whos name starts with
"ling" (should be case insensitive) and get distinct terms properly cased "Ling Deponte" not "ling deponte"?
I am find with changing mappings on the index in any way.
Edit does what I want but is really bad query:
{
"size": 0,
"aggs": {
"person": {
"filter": {
"bool":{
"should":[
{"regexp":{
"people.raw":"(.* )?[lL][iI][nN][gG].*"
}}
]}
},
"aggs": {
"top-colors": {
"terms": {
"size":10,
"field": "people.raw",
"include":
{
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
}
}
people.raw is not_analyzed

Yes, and you can do it without a regular expression by taking advantage of Elasticsearch's full text capabilities.
GET /test/_search
{
"query": {
"match_phrase": {
"people": "Ling"
}
}
}
Note: This could also be match or match_phrase_prefix in this case. The match_phrase* queries imply an order of the values in the text. match simply looks for any of the values. Since you only have one value, it's pretty much irrelevant.
The problem is that you cannot limit the document responses to just that name because the search API returns documents. With that said, you can use nested documents and get the desired behavior via inner_hits.
You do not want to do wildcard prefixing whenever possible because it simply does not work at scale. To put it in SQL terms, that's like doing a full table scan; you effectively lose the benefit of the inverted index because it has to walk it entirely to find the actual start.
Combining the two should work pretty well though. Here, I use the query to widdle down results to what you are interested in, then I use your inner aggregation to only include based on the value.
{
"size": 0,
"query": {
"match_phrase": {
"people": "Ling"
}
}
"aggs": {
"person": {
"terms": {
"size":10,
"field": "people.raw",
"include": {
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}

Hi Please find the query it may help for your request
GET skills/skill/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"wildcard": {
"skillNames.raw": "jav*"
}
}
]
}
}
}
}
}
My intention is to find documents starting with the "jav"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

JsonQueryElasticSearch Processor in Nifi - apache-nifi

Related

Elasticsearch: How to filter results with a specific word in a value using elasticsearch

Spring Data Elasticsearch - Is Inner Hit supported at root level on query?

Elasticsearch: Search in an array of JSONs

How to check field data is numeric when using inline Script in ElasticSearch

Elasticsearch terms query on array of values

Categories

Resources