Elasticsearch indexed search template generates empty strings in array - elasticsearch

First of all, this is taken from documentation:
Passing an array of strings
GET /_search/template
{
"template": {
"query": {
"terms": {
"status": [
"{{#status}}",
"{{.}}",
"{{/status}}"
]
}
}
},
"params": {
"status": [ "pending", "published" ]
}
}
which is rendered as:
{
"query": {
"terms": {
"status": [ "pending", "published" ]
}
}
However, in my scenario I've done exactly the same template (at least I think so), but it produces a slightly different output for me:
.."filter" : {
"bool" : {
"must" : [{
"terms" : {
"myTerms" : [
"{{#myTerms}}",
"{{.}}",
"{{/myTerms}}"
],
"_cache" : true
}
}
]
}
}..
That's how I call it later:
GET /passport/_search/template
{
"template": {
"id": "myTemplate"
},
"params": {
"myTerms": ["1", "2"]
}
}
However it's rendered as:
.."myTerms" : ["", "1", "2", ""]..
That wouldn't a be a issue, but myTerms are stored as integers and I would like to keep it this way (but if only this is solution, then fine, I can live with it), but then query throws exception that it cannot convert "" into integer type, which is expected behaviour
NumberFormatException[For input string: \"\"]
How should I deal with that? I don't want to store my templates as files, I prefer them being indexed.
This SO question was promising: Pass an array of integers to ElasticSeach template but it's not clear and answer didn't solve my issue (I wasn't allowed to store my template like that).
Elasticsearch version used: 1.6.0
Please advice.

I've seen this requirement before and the solution looks hacky, but it works. Basically, the commas in the template are the problem because Mustache will go over the array and for each element in the array will put the element - {{.}} - but also the comma you are specifying inside {{#myTerms}} and {{/myTerms}}.
And, also, in your case you shouldn't use double quotes - "{{.}}" because the element itself will be surrounded with double quotes. That's why you are seeing "1" in the result. But, if you want to match numbers that should be a list of numbers, not strings.
So, first of all, get rid of the double quotes. This means, surrounding the template with double quotes and escape any double quotes that should make it in the final result (you'll understand shortly by seeing the example below).
Secondly, the hacky part is to simulate the commas in the result and to skip the last comma. Meaning, 1,2,3, shouldn't contain the last comma. The solution is to provide the parameters as a list of tuples - one element of the tuple is the value itself, the other element is a boolean: [{"value":1,"comma":true},{"value":2,"comma":true},{"value":4}]. If comma is true then Mustache should put the ,, otherwise not (this case is for the last element in the array).
POST /_search/template/myTemplate
{"template":"{\"filter\":{\"bool\":{\"must\":[{\"terms\":{\"myTerms\":[{{#myTerms}}{{value}}{{#comma}},{{/comma}}{{/myTerms}}],\"_cache\":true}}]}}}"}
And this is how you should pass the parameters:
{
"template": {
"id": "myTemplate"
},
"params": {
"myTerms": [{"value":1,"comma":true},{"value":2,"comma":true},{"value":4}]
}
}
What this does is to generate something like this:
{
"filter": {
"bool": {
"must": [
{
"terms": {
"myTerms": [1,2,4],
"_cache": true
}
}
]
}
}
}

try this out! (using 'toJson' function)
GET /_search/template
{
"template": {
"query": {
"terms": {
"status": {{#toJson}}status{{/toJson}}
}
}
},
"params": {
"status": [ "pending", "published" ]
}
}

Related

How to find if the value of the field is an array?

By accident i inserted some values inside an index as an array with a single value, instead of inserting it as a single string.
For example:
Instead of inserting "This string" i inserted ["This string"]
I need to find the values that have been inserted in the ["String"] case so that i can update them the way they should be, the normal "String".
The index mapping for the field is a keyword and i can't really seem to a query that finds the values that are arrays.
I can't really delete the index and start over since there is a lot of data in it.
Let's say the index has this mapping:
{
"mappings": {
"_doc": {
"properties": {
"url": {
"type": "keyword"
}
}
}
}
}
And i inserted two values
PUT <index>/_doc
{
"url": "google.com"
}
PUT <index>/_doc
{
"url": ["google.com"]
}
How can i find the documents that are like the second document that are an array of a single value?
Note: This is with elasticsearch version 7.13.1
Elasticsearch doesn't have a dedicated array data type, so "string" and ["string"] are equivalent.
The following query will find both of your documents.
{
"query": {
"term": {
"url": "google.com"
}
}
}
So, to be fair, you don't need to do anything unless it matters for the application that would later consume the search results and actually expect a string instead of array.
Try the script filter
GET <index>/_search
{
"query": {
"bool": {
"filter": {
"script": {
"script": "params._source.url instanceof List"
}
}
}
}
}

How to change field value when grouping in elasticsearch, using regular expression to remove some information?

I have a string field which contains strings like:
"operation XYZ [1.1]"
"operation XYZ [16.1]"
"Operation ABC [22.3]"
"Operation ABC [12.34]"
When I group this set of information, if give me four buckets, but I need to remove the trailing " [...]" and make ElasticSearch to group only the operation itself, in this case, it must return only two buckets.
Reading the ElasticSearch documentation, I've found:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html and https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html
With apparently the following syntax:
POST operations*/_search
{
"aggs": {
"operation": {
"filter": {
"regexp": {
"de_operation": {
"value": "REGULAR EXPRESSION",
}
}
}}
I Tried the following example:
POST operations*/_search
{
"aggs": {
"operation": {
"filter": {
"regexp": {
"de_operation": {
"value": "^(.*) \[.*\]$"
}
}
}, ...}
But the outcome is:
{
"error": {
"root_cause": [
{
"type": "json_parse_exception",
"reason": "Unrecognized character escape '[' (code 91)\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#2360ea24; line: 7, column: 40]"
}
],
"type": "json_parse_exception",
"reason": "Unrecognized character escape '[' (code 91)\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput#2360ea24; line: 7, column: 40]"
},
"status": 500
}
I know the error is on \[\], but the question is: Is this possible to group data based on a regular expression transformation?
Just to help others here, I've managed to find a solution with a Help from my Friend Caio Paulluci.
When using painless language you can execute some java code and use some functions it provides, in this case I converted the object to string then manipulate it using native functions
POST operations*/_search
"aggs": {
"operation": {
"terms": {
"script": """
String o = doc.de_operation.toString();
String t = o.substring(1,orig.length()-1);
int indexOf = t.indexOf(' [');
return indexOf < 0 ? t : t.substring(0,indexOf);
""",
"size": 200000,
"order": {
"_key": "desc"
}
}
For some reason, when converting from the original cast to string, it encloses the string with '[]', I'll dig further to find why and later I'll post the result here.
Another point is I've not found any performance issues with this approach, it seems to add a minor overhead to the whole operation.

Elasticsearch documents that only have terms intersecting a list of terms but no other terms

I have documents that have a list of labels:
{
"fields": {
"label": [
"foo",
"bar",
"baz"
],
"name": [
"Document One"
],
"description" : "A fine first document",
"id" : 1
}
},
{
"fields": {
"label": [
"foo",
"dog"
],
"name": [
"Document Two"
],
"description" : "A fine second document",
"id" : 2
}
}
I have a list of terms:
[ "foo", "bar", "qux", "zip", "baz"]
I want a query that will return documents that have labels in the list of terms - but no other terms.
So given the list above, the query would return Document One, but not Document Two (because it has the term dog that is not in the list of terms.
I've tried doing a query using a not terms filter, like this:
POST /documents/_search?size=1000
{
"fields": [
"id",
"name",
"label"
],
"filter": {
"not": {
"filter" : {
"bool" : {
"must_not": {
"terms": {
"label": [
"foo",
"bar",
"qux",
"zip",
"baz"
]
}
}
}
}
}
}
}
But that didn't work.
How can I create a query that, given a list of terms, will match documents that only contain terms in the list, and no other terms? In other words, all documents should contain a list of labels that are a subset of the list of supplied terms.
I followed Rohit's suggestion, and implemented an Elasticsearch script filter. You will need to configure your Elasticsearch server to allow dynamic (inline) Groovy scripts.
Here's the code for the Groovy script filter:
def label_map = labels.collectEntries { entry -> [entry, 1] };
def count = 0;
for (def label : doc['label'].values) {
if (!label_map.containsKey(label)) {
return 0
} else {
count += 1
}
};
return count
To use it in an Elasticsearch query, you either need to escape all the newline characters, or place the script on one line like this:
def label_map = labels.collectEntries { entry -> [entry, 1] }; def count = 0; for (def label : doc['label'].values) { if (!label_map.containsKey(label)) { return 0 } else { count += 1 } }; return count
Here's an Elasticsearch query that's very similar to what I did, including the script filter:
POST /documents/_search
{
"fields": [
"id",
"name",
"label",
"description"
],
"query": {
"function_score": {
"query": {
"filtered": {
"query": {
"bool": {
"minimum_should_match": 1,
"should" : {
"term" : {
"description" : "fine"
}
}
}
},
"filter": {
"script": {
"script": "def label_map = labels.collectEntries { entry -> [entry, 1] }; def count = 0; for (def label : doc['label'].values) { if (!label_map.containsKey(label)) { return 0 } else { count += 1 } }; return count",
"lang": "groovy",
"params": {
"labels": [
"foo",
"bar",
"qux",
"zip",
"baz"
]
}
}
}
}
},
"functions": [
{
"filter": {
"query": {
"match": {
"label": "qux"
}
}
},
"boost_factor": 25
}
],
"score_mode": "multiply"
}
},
"size": 10
}
My actual query required combining the script filter with a function score query, which was hard to figure out how to do, so I'm including it here as an example.
What this does is use the script filter to select documents whose labels are a subset of the labels passed in the query. For my use case (thousands of documents, not millions) this works very quickly - tens of milliseconds.
The first time the script is used, it takes a long time (about 1000 ms), probably due to compilation and caching. But later invocations are 100 times faster.
A couple of notes:
I used the Sense console Chrome plugin to debug the Elasticsearch query. Much better than using curl on the commandline! (Note that Sense is now part of Marvel, so you can also get it there.
To implement the Groovy script, I first installed the Groovy language on my laptop, and wrote some unit tests, and implemented the script. Once I was sure that the script was working, I formatted it to fit on one line and put it into Sense.
You can script filter to check if the array terms has all the values of label array in a document. I suggest you to make a separate groovy file or plain javascript file, put it in config/scripts/folderToYourScript, and use it in your query infilter: { script : {script_file: file } }
While in script file you can use loop to check the requirement

Querystring search on array elements in Elastic Search

I'm trying to learn elasticsearch with a simple example application, that lists quotations associated with people. The example mapping might look like:
{
"people" : {
"properties" : {
"name" : { "type" : "string"},
"quotations" : { "type" : "string" }
}
}
}
Some example data might look like:
{ "name" : "Mr A",
"quotations" : [ "quotation one, this and that and these"
, "quotation two, those and that"]
}
{ "name" : "Mr B",
"quotations" : [ "quotation three, this and that"
, "quotation four, those and these"]
}
I would like to be able to use the querystring api on individual quotations, and return the people who match. For instance, I might want to find people who have a quotation that contains (this AND these) - which should return "Mr A" but not "Mr B", and so on. How can I achieve this?
EDIT1:
Andrei's answer below seems to work, with data values now looking like:
{"name":"Mr A","quotations":[{"value" : "quotation one, this and that and these"}, {"value" : "quotation two, those and that"}]}
However, I can't seem to get a query_string query to work. The following produces no results:
{
"query": {
"nested": {
"path": "quotations",
"query": {
"query_string": {
"default_field": "quotations",
"query": "quotations.value:this AND these"
}
}
}
}
}
Is there a way to get a query_string query working with a nested object?
Edit2: Yes it is, see Andrei's answer.
For that requirement to be achieved, you need to look at nested objects, not to query a flattened list of values but individual values from that nested object. For example:
{
"mappings": {
"people": {
"properties": {
"name": {
"type": "string"
},
"quotations": {
"type": "nested",
"properties": {
"value": {
"type": "string"
}
}
}
}
}
}
}
Values:
{"name":"Mr A","quotations":[{"value": "quotation one, this and that and these"}, {"value": "quotation two, those and that"}]}
{"name":"Mr B","quotations":[{"value": "quotation three, this and that"}, {"value": "quotation four, those and these"}]}
Query:
{
"query": {
"nested": {
"path": "quotations",
"query": {
"bool": {
"must": [
{ "match": {"quotations.value": "this"}},
{ "match": {"quotations.value": "these"}}
]
}
}
}
}
}
Unfortunately there is no good way to do that.
https://web.archive.org/web/20141021073225/http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/complex-core-fields.html
When you get a document back from Elasticsearch, any arrays will be in
the same order as when you indexed the document. The _source field
that you get back contains exactly the same JSON document that you
indexed.
However, arrays are indexed — made searchable — as multi-value fields,
which are unordered. At search time you can’t refer to “the first
element” or “the last element”. Rather think of an array as a bag of
values.
In other words, it is always considering all values in the array.
This will return only Mr A
{
"query": {
"match": {
"quotations": {
"query": "quotation one",
"operator": "AND"
}
}
}
}
But this will return both Mr A & Mr B:
{
"query": {
"match": {
"quotations": {
"query": "this these",
"operator": "AND"
}
}
}
}
If scripting is enabled, this should work:
"script": {
"inline": "for(element in _source.quotations) { if(element == 'this' && element == 'these') {return true;} }; return false;"
}

ElasticSearch filter by array item

I have the following record in ES:
"authInput" : {
"uID" : "foo",
"userName" : "asdfasdfasdfasdf",
"userType" : "External",
"clientType" : "Unknown",
"authType" : "Redemption_regular",
"uIDExtensionFields" :
[
{
"key" : "IsAccountCreation",
"value" : "true"
}
],
"externalReferences" : []
}
"uIDExtensionFields" is an array of key/value pairs. I want to query ES to find all records where:
"uIDExtensionFields.key" = "IsAccountCreation"
AND "uIDExtensionFields.value" = "true"
This is the filter that I think I should be using but it never returns any data.
GET devdev/authEvent/_search
{
"size": 10,
"filter": {
"and": {
"filters": [
{
"term": {
"authInput.uIDExtensionFields.key" : "IsAccountCreation"
}
},
{
"term": {
"authInput.uIDExtensionFields.value": "true"
}
}
]
}
}
}
Any help you guys could give me would be much appreciated.
Cheers!
UPDATE: WITH THE HELP OF THE RESPONSES BELOW HERE IS HOW I SOLVED MY PROBLEM:
lowercase the value that I was searching for. (changed "IsAccoutCreation" to "isaccountcreation")
Updated the mapping so that "uIDExtensionFields" is a nested type
Updated my filter to the following:
_
GET devhilden/authEvent/_search
{
"size": 10,
"filter": {
"nested": {
"path": "authInput.uIDExtensionFields",
"query": {
"bool": {
"must": [
{
"term": {
"authInput.uIDExtensionFields.key": "isaccountcreation"
}
},
{
"term": {
"authInput.uIDExtensionFields.value": "true"
}
}
]
}
}
}
}
}
There are a few things probably going wrong here.
First, as mconlin points out, you probably have a mapping with the standard analyzer for your key field. It'll lowercase the key. You probably want to specify "index": "not_analyzed" for the field.
Secondly, you'll have to use nested mappings for this document structure and specify the key and the value in a nested filter. That's because otherwise, you'll get a match for the following document:
"uIDExtensionFields" : [
{
"key" : "IsAccountCreation",
"value" : "false"
},
{
"key" : "SomeOtherField",
"value" : "true"
}
]
Thirdly, you'll want to be using the bool-filter's must and not and to ensure proper cachability.
Lastly, you'll want to put your filter in the filtered-query. The top-level filter is for when you want hits to be filtered, but facets/aggregations to not be. That's why it's renamed to post_filter in 1.0.
Here's a few resources you'll want to check out:
Troubleshooting Elasticsearch searches, for Beginners covers the first two issues.
Managing Relations in ElasticSearch covers nested docs (and parent/child)
all about elasticsearch filter bitsets covers and vs. bool.

Resources