Elasticsearch query on array index - elasticsearch

How do I query/filter by index of an array in elasticsearch?
I have a document like this:-
PUT /edi832/record/1
{
"LIN": [ "UP", "123456789" ]
}
I want to search if LIN[0] is "UP" and LIN[1] exists.
Thanks.

This might look like a hack , but then it will work for sure.
First we apply token count type along with multi field to capture the the number of tokens as a field.
So the mapping will look like this -
{
"record" : {
"properties" : {
"LIN" : {
"type" : "string",
"fields" : {
"word_count": {
"type" : "token_count",
"store" : "yes",
"analyzer" : "standard"
}
}
}
}
}
}
LINK - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#token_count
So to check if the second field exists , its as easy as checking if this field value is more than or equal to 2.
Next we can use the token filter to check if the token "up" exists in position 0.
We can use the scripted filter to check this.
Hence a query like below should work -
{
"query": {
"filtered": {
"query": {
"range": {
"LIN.word_count": {
"gte": 2
}
}
},
"filter": {
"script": {
"script": "for(pos : _index['LIN'].get('up',_POSITIONS)){ if(pos.position == 0) { return true}};return false;"
}
}
}
}
}
Advanced scripting - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html
Script filters - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-script-filter.html

Related

Can only use wildcard queries on keyword, text and wildcard fields - not on [id] which is of type [long]

Elasticsearch version 7.13.1
GET test/_mapping
{
"test" : {
"mappings" : {
"properties" : {
"id" : {
"type" : "long"
},
"name" : {
"type" : "text"
}
}
}
}
}
POST test/_doc/101
{
"id":101,
"name":"hello"
}
POST test/_doc/102
{
"id":102,
"name":"hi"
}
Wildcard Search pattern
GET test/_search
{
"query": {
"query_string": {
"query": "*101* *hello*",
"default_operator": "AND",
"fields": [
"id",
"name"
]
}
}
}
Error is : "reason" : "Can only use wildcard queries on keyword, text and wildcard fields - not on [id] which is of type [long]",
It was working fine in version 7.6.0 ..
What is new change in latest ES and what is the resolution of this issue?
It's not directly possible to perform wildcards on numeric data types. It is better to convert those integers to strings.
You need to modify your index mapping to
PUT /my-index
{
"mappings": {
"properties": {
"code": {
"type": "text"
}
}
}
}
Otherwise, if you want to perform a partial search you can use edge n-gram tokenizer

Elasticsearch query all documents where keyword value is greater than X [7.2]

I am trying to find all documents that have a name that is over 32 characters in length.
This is the mapping of the document.
export const boards = {
handle: {
type: "text"
},
name: {
type: "keyword"
},
};
I tried to use painless to query the size of the field but the following query did not return any results despite the fact that there are.
Query
GET /_search
{
"query": {
"bool" : {
"filter" : {
"script" : {
"script" : {
"source": "doc['name'].size() > 32",
"lang": "painless"
}
}
}
}
}
}
I am thinking that it is perhaps related to the keyword type being used.
I figured out from this forum post that when using keyword size is not the correct method. Instead you need to use:
.value.length() making my final query look like the following:
{
"query": {
"bool" : {
"filter" : {
"script" : {
"script" : {
"source": "doc['name'].value.length() > 32",
"lang": "painless"
}
}
}
}
}
}

ElasticSearch filtering for a tag in array

I've got a bunch of events that are tagged for their audience:
{ id = 123, audiences = ["Public", "Lecture"], ... }
I've trying to do an ElasticSearch query with filtering, so that the search will only return events that have the an exact entry of "Public" in that audiences array (and won't return events that a "Not Public").
How do I do that?
This is what I have so far, but it's returning zero results, even though I definitely have "Public" events:
curl -XGET 'http://localhost:9200/events/event/_search' -d '
{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"audiences": "Public"
}
},
"query" : {
"match" : {
"title" : "[searchterm]"
}
}
}
}
}'
You could use this mapping for you content type
{
"your_index": {
"mappings": {
"your_type": {
"properties": {
"audiences": {
"type": "string",
"index": "not_analyzed"
},
}
}
}
}
}
not_analyzed
Index this field, so it is searchable, but index the
value exactly as specified. Do not analyze it.
And use lowercase term value in search query

elasticsearch: can I defined synonyms with boost?

Let's say A, B, C are synonyms, I want to define B is "closer" to A than C
so that when I search the keyword A, in the searching results, A comes the first, B comes the second and C comes the last.
Any help?
There is no search-time mechanism (as of yet) to differentiate between matches on synonyms and source field. This is because, when indexed, a field's synonyms are placed into the inverted index alongside the original term, leaving all words equal.
This is not to say however that you cannot do some magic at index time to glean the information you want.
Create an index with two analyzers: one with a synonym filter, and one without.
PUT /synonym_test/
{
settings : {
analysis : {
analyzer : {
"no_synonyms" : {
tokenizer : "lowercase"
},
"synonyms" : {
tokenizer : "lowercase",
filter : ["synonym"]
}
},
filter : {
synonym : {
type : "synonym",
format: "wordnet",
synonyms_path: "prolog/wn_s.pl"
}
}
}
}
}
Use a multi-field mapping so that the field of interest is indexed twice:
PUT /synonym_test/mytype/_mapping
{
"properties":{
"mood": {
"type": "multi_field",
"fields" : {
"syn" : {"type" : "string", "analyzer" : "synonyms"},
"no_syn" : {"type" : "string", "analyzer" : "no_synonyms"}
}
}
}
}
Index a test document:
POST /synonym_test/mytype/1
{
mood:"elated"
}
At search time, boost the score of hits on the field with no synonymn.
GET /synonym_test/mytype/_search
{
query: {
bool: {
should: [
{ match: { "mood.syn" : { query: "gleeful", "boost": 3 } } },
{ match: { "mood.no_syn" : "gleeful" } }
]
}
}
}
Results in _score":0.2696457
Searching for the original term returns a better score:
GET /synonym_test/mytype/_search
{
query: {
bool: {
should: [
{ match: { "mood.syn" : { query: "elated", "boost": 3 } } },
{ match: { "mood.no_syn" : "elated" } }
]
}
}
}
Results in: _score":0.6558018,"

elasticsearch filtering by the size of a field that is an array

How can I filter documents that have a field which is an array and has more than N elements?
How can I filter documents that have a field which is an empty array?
Is facets the solution? If so, how?
I would have a look at the script filter. The following filter should return only the documents that have at least 10 elements in the fieldname field, which is an array. Keep in mind that this could be expensive depending on how many documents you have in your index.
"filter" : {
"script" : {
"script" : "doc['fieldname'].values.length > 10"
}
}
Regarding the second question: do you really have an empty array there? Or is it just an array field with no value? You can use the missing filter to get documents which have no value for a specific field:
"filter" : {
"missing" : { "field" : "user" }
}
Otherwise I guess you need to use scripting again, similarly to what I suggested above, just with a different length as input. If the length is constant I'd put it in the params section so that the script will be cached by elasticsearch and reused, since it's always the same:
"filter" : {
"script" : {
"script" : "doc['fieldname'].values.length > params.param1"
"params" : {
"param1" : 10
}
}
}
javanna's answer is correct on Elasticsearch 1.3.x and earlier, since 1.4 the default scripting module has changed to groovy (was mvel).
To answer OP's question.
On Elasticsearch 1.3.x and earlier, use this code:
"filter" : {
"script" : {
"script" : "doc['fieldname'].values.length > 10"
}
}
On Elasticsearch 1.4.x and later, use this code:
"filter" : {
"script" : {
"script" : "doc['fieldname'].values.size() > 10"
}
}
Additionally, on Elasticsearch 1.4.3 and later, you will need to enable the dynamic scripting as it has been disabled by default, because of security issue. See: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/modules-scripting.html
Still posting to here for who stuck same situation with me.
Let's say your data look like this:
{
"_source": {
"fieldName" : [
{
"f1": "value 11",
"f2": "value 21"
},
{
"f1": "value 12",
"f2": "value 22"
}
]
}
}
Then to filter fieldName with length > 1 for example:
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline": "doc['fieldName.f1'].values.length > 1",
"lang": "painless"
}
}
}
}
}
The script syntax is as ES 5.4 documentation https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-query.html.
Imho the correct way of filtering arrays by size using scripting is :
"filter" : {
"script" : {
"script" : "_source.fieldName.size() > 1"
}
}
If I do that as #javanna suggests it throws exception groovy.lang.MissingPropertyException: No such property: length for class: java.lang.String
If you have an array of objects that aren't mapped as nested, keep in mind that Elastic will flatten them into:
attachments: [{size: 123}, {size: 456}] --> attachments.size: [123, 456]
So you want to reference your field as doc['attachments.size'].length, not doc['attachments'].length, which is very counter-intuitive.
Same for doc.containsKey(attachments.size).
The .values part is deprecated and no longer needed.
Based on this:
https://code.google.com/p/guava-libraries/source/browse/guava/src/com/google/common/collect/RegularImmutableList.java?r=707f3a276d4ea8e9d53621d137febb00cd2128da
And on lisak's answer here.
There is size() function which returns the length of list:
"filter" : {
"script" : {
"script" : "doc['fieldname'].values.size() > 10"
}
}
Easiest way to do this is to "denormalize" your data so that you have a property that contains the count and a boolean if it exists or not. Then you can just search on those properties.
For example:
{
"id": 31939,
"hasAttachments": true,
"attachmentCount": 2,
"attachments": [
{
"type": "Attachment",
"name": "txt.txt",
"mimeType": "text/plain"
},
{
"type": "Inline",
"name": "jpg.jpg",
"mimeType": "image/jpeg"
}
]
}
When you need to find documents which contains some field which size/length should be larger then zero #javanna gave correct answer. I only wanted to add if your field is text field and you want to find documents which contains some text in that field you can't use same query. You will need to do something like this:
GET index/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"FIELD_NAME": {
"gt": 0
}
}
}
]
}
}
}
This is not exact answer to this question because answer already exists but solution for similar problem which I had so maybe somebody will find it useful.
a suggestion about the second question:
How can I filter documents that have a field which is an empty array?
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "fieldname"
}
}
}
}
}
will return docs with empty fieldname: [] arrays. must (rather than must_not will return the opposite).
Here is what worked for me:
GET index/search {
"query": {
"bool": {
"filter" : {
"script" : {
"script" : "doc['FieldName'].length > 10"
}
}
}
}
}
For version 7+:
"filter": {
"script": {
"script": {
"source": "doc['fieldName.keyword'].length > 10",
"lang": "painless"
}
}
}
Ref. https://medium.com/#felipegirotti/elasticsearch-filter-field-array-more-than-zero-8d52d067d3a0

Resources