How to find if the value of the field is an array?

How to find if the value of the field is an array? - elasticsearch

By accident i inserted some values inside an index as an array with a single value, instead of inserting it as a single string.
For example:
Instead of inserting "This string" i inserted ["This string"]
I need to find the values that have been inserted in the ["String"] case so that i can update them the way they should be, the normal "String".
The index mapping for the field is a keyword and i can't really seem to a query that finds the values that are arrays.
I can't really delete the index and start over since there is a lot of data in it.
Let's say the index has this mapping:
{
"mappings": {
"_doc": {
"properties": {
"url": {
"type": "keyword"
}
}
}
}
}
And i inserted two values
PUT <index>/_doc
{
"url": "google.com"
}
PUT <index>/_doc
{
"url": ["google.com"]
}
How can i find the documents that are like the second document that are an array of a single value?
Note: This is with elasticsearch version 7.13.1

Elasticsearch doesn't have a dedicated array data type, so "string" and ["string"] are equivalent.
The following query will find both of your documents.
{
"query": {
"term": {
"url": "google.com"
}
}
}
So, to be fair, you don't need to do anything unless it matters for the application that would later consume the search results and actually expect a string instead of array.

Try the script filter
GET <index>/_search
{
"query": {
"bool": {
"filter": {
"script": {
"script": "params._source.url instanceof List"
}
}
}
}
}

Related

Find all entries on a list within Kibana via Elasticserach Query DSL

Could you please help me on this? My Kibana Database within "Discover" contains a list of trades. I know want to find all trades within this DB that have been done in specific instruments (ISIN-Number). When I add a filter manually and switch to Elasticserach Query DSL, I find the following:
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"obdetails.isin": "CH0253592783"
}
},
{
"match_phrase": {
"obdetails.isin": "CH0315622966"
}
},
{
"match_phrase": {
"obdetails.isin": "CH0357659488"
}
}
],
"minimum_should_match": 1
}
}
}
Since I want to check the DB for more than 200 ISINS, this seems to be inefficient. Is there a way, in which I could just say "show me the trade if it contains one of the following 200 ISINs?".
I already googled and tried this, which did not work:
{
"query": {
"terms": {
"obdetails.isin": [ "CH0357659488", "CH0315622966"],
"boost": 1.0
}
}
}
The query works, but does not show any results.

To conclude. A field of type text is analyzed which basically converts the given data to a list of terms using given analyzers etc. rather than it being a single term.
Given behavior causes the terms query to not match these values.
Rather than changing the type of the field one may add an additional field of type keyword. That way a terms queries can be performed whilst still having the ability to match on the field.
{
"isin": {
"type" "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
The above example will add an extra field called obdetails.isin.keyword which can be used for terms. While still being able to use match queries on obdetails.isin

How to search by non-tokenized field length in ElasticSearch

Say I create an index people which will take entries that will have two properties: name and friends
PUT /people
{
"mappings": {
"properties": {
"friends": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
and I put two entries, each one of them has two friends.
POST /people/_doc
{
"name": "Jack",
"friends": [
"Jill", "John"
]
}
POST /people/_doc
{
"name": "Max",
"friends": [
"John", "John" # Max will have two friends, but both named John
]
}
Now I want to search for people that have multiple friends
GET /people/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "doc['friends.keyword'].length > 1"
}
}
}
]
}
}
}
This will only return Jack and ignore Max. I assume this is because we are actually traversing the inversed index, and John and John create only one token - which is 'john' so the length of the tokens is actually 1 here.
Since my index is relatively small and performance is not the key, I would like to actually traverse the source and not the inversed index
GET /people/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "ctx._source.friends.length > 1"
}
}
}
]
}
}
}
But according to the https://github.com/elastic/elasticsearch/issues/20068 the source is supported only when updating, not when searching, so I cannot.
One obvious solution to this seems to take the length of the field and store it to the index. Something like friends_count: 2 and then filter based on that. But that requires reindexing and also this appears as something that should be solved in some obvious way I am missing.
Thanks a lot.

There is a new feature in ES 7.11 as runtime fields a runtime field is a field that is evaluated at query time. Runtime fields enable you to:
Add fields to existing documents without reindexing your data
Start working with your data without understanding how it’s structured
Override the value returned from an indexed field at query time
Define fields for a specific use without modifying the underlying schema
you can find more information here about runtime fields, but how you can use runtime fields you can do something like this:
Index Time:
PUT my-index/
{
"mappings": {
"runtime": {
"friends_count": {
"type": "keyword",
"script": {
"source": "doc['#friends'].size()"
}
}
},
"properties": {
"#timestamp": {"type": "date"}
}
}
}
You can also use runtime fields in search time for more information check here.
Search Time
GET my-index/_search
{
"runtime_mappings": {
"friends_count": {
"type": "keyword",
"script": {
"source": "ctx._source.friends.size()"
}
}
}
}
Update:
POST mytest/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source.arrayLength = ctx._source.friends.size()"
}
}
You can update all of your document with query above and adjust your query.

For everyone wondering about the same issue, I think #Kaveh answer is the most likely way to go, but I did not manage to make it work in my case. It seems to me that source is created after the query is performed and therefore you cannot access source for the purposes of filtering query.
This leaves you with two options:
filter the result on the application level (ugly and slow solution)
actually save the filed length in a separate field. Such as friends_count
possibly there is another option I don't know about(?).

Elasticsearch - Script Filter over a list of nested objects

I am trying to figure out how to solve these two problems that I have with my ES 5.6 index.
"mappings": {
"my_test": {
"properties": {
"Employee": {
"type": "nested",
"properties": {
"Name": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
},
"Surname": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}
}
I need to create two separate scripted filters:
1 - Filter documents where size of employee array is == 3
2 - Filter documents where the first element of the array has "Name" == "John"
I was trying to make some first steps, but I am unable to iterate over the list. I always have a null pointer exception error.
{
"bool": {
"must": {
"nested": {
"path": "Employee",
"query": {
"bool": {
"filter": [
{
"script": {
"script" : """
int array_length = 0;
for(int i = 0; i < params._source['Employee'].length; i++)
{
array_length +=1;
}
if(array_length == 3)
{
return true
} else
{
return false
}
"""
}
}
]
}
}
}
}
}
}

As Val noticed, you cant access _source of documents in script queries in recent versions of Elasticsearch.
But elasticsearch allow you to access this _source in the "score context".
So a possible workaround ( but you need to be careful about the performance ) is to use a scripted score combined with a min_score in your query.
You can find an example of this behavior in this stack overflow post Query documents by sum of nested field values in elasticsearch .
In your case a query like this can do the job :
POST <your_index>/_search
{
"min_score": 0.1,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": """
if (params["_source"]["Employee"].length === params.nbEmployee) {
def firstEmployee = params._source["Employee"].get(0);
if (firstEmployee.Name == params.name) {
return 1;
} else {
return 0;
}
} else {
return 0;
}
""",
"params": {
"nbEmployee": 3,
"name": "John"
}
}
}
}
]
}
}
}
The number of Employee and first name should be set in the params to avoid script recompilation for every use case of this script.
But remember it can be very heavy on your cluster as Val already mentioned. You should narrow the set a document on which your will apply the script by adding filters in the function_score query ( match_all in my example ).
And in any case, it is not the way Elasticsearch should be used and you cant expect bright performances with such a hacked query.

1 - Filter documents where size of employee array is == 3
For the first problem, the best thing to do is to add another root-level field (e.g. NbEmployees) that contains the number of items in the Employee array so that you can use a range query and not a costly script query.
Then, whenever you modify the Employee array, you also update that NbEmployees field accordingly. Much more efficient!
2 - Filter documents where the first element of the array has "Name" == "John"
Regarding this one, you need to know that nested fields are separate (hidden) documents in Lucene, so there is no way to get access to all the nested docs at once in the same query.
If you know you need to check the first employee's name in your queries, just add another root-level field FirstEmployeeName and run your query on that one.

handling both exact and partial search on the same search string

I want to define the schema which can tackle the partial as well as the exact search for the same search value.
The exact search should always return the "exact match", ES should not break the search string into tokens in this case.

For partial match data type of the property should be text and for exact it should be keyword. For having the feasibility to have both partial and exact search without having to index the data to different properties you can leverage using fields. What it does is that it helps to index same data into different ways.
So, lets say you want to index name of persons, and have the ability for partial and exact search. In such case the mapping would be:
PUT test
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
Lets index a few docs:
PUT test/_doc/1
{
"name": "Nishant Saini"
}
PUT test/_doc/2
{
"name": "Nishant Kumar"
}
For partial search we have to query name field and it is of type text.
GET test/_doc/_search
{
"query": {
"query_string": {
"query": "Nishant Saini",
"field": [
"name"
]
}
}
}
The above query will return both docs (1 and 2) because one token i.e. Nishant appears in both the document for field name.
For exact search we need to query on name.keyword. To perform exact match we can use term query as below:
{
"query": {
"term": {
"name.keyword": "Nishant Saini"
}
}
}
This would match doc 1 only.

How to specify or target a field from a specific document type in queries or filters in Elasticsearch?

Given:
Documents of two different types, let's say 'product' and 'category', are indexed to the same Elasticsearch index.
Both document types have a field 'tags'.
Problem:
I want to build a query that returns results of both types, but the documents of type 'product' are allowed to have tags 'X' and 'Y', and the documents of type 'category' are only allowed to have tag 'Z'. How can I achieve this? It appears I can't use product.tags and category.tags since then ES will look for documents' product/category field, which is not what I intend.
Note:
While for the example above there might be some kind of workaround, I'm looking for a general way to target or specify fields of a specific document type when writing queries. I basically want to 'namespace' the field names used in my query so only documents of the type I want to work with are considered.

I think field aliasing would be the best answer for you, but it's not possible.
Instead you can use "copy_to" but I it probably affects index size:
DELETE /test
PUT /test
{
"mappings": {
"product" : {
"properties": {
"tags": { "type": "string", "copy_to": "ptags" },
"ptags": { "type": "string" }
}
},
"category" : {
"properties": {
"tags": { "type": "string", "copy_to": "ctags" },
"ctags": { "type": "string" }
}
}
}
}
PUT /test/product/1
{ "tags":"X" }
PUT /test/product/2
{ "tags":"Y" }
PUT /test/category/1
{ "tags":"Z" }
And you can query one of fields or many of them:
GET /test/product,category/_search
{
"query": {
"term": {
"ptags": {
"value": "x"
}
}
}
}
GET /test/product,category/_search
{
"query": {
"multi_match": {
"query": "x",
"fields": [ "ctags", "ptags" ]
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to find if the value of the field is an array? - elasticsearch

Try the script filter GET <index>/_search { "query": { "bool": { "filter": { "script": { "script": "params._source.url instanceof List" } } } } }

Related

Find all entries on a list within Kibana via Elasticserach Query DSL

How to search by non-tokenized field length in ElasticSearch

Elasticsearch - Script Filter over a list of nested objects

handling both exact and partial search on the same search string

How to specify or target a field from a specific document type in queries or filters in Elasticsearch?

Categories

Resources