Elasticsearch exact multiword (array) query for one field - elasticsearch

I try to write a query where I have multiple Exact search terms lets say an array of strings
like
["Q4 Test WC Schüssel", "Q4_18 Bankerlampen", "MORE_SEARCHTERMS"]
And I have an index with an property data.name and I want to search for each of my array strings inside this ONE field for the exact value and I want all entries back where one of my array strings matches.
{
"mappings": {
"_doc": {
"country": {
"type": "keyword"
},
"data": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
I thought this would be an easy task, but I am not shure if I have the wrong google terms where I search for this problem to get an example query.

Use terms query
GET /_search
{
"query": {
"terms": {
"name.keyword": ["Q4 Test WC Schüssel", "Q4_18 Bankerlampen", "MORE_SEARCHTERMS"]
}
}
}

Related

Number of nested objects in Elasticsearch

Looking for a way to get the number of nested objects, for querying, sorting etc.
For example, given this index:
PUT my-index-000001
{
"mappings": {
"properties": {
"some_id": {"type": "long"},
"user": {
"type": "nested",
"properties": {
"first": {
"type": "keyword"
},
"last": {
"type": "keyword"
}
}
}
}
}
}
PUT my-index-000001/_doc/1
{
"some_id": 111,
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
How to filter by the number of users (e.g. query fetching all documents with more than XX users).
I was thinking to using a runtime_field but this gives an error:
GET my-index-000001/_search
{
"runtime_mappings": {
"num": {
"type": "long",
"script": {
"source": "emit(doc['some_id'].value)"
}
},
"num1": {
"type": "long",
"script": {
"source": "emit(doc['user'].size())" // <- this breaks with "No field found for [user] in mapping"
}
}
}
,"fields": [
"num","num1"
]
}
Is it possible perhaps using aggregations?
Would also be nice to know if I can sort the results (e.g. all documents with more than XX and sorted desc by XX).
Thanks.
You cannot query this efficiently
It is possible to use this hack for it, but I would only do it if you need to do some one-time fetching, not for a regular use case as it uses params._source and is therefore really slow when you have a lot of docs
{
"query": {
"function_score": {
"min_score": 1, # -> min number of nested docs to filter by
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "params._source['user'].size()"
}
}
],
"boost_mode": "replace"
}
}
}
It basically calculates a new score for each doc, where the score is equal to the length of the users array, and then removes all docs under min_score from returning
The best way to do this is to add a userCount field at indexing time (since you know how many elements there are) and then query that field using a range query. Very simple, efficient and fast.
Each element of the nested array is a document in itself, and thus, not queryable via the root-level document.
If you cannot re-create your index, you can leverage the _update_by_query endpoint in order to add that field:
POST my-index-000001/_update_by_query?wait_for_completion=false
{
"script": {
"source": """
ctx._source.userCount = ctx._source.user.size()
"""
}
}

Elasticsearch - extract values from a flattened field inside a nested type using a runtime field script

I have this mapping, a big document actually, a lot of fields excluded for brevity
{
"items": {
"mappings": {
"dynamic": "false",
"properties": {
"id": {
"type": "keyword"
},
"costs": {
"type": "nested",
"properties": {
"id": {
"type": "keyword"
},
"costs_samples": {
"type": "flattened"
}
}
}
}
}
}
}
costs_samples is a flattened field type huge collection of possible costs (sometimes more than 10k entries), based on some dynamic dimensions. Have to highlight that costs_sample cannot live outside costs, since at query time some conditions from costs should be composed with should or match clauses, like (costs.country=this_value AND some_other costs_samples_condition.
I would like to be able to extract and eventually inject a new field at costs level as a runtime field, and then use that field to sort, filter and aggregate.
Something like this
{
"runtime_mappings": {
"costs.selected_cost": {
"type": "long",
"script": {
"source":
"for (def cost : doc['costs.costs_samples']) { if(cost.values!= null) {emit(Long.parseLong(cost.values.some_dynamic_identity_known_at_query_time.last))} }"
}
}
},
"query":{
"nested": {
"path": "costs",
"query": {
"bool": {
"filter": [
{
"terms": {
"costs.id": ["id-1","id-2"]
}
},
{
"term": {
"costs.selected_cost": 10
}
}
]
}
}
}
},
"fields": ["costs.selected_cost"]
}
The trouble is selected_cost it is not created / returned by ES. No error message.
Where did I do wrong? Documentation was not helpful.
Maybe worth mention that I've also tried with 2 different documents, like items and costs and then execute kind of a join operation, but the performance tests were really poor.
Thanks!
Long story short, there is no way to use a runtime field inside a nested field type.

Wildcard doesn't work as expected when querying by more than a word

If I search documents containing e.g "called" in "message" field I get an expected result, but when I search for "was called", "was called*" or
"*was called*"
I get nothing, although I have a lot of documents whose message field contains the following content "Application was called by REST API".
Here is a part of a query I send:
"wildcard": {
"message": {
"wildcard": "was called",
"boost": 1.0
}
}
Here is a part of the mapping:
"mappings": {
"doc": {
"dynamic_templates": [
{
"message_field": {
"path_match": "message",
"match_mapping_type": "string",
"mapping": {
"norms": false,
"type": "text"
}
}
},
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"fields": {
"keyword": {
"ignore_above": 256,
"type": "keyword"
}
},
"norms": false,
"type": "text"
}
}
}
],
"properties": {
...
"message": {
"type": "text",
"norms": false
}
}
}
}
Indexes I search in are automatically created by Logstash.
I have a similar problem with another field; I have the following value in the field: "NP-00121". *00121 works, but *-00121 doesn't.
edit: and one example more: I have a "requestUri" field containing "/api/v1/log/rest", "/api/v1/log/notification" etc. when I send the following wildcard query I get nothing "/api/v1*".
So it looks like problem appears when using spaces and dashes. Could anyone help me to solve this problem?
Wildcards are used within tokens. Your message field is indexed as text, and so will be tokenized into words.
Basically, you don't need wildcards for a query like "was called". Simply use a phrase query like:
"query": {
"match_phrase" : {
"message" : "was called"
}
}
or if you prefer a query string query:
"query": {
"query_string" : {
"query" : "message:\"was called\""
}
}
A wildcard query would be useful for searching for partial terms, something like:
"query": {
"wildcard" : { "message" : "call*" }
}
If you wanted to find all docs that contain "call", "called" or "calling".
For values like NP-00121, or for URIs, it would likely be more useful if those fields were not analyzed. As it is these are getting separated into tokens ('np' and '00121'), thus the problem you are seeing. You can index these fields as the "keyword" type instead of "text", to have the whole field indexed as a single, unanalyzed token.

Why prefix returns documents without the specific prefix?

I want to return only documents which their name start with "pizza". this is what I've done:
{
"query": {
"filtered": {
"filter": {
"prefix": {
"name": "pizza"
}
}
}
}
}
But I've got these 3 documents:
{
"name": "Viana Pizza",
"city": "Mashhad",
"address": "Vakil abad",
"foods": ["Pizza"],
"salad": true,
"rate": 5.0
}
{
"name": "Pizza Pizza",
"city": "Mashhad",
"address": "Bahar st",
"foods": ["Pizza"],
"salad": true,
"rate": 8.5
}
{
"name": "Reza Pizza",
"city": "Tehran",
"address": "Vali Asr",
"foods": ["Pizza"],
"salad": true,
"rate": 7.5
}
As you can see, Only one of them has "pizza" in the beginning of the name field.
What's wrong?
Probably, the simplest explanation given that you didn't provide the actual mapping, is that you have th e "name" field as "string" and "analyzed" (the default). Which means that "Reza Pizza" will be transformed to "reza" and "pizza" terms.
And your filter will match against terms, not against entire fields. Because ES analyzes the fields and forms terms when the standard mapping is used.
You need to either change your "name" field to "not_analyzed" or add another field to mirror the "name" but this mirror field to be "not_analyzed". Also, for text "pizza" (lowercase) to work in this case you need to create a custom analyzer.
Below you have the solution with the mirror field:
PUT /pizza
{
"settings": {
"analysis": {
"analyzer": {
"my_keyword_lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"restaurant": {
"properties": {
"name": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"analyzer": "my_keyword_lowercase_analyzer"
}
}
}
}
}
}
}
And in searching you need to use the mirror field:
GET /pizza/restaurant/_search
{
"query": {
"filtered": {
"filter": {
"prefix": {
"name.raw": "pizza"
}
}
}
}
}
That's all about Elasticsearch analyzers. Let's read the documentation on prefix filter:
Filters documents that have fields containing terms with a specified prefix (not analyzed).
Here we can see that this filter matches terms, not the whole field value. When you index the document, ES splits your field values to terms using analyzers. Default analyzer splits value by whitespace and convert parts to lowercse. So all three results have term pizza in the name field and pizza term perfectly matches pizza prefix. If you want to match field value as is - I'd suggest you to map name field as not_analyzed

Elasticsearch: sorting Spanish double names alphabetically

I am doing an Elasticsearch query and I want the results ordered alphabetically by last name. My problem: the last names are all Spanish double names, and ES doesn't order them the way I would like it.
I would prefer the order to be:
Batres Rivera
Batrín Chojoj
Fion Morales
Lopez Giron
Martinez Castellanos
Milán Casanova
This is my query:
{
"query": {
"match_all": {}
},
"sort": [
{
"Last Name": {
"order": "asc"
}
}
]
}
The order that I get with this is:
Batres Rivera
Batrín Chojoj
Milán Casanova
Martinez Castellanos
Fion Morales
Lopez Giron
So it is not sorting by the first string, but by either of both (Batres, Batrín, Casanova, Castellanos, Fion, Giron).
If I try additionally
{
"order": "asc",
"mode": "max"
}
then I get:
Batrín Chojoj
Lopez Giron
Martinez Castellanos
Milán Casanova
Fion Morales
Batres Rivera
All the fields are indexed by default, I checked with
curl -XGET localhost/my_index/_mapping
and I get back
my_index: {
my_type: {
properties: {
FirstName: {
type: string
}LastName: {
type: string
}MiddleName: {
type: string
}
...
}
}
}
Does anyone know how to make the results to be ordered to be ordered alphabetically by the beginning string of the last name?
Thanks!
The problem is that your LastName field is analyzed, so the string Batres Rivera is indexed as a multi-value field with two terms: batres and rivera. But this isn't like an ordered array, it's more like a "bag of values". So when you try to sort on the field, it chooses one of the terms (the min or max) and sorts on that.
What you need to do is to store the LastName as a single term (Batres Rivera) for sorting purposes, by mapping the field as
{ "type": "string", "index": "not_analyzed"}
Obviously you can't then use that field for search purposes: you wouldn't be able to search for rivera and match on that field.
The way to support both searching and sorting is to use multi-fields: ie index the same value in two ways, one for searching and one for sorting.
In 0.90.* the syntax for multi-fields is:
curl -XPUT "http://localhost:9200/my_index" -d'
{
"mappings": {
"my_type": {
"properties": {
"LastName": {
"type": "multi_field",
"fields": {
"LastName": {
"type": "string"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
In 1.0.* the multi_field type has been removed and now any core field type supports sub-fields as follows:
curl -XPUT "http://localhost:9200/my_index" -d'
{
"mappings": {
"my_type": {
"properties": {
"LastName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
So you can use the LastName field for searching, and the LastName.raw field for sorting:
curl -XGET "http://localhost:9200/my_index/my_type/_search" -d'
{
"query": {
"match": {
"LastName": "rivera"
}
},
"sort": "LastName.raw"
}'
Language specific sorting
You should also look at using the ICU analysis plugin to sort using the Spanish sort order (or collation). This is a bit more complex but is worth using:
curl -XPUT "http://localhost:9200/my_index" -d'
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"type": "custom",
"tokenizer": "icu_tokenizer",
"filter": [
"icu_folding"
]
},
"es_sorting": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"spanish"
]
}
},
"filter": {
"spanish": {
"type": "icu_collation",
"language": "es"
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"LastName": {
"type": "string",
"analyzer": "folding",
"fields": {
"raw": {
"type": "string",
"analyzer": "es_sorting"
}
}
}
}
}
}
}'
We create a folding analyzer which we'll use for the LastName field, which will analyze a string like Muñoz Rivera into the two terms munoz (without the ~) and rivera. So a user can search for munoz or muñoz and either will match.
Then we create the es_sorting analyzer which indexes the proper sort order for muñoz rivera (lowercased) in Spanish.
Searching would be done in the same way:
curl -XGET "http://localhost:9200/my_index/my_type/_search" -d'
{
"query": {
"match": {
"LastName": "rivera"
}
},
"sort": "LastName.raw"
}'
We need to know how you are indexing the name.
Please check this discussion link.
http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-way-to-search-terms-lower-cased-td932996.html
This should be very helpful for your case. This depends on your mapping settings. What analyzer you use for the name field.
Need your mapping definition to decide on a proper solution.

Resources