Elasticsearch search in array - elasticsearch

I'm trying to find a way to search in a same array.
Example Dataset
"_id":"23424232",
"vehicule":[
"tags":['kawasaki','suzuki','ducati'],
"tags":['opel','mercedes','ford']
]
if i search for someone with "kawasaki" and "opel" in the same tags array i'm expecting to have 0 hits but elastic found the customer
Query
"query": {
"bool": {
"must": [
{ "term": { "vehicule.tags" : "kawasaki"}},
{ "term": { "vehicule.tags" : "opel"}}
]
}
}
Mapping
"vehicule": {
"include_in_parent": true,
"type": "nested",
"properties": {
"tags":{
"type":"string",
"analyzer":"code_tokenizer"
},
I think it's because for elastic tags is flat and i would like to avoid that. How can i do that ?
"tags":['kawasaki','suzuki','ducati','opel','mercedes','ford']

i found the solution for me.
{
"query": {
"nested": {
"path": "vehicule.tags",
"query": {
"bool": {
"must": [
{
"term": {
"vehicule.tags": "suzuki"
}
},
{
"term": {
"vehicule.tags": "opel"
}
}
]
}
}
}
}
}
and for that query elastic found 0 customer :)

Related

Can you reference other queries in Elasticsearch percolator?

can percolator queries reference other stored query docs in a percolator index? For example, given I have the following Boolean query, with _id=1, already indexed in the percolator:
{
"query": {
"bool": {
"must": [
{ "term": { "tag": "wow" } }
]
}
}
}
Could I have another query, with _id=2, indexed (note that I'm making up the _percolator_ref_id terms query key):
{
"query": {
"bool": {
"should": [
{ "term": { "tag": "elasticsearch" } },
{ "terms" : { "_percolator_ref_id": [1] } }
]
}
}
}
If I percolated the following document:
{ "tag": "wow" }
I would expect both _id=1 and _id=2 queries to match. Does some functionality like _percolator_ref_id exist?
Thanks!
Edit: To clarify, I do not know beforehand how many query references appear in a given query (e.g., the _id=2 query could reference 10 other queries potentially).
You can do something like below
2 queries are registered in below index
PUT myindex
{
"mappings": {
"properties": {
"query1": {
"type": "percolator"
},
"query": {
"type": "percolator"
},
"field": {
"type": "text"
}
}
}
}
You can use bool and must/should to combine different queries
GET /myindex/_search
{
"query": {
"bool": {
"must": [
{
"percolate": {
"field": "query",
"document": {
"field": "fox jumps over the lazy dog"
}
}
},
{
"percolate": {
"field": "query1",
"document": {
"field": "fox jumps over the lazy dog"
}
}
}
]
}
}
}

ElasticSearch multimatch substring search

I have to combine two filters to match requirements:
- a specific list of values in r.status field
- one of the multiple text fields contains the value.
Result query (with using Nest, but it doesn't matter) looks like:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"bool": {
"should": [
{
"match": {
"r.g.firstName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
},
{
"match": {
"r.g.lastName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
}
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
Also tried with multi_match query:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"multi_match": {
"query": "SUBSTRING_VALUE",
"fields": [
"r.g.firstName",
"r.g.lastName"
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
FirstName and LastName are configured in index mappings as text:
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
}
Elastic gives a lot of full-text search options: multi_match, phrase, wildcards etc. But all of them fail in my case looking a sub-string in my text fields. (terms query and isActive one work well, I just tried to run only them).
What options do I have also or maybe where I made a mistake?
UPD: Combined wildcards worked for me, but such query looks ugly. Looking for a more elegant solution.
The elasticsearch way is to use ngram tokenizer.
The ngram analyzer will split your terms with a sliding window. For example, the input "Hello World" will generate the following terms:
Hel
Hell
Hello
ell
ello
...
Wor
World
orl
...
You can configure the minimum and maximum size of the sliding window (in the example the minimum size is 3). Once the sub terms are generated you can use a match query an the subfield.
Another point, it is weird to use must within a filter. If you are interested in the score, you should always use must otherwise use filter. Read this article for a good understanding.

How to join two queries in one using elasticsearch?

Hi I want to join two queries in one in elasticsearch, but I don't know how to do it: I think I should do an aggregation but I don't know very clear how to do it. Could you help me? My ES version is 5.1.2.
First filter by status and name:
POST test_lite/_search
{
"aggs": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"match": {
"STATUS": "Now"
}
},
{
"match": {
"NAME": "PRUDENTL"
}
}
]
}
}
}
}
}
Look for in the filtered records for the word filtered in description:
POST /test_lite/_search
{
"query": {
"wildcard" : { "DESCRIPTION" : "*english*" }
}
}
The only query needed is:
POST test_lite/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"STATUS": "Now"
}
},
{
"match": {
"NAME": "PRUDENTL"
}
},
{"wildcard" : { "DESCRIPTION" : "*english*" }}
]
}
}
}

elastic exists query for nested documents

I have a nested documents as:
"someField": "hello",
"users": [
{
"name": "John",
"surname": "Doe",
"age": 2
}
]
according to this https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-exists-query.html, the above should match:
GET /_search
{
"query": {
"exists" : { "field" : "users" }
}
}
whereas the following should not,
"someField": "hello",
"users": []
but unfortunately both do not match. any ideas?
The example mentioned on the Elasticsearch blog refers to string and array of string types, not for nested types.
The following query should work for you:
{
"query": {
"nested": {
"path": "users",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "users"
}
}
]
}
}
}
}
}
Also, you can refer to this issue for more info, which discusses this usage pattern.
This works for me
GET /type/_search?pretty=true
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "outcome",
"query": {
"exists": {
"field": "outcome.outcomeName"
}
}
}
}
]
}
}
}
With the following index mapping:
{
"index_name": {
"mappings": {
"object_name": {
"dynamic": "strict",
"properties": {
"nested_field_name": {
"type": "nested",
"properties": {
"some_property": {
"type": "keyword"
}
}
}
}
}
}
}
}
I needed to use this query:
GET /index_name/_search
{
"query": {
"nested": {
"path": "nested_field_name",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "nested_field_name.some_property"
}
}
]
}
}
}
}
}
Elasticsearch version 5.4.3
The answer from user3775217 has worked for me but I needed to tweak it to work as expected for must_not. Essentially the bool/must needed to be wrapped around the nested portion of the query:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "users",
"query": {
"exists": {
"field": "users"
}
}
}
}
}
]
}
}

Elasticsearch boost score with nested query

I have the following query in Elasticsearch version 1.3.4:
{
"filtered": {
"query": {
"bool": {
"should": [
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "java"
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "adobe creative suite"
}
}
]
}
}
]
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "skills",
"query": {
"bool": {
"must": [
{
"term": {
"skills.name.original": "java"
}
},
{
"bool": {
"should": [
{
"match": {
"skills.source": {
"query": "linkedin",
"boost": 5
}
}
},
{
"match": {
"skills.source": {
"query": "meetup",
"boost": 5
}
}
}
]
}
}
],
"minimum_should_match": "100%"
}
}
}
}
]
}
}
],
"minimum_should_match": "100%"
}
},
"filter": {
"and": [
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "java"
}
}
]
}
},
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "ajax"
}
},
{
"term": {
"skills.name.original": "html"
}
}
]
}
}
]
}
}
}
Mappings look like this:
skills: {
type: "nested",
include_in_parent: true,
properties: {
name: {
type: "multi_field",
fields: {
name: {type: "string"},
original: {type : "string", analyzer : "string_lowercase"}
}
}
}
}
and finally the document structure, for skills (excluded other parts), looks like this:
"skills":
[
{
"name": "java",
"source": [
"linkedin",
"facebook"
]
},
{
"name": "html",
"source": [
"meetup"
]
}
]
My goal with this query is to, first filter out some irrelevant hits with the filters (bottom of the query), then score a person by searching the whole document for the match_phrase "java", extra boosting if it also contains the match_phrase "adobe creative suit", then check the nested value where we get a hit in "skills" to see what kind of "source(s)" the skill came from. Then give the query a boost based on what source, or sources the nested object has.
This kinda of works, at least I don't get any errors, but the final score is odd and its hard to see if its working. If I give a small boost, lets say 2, the score goes DOWN slightly, my top hit at the moment has a score of 32.176407 with boost = 1. With a boost of 5 it goes down to 31.637703. I would expect it to go up, not down? With a boost of 1000, the score goes down to 2.433376.
Is this the right way to do this, or is there a better/easier way? I could change the structure and mappings etc. And why is my score decreasing?
Edit: I have simplified the query a little, only dealing with one "skill":
{
"filtered": {
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "java"
}
}
],
"minimum_should_match": 1
}
}
]
}
}
],
"should": [
{
"nested": {
"path": "skills",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"term": {
"skills.name.original": "java"
}
}
],
"should": [
{
"match": {
"skills.source": {
"query": "linkedin",
"boost": 1.2
}
}
},
{
"match": {
"skills.source": {
"query": "meetup",
"boost": 1.2
}
}
}
]
}
}
}
}
]
}
},
"filter": {
"and": [
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "java"
}
}
]
}
}
]
}
}
}
The problem now is that I expect two similar documents, where the only difference is the "source" value on the skill "java". They are "linkedin" and "meetup" respectively. In my new query, they both get the same boost, but the final _score is very different for the two documents.
From the query explanation for doc 1:
"value": 3.82485,
"description": "Score based on child doc range from 0 to 125"
and for doc two:
"value": 2.1993546,
"description": "Score based on child doc range from 0 to 125"
These values are the only ones that differ, and I cant see why.
I can't answer the question regarding the boost, but how many shards do you have on index?
TF and IDF are calculated per shard not per index and this could be creating the difference in score.
https://groups.google.com/forum/#!topic/elasticsearch/FK-PYb43zcQ.
If you reindex with only 1 shard does change the outcome?
Edit: Also, the doc range is the range of docs for each document in the shard and you can use this to calculate IDF for each doc to verify scores.

Resources