Imagine that I have an index with the following three documents representing images and their colors.
[
{
"id": 1,
"intensity": {
"red": 0.6,
"green": 0.1,
"blue": 0.3
}
},
{
"id": 2,
"intensity": {
"red": 0.5,
"green": 0.6,
"blue": 0.0
}
},
{
"id": 3,
"intensity": {
"red": 0.98,
"green": 0.0,
"blue": 0.0
}
}
]
It the user wants a "red image" (selected in a dropdown or in a “tag cloud”), it is very convenient to do a range query over the floats (maybe intensity.red > 0.5). I can also use the score of that query to get the "red-est" image ranked highest.
However, if I would like to offer free-text search, it gets harder. My solution to that would to index the documents as the following (eg use the if color > 0.5 then append(colors, color_name) at index time):
[
{
"id": 1,
"colors": ["red"]
},
{
"id": 2,
"colors": ["green", "red"]
}
{
"id": 3,
"colors": ["red"]
}
]
I could now use a query_string or a match on the colors field and then search for "red", but all of a sudden I lost my ranking possibilities. ID 3 is far more red than ID 1 (0.98 vs 0.6) but the score will be similar?
My question is: Can I have the cake and eat it too?
One solution I see is to have one index that turns free text into "keywords" which I later use in the actual search.
POST image_tag_index/_search {query: "redish"} -> [ "red" ]
POST images/_search {query: {"red" > 0.5}} -> [ {id: 1}, {id: 3}]
But then I need to fire two searches for every search, but maybe that is the only option?
You can make use of nested data type along with function_score query to get the desired result.
You need to change the way you are storing image data. The mapping will be as below:
PUT test
{
"mappings": {
"_doc": {
"properties": {
"id": {
"type": "integer"
},
"image": {
"type": "nested",
"properties": {
"color": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"intensity": {
"type": "float"
}
}
}
}
}
}
}
Index the image data as below:
PUT test/_doc/1
{
"id": 1,
"image": [
{
"color": "red",
"intensity": 0.6
},
{
"color": "green",
"intensity": 0.1
},
{
"color": "blue",
"intensity": 0.3
}
]
}
The above corresponds to the first image data the you posted in the question. Similarly you can index other images data.
Now when a user search for red the query should be build as below:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "image",
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"match": {
"image.color": "red"
}
},
{
"range": {
"image.intensity": {
"gt": 0.5
}
}
}
]
}
},
"field_value_factor": {
"field": "image.intensity",
"modifier": "none",
"missing": 0
}
}
}
}
}
]
}
}
}
You can see in the above query that I have used the field value of image.intensity to calculate the score.
Related
Assume that we have this index in OpenSearch:
{
"settings": {
"index.knn": True,
"number_of_replicas": 0,
"number_of_shards": 1,
},
"mappings": {
"properties": {
"title": {"type": "text"},
"tag": {"type": "text"},
"e1": {
"type": "knn_vector",
"dimension": 512,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "nmslib",
"parameters": {"ef_construction": 512, "m": 24},
},
},
"e2": {
"type": "knn_vector",
"dimension": 512,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "nmslib",
"parameters": {"ef_construction": 512, "m": 24},
},
},
"e3": {
"type": "knn_vector",
"dimension": 512,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "nmslib",
"parameters": {"ef_construction": 512, "m": 24},
},
},
}
},
}
And we want to perform a search over all the fields (approximate knn for the vector fields). What would be the correct way to do this in OpenSearch?
I have this query that works but I'm not sure if it is the correct way of doing this and if it uses approximate knn:
{
"size": 10,
"query": {
"bool": {
"should": [
{
"function_score": {
"query": {
"knn": {
"e1": {
"vector": [0, 1, 2, 3],
"k": 10,
},
}
},
"weight": 1,
}
},
{
"function_score": {
"query": {
"knn": {
"e2": {
"vector": [0, 1, 2, 3],
"k": 10,
},
}
},
"weight": 1,
}
},
{
"function_score": {
"query": {
"knn": {
"e3": {
"vector": [0, 1, 2, 3],
"k": 10,
},
}
},
"weight": 1,
}
},
{
"function_score": {
"query": {
"match": {"title": "title"}
},
"weight": 0.1,
}
},
{
"function_score": {
"query": {"match": {"tag": "tag"}},
"weight": 0.1,
}
},
]
}
},
"_source": False,
}
In other words, I want to know how this which is for ElasticSearch can be done in OpenSearch.
Edit 1:
I want to do this Elasticsearch new feature in OpenSearch. The question is how and also what does the query mentioned above does exactly.
First of all, searching multiple kNN fields in Elasticsearch is not yet supported.
Here you can find the development, not yet released, related to issue #91187 and PR #92118 that was merged for version 8.7... the current version is 8.6.
Looking at the OpenSearch documentation for k-NN, it does not appear to be supported there either.
However, regarding the query you provided:
knn search was not defined well... the right way is, for example:
{
"query": {
"knn": {
"my_vector": {
"vector": [2, 3, 5, 6],
"k": 2
}
}
}
}
where my_vector is the name of your vector field while vector is the query vector (i.e. query text encoded into the corresponding vectors) that must have the same number of dimensions as the vector field you are searching against.
the match query value was not defined well. Here the documentation.
the use of the function_score is unclear and not properly correct.
Finally, if you are interested in vector search with OpenSearch, we recently wrote a blog post in which we provide a detailed description of the new neural search plugin introduced with version 2.4.0 through an end-to-end testing experience.
I need to search in array of ElasticSearch. I've documents like
{
"product_name": "iPhone 9",
"features":[
{
"color": "black",
"memory": "128GB"
},
{
"color": "white",
"memory": "64GB"
}
],
},
{
"product_name": "iPhone 9",
"features":[
{
"color": "black",
"memory": "64GB"
},
{
"color": "white",
"memory": "64GB"
}
],
}
I want to search iphone 9 with color = black and memory = 64GB. I'm using following query
_search?q=product_name:"iPhone 9"+AND+features.color:"black"+AND+features.memory:"64GB"
Only the second record from the document should get listed, but this query is displaying both the records as it matches color with first array and memory with second array. How can I achieve the correct result?
Elasticsearch has no concept of inner objects. Therefore, it flattens object hierarchies into a simple list of field names and values.
Your document will be transformed internally and stored as
{
"product_name" : "iPhone 9",
"features.color" : [ "black", "white" ],
"features.memory" : [ "128GB", "64GB" ]
}
The associate between color and memory is lost.
If you need to maintain independence of each inner object of array , you need to use nested type
Nested type can be only queried using nested query.
PUT index-name
{
"mappings": {
"properties": {
"features": {
"type": "nested"
}
}
}
}
PUT index-name/_doc/1
{
"product_name": "iPhone 9",
"features":[
{
"color": "black",
"memory": "128GB"
},
{
"color": "white",
"memory": "64GB"
}
],
}
GET index-name/_search
{
"query": {
"nested": {
"path": "features",
"query": {
"bool": {
"must": [
{ "match": { "features.color": "black" }},
{ "match": { "features.memory": "64GB" }}
]
}
}
}
}
}
I'm working with elasticsearch Query dsl, and I can't find a way to express the following:
Return results that have the field "price" > min budget and have "price" < max Budget and have has_price=true and also return all results that have "has_price=false"
In other words, I would like to use a range filter on results only that have has_price field set to true, otherwise, on results that have has_price set to false don't take in consideration the filter
Here's the mapping:
{
"formations": {
"mappings": {
"properties": {
"code": {
"type": "text"
},
"date": {
"type": "date",
"format": "dd/MM/yyyy"
},
"description": {
"type": "text"
},
"has_price": {
"type": "boolean"
},
"place": {
"type": "text"
},
"price": {
"type": "float"
},
"title": {
"type": "text"
}
}
}
}
}
The following query combines the 2 scenarios as 2 should clauses in a bool-query. And as there are only should clauses, minimum_should_match will be 1, meaning that at least one should-clause has to match:
Abstract Code Snippet
GET formations/_search
{
"query": {
"bool": {
"should": [
{ <1st scenario: has_price = false> },
{ <2nd scenario> has_price = true AND price IN budget_range}
]
}
}
}
Actual Sample Code Snippets
# 1. Create the index and populate it with some sample documents
POST formations/_bulk
{"index": {"_id": 1}}
{"has_price": true, "price": 2.0}
{"index": {"_id": 2}}
{"has_price": true, "price": 3.0}
{"index": {"_id": 3}}
{"has_price": true, "price": 4.0}
{"index": {"_id": 4}}
{"has_price": false, "price": 2.0}
{"index": {"_id": 5}}
{"has_price": false, "price": 3.0}
{"index": {"_id": 6}}
{"has_price": false, "price": 4.0}
# 2. Query assuming min_budget = 2.0 and max_budget = 4.0
GET formations/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"filter": {
"term": {
"has_price": false
}
}
}
},
{
"bool": {
"filter": [
{
"term": {
"has_price": true
}
},
{
"range": {
"price": {
"gt": 2,
"lt": 4
}
}
}
]
}
}
]
}
}
}
# 3. Result Snippet (4 hits: 3 from 1st scenario & 1 from 2nd scenario)
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
...
Don't forget to add the Claus "minimum_should_match": 1 to your bool-query in case you add another non-should-clause to your bool-query.
Let me know if this answers your question & solves your issue.
I have an object which looks something like this:
{
"id": 123,
"language_id": 1,
"label": "Pablo de la Pena",
"office": {
"count": 2,
"data": [
{
"id": 1234,
"is_office_lead": false,
"office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
},
{
"id": 5678,
"is_office_lead": false,
"office": {
"id": 2,
"address_line_1": "77 High Road",
"address_line_2": "Edinburgh",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "EH1 2DE",
"city_id": 2
}
}
]
},
"primary_office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
}
My Elasticsearch mapping looks like this:
"mappings": {
"item": {
"properties": {
"office": {
"properties": {
"data": {
"type": "nested",
}
}
}
}
}
}
My Elasticsearch query looks something like this:
GET consultant/item/_search
{
"from": 0,
"size": 24,
"query": {
"bool": {
"must": [
{
"term": {
"language_id": 1
}
},
{
"term": {
"office.data.office.city_id": 1
}
}
]
}
}
}
This returns zero results, however, if I remove the second term and leave it only with the language_id clause, then it works as expected.
I'm sure this is down to a misunderstading on my part of how the nested object is flattened, but I'm out of ideas - I've tried all kinds of permutations of the query and mappings.
Any guidance hugely appreciated. I am using Elasticsearch 6.1.1.
I'm not sure if you need the entire record or not, this solution gives every record that has language_id: 1 and has an office.data.office.id: 1 value.
GET consultant/item/_search
{
"from": 0,
"size": 100,
"query": {
"bool":{
"must": [
{
"term": {
"language_id": {
"value": 1
}
}
},
{
"nested": {
"path": "office.data",
"query": {
"match": {
"office.data.office.city_id": 1
}
}
}
}
]
}
}
}
I put 3 different records in my test index for proofing against false hits, one with different language_id and one with different office ids and only the matching one returned.
If you only need the office data, then that's a bit different but still solvable.
I've got a very small dataset of documents put in ES :
{"id":1, "name": "John", "team":{"code":"red", "position":"P"}}
{"id":2, "name": "Jack", "team":{"code":"red", "position":"S"}}
{"id":3, "name": "Emily", "team":{"code":"green", "position":"P"}}
{"id":4, "name": "Grace", "team":{"code":"green", "position":"P"}}
{"id":5, "name": "Steven", "team":[
{"code":"green", "position":"S"},
{"code":"red", "position":"S"}]}
{"id":6, "name": "Josephine", "team":{"code":"red", "position":"S"}}
{"id":7, "name": "Sydney", "team":[
{"code":"red", "position":"S"},
{"code":"green", "position":"P"}]}
I want to query ES for people who are in the red team, with position P.
With the request
curl -XPOST 'http://localhost:9200/teams/aff/_search' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}'
I've got a wrong result.
ES gives
"name": "John",
"team":
{ "code": "red", "position": "P" }
and
"name": "Sydney",
"team":
[
{ "code": "red", "position": "S"},
{ "code": "green", "position": "P"}
]
For the last entry, ES took the property code=red in the first record and took the property position=P in the second record.
How can I specify that the search must match the 2 two terms in the same record (within or not a list of nested records) ?
In fact, the good answer is only the document 1, with John.
Here is the gist that creates the dataset :
https://gist.github.com/flrt/4633ef59b9b9ec43d68f
Thanks in advance
When you index document like
{
"name": "Sydney",
"team": [
{"code": "red", "position": "S"},
{"code": "green","position": "P"}
]
}
ES implicitly create inner object for your field (team in particular example) and flattens it to structure like
{
'team.code': ['red', 'green'],
'team.position: ['S', 'P']
}
So you lose your order. To avoid this you need explicitly put nested mapping, index your document as always and query them with nested query
So, this
PUT so/nest/_mapping
{
"nest": {
"properties": {
"team": {
"type": "nested"
}
}
}
}
PUT so/nest/
{
"name": "Sydney",
"team": [
{
"code": "red",
"position": "S"
},
{
"code": "green",
"position": "P"
}
]
}
GET so/nest/_search
{
"query": {
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}
}
}
will result with empty hits.
Further reading on relation management: https://www.elastic.co/blog/managing-relations-inside-elasticsearch
You can use a Nested Query so that your searches happen individually on the subdocuments in the team array, rather than across the entire document.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{ "match": { "team.code": "red" } },
{ "match": { "team.position": "P" } }
]
}
}
}
}
]
}
}
}