In our Elasticsearch collection of products, we have an an array of hashes, called "nutrients". A partial example of the data would be:
"_source": {
"quantity": "150.0",
"id": 1001,
"barcode": "7610809001066",
"nutrients": [
{
"per_hundred": "1010.0",
"name_fr": "Énergie",
"per_portion": "758.0",
"name_de": "Energie",
"per_day": "9.0",
"name_it": "Energia",
"name_en": "Energy"
},
{
"per_hundred": "242.0",
"name_fr": "Énergie (kCal)",
"per_portion": "181.0",
"name_de": "Energie (kCal)",
"per_day": "9.0",
"name_it": "Energia (kCal)",
"name_en": "Energy (kCal)"
},
{
"per_hundred": "18.0",
"name_fr": "Matières grasses",
"per_portion": "13.5",
"name_de": "Fett",
"per_day": "19.0",
"name_it": "Grassi",
"name_en": "Fat"
},
In the search, we are trying to bring back the products based on an exact match of two of the fields contained in the nutrients array. What I am finding is the conditions seemed to be OR and not AND.
The two attempts have been:
"query": {
"bool": {
"must": [
{ "match": { "nutrients.name_fr": "Énergie" } },
{ "match": { "nutrients.per_hundred": "242.0" } }
]
}
}
}
and
"query": {
"filtered": {
"filter": {
"and": [
{ "term": { "nutrients.name_fr": "Énergie" } },
{ "term": { "nutrients.per_hundred": "242.0" } }
]
}
}
}
Both of these are in fact bringing back entries with Énergie and 242.0, but are also match on different name_fr, eg:
{
"per_hundred": "242.0",
"name_fr": "Acide folique",
"per_portion": "96.0",
"name_de": "Folsäure",
"per_day": "48.0",
"name_it": "Acido folico",
"name_en": "Folic acid"
},
They are also matching on a non exact match, i.e: matching also on "Énergie (kCal)" when we want to match only on "Énergie"
On your first problem:
You have to make the nutrients field nested, so you can query each object inside it for itself Elasticsearch Nested Objects.
Related
I have an ES index where one of my mappings stores a simple array of named entities pre-set at the point of ingestion.
I'm trying to search my index using a given array of entities, to return documents where containing many of the same entities.
Some code for illustration...
GET /test_data/_search
{
"query": {
"match": {
"entities": ['Trump', 'CNN', 'Oklahoma', 'Tiktok', 'Tulsa']
}
}
}
However, this returns a parse exception -- What would be the best method to search fields containing arrays using another array?
Thanks
If you're looking for exact matches then change match to terms -- this functions as an OR query:
GET /test_data/_search
{
"query": {
"terms": {
"entities": [
"Trump",
"CNN",
"Oklahoma",
"Tiktok",
"Tulsa"
]
}
}
}
otherwise use a bool-should array of match queries:
GET /test_data/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"entities": "Trump"
}
},
{
"match": {
"entities": "CNN"
}
},
{
"match": {
"entities": "Oklahoma"
}
},
...
]
}
}
}
You can define how many of them should match with the minimum_should_match param.
I have a document like this:
{
"listings": {
"mappings": {
"listing": {
"properties": {
"auctionOn": {
"type": "date"
},
"inspections": {
"type": "nested",
"properties": {
"endsOn": {
"type": "date"
},
"startsOn": {
"type": "date"
}
}
},
// more fields.. snipped for brevity
}
}
}
}
}
and i would like to perform the following search: (needs to be a bool filter.. no scoring req'd)
return documents any of the inspections.startsOn matches any of the dates provided (if they are provided)
OR
return documents where auctionOn matches the date provided (if it's provided)
they can also specify to search for a) inspections only, b) auctions only. if not provided, either of the dates need to match.
So in other words, possible searches:
Search where there are any inspections/auctions
Search where there are any inspections
Search where there are any auctions
Search where there are any inspections/auctions on the dates provided
Search where there are any inspections on the dates provided
Search where there are any auctions on the dates provided
Now, i'm already in a bool query filter:
{
"query":
{
"bool":
{
"filter":[{"terms":{"location.suburb":["Camden"]}}
}
}
}
and i need this new filter to be seperate. so.. this is like a nested or filter, within a main bool filter?
So if provided "Suburb = Camden, Dates = ['2018-11-01','2018-11-02']'
then it should return documents where the suburb = Camden and either the inspections or auction date includes one of the dates provided.
I'm kinda stumped on how to do it, so any help would be much appreciated!
There will lot of bool query combinations for the cases you mentioned in the question. Taking the example you mention i.e.
So if provided "Suburb = Camden, Dates = ['2018-11-01','2018-11-02']'
then it should return documents where the suburb = Camden and either
the inspections or auction date includes one of the dates provided.
Assuming your location filter is working as expected, for dates part in the above e.g. additions to the query will be:
{
"query": {
"bool": {
"filter": [
{
"terms": {
"location.suburb": [
"Camden"
]
}
},
{
"bool": {
"should": [
{
"terms": {
"auctionOn": [
"2018-11-01",
"2018-11-02"
]
}
},
{
"nested": {
"path": "inspections",
"query": {
"bool": {
"should": [
{
"terms": {
"inspections.startsOn": [
"2018-11-01",
"2018-11-02"
]
}
},
{
"terms": {
"inspections.endsOn": [
"2018-11-01",
"2018-11-02"
]
}
}
]
}
}
}
}
]
}
}
]
}
}
}
I'm using Elasticsearch with the python library and I have a problem using the search query when the object become a little bit complex. I have objects build like that in my index:
{
"id" : 120,
"name": bob,
"shared_status": {
"post_id": 123456789,
"text": "This is a sample",
"urls" : [
{
"url": "http://test.1.com",
"displayed_url": "test.1.com"
},
{
"url": "http://blabla.com",
"displayed_url": "blabla.com"
}
]
}
}
Now I want to do a query that will return me this document only if in one of the displayed URL's a substring "test" and there is a field "text" in the main document. So I did this query:
{
"query": {
"bool": {
"must": [
{"exists": {"field": "text"}}
]
}
}
}
}
But I don't know what query to add for the part: one of the displayed URL's a substring "test"
Is that posssible? How does the iteration on the list works?
If you didn't define an explicit mapping for your schema, elasticsearch creates a default mapping based on the data input.
urls will be of type object
displayed_url will be of type string and using standard analyzer
As you don't need any association between url and displayed_url, the current schema will work fine.
You can use a match query for full text match
GET _search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "text"
}
},
{
"match": {
"urls.displayed_url": "test"
}
}
]
}
}
}
I have such index:
{
"id":2,
"name":"test home",
"city_id":"31",
"county_id":"1",
"zip_code":"123",
"residencePlans":[
{
"id" : 1,
"unit_price_from":480240,
"bathrooms_count":3,
"interior_area_sqrft":23,
"floor_range_hight":5,
"bedrooms_count":5,
"elevator_type_id":4,
"price_psqft":3756,
},
{
"id" : 2,
"unit_price_from":123456,
"bathrooms_count":1,
"interior_area_sqrft":12,
"floor_range_hight":4,
"bedrooms_count":2,
"elevator_type_id":3,
"price_psqft":1234,
}
],
}
And then I use some filters. Some of them are applied to the top object, and some to nesting.
I need to query residencePlans, that match filter, applied for their. eg filter on residencePlans.bathrooms_count >= 3 should return only residence with id = 1 and not 2.
{
"id": [2],
"residencePlans.id": [1]
}
I marked residencePlans as nested mapping, but it doesn't help.
Checkout the documentation here: https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-query.html
And here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html
Something like this should do it
{
"query": {
"bool": {
"must": [
{ "match": { "id": 1 }},
{
"nested": {
"path": "residencePlans",
"query": {
"bool": {
"must": [
{ "gte": { "residencePlans.unit_price_from": 3 }}
]
}
}
}
}
]
},
inner_hits: {}
}
}
I've revised my answer to take into account the particulars of filtering your top level document and your nested documents. Please let me know if it works for you!
I have an object mapping that uses nested objects (props in our example) in a tag-like fashion.
Each tag can belong to a client/user and when we want to allow our users to generate query_string style searches against the props.name.
Issue is that when we run our query if an object has multiple props and if one of the many props match the filter when others don't the object is returned, when we want the opposite - if one returns false don't return vs. if one returns true return.
I have posted a comprehensive example here: https://gist.github.com/d2kagw/1c9d4ef486b7a2450d95
Thanks in advance.
I believe here you might need the advantage of a flattened list of values, like an array of values. The major difference between an array and nested objects is that the latter "knows" which value of a nested property corresponds to another value of another property in the same nested object. The array of values, on the other hand will flatten the values of a certain property and you lose the "association" between a client_id and a name. Meaning, with arrays you have props.client_id = [null, 2] and props.name = ["petlover", "premiumshopper"].
With your nested filter you want to match that string to all values for props.name meaning ALL nested props.names of one parent doc needs to match. Well, this doesn't happen with nested objects, because the nested documents are separate and are queried separately. And, if at least one nested document matches then it's considered a match.
In other words, for a query like "query": "props.name:(carlover NOT petlover)" you basically need to run it against a flattened list of values, just like arrays. You need that query ran against ["carlover", "petlover"].
My suggestion for you is to make your nested documents "include_in_parent": true (meaning, keep in parent a flattened, array-like list of values) and change a bit the queries:
for the query_string part, use the flattened properties approach to be able to match your query for a combined list of elements, not element by element.
for the match (or term, see below) and missing parts use the nested properties approach because you can have nulls in there. A missing on an array will match only if the whole array is missing, not one value in it, so here one cannot use the same approach as for the query, where the values were flattened in an array.
optional, but for the query match integer I would use term, as it's not string but integer and is by default not_analyzed.
These being said, with the above changes, these are the changes:
{
"mappings" : {
...
"props": {
"type": "nested",
"include_in_parent": true,
...
should (and does) return zero results
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{
"query": {
"query_string": { "query": "props.name:((carlover AND premiumshopper) NOT petlover)" }
}
},
{
"nested": {
"path": "props",
"filter": {
"or": [ { "query": { "match": { "props.client_id": 1 } } }, { "missing": { "field": "props.client_id" } } ]
}
}
}
]
}
}
}
}
should (and does) return just 1
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{"query": {"query_string": { "query": "props.name:(carlover NOT petlover)" } } },
{
"nested": {
"path": "props",
"filter": {
"or": [{ "query": { "match": { "props.client_id": 1 } } },{ "missing": { "field": "props.client_id" } } ]
}
}
}
]
}
}
}
}
should (and does) return just 2
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{ "query": {"query_string": { "query": "props.name:(* NOT carlover)" } } },
{
"nested": {
"path": "props",
"filter": {
"or": [{ "query": { "term": { "props.client_id": 1 } } },{ "missing": { "field": "props.client_id" } }
]
}
}
}
]
}
}
}
}