Source filtering for first of nested objects - elasticsearch

I'm using source filtering on a query to just return a small set of information and within that information I want the first photo url of an object:
"query" : { ... },
"_source" : ["name", "photos.thumbnail_url"]
Of course this returns the thumbnail_url for all of the photos stored against each item.
Is there any way to get just the first nested photo's thumbnail_url using source filtering?
(I appreciate that one way to do it would be to index the first photo separately and just source filter to include that separate field but I'm asking whether it is possible without a separate field.)
(I've guessed a few approaches without luck: photos.0.thumbnail_url, photos[0].thumbnail_url, photos.first.thumbnail_url)

Just an attempt, as I don't know your mapping, with inner_hits:
{
"size": 5,
"_source": ["name"],
"query": {
"filtered": {
"query": {
"match": {
"name": "whatever"
}
},
"filter": {
"nested": {
"path": "photos",
"query": {
"match_all": {}
},
"inner_hits": {
"size": 1,
"_source": ["thumbnail_url"]
}
}
}
}
}
}
or use a script field:
"script_fields": {
"FIELD": {
"script": "urls=_source.photos.thumbnail_url;return urls[0]"
}
}
and I don't think it's possible to do this with source filtering.

Related

Nested Fields, Wildcard Queries and Aggregations in Elasticsearch

I have an index that collects web redirects data for various sites. I am using a nested field to collect the data as shown in the mapping below:
"chain": {
"type": "nested",
"properties": {
"url.position": {
"type": "long"
},
"url.full": {
"type": "text"
},
"url.domain": {
"type": "keyword"
},
"url.path": {
"type": "keyword"
},
"url.query": {
"type": "text"
}
}
}
As you can imagine, each document contains an array of url chains, the size of the array being equal to number of web redirects. I want to get aggregations based on wildcard/regexp matches to url.query field. Here is a sample query:
GET push_url_chain/_search
{
"query": {
"nested": {
"path": "chain",
"query": {
"regexp": {
"chain.url.query": "aff_c.*"
}
}
}
},
"size": 0,
"aggs": {
"dataFields": {
"nested": {
"path": "chain"
},
"aggs": {
"offers": {
"terms": {
"field": "chain.url.domain",
"size": 30
}
}
}
}
}
}
The above query does produce aggregated results but not the way I want.
I want to see chain.url.domain aggregations for the urls that contain the aff_c.* phrase. Right now it is looking at all the urls in the chain and then aggregating the buckets by doc_count regardless of whether that url/domain has the particular phrase. I hope I have been able to explain this clearly. How do I get my results to show bucket aggregations that contain domains that have aff_c.* phrase match to the query field of the url.
I would also like to know how I can use = or / in my wildcard or regexp queries. It is not producing any results if I use the above symbols in my queries.
Tha
Nested query returns all documents where a nested document matches the condition, you get matched nested docs only in inner_hits.
Aggregation is applied on top of these documents, so all domains are coming in terms
You need to use nested aggregation to gets only matching terms.
{
"size": 0,
"aggs": {
"Name": {
"nested": {
"path": "chain"
},
"aggs": {
"matched_doc": {
"filter": { --> filter for url
"match_phrase_prefix": {
"chain.url.query": "abc"
}
},
"aggs": {
"domain": {
"terms": {
"field": "chain.url.domain", -- terms for matched url
"size": 10
}
}
}
}
}
}
}
}
You can use match_phrase_prefix instead of regex. It has better performance.
Standard analyzer while generating tokens removes "/","=". So if you want to use regex or wildcard and look for these , you need to use keyword field not text field.

How to count number of objects in a nested field in elastic search?

How to count number of objects in a nested filed in elastic search?
Sample mapping :
"base_keywords": {
"type": "nested",
"properties": {
"base_key": {
"type": "text"
},
"category": {
"type": "text"
},
"created_at": {
"type": "date"
},
"date": {
"type": "date"
},
"rank": {
"type": "integer"
}
}
}
I would like to count number of objects in nested filed 'base_keywords'.
You would need to do this with inline script. This is what worked for me: (Using ES 6.x):
GET your-indices/_search
{
"aggs": {
"whatever": {
"sum": {
"script": {
"inline": "params._source.base_keywords.size()"
}
}
}
}
}
Aggs are normally good for counting and grouping, for nested documents you can use nested aggs:
"aggs": {
"MyAggregation1": {
"terms": {
"field": "FieldA",
"size": 0
},
"aggs": {
"BaseKeyWords": {
"nested": { "path": "base_keywords" },
"aggs": {
"BaseKeys": {
"terms": {
"field": "base_keywords.base_key.keyword",
"size": 0
}
}
}
}
}
}
}
You don't specify what you want to count, but aggs are quite flexible for grouping and counting data.
The "doc_count" and "key" behave similar to an sql group by + count()
Updated (This assumes you have a .keyword field create the "keys" values, since a property of type "text" can't be aggregated or counted:
{
"aggs": {
"MyKeywords1Agg": {
"nested": { "path": "keywords1" },
"aggs": {
"NestedKeywords": {
"terms": {
"field": "keywords1.keys.keyword",
"size": 0
}
}
}
}
}
}
For simply counting the number of nested keys you could simply do this:
{
"aggs": {
"MyKeywords1Agg": {
"nested": { "path": "keywords1" }
}
}
}
If you want to get some grouping on the field values on the "main" document or the nested documents, you will have to extend your mapping / data model to include terms that are aggregatable, which includes most data types in elasticsearch except "text", ex.: dates, numbers, geolocations, keywords.
Edit:
Example with aggregating on a unique identifier for each top level document, assuming you have a property on it called "WordMappingId" of type integer
{
"aggs": {
"word_maping_agg": {
"terms": {
"field": "WordMappingId",
"size": 0,
"missing": -1
},
"aggs": {
"Keywords1Agg": null,
"nested": { "path": "keywords1" }
}
}
}
}
If you don't add any properties to the "word_maping" document on the top level there is no way to do an aggregation for each unique document. The builtin _id field is by default not aggregateable, and I suggest you include a unique identifier from the source data on the top level to aggregate on.
Note: the "missing" parameter will put all documents that don't have the WordMappingId property set in a bucked with the supplied value, this makes sure you're not missing any documents in the search results.
Aggs can support a behaviour similar to a group by in SQL, but you need something to actually group it by, and according to the mapping you supplied there are no such fields currently in your index.
I was trying to do similar to understand production data distribution
The following query helped me find top 5
{
"query": {
"match_all": {}
},
"aggs": {
"n_base_keywords": {
"nested": { "path": "base_keywords" },
"aggs": {
"top_count": { "terms": { "field": "_id", "size" : 5 } }
}
}
}
}

Elasticsearch - Aggregations on part of bool query

Say I have this bool query:
"bool" : {
"should" : [
{ "term" : { "FirstName" : "Sandra" } },
{ "term" : { "LastName" : "Jones" } }
],
"minimum_should_match" : 1
}
meaning I want to match all the people with first name Sandra OR last name Jones.
Now, is there any way that I can get perform an aggregation on all the documents that matched the first term only?
For example, I want to get all of the unique values of "Prizes" that anybody named Sandra has. Normally I'd just do:
"query": {
"match": {
"FirstName": "Sandra"
}
},
"aggs": {
"Prizes": {
"terms": {
"field": "Prizes"
}
}
}
Is there any way to combine the two so I only have to perform a single query which returns all of the people with first name Sandra or last name Jones, AND an aggregation only on the people with first name Sandra?
Thanks alot!
Use post_filter.
Please refer the following query. Post_filter will make sure that your bool should clause don't effect your aggregation scope.
Aggregations are filtered based on main query as well, but they are unaffected by post_filter. Please refer to the link
{
"from": 0,
"size": 20,
"aggs": {
"filtered_lastname": {
"filter": {
"query": {
"match": {
"FirstName": "sandra"
}
}
},
"aggs": {
"prizes": {
"terms": {
"field": "Prizes",
"size": 10
}
}
}
}
},
"post_filter": {
"bool": {
"should": [{
"term": {
"FirstName": "Sandra"
}
}, {
"term": {
"LastName": "Jones"
}
}],
"minimum_should_match": 1
}
}
}
Running a filter inside the aggs before aggregating on prizes can help you achieve your desired usecase.
Thanks
Hope this helps

How to display "ALL" the nested documents in an object in separate rows from elasticsearch?

I have a nested object in the following form:
{
"name": "Multi G. Enre",
"books": [
{
"name": "Guns and lasers",
"genre": "scifi",
"publisher": "orbit"
},
{
"name": "Dead in the night",
"genre": "thriller",
"publisher": "penguin"
}
]
}
I tried the following JSON query for the above document:
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "books",
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"term": {
"books.publisher": "penguin"
}
},
{
"term": {
"books.genre": "thriller"
}
}
]
}
}
}
}
}
}
}
}
So,I would like to see the second nested document i.e. "Dead in the night" as the result but, for anything I search only the first document i.e. "Guns and lasers" is displayed in the table in elasticsearch head plugin.
So, is there any way I can display the nested documents separately based on the search query and not just the first document?
I'm new to elasticsearch,so would appreciate any type of responses. THANKYOU!
You need to use inner_hits in your query.
Moreover, if you want to only retrieve the matching nested document and nothing else, you can add "_source":["books"] to your query and only the matching nested books will be returned, nothing else.
UPDATE
Sorry, I misunderstood your comment. You can add "_source": false and the top-level document will not be returned. Only the nested matching document.

how to make a query on a field I have not defined a mapping for

I have a field current_country that I am adding to brands, and which has not been defined in my elasticsearch mapping.
I would like to do a filtered query on this, since it is not defined I suppose it is not analyzed and a term query should work.
This is the query I am doing
{
"index": "products",
"type": "brand",
"body": {
"from": 0,
"size": 100,
"sort": [
{
"n_name": "asc"
}
],
"query": {
"filtered": {
"query": {
"function_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"current_country": "DK"
}
}
]
}
}
}
}
}
}
}
}
which returns no documents from the index.
I run the following query to check if current country exists
{
"index": "products",
"type": "brand",
"body": {
"from": 0,
"size": 100,
"sort": [
{
"n_name": "asc"
}
],
"query": {
"filtered": {
"query": {
"function_score": {
"filter": {
"bool": {
"must": [
{
"exists": {
"field": "current_country"
}
}
]
}
}
}
}
}
}
}
}
which returns a total of 693 documents.
here is an example document from the index, returned when I ran the query above.
{
"_index": "products",
"_type": "brand",
"_id": "195da951241478LuxoLivingbrand",
"_score": null,
"_source": {
"categories": [
"Bordlamper og designer bordlamper der giver liv og lys"
],
"image": "http://www.fotoagent.dk/single_picture/11385/138/mega/and_tradition_flowerpot_bordlampe_lilla.jpg",
"top_price": 1695,
"low_price": 1695,
"n_name": "&Tradition",
"name": "&Tradition",
"current_country": "DK",
"current_currency": "DKK"
}
}
How can I query against current_country (preferably a filtered query).
If you do not define any mapping for a field, elasticsearch tries to detect the field as string/date/numeric. If it detects the field as string then it will use the default analyzer (standard analyzer) to analyze your input. Since standard analyzer uses lowercase token filter your input string is indexed as "dk". As term filters does not analyze the input, "DK" won't match "dk".
It can be solved by various means.
(hack) You can lowercase your input filter term. this won't work for phrases.
(better) define a mapping for your input. You can dynamically change mapping/ add new mapping easily

Resources