Elasticsearch: Aggregate results of query - elasticsearch

I have an elasticsearch index containing products, which I can query for different search terms. Every product contains a field shop_id to reference the shop it belongs to. Now I try to display a list of all shops holding products for my query. (To filter by shops)
As far as I read on similar questions, I have to use an aggregation. Finally I built this query:
curl -XGET 'http://localhost:9200/searchindex/_search?search_type=count&pretty=true' -d '{
"query" : {
"match" : {
"_all" : "playstation"
}
},
"aggregations": {
"shops_count": {
"terms": {
"field": "shop_id"
}
}
}
}'
This should search for playstation and aggregate the results based on shop_id. Sadly it only returns
Data too large, data would be larger than limit of [8534150348]
bytes].
I also tried it with queries returning only 2 results.
The index contains more than 90,000,000 products.

I would suggest thats a job for a filter aggregation.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html
Note: I don't know your product mapping in your index, so if that filter below doesn't work, try another filter from http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filters.html
{
"aggs" : {
"in_stock_playstation" : {
"filter" : { "term" : { "change_me_to_field_for_product" : "playstation" } } },
"aggs" : {
"shop_count" : { "terms" : { "field" : "shop_id" } }
}
}
}
}

Related

How to query a specific list of indexes in elastic search

I am creating the date based elastic indexes like - logs-2017-06-10, logs-2018-07-10, logs-2019-06-11, date suffix can be any valid date.
How can i limit my search query to only search against specific days index.
for example if i want to seach between 2018-06-09 to 2018-06-11 then only below mentioned indexes should get searched against my query
logs-2018-06-09, logs-2018-06-10 and logs-2018-06-11
I tried wildcard * but it will not help here.
logs-2018-06-* will search in indexes logs-2018-06-01 to logs-2018-06-30 which is not my query.
how can i only limit it to
logs-2018-06-09,logs-2018-06-10 and logs-2018-06-11
GET /_search
{
"query": {
"indices" : {
"indices" : ["index1", "index2"],
"query" : { "term" : { "tag" : "wow" } },
"no_match_query" : { "term" : { "tag" : "kow" } }
}
}
}
From: https://www.elastic.co/guide/en/elasticsearch/reference/5.4/query-dsl-indices-query.html
I did not find any way to decide the dynamic list of indices against which I could run my search query.
As an alternative now I will run my search query against all the indices and based on date range query.
GET logs-2019-*/_search
{
"query": {
"range" : {
"timestamp" : {
"time_zone": "+01:00",
"gte": "2015-01-01 00:00:00",
"lte": "now"
}
}
}
}```

How to limit number of documents returned for each term in Elasticsearch terms query?

I try to get documents with specified list of terms, like this:
GET /_search
{
"query" : {
"terms" : {
"md5" : ["file_1", "file_2"]
}
}
}
Is it possible to limit Elasticsearch results just to one document for each term? So as a result, I should have one document for "file_1", one document for "file_2" and so on.
What I try to accomplish is to get Elasticsearch _id of the most recent document for each term in list. Can I do this in this way or it's necessary to do separate request for each term?
You have two different ways to get the N documents for each term.
One way is by performing one request for each term.
The other way is by using the top hits aggregation (see the documentation here).
GET /_search
{
"query" : {
"terms" : {
"md5" : ["file_1", "file_2"]
}
},
"aggs": {
"top-docs": {
"terms": {
"field": "md5",
"size": 3
},
"aggs": {
"top_tag_hits": {
"top_hits": {
"size" : 1
}
}
}
}
}
}

Elastic Search max on string fields

In SQL, it is possible to use MAX() on string fields to get a distinct value (assuming the group by is correct).
However this is not possible in ElasticSearch, since MAX only works on numeric fields. However I want to retrieve the values of some string fields after my aggregations, so I can display these values.
eg assuming a generic books structure
{
"aggs" : {
"group_by_author" : { "terms" : { "field" : "author"},
"aggs" : {
"books_published" : { "sum" : { "field" : "name"}},
"distinct_title" : { "max" : {"field" : "some_relevant_field_name"}}
}
}
}
}
Here I cannot perform the max on some_relevant_field_name since it is a string. Is there an alternative way to do this apart from more aggregations ?
If you want to find the distinct book titles for each author, maybe should your try to use the "terms" aggregation in the "distinct_title" field:
{
"aggs":{
"group_by_author":{
"terms":{
"field":"author"
},
"aggs":{
"books_published":{
"sum":{
"field":"name"
}
},
"distinct_title":{
"terms":{
"field":"some_relevant_field_name"
}
}
}
}
}
}
It should create buckets of book titles for each author as described in the documentation.

Difference between simple query containing query_string, and bool query in Elastic search

I wrote the following query to fetch records from an Elastic Search cluster.
{
"query" : {
"query_string" : {
"query" : "One Record"
}
},
"explain" : true
}
However, later I found out that the following query also produces the same results.
{
"query" : {
"bool" : {
"should" : {
"query_string" : {
"query" : "One Record"
}
}
}
},
"explain" : true
}
Will both the above queries always produce the same results?
A bool query merely combines other types of queries and adds over the scores , hence the above two queries will always give the same result.

How to search for a term and match a boolean condition

I've had good success getting results for searches using the below syntax, but I'm having trouble adding a boolean condition.
http://localhost:9200/index_name/type_name/_search?q=test
My documents look like:
{
"isbn":"9780307414922",
"name":"Dark of the Night",
"adult":false
}
Here's my best guess as to how to achieve what I'm trying to do.
{
"query_string": {
"default_field": "_all",
"query": "test"
},
"from": 0,
"size": 20,
"terms": {
"adult": true
}
}
However this results in "Parse Failure [No parser for element [query_string]]]; }]"
I'm using elastic search 0.20.5.
How can I match documents containing a search term the way "?q=test" does and filter by the document's adult property?
Thanks in advance.
Your adult == true clause has to be part of the query - you can't pass in a term clause as a top level parameter to search.
So you could add it to the query as a query clause, in which case you need to join both query clauses using a bool query, as follows:
curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"bool" : {
"must" : [
{
"query_string" : {
"query" : "test"
}
},
{
"term" : {
"adult" : true
}
}
]
}
},
"from" : 0,
"size" : 20
}
'
Really, though, query clauses should be used for:
full text search
clauses which affect the relevance score
However, your adult == true clause is not being used to change the relevance, and it doesn't involve full text search. It's more of a yes/no response, in other words it is better applied as a filter clause.
This means that you need to wrap your full text query (_all contains test) in a query clause which accepts both a query and a filter: the filtered query:
curl -XGET 'http://127.0.0.1:9200/_all/_search?pretty=1' -d '
{
"query" : {
"filtered" : {
"filter" : {
"term" : {
"adult" : true
}
},
"query" : {
"query_string" : {
"query" : "test"
}
}
}
},
"from" : 0,
"size" : 20
}
'
Filters are usually faster because:
they don't have to score documents, just include or exclude them
they can be cached and reused

Resources