Elastic Search max on string fields - elasticsearch

In SQL, it is possible to use MAX() on string fields to get a distinct value (assuming the group by is correct).
However this is not possible in ElasticSearch, since MAX only works on numeric fields. However I want to retrieve the values of some string fields after my aggregations, so I can display these values.
eg assuming a generic books structure
{
"aggs" : {
"group_by_author" : { "terms" : { "field" : "author"},
"aggs" : {
"books_published" : { "sum" : { "field" : "name"}},
"distinct_title" : { "max" : {"field" : "some_relevant_field_name"}}
}
}
}
}
Here I cannot perform the max on some_relevant_field_name since it is a string. Is there an alternative way to do this apart from more aggregations ?

If you want to find the distinct book titles for each author, maybe should your try to use the "terms" aggregation in the "distinct_title" field:
{
"aggs":{
"group_by_author":{
"terms":{
"field":"author"
},
"aggs":{
"books_published":{
"sum":{
"field":"name"
}
},
"distinct_title":{
"terms":{
"field":"some_relevant_field_name"
}
}
}
}
}
}
It should create buckets of book titles for each author as described in the documentation.

Related

How to query a specific list of indexes in elastic search

I am creating the date based elastic indexes like - logs-2017-06-10, logs-2018-07-10, logs-2019-06-11, date suffix can be any valid date.
How can i limit my search query to only search against specific days index.
for example if i want to seach between 2018-06-09 to 2018-06-11 then only below mentioned indexes should get searched against my query
logs-2018-06-09, logs-2018-06-10 and logs-2018-06-11
I tried wildcard * but it will not help here.
logs-2018-06-* will search in indexes logs-2018-06-01 to logs-2018-06-30 which is not my query.
how can i only limit it to
logs-2018-06-09,logs-2018-06-10 and logs-2018-06-11
GET /_search
{
"query": {
"indices" : {
"indices" : ["index1", "index2"],
"query" : { "term" : { "tag" : "wow" } },
"no_match_query" : { "term" : { "tag" : "kow" } }
}
}
}
From: https://www.elastic.co/guide/en/elasticsearch/reference/5.4/query-dsl-indices-query.html
I did not find any way to decide the dynamic list of indices against which I could run my search query.
As an alternative now I will run my search query against all the indices and based on date range query.
GET logs-2019-*/_search
{
"query": {
"range" : {
"timestamp" : {
"time_zone": "+01:00",
"gte": "2015-01-01 00:00:00",
"lte": "now"
}
}
}
}```

Elasticsearch tags aggregation with specific keys

I have an array field with tags and fixed list of 10 most popular tags (I got it from previous terms aggregations call).
Can I determine document counts for current search exactly with this keys (tags from my array)? Like terms aggregation, but for specific keys only.
Thanks!
Take a look at filtering terms aggregations, especially the include parameter. It would be easier to show you if you provided a specific example of your problem, but here is the example from the docs that should help you figure out how to solve your problem:
{
"aggs" : {
"JapaneseCars" : {
"terms" : {
"field" : "make",
"include" : ["mazda", "honda"]
}
},
"ActiveCarManufacturers" : {
"terms" : {
"field" : "make",
"exclude" : ["rover", "jensen"]
}
}
}
}
You can use include or exclude keywords inside aggregations to filter your keys.
{
"size": 0,
"aggs": {
"my_agg": {
"terms": {
"field": "agg_field",
"include": [key1,key2,key3]
}
}
}
}

ElasticSearch : IN equivalent operator in ElasticSearch

I am trying to find ElasticSearch query equivalent to IN \ NOT in SQL.
I know we can use QueryString query with multiple OR to get the same answer, but that ends up with lot of OR's.
Can anyone share the example?
Similar to what Chris suggested as a comment, the analogous replacement for IN is the terms filter (queries imply scoring, which may improve the returned order).
SELECT * FROM table WHERE id IN (1, 2, 3);
The equivalent Elasticsearch 1.x filter would be:
{
"query" : {
"filtered" : {
"filter" : {
"terms" : {
"id" : [1, 2, 3]
}
}
}
}
}
The equivalent Elasticsearch 2.x+ filter would be:
{
"query" : {
"bool" : {
"filter" : {
"terms" : {
"id" : [1, 2, 3]
}
}
}
}
}
The important takeaway is that the terms filter (and query for that matter) work on exact matches. It is implicitly an or operation, similar to IN.
If you wanted to invert it, you could use the not filter, but I would suggest using the slightly more verbose bool/must_not filter (to get in the habit of also using bool/must and bool).
{
"query" : {
"bool" : {
"must_not" : {
"terms" : {
"id" : [1, 2, 3]
}
}
}
}
}
Overall, the bool compound query syntax is one of the most important filters in Elasticsearch, as are the term (singular) and terms filters (plural, as shown).
1 terms
you can use terms term query in ElasticSearch that will act as IN
terms query is used to check if the value matches any of the provided values from Array.
2 must_not
must_not can be used as NOT in ElasticSearch.
ex.
GET my_index/my_type/_search
{
"query" : {
"bool" : {
"must":[
{
"terms": {
"id" : ["1234","12345","123456"]
}
},
{
"bool" : {
"must_not" : [
{
"match":{
"id" : "123"
}
}
]
}
}
]
}
}
}
exists
Also if it helps you can also use "exists" query to check if the field exists or not.
for ex,
check if the field exists
"exists" : {
"field" : "mobileNumber"
}
check if a field does not exist
"bool":{
"must_not" : [
{
"exists" : {
"field" : "mobileNumber"
}
}
]
}
I saw what you requested.
And I wrote the source code as below.
I hope this helps you solve your problem.
sql query :
select * from tablename where fieldname in ('AA','BB');
elastic search :
{
query :{
bool:{
must:[{
"script": {
"script":{
"inline": "(doc['fieldname'].value.toString().substring(0,2).toUpperCase() in ['AA','BB']) == true"
}
}
}],
should:[],
must_not:[]
}
}
}

Elastic Search - Filter value by latest date

In elastic search we have a document with different values. Each value have a period. A period tells if the value is still actual, and tells in what period the value is actual.
Example bellow:
"": [
{
"memberType": user,
"period": {
"validFrom": "1964-08-23",
"validTo": "2008-12-31"
},
},
{
"memberType": admin,
"period": {
"validFrom": "2008-12-31",
"validTo": null
}
}
]
In our query, I want to filter by memberType. But only deal with the newest type of member. So if I filter by memberType "user", the document above should not be a match, because the actual memberType is admin.
In the above example, I could filter with a boolean filter by memberType, and missing field on the validTo field.
But if the person is not valid longer at all, both will have a validTo date defined, and I have to look at the newest date then.
How can I achieve that? I'm thinking of a nested query, or a custom script filter. But I dont know how to express the query.
Thanks in advance
Provided that your field name for this array is memberDetails, you can use this query to achieve what you need.
{
"query" : { "match_all" : {} },
"filter" : {
"nested" : {
"path" : "memberDetails",
"filter" : {
"bool" : {
"must" : [
{
"term" : {"memberType" : "user"}
},
{
"missing" : { "field" : "validTo" }
}
]
}
}
}
}
}

Elasticsearch: Aggregate results of query

I have an elasticsearch index containing products, which I can query for different search terms. Every product contains a field shop_id to reference the shop it belongs to. Now I try to display a list of all shops holding products for my query. (To filter by shops)
As far as I read on similar questions, I have to use an aggregation. Finally I built this query:
curl -XGET 'http://localhost:9200/searchindex/_search?search_type=count&pretty=true' -d '{
"query" : {
"match" : {
"_all" : "playstation"
}
},
"aggregations": {
"shops_count": {
"terms": {
"field": "shop_id"
}
}
}
}'
This should search for playstation and aggregate the results based on shop_id. Sadly it only returns
Data too large, data would be larger than limit of [8534150348]
bytes].
I also tried it with queries returning only 2 results.
The index contains more than 90,000,000 products.
I would suggest thats a job for a filter aggregation.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html
Note: I don't know your product mapping in your index, so if that filter below doesn't work, try another filter from http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filters.html
{
"aggs" : {
"in_stock_playstation" : {
"filter" : { "term" : { "change_me_to_field_for_product" : "playstation" } } },
"aggs" : {
"shop_count" : { "terms" : { "field" : "shop_id" } }
}
}
}
}

Resources