Aggregation in elastic search by field value data - elasticsearch

I have below set of data and I want aggregation as per the status. Not sure how to compare the value of status with rejected or success and get the count of result.
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2874,
"max_score": 1,
"hits": [
{
"_index": "testfiles",
"_type": "testfiles",
"_id": "testfile.one",
"_score": 1,
"_source": {
"businessDate": 20171013,
"status": "Success"
}
},
{
"_index": "testfiles",
"_type": "testfiles",
"_id": "testfile.two",
"_score": 1,
"_source": {
"businessDate": 20171013,
"status": "Success"
}
},
{
"_index": "testfiles",
"_type": "testfiles",
"_id": "testfile.three",
"_score": 1,
"_source": {
"businessDate": 20171013,
"status": "Rejected"
}
},
{
"_index": "testfiles",
"_type": "testfiles",
"_id": "testfile.four",
"_score": 1,
"_source": {
"businessDate": 20171013,
"status": "Rejected"
}
}
]
}
}
Can someone help to how to achieve this in elastic search aggregation.
Expected response something like below
"aggregations": {
"success_records": 2,
"rejected_records": 2
}

Assuming status field is of type text, you'll need to update it to multi-fields having a keyword type needed for aggregation. Then query using:
GET my_index/_search
{
"size": 0,
"aggs": {
"statuses": {
"terms": {
"field": "status.raw"
}
}
}
If you already have status as keyword field, then change status.raw to status in the above query.

Related

How to sort by date with ElasticSearch

I have an index with a date field as following:
{
"properties": {
"productCreationDate": {
"format": "YYYY-MM-dd'T'HH:mm:ssXXX",
"type": "date"
},
}
}
When I perform a search that way:
{
"size": 5,
"from": 0,
"sort": [
{
"productCreationDate": {
"order": "desc"
}
}
],
"track_scores": false
}
I get the documents in the inserting order an note the field order on ElasticSearch 7.9:
{
"took": 24,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 6,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^14",
"_score": null,
"_source": {
"productCreationDate": "2020-08-14T18:21:51+02:00",
},
"sort": [
1577722911000
]
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^28",
"_score": null,
"_source": {
"productCreationDate": "2020-08-28T18:21:51+02:00",
},
"sort": [
1577722911000
]
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^19",
"_score": null,
"_source": {
"productCreationDate": "2020-08-19T18:21:51+02:00",
},
"sort": [
1577722911000
]
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^27",
"_score": null,
"_source": {
"productCreationDate": "2020-08-27T18:21:51+02:00",
},
"sort": [
1577722911000
]
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "product^26",
"_score": null,
"_source": {
"productCreationDate": "2020-08-26T18:21:51+02:00",
},
"sort": [
1577722911000
]
}
]
}
}
What do I miss?
Edit: Thanks to #zaid warsi and #Yeikel I have changed the format to yyyy and I have a new order:
15
26
27
19
28
14
Which is even weirder since I asked for 5 documents.
YYYY is not a correct inbuilt year format in Elasticsearch.
Try changing your date format to yyyy-MM-dd'T'HH:mm:ssXXX, it should work.
Refer this for valid inbuilt date formats, or you might need to define your own in the mapping.

mysql field="value" in elasticsearch

I want to display only the items that contain the word itself when "google" searches
How can I only search for items that have only the word "google"?
Request body
(Request created in postman)
{
"query": {
"bool": {
"must": [
{
"match": {
"body": "google"
}
}
]
}
}
}
Response body
(Request created in postman)
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 0.6587735,
"hits": [
{
"_index": "s_t",
"_type": "_doc",
"_id": "3",
"_score": 0.6587735,
"_source": {
"body": "google"
}
},
{
"_index": "s_t",
"_type": "_doc",
"_id": "4",
"_score": 0.5155619,
"_source": {
"body": "google map"
}
},
{
"_index": "s_t",
"_type": "_doc",
"_id": "5",
"_score": 0.5155619,
"_source": {
"body": "google-map"
}
}
]
}
}
I need this output
(Request created in postman)
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 0.69381464,
"hits": [
{
"_index": "s_t",
"_type": "_doc",
"_id": "3",
"_score": 0.69381464,
"_source": {
"body": "google"
}
}
]
}
}
In mysql with this query I reach my goal.
Similar query in mysql:
select * from s_t where body='google'
well i assume you automap or use a text in your mappings.
specify .keyword in your query. Note this is case sensitive.
{
"query": {
"bool": {
"must": [
{
"match": {
"body.keyword": "google"
}
}
]
}
}
}
If you only want to query your body field using exact match. You need to reindex it using keyword. Take a look at: Exact match in elastic search query

elasticsearch max aggregation return more than one result

Im running the following query:
POST myindex/_search
{
"aggs": {
"minSamp": {
"min": {
"field": "sample"
}
}
}
}
part of the result:
{
"took": 15,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 91,
"max_score": 1,
"hits": [
{
"_index": "myindex",
"_type": "myindex",
"_id": "HyYmY2oB06bGDsjT4C7Z",
"_score": 1,
"_source": {
"sample": 119267,
"age": 6,
"comp": 11
}
},
{
"_index": "myindex",
"_type": "myindex",
"_id": "HyYmY2oB06bGDsjT4C79",
"_score": 1,
"_source": {
"sample": 117100,
"age": 9,
"comp": 7
}
}
]
}
}
....
and I want to get only one response (what is the smallest "sample" value")
but I get lot of documents as response, full documents ,
1. what is wrong?
2. can I get one response for multiple indices? for example: if my query is for all indices that start with "my":
Thanks
POST my*/_search
In hits, it will be returning default 10 documents. You need to give size:0 in your query if you don't want to return documents i.e only aggregation is needed
"size":0,
"aggs": {
"minSamp": {
"min": {
"field": "sample"
}
}
}
link for reference.

Get all data of specific fields

I 'am using Elastic search 5.1.1,how to get all data specified for these below fields (FeatureValue,FeatureName)
sample document
{
"_index": "rawdata",
"_type": "feed",
"_id": "591031",
"_score": 1,
"_source": {
"sourceproductname": "1-5-Size Relays",
"zmfrid": 4,
"sourcetitle": null,
"featurename": "Coil Magnetic System",
"localsourcepath": null,
"sourcingdate": "2017-01-08T22:00:00.000Z",
"migrationstatus": 3,
"featrueunit": null,
"inputkeyword": null,
"#version": "1",
"sourcetype": "DirectFeed",
"id": 591031,
"sourceid": 674,
"partdataid": null,
"partid": null,
"featurecondition": null,
"sourcingstatus": null,
"sourcetaxonomypath": "1-5-Size Relays",
"sourcename": "CrunchBase ",
"tags": [],
"sourceurl": "N/A",
"#timestamp": "2017-01-10T11:51:54.095Z",
"featurevalue": "Non-Polarized, Monostable",
"mfr": "feed",
"partdataattributeid": null,
"supplierfamily": "null",
"partnumber": "4-1617072-5"
}
}
I tried this
POST /rawdata/feed/_search?pretty=true
{
"_source": ["FeatureValue", "FeatureName"],
"query": {
"match_all":{}
}
}
sample result
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 386424,
"max_score": 1,
"hits": [
{
"_index": "rawdata",
"_type": "feed",
"_id": "591031",
"_score": 1,
"_source": {}
}
You simply need to lowercase the field names in the source filter since they are lowercased in your documents
POST /rawdata/feed/_search?pretty=true
{
"_source": ["featurevalue", "featurename"], <--- change this
"query": {
"match_all":{}
}
}

Get every Nth result in Elasticsearch

I have this large set of data and I want a sample that I can use in a graph. For this I don't need all of the data, I need every Nth item.
For instance if I have 4000 results, and I only need 800 results, I want to be able to get every 5th result.
So some like: get, skip, skip, skip, skip, get, skip, skip, skip,..
I was wondering if such a thing is possible in Elasticsearch?
You're better off using a scripted filter. Otherwise you're needlessly using the score. Filters are just like queries, but they don't use scoring.
POST /test_index/_search
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "doc['unique_counter'].value % n == 0",
"params" : {
"n" : 5
}
}
}
}
}
}
You're also better off not using dynamic scripting in real world usage.
That said, you probably want to take a look at aggregations for graphing analytical information about your data rather than taking an arbitrary sample.
One way you could do it is with random scoring. It won't give you precisely every nth item according to a rigid ordering, but if you can relax that requirement this trick should do nicely.
To test it I set up a simple index (I mapped "doc_id" to "_id" just so the documents would have some contents, so that part isn't required, in case that's not obvious):
PUT /test_index
{
"mappings": {
"doc": {
"_id": {
"path": "doc_id"
}
}
}
}
Then I indexed ten simple documents:
POST /test_index/doc/_bulk
{"index":{}}
{"doc_id":1}
{"index":{}}
{"doc_id":2}
{"index":{}}
{"doc_id":3}
{"index":{}}
{"doc_id":4}
{"index":{}}
{"doc_id":5}
{"index":{}}
{"doc_id":6}
{"index":{}}
{"doc_id":7}
{"index":{}}
{"doc_id":8}
{"index":{}}
{"doc_id":9}
{"index":{}}
{"doc_id":10}
Now I can pull back three random documents like this:
POST /test_index/_search
{
"size": 3,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": "some seed"
}
}
]
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 0.93746644,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.93746644,
"_source": {
"doc_id": 1
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "10",
"_score": 0.926947,
"_source": {
"doc_id": 10
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "5",
"_score": 0.79400194,
"_source": {
"doc_id": 5
}
}
]
}
}
Or a different random three like this:
POST /test_index/_search
{
"size": 3,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": "some other seed"
}
}
]
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 0.817295,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "4",
"_score": 0.817295,
"_source": {
"doc_id": 4
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "8",
"_score": 0.469319,
"_source": {
"doc_id": 8
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 0.4374538,
"_source": {
"doc_id": 3
}
}
]
}
}
Hopefully it's clear how to generalize this method to what you need. Just take out however many documents you want, in however many chunks make it performant.
Here is all the code I used to test:
http://sense.qbox.io/gist/a02d4da458365915f5e9cf6ea80546d2dfabc75d
EDIT: Actually now that I think about it, you could also use scripted scoring to get precisely every nth item, if you set it up right. Maybe something like,
POST /test_index/_search
{
"size": 3,
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": "if(doc['doc_id'].value % 3 == 0){ return 1 }; return 0;"
}
}
]
}
}
}
...
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"_source": {
"doc_id": 3
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "6",
"_score": 1,
"_source": {
"doc_id": 6
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "9",
"_score": 1,
"_source": {
"doc_id": 9
}
}
]
}
}

Resources