I need some help regarding querying in elasticsearch.
So basically, the api looks something like this:
{
"took": 58,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1020900,
"max_score": 1,
"hits": [
{
"_index": "index-20192029",
"_type": "_doc",
"_id": "urn:22291760",
"_score": 1,
"_source": {
"user_id": 1234567,
"document": [
{
"documentType": "application/pdf",
"documentUrl": "http://somethingxyz1234.pdf"
},
{
"documentType": "application/xml",
"documentUrl": "http://somethingxyz1234.xml"
}
], .....
How do I only get the url that is an xml?
I tried doing
"_source": ["user_id", "document.documentType", "document.documentUrl"],
"query": {
"bool": {
"match": { "document.documentType" :"application/xml"}
}
}
But that also included the pdf.
I just want the documentUrl to give only the url that's xml.
Thanks
If document is nested you can use inner_hits to get the document query match.
GET test/_search
{
"query": {
"nested": {
"path": "document",
"query": {
"term": {
"document.documentType": {
"value": "application/pdf"
}
}
},
"inner_hits": {}
}
}
}
Results:
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "pv36jIIB-X7q7ErxEhyg",
"_score" : 0.6931471,
"_source" : {
"document" : [
{
"documentType" : "application/pdf",
"documentUrl" : "http://somethingxyz1234.pdf"
},
{
"documentType" : "application/xml",
"documentUrl" : "http://somethingxyz1234.xml"
}
]
},
"inner_hits" : {
"document" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "pv36jIIB-X7q7ErxEhyg",
"_nested" : {
"field" : "document",
"offset" : 0
},
"_score" : 0.6931471,
"_source" : {
"documentType" : "application/pdf",
"documentUrl" : "http://somethingxyz1234.pdf"
}
}
]
}
}
}
}
]
Related
I have some documents contains fields: id, size, etc
And I want find all the possible size where id = 1 or 2
Is this possible?
You can use terms query with source filtering. Adding a working example
Index Data:
{
"id": 1,
"size": 1
}
{
"id": 2,
"size": 2
}
{
"id": 3,
"size": 3
}
Search Query:
{
"_source": "size",
"query": {
"terms": {
"id": [
1,
2
]
}
}
}
Search Result:
"hits": [
{
"_index": "66381642",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"size": 1
}
},
{
"_index": "66381642",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"size": 2
}
}
]
If you want to show the possible sizes of that Ids then you should use an aggregation.
POST your_index/_search
{
"size": 0,
"query": {
"terms": {
"id": [
"1",
"2"
]
}
},
"aggs": {
"sizes": {
"terms": {
"field": "size"
}
}
}
}
The response will be the unique size with the amount of docs with that size
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"sizes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 1
},
{
"key" : 2,
"doc_count" : 1
}
]
}
}
}
I'm collecting logs through Elastic Search. And I look up the results through a query.
When inquiring with the following query
GET test/_search
{
"query": {
"match_all":{
}
}
}
The result is inquired as follows.
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 100,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test",
"_id" : "1a2b3c4d5e6f",
"_score" : 1.0,
"_source" : {
"team" : "Marketing"
"number" : "3"
"name" : "Mark"
}
},
{
"_index" : "test",
"_id" : "1a2b3c4d5e66",
"_score" : 1.0,
"_source" : {
"team" : "HR"
"number" : "1"
"name" : "John"
}
},
........
but, I want to be inquired as below.(Specific value of Inner_hits)
{
"name": "Mark"
},
{
"name": "John"
},
So, How can I query a specific value inner_hits?
Thanks.
You could simply use the source_filtering feature of ES, so in your case, your query will like below:
{
"_source": "name",
"query": {
"match_all": {}
}
}
And it returns search results like
"hits": [
{
"_index": "64214413",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"name": "Mark"
}
},
{
"_index": "64214413",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"name": "John"
}
}
]
I have the following data in an Elasticsearch index called products
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "products",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"prod_id" : 1,
"currency" : "USD",
"price" : 1
}
},
{
"_index" : "products",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"prod_id" : 2,
"currency" : "INR",
"price" : 60
}
},
{
"_index" : "products",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"prod_id" : 3,
"currency" : "EUR",
"price" : 2
}
},
{
"_index" : "products",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"prod_id" : 5,
"currency" : "MYR",
"price" : 1
}
}
]
}
}
I am sorting the data based on the price field,
I have the following script to do so -
GET products/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [{
"script_score": {
"script": {
"params": {
"USD": 1,
"SGD": 0.72,
"MYR": 0.24,
"INR": 0.014,
"EUR": 1.12
},
"source": "doc['price'].value * (doc.currency.value == 'eur'? params.EUR : doc.currency.value == 'myr' ? params.MYR : doc.currency.value == 'inr' ? params.INR : 1)"
}
}
}]
}
},
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}
Because the field currency in the product index is of type text,
it is indexed with Standard Analyzer, which converts it to lower case.
I wish to optimise this part of the script, As I may end up with 20-30 currencies -
"source": "doc['price'].value * (doc.currency.value == 'eur'? params.EUR : doc.currency.value == 'myr' ? params.MYR : doc.currency.value == 'inr' ? params.INR : 1)"
I was able to optimize the source script with the following working solution -
GET products/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [{
"script_score": {
"script": {
"params": {
"USD": 1,
"SGD": 0.72,
"MYR": 0.24,
"INR": 0.014,
"EUR": 1.12
},
"source": "doc['price'].value * params[doc['currency.keyword'].value]"
}
}
}]
}
},
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}
I have index in Elasticsearch. Documents in it have duplicate field values. And in query result I need to remove all duplicates, and get only distinct values. For example:
PUT localhost:9200/person
{
"mappings" : {
"person" : {
"properties" : {
"name" : { "type" : "keyword" }
}
}
}
}
POST localhost:9200/person/person
{
"name": "John"
}
{
"name": "John"
}
{
"name": "Marry"
}
{
"name": "Tomas"
}
I'm trying to remove duplicated with terms aggregation by field "name", but it doesn't work.
GET localhost:9200/person/person/_search
{
"size": 3,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": "dasdfdLBpnM0"
}
}
]
}
},
"aggs": {
"top-names": {
"terms": {
"field": "name",
"size": 3
},
"aggs": {
"top_names_hits": {
"top_hits": {
"size": 1
}
}
}
}
}
}
Result:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 0.9506482,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "H-5D8GoB8pRyckNSVUeN",
"_score": 0.9506482,
"_source": {
"name": "Tomas"
}
},
{
"_index": "person",
"_type": "person",
"_id": "He5D8GoB8pRyckNSPEfa",
"_score": 0.7700638,
"_source": {
"name": "John"
}
},
{
"_index": "person",
"_type": "person",
"_id": "HO5D8GoB8pRyckNSN0fo",
"_score": 0.71723765,
"_source": {
"name": "John"
}
}
]
},
"aggregations": {
"top-names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "John",
"doc_count": 2,
"top_names_hits": {
"hits": {
"total": 2,
"max_score": 0.7700638,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "He5D8GoB8pRyckNSPEfa",
"_score": 0.7700638,
"_source": {
"name": "John"
}
}
]
}
}
},
{
"key": "Marry",
"doc_count": 1,
"top_names_hits": {
"hits": {
"total": 1,
"max_score": 0.66815424,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "Iu5D8GoB8pRyckNScUdv",
"_score": 0.66815424,
"_source": {
"name": "Marry"
}
}
]
}
}
},
{
"key": "Tomas",
"doc_count": 1,
"top_names_hits": {
"hits": {
"total": 1,
"max_score": 0.9506482,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "H-5D8GoB8pRyckNSVUeN",
"_score": 0.9506482,
"_source": {
"name": "Tomas"
}
}
]
}
}
}
]
}
}
}
Aggregation applied to documents with name = "Marry", but I don't understand why, and how can I apply aggregation only to query results.
Below is more or less Elasticsearch Query blueprint....
{
"size": n, // Return the n documents based on "query" section (to frontend)
"query": {
// Here is where you are supposed to mention what documents you want
// Any filter/bool/match query condition
// In your case, you haven't specified any correct condition.
// So basically, it would return all the documents or documents based on size parameter. In your case it returns 3.
},
"aggs":{
// This aggregation query would only be applied on documents
// based on documents filtered/matched by the "query" section.
// In your case it is applying aggregation on all documents of that index as per the comment I've mentioned in the above query section.
}
}
Aggregation Query:
To get what you are looking for simply make use of below simplified query which you had with Terms Aggregation with Top Hits as sub-aggregation.
POST person/_search
{
"size": 0, <------- This is to say, I don't want "query" results to be returned and that I only want below aggregation results.
"aggs": {
"top-names": {
"terms": {
"field": "name",
"size": 10
},
"aggs": {
"top_hits_documents": { <------- Top hits would return the actual documents
"top_hits": {
"size": 1
}
}
}
}
}
}
By specifying "size": 0, at the very top you are basically applying aggregation on all the documents and that you are not returning any query results.
You simply return aggregation results.
Response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.0,
"hits" : [ ] <------ Notice this. No query results returned
},
"aggregations" : { <------ Aggregation Result starts
"top-names" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "John", <------- This is to say there's a value called John
"doc_count" : 2, <------- John occurs in two documents.
"top_hits_documents" : {
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "person",
"_type" : "person",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "John"
}
}
]
}
}
},
{
"key" : "Marry",
"doc_count" : 1,
"top_hits_documents" : {
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "person",
"_type" : "person",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "Marry"
}
}
]
}
}
},
{
"key" : "Thomas",
"doc_count" : 1,
"top_hits_documents" : {
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "person",
"_type" : "person",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "Thomas"
}
}
]
}
}
}
]
}
}
}
Hope that helps!
This should be a no brainer as Jolt was built mainly to transform ES ,mongodb responses.But I am unable to figure it out
I want to parse ES response and print only selected fields. For instance I want to transform the response to
{
"time" : 63,
"totalhits":100,
"0";{ response1.field1,response1.field2},
"1";{ response2.field1,response2.field2},
"2";{ response3.field1,response3.field2},
}
{
"took" : 63,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1000,
"max_score" : null,
"hits" : [ {
"_index" : "bank",
"_type" : "account",
"_id" : "0",
"sort": [0],
"_score" : null,
"_source" : {"account_number":0,"balance":16623,"firstname":"Bradshaw","lastname":"Mckenzie","age":29,"gender":"F","address":"244 Columbus Place","employer":"Euron","email":"bradshawmckenzie#euron.com","city":"Hobucken","state":"CO"}
}, {
"_index" : "bank",
"_type" : "account",
"_id" : "1",
"sort": [1],
"_score" : null,
"_source" : {"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke#pyrami.com","city":"Brogan","state":"IL"}
}, ...
]
}
}
The spec file I got so far is
[
{
"operation": "shift",
"spec": {
"hits": {
"*": {
"*": "&"
}
}
}
}
]
figured it out.
[
{
"operation": "shift",
"spec": {
"took": "took",
"hits": {
"total": "total_hits",
"hits": {
"*": {
"_source": {
"country": "Response[&2].country",
"city": "Response[&2].city",
"year": "Response[&2].year"
}
}
}
}
}
}
]