ElasticSearch Aggregation Query possible? - elasticsearch

I have the following JSON structures which I want to run an aggregation query in ElasticSearch upon:
{
"user_id": 1,
"gender_id": 1,
"locale_id": 6,
"age": 38,
"hometown_city_id": 1002,
"current_city_id": 672,
"books": [
{
"b_id": 2065,
"b_name": "aut qui assumenda ut",
"b_cat": 56
},
{
"b_id": 2527,
"b_name": "libero et laudantium",
"b_cat": 132
},
...
]
}
What I basically want to do is for example to show the top 5 books the users with "gender_id": "1" (male) have read which also read the book with the "b_id": 2065.
Being a beginner with ElasticSearch I didn't come across a solution yet. I know there's the Aggegation Module (https://github.com/elasticsearch/elasticsearch/issues/3300) coming with v1.0, but I couldn't get it running.
If somebody already has implemented something similiar, please help! Thanks a lot in advance!

maybe something like this (not sure about the nesting books.field):
{
"query" : {
"match_all" : { }
},
"facets" : {
"filter_one" : {
"filter" : {
"and" : [
{"filter" : {"term" : { "gender_id" : 1 }},
{"filter" : {"term" : { "books.b_id" : 2065 }},
]
}
},
"book_cnt" : {
"terms" : {
"field" : "books.b_name",
"size" : 5
}
}
}
}

Thanks #mconlin for the hint. The query I got it working is the following:
{
"query": {
"filtered": {
"filter": {
"and" : [
{ "term": { "gender_id": 1 } },
{ "term": { "books.b_id": 2065} }
]
}
}
},
"facets": {
"book_cnt": {
"terms": {
"field": "books.b_id",
"size": 5
}
}
}
}
Turns out it's better to use the filter in the query itself instead of as a separate facet.

Related

Elasticsearch - Retrieving a list of documents which last status is 51

I have an index in Elasticsearch with this kind of documents:
"transactionId" : 5588,
"clientId" : "1",
"transactionType" : 1,
"transactionStatus" : 51,
"locationId" : 12,
"images" : [
{
"imageId" : 5773,
"imagePath" : "http://some/url/path",
"imageType" : "dummyData",
"ocrRead" : "XYZ999",
"imageName" : "SOMENUMBERSANDCHARACTERS.jpg",
"ocrConfidence" : "94.6",
"ROITopLeftCoordinate" : "839x251",
"ROIBottomRightCoordinate" : "999x323"
}
],
"creationTimestamp" : 1669645709130,
"current" : true,
"timestamp" : 1669646359686
It's an "add only" type of stack, where a record is never updated. For instance:
.- Adds a new record with "transactionStatus": 10
.- the transactionID changes status, then, adds a new record for the same transactionID with "transactionStatus": 51
and so on.
What I want to achieve, is get a list of 10 records whose last status is 51 but I can't write the correct query.
Here is what I've tried:
{ "size": 10,
"query": {
"match_all": {}
},
"collapse": {
"field": "transactionId",
"inner_hits": {
"name": "most_recent",
"size": 1,
"sort": [{"timestamp": "desc"}]
}
},
"post_filter": {
"term": {
"transactionStatus": "51"
}
}
}
If I change the "transactionStatus":51 on the post_filter term for, let's say 10, it gives me a transactionID record which last record is not 10.
I don't know if I could explain in a proper way. I apologize for my english, is not my native language.
GET test_status/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"transactionStatus": 51
}
}
]
}
},
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]
}
This one will filter and then sort by timestamp. Let me know if there is something missing.

How to get the best matching document in Elasticsearch?

I have an index where I store all the places used in my documents. I want to use this index to see if the user mentioned one of the places in the text query I receive.
Unfortunately, I have two documents whose name is similar enough to trick Elasticsearch scoring: Stockholm and Stockholm-Arlanda.
My test phrase is intyg stockholm and this is the query I use to get the best matching document.
{
"size": 1,
"query": {
"bool": {
"should": [
{
"match": {
"name": "intyig stockholm"
}
}
],
"must": [
{
"term": {
"type": {
"value": "4"
}
}
},
{
"terms": {
"name": [
"intyg",
"stockholm"
]
}
},
{
"exists": {
"field": "data.coordinates"
}
}
]
}
}
}
As you can see, I use a terms query to find the interesting documents and I use a match query in the should part of the root bool query to use scoring to get the document I want (Stockholm) on top.
This code worked locally (where I run ES in a container) but it broke when I started testing on a cluster hosted in AWS (where I have the exact same dataset). I found this explaining what happens and adding the search type argument actually fixes the issue.
Since the workaround is best not used on production, I'm looking for ways to have the expected result.
Here are the two documents:
// Stockholm
{
"type" : 4,
"name" : "Stockholm",
"id" : "42",
"searchableNames" : [
"Stockholm"
],
"uniqueId" : "Place:42",
"data" : {
"coordinates" : "59.32932349999999,18.0685808"
}
}
// Stockholm-Arlanda
{
"type" : 4,
"name" : "Stockholm-Arlanda",
"id" : "1832",
"searchableNames" : [
"Stockholm-Arlanda"
],
"uniqueId" : "Place:1832",
"data" : {
"coordinates" : "59.6497622,17.9237807"
}
}

Custom highlights in elastic search

I am a new bie to elastic search. I have a task where I have to highlight certain queries with specific tags.
I am using a similar query mentioned in elastic search intervals. The problem now is I have to highlight "my favourite food" with a html tag,say "favorite" and cold porridge / hot water with a different html tag, say "state".
How I can do that.
POST _search
{
"query": {
"intervals" : {
"my_text" : {
"all_of" : {
"ordered" : true,
"intervals" : [
{
"match" : {
"query" : "my favourite food",
"max_gaps" : 0,
"ordered" : true
}
},
{
"any_of" : {
"intervals" : [
{ "match" : { "query" : "hot water" } },
{ "match" : { "query" : "cold porridge" } }
]
}
}
]
},
"boost" : 2.0,
"_name" : "favourite_food"
}
}
}
}
You can use the Highlighting feature in Elasticsearch as follows:
GET /index_name/_search
{
"query": {},
"highlight": {
"fields": {
"content": {
"type": "unified",
"number_of_fragments": 0,
"pre_tags": [
"<first_filter>",
"<second_filter>",
"<third filter>"
],
"post_tags": [
"<first_filter>",
"<second_filter>",
"<third filter>"
]
}
}
}
}
The order in which the tags are applied depends on the order in which the filters applied. Also note that, applying number_of_fragments:0 returns the entire content with the tagged hits.

Highlight not working along with term lookup filter

I'm new to elastic search and have started exploring it from the past few days. My requirement is to get the matched keywords highlighted.
So I have 2 indices
http://localhost:9200/lookup/type/1?pretty
Output
{
"_index" : "lookup",
"_type" : "type",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source":{"terms":["Apache
Storm","Kafka","MR","Pig","Hive","Hadoop","Mahout"]}
}
And another one as following:-
http://localhost:9200/skillsetanalyzer/resume/_search?fields=keySkills
output
{"took":19,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":3,"max_score":1.0,"hits":[{"_index":"skillsetanalyzer","_type":"resume","_id":"1","_score":1.0,"fields":{"keySkills":["Core
Java","J2EE","Struts 1.x","SOAP based
Web Services using JAX-WS","Maven","Ant","JMS","Apache
Storm","Kafka","RDBMS
(MySQL","Tomcat","Weblogic","Eclipse","Toad","TIBCO
product Suite (Administrator","Business
Work","Designer","EMS)","CVS","SVN"]}},
And below query returns the correct results but does not highlight the matched keywords.
curl -XGET 'localhost:9200/skillsetanalyzer/resume/_search?pretty' -d '
{
"query":
{"filtered":
{"filter":
{"terms":
{"keySkills":
{"index":"lookup",
"type":"type",
"id":"1",
"path":"terms"
},
"_cache_key":"1"
}
}
}
},
"highlight": {
"fields":{
"keySkills":{}
}
}
}'
Field "KeySkills" is not analyzed and its type is String. I'm not able to make out what is wrong with the
query.
Please help in providing the necessary pointers.
~Shweta
Highlighting works against the Query, you are just filtering the results. You need to specify highlight_query along with your filters like this
{
"query": {
"filtered": {
"filter": {
"terms": {
"keySkills": [
"MR","Pig","Hive"
]
}
}
}
},
"highlight": {
"fields": {
"keySkills": {
"highlight_query": {
"terms": {
"keySkills": [
"MR","Pig","Hive"
]
}
}
}
}
}
}
I hope this helps.

Elasticsearch wildcard and sort it by how exact the query

I have a document like this
[{ name : 'Mark'}, { name : 'Aaron Mark'}, { name : 'Jane'}, { name : 'Mary'}, {'Mark Joseph'}]
every time i do a wildcard search of name : 'Mar*" it always return this
[{ name : 'Mark'}, {name : 'Aaron Mark'}, { name : 'Mary'}, {name :'Mark Joseph'}]
but the result that i want is like this
[{name : 'Mark'}, {name:'Mark Joseph'}, {name:'Mary'}, {name:'Aaron Mark'}]
Thanks
You can try a combination of query as below to get the desired order
"bool": {
"should": [
{
"wildcard": {
"name": {
"value": "*Mar*"
}
}
},
{
"prefix": {
"name": {
"value": "Mar*",
"boost": 2
}
}
},
{
"term": {
"name": {
"value": "Mar",
"boost": 4
}
}
}
]
}
This may not exactly give the result as you specified 'coz "Mar*" cannot differentiate between "Mark" and "Mary". But it is to give you an idea about what can be done.

Resources