How to get the number of hits of several matching fields in one record? - elasticsearch

I have records similar to
{
"who": "John",
"hobby": [
{"name": "gardening",
"skills": 2
},
{"name": "sleeping",
"skills": 3
},
{"name": "darts",
"skills": 2
}
]
}
,
{
"who": "Mary",
"hobby": [
{"name": "gardening",
"skills": 2
},
{"name": "volleyball",
"skills": 3
},
{"name": "kung-fu",
"skills": 2
}
]
}
I am looking at building a query which would answer the question: "how many hobbies with skills=2 do we have?"
The answer for the example above would be 3 ("gardening" is common to both, and each have another unique one).
Every "query" or "query"+"aggs" I tried returns in ['hits']['hits'] or ['aggregations']['sources']['buckets'] the number of matching documents, that is two in the case above (one for "John" and one for "Mary", each of them satisfying the query).
Is there a way to build a query so that it returns the total number of fields (in the example above: the elements of the list "hobby") which matched that query? (fields, not documents)
Note: If my documents were flat:
{"who": "John", "name": "gardening", "skills": 2},
{"who": "John", "name": "sleeping", "skills": 3},
(...)
{"who": "Mary", "name": "kung-fu", "skills": 2}
then a simple "query" to match "skills": 2 + an aggregation on "name" would have done the work

Yes, you can achieve this with the nested type and using inner_hits and/or nested aggregations.
So here is the mapping you should use:
curl -XPUT localhost:9200/hobbies -d '{
"mappings": {
"hob": {
"properties": {
"who": {
"type": "string"
},
"hobby": {
"type": "nested", <--- the hobby list is of type nested
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"skills": {
"type": "integer"
}
}
}
}
}
}
}
Then we can insert your two sample documents using the _bulk endpoint like this:
curl -XPOST localhost:9200/hobbies/hob/_bulk -d '
{"index":{}}
{"who":"John", "hobby":[{"name": "gardening","skills": 2},{"name": "sleeping","skills": 3},{"name": "darts","skills": 2}]}
{"index":{}}
{"who":"Mary", "hobby":[{"name": "gardening","skills": 2},{"name": "volley-ball","skills": 3},{"name": "kung-fu","skills": 2}]}
'
And finally, we can query your index for how many hobbies have skills: 2 like this:
curl -XPOST localhost:9200/hobbies/hob/_search -d '{
"_source": false,
"query": {
"nested": {
"path": "hobby",
"query": {
"term": {
"hobby.skills": 2
}
},
"inner_hits": {} <---- this will return only the matching nested fields with skills=2
}
},
"aggs": {
"hobbies": {
"nested": {
"path": "hobby"
},
"aggs": {
"skills": {
"filter": {
"term": {
"hobby.skills": 2
}
},
"aggs": {
"by_field": { <--- this will return a breakdown of the fields with skills=2
"terms": {
"field": "name"
}
}
}
}
}
}
}
}'
What this query will return you is
In the hits part, the four fields that have skills: 2
In the aggs part, a breakdown of the 3 distinct fields which have skills: 2

Related

Elasticsearch score from 0 to 1 for searching similar documents to the one that exists

Need to calculate relative score from 0 to 1 when searching similar documents to existing one?
So existing one has score 1, and all other matching documents score should be calculated according to this and score will be <= 1. But existing document should be excluded from the search. Is it possible to do it on elasticsearch side, not just calculating score manually in a programming language like:
match_doc_score/search_doc_score
Let's imagine we have index person with mapping:
{
"properties": {
"person_id": {
"type": "keyword"
},
"fullname": {
"type": "text"
},
"email": {
"type": "keyword"
},
"phone": {
"type": "keyword"
},
"country_of_birth": {
"type": "keyword"
}
}
}
And I have 3 persons inside the index:
Person 1:
{
"person_id": 1,
"fullname": "John Snow",
"email": "john#gmail.com",
"phone": "111-11-11",
"country_of_birth": "Denmark"
}
Person 2:
{
"person_id": 2,
"fullname": "Snow John",
"email": "john#gmail.com",
"phone": "222-22-22",
"country_of_birth": "Denmark"
}
Person 3:
{
"person_id": 3,
"fullname": "Peter Wislow",
"email": "peter#gmail.com",
"phone": "111-11-11",
"country_of_birth": "Denmark"
}
We find persons that are similar to Person 1 by this query:
{
"query": {
"bool": {
"should": [
{
"match": {
"fullname": {
"query": "John Snow",
"boost": 6
}
}
},
{
"term": {
"email": {
"value": "john#gmail.com",
"boost": 5
}
}
},
{
"term": {
"phone": {
"value": "111-11-11",
"boost": 4
}
}
},
{
"term": {
"country_of_birth": {
"value": "Denmark",
"boost": 2
}
}
}
],
"must_not": [
{
"term": {
"person_id": 123
}
}
]
}
}
}
As you can see:
person 1 and person 2 match by: fullname, email, country of birth.
person 1 and person 3 match by: phone, country of birth.
Is it possible to have 0..1 scoring if we have order with full match in the index(person 1)?
I know there is a more_like_this query, but in real life search queries can be complicated so more_like_this is not a good option. Even elasticsearch documentation says that if you need more control over the query, then use boolean query combinations.
Have not tried but looks like field value factor of function score might solve your query.

Query elasticsearch nested field by index(order of insert)

I have an elasticsearch document with some nested objects(mapped as nested field)
for example:
{
"FirstName": "Test",
"LastName": "Test",
"Cost": 322.54,
"Email": "test#test.com",
"Vehicles": [
{
"Year": 2000,
"Make": "Mazda",
"Model": "6"
},
{
"Year": 2012,
"Make": "Ford",
"Model": "F150"
}
]
}
i am trying to do aggregations on specific index of the array, for example i want to sum the cost of documents which has Ford make but only on the first vehicle.
is it even possible at all? there is almost no information on the internet about elasticsearch nested fields and nothing about their index/order
It is possible to achieve what you want, but you also need to add the index order as a field inside your nested documents:
{
"FirstName": "Test",
"LastName": "Test",
"Cost": 322.54,
"Email": "test#test.com",
"Vehicles": [
{
"Year": 2000,
"Make": "Mazda",
"Model": "6",
"Index": 0
},
{
"Year": 2012,
"Make": "Ford",
"Model": "F150",
"Index": 1
}
]
}
And then you can query your index using the two conditions on Index and the Make like this:
{
"query": {
"nested": {
"path": "Vehicles",
"query": {
"bool": {
"filter": [
{
"match": {
"Vehicles.Index": 0
}
},
{
"match": {
"Vehicles.Make": "Ford"
}
}
]
}
}
}
}
}
In this specific case, the query is not going to yield any results, as you expect.

Elasticsearch: Retrieving filtered and unfiltered count in one request

I am using the following mapping in one of my ElasticSearch indices:
"mappings": {
"my-mapping": {
"properties": {
"id": {
"type": "keyword"
},
"groupId": {
"type" : "keyword"
}
"title": {
"type": "text"
}
}
}
}
I now want to count elements matching to a search string which may be present inside of "title", grouped by my groupId. I can achieve that using aggregations and buckets:
/indexname/_search
{
"query" : {
"term" : {
"title" : "sky"
}
},
"aggs": {
"filtered_buckets": {
"terms": {
"field": "groupId"
}
}
}
}
Additionally, I want to know the count of all elements not respecting the filter. I could simply achieve that using a non-queried search:
/indexname/_search
{
"aggs": {
"filtered_buckets": {
"terms": {
"field": "groupId"
}
}
}
}
Current problem is: Is there any possibility to generate aggregation data containing the filtered count and the unfiltered count of only those groups which had a hit before - in one request?
For example:
"buckets": [
{
"key": "257786",
"doc_count": 3024,
"filtered_doc_count" : 202
},
{
"key": "254640",
"doc_count": 3010
"filtered_doc_count" : 1
},
{
"key": "252256",
"doc_count": 2367
"filtered_doc_count" : 5
},
...
]
One way I see is splitting the requests in two while first requesting all filtered buckets (their IDs) and then requesting the counts of these specific buckets using "terms" : { "id" : ["4", "65", "404"] }. This is not very nice and I don't want to request twice (_msearch does not help here).
Second bad solution would be to persist the all-counts somewhere in all of my entities.
Is there any way to achieve what I described in a single request?
PS: Please correct me, if the question is unclear.
Based on these:
How to filter terms aggregation
http://nocf-www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
I made this:
PUT test
{
"mappings": {
"my-mapping": {
"properties": {
"id": {
"type": "keyword"
},
"groupId": {
"type" : "keyword"
},
"title": {
"type": "text"
}
}
}
}
}
PUT test/type1/1
{
"id":1,
"groupId": 1,
"title": "asd"
}
PUT test/type1/2
{
"id":2,
"groupId": 1,
"title": "sky"
}
PUT test/type1/3
{
"id":3,
"groupId": 2,
"title": "sky"
}
PUT test/type1/4
{
"id":4,
"groupId": 2,
"title": "sky"
}
PUT test/type1/5
{
"id":5,
"groupId": 2,
"title": "sky"
}
POST test/type1/_search
{
"aggs": {
"categories-filtered": {
"filter": {"term": {"title": "sky"}},
"aggs": {
"names": {
"terms": {"field": "groupId"}
}
}
},
"categories": {
"terms": {"field": "groupId"}
}
}
}

elasticsearch search results with sub query

Getting started with elasticsearch, not sure if this is possible with one query along with pagination. I have a index with two types: user & blog. Example mapping:
"mappings": {
"user": {
"properties": {
"name" : { "type": "string" }
}
},
"blog": {
"properties": {
"title" : { "type": "string" },
"author_name" : { "type": "string" }
}
}
}
}
sample data
user:
[
{"name": "jemmy"},
{"name": "Tom"}
]
blog:
[
{"title": "foo bar", "author": "jemmy"},
{"title": "magic foo", "author": "Tom"},
{"title": "bigdata for dummies", "author": "Tom"},
{"title": "elasticsearch", "author": "Tom"},
{"title": "JS cookbook", "author": "jemmy"},
]
I'd like to query on the index such a way that when I search for blog it should do subquery on on each match. For example:
POST /test_index/blog/_search
{
"query": {
"match": {
"_all": "foo"
}
}
}
Expected (pseudo) results:
[
{
title: "foo bar",
author_name: "Jemmy",
author_post_count: 2
},
{
title: "magic foo",
author_name: "Tom",
author_post_count: 3
}
]
Here author_post_count is blog post count that the user has authored. If it could return those blog posts instead of count that would be great too. Is this possible? Perhaps the term i'm using not right, but I hope my question is clear.
Try something like this:
POST /test_index/blog/_search
{
"query": {
"match": {
"_all": "foo"
}
},
"aggs": {
"counting_posts": {
"global": {},
"aggs": {
"authors": {
"terms": {
"field": "author",
"size": 10
}
}
}
}
}
}
Be careful though with terms aggregation because it is considering the actual tokenized list of terms from the index, not what you actually index (lowercase/uppercase, tokenized in a way or another).

Elasticsearch OR filtered query does not return results

I have the following data set:
{
"_index": "myIndex",
"_type": "myType",
"_id": "220005",
"_score": 1,
"_source": {
"id": "220005",
"name": "Some Name",
"type": "myDataType",
"doc_as_upsert": true
}
}
Doing a direct match query like so:
GET typo3data/destination/_search
{
"query": {
"match": {
"name": "Some Name"
}
},
"size": 500
}
Will return the data just fine:
"hits": {
"total": 1,
"max_score": 3.442347,
"hits": [...
Doing an OR-query however (I am not sure which syntax is correct, the first syntax is taken from elasticsearch docs, the second is a working query taken from another project with the same versions):
GET typo3data/destination/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"or": {
"filters": [
{
"term": {
"name": "Some Name"
}
}
]
}
}
}
},
"size": 500
}
or
{
"query":
{
"match_all": {}
},
"filter":
{
"or":
[
{ "term": { "name": "Some Name"} },
{ "term": { "name": "Some Other Name"} }
]
},
"size": 1000
}
Does not return anything.
The mapping for the name field is:
"name": {
"type": "string",
"index": "not_analyzed"
}
Elasticsearch version is 1.4.4.
When indexing "some name" , this is broken into tokens as follows -
"some name" => [ "some" , "name" ]
Now in a normal match query , it also does the same above process before matching result. If either "same" or "name" is present , that document is qualified as result
match query ("some name") => search for term "some" or "name"
The term query does not analyze or tokenize your query. This means that it looks for a exact token or term of "some name" which is not present.
term query ("some name") => search for term "some name"
Hence you wont be seeing any result.
Things should work fine if you make the field not_analyzed , but then make sure the case is also matching,
You can read more about the same here.
After extending our mapping to include every field we have:
PUT typo3data/_mapping/destination
{
"someType": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"parentId": {
"type": "integer"
},
"type": {
"type": "string"
},
"generatedUid": {
"type": "integer"
}
}
}
}
The or-filters were working. So the general answer is: If you have such a problem, check your mappings closely and rather do too much work on them than too little.
If someone has an explanation why this might be happening, I will gladly pass the answer mark on to it.

Resources