elasticsearch search results with sub query - elasticsearch

Getting started with elasticsearch, not sure if this is possible with one query along with pagination. I have a index with two types: user & blog. Example mapping:
"mappings": {
"user": {
"properties": {
"name" : { "type": "string" }
}
},
"blog": {
"properties": {
"title" : { "type": "string" },
"author_name" : { "type": "string" }
}
}
}
}
sample data
user:
[
{"name": "jemmy"},
{"name": "Tom"}
]
blog:
[
{"title": "foo bar", "author": "jemmy"},
{"title": "magic foo", "author": "Tom"},
{"title": "bigdata for dummies", "author": "Tom"},
{"title": "elasticsearch", "author": "Tom"},
{"title": "JS cookbook", "author": "jemmy"},
]
I'd like to query on the index such a way that when I search for blog it should do subquery on on each match. For example:
POST /test_index/blog/_search
{
"query": {
"match": {
"_all": "foo"
}
}
}
Expected (pseudo) results:
[
{
title: "foo bar",
author_name: "Jemmy",
author_post_count: 2
},
{
title: "magic foo",
author_name: "Tom",
author_post_count: 3
}
]
Here author_post_count is blog post count that the user has authored. If it could return those blog posts instead of count that would be great too. Is this possible? Perhaps the term i'm using not right, but I hope my question is clear.

Try something like this:
POST /test_index/blog/_search
{
"query": {
"match": {
"_all": "foo"
}
},
"aggs": {
"counting_posts": {
"global": {},
"aggs": {
"authors": {
"terms": {
"field": "author",
"size": 10
}
}
}
}
}
}
Be careful though with terms aggregation because it is considering the actual tokenized list of terms from the index, not what you actually index (lowercase/uppercase, tokenized in a way or another).

Related

Aggregate by property on parent document with Elasticsearch join field

I have an Elasticsearch index that uses a join type field to relate two types of indexed documents to each other via a parent-child relation: posts which are parents of comments.
posts have a category keyword field, and comments belong to posts. I would like to find the number of comments in each post category, like so:
// what query do I need to get this result?
{
"aggregations" : {
"comment-counts-by-post-category" : {
"buckets" : [
{
"key" : "Dogs",
"doc_count" : 2,
},
{
"key" : "Cats",
"doc_count" : 1,
}
]
}
}
}
Here is a complete example:
I have an index with the following mapping:
PUT posts-index/
{
"mappings": {
"properties": {
"post": {
"type": "object",
"properties": {
"category": {
"type": "keyword"
}
}
},
"text": {
"type": "keyword"
},
"post_comment_join": {
"type": "join",
"relations": {
"post": "comment"
}
}
}
}
}
I create two posts, one in the Dogs category, and one in the Cats category:
PUT posts-index/_doc/post-1
{
"text": "this is a dog post",
"post": {
"category": "Dogs"
},
"post_comment_join": {
"name": "post"
}
}
PUT posts-index/_doc/post-2
{
"text": "this is a cat post",
"post": {
"category": "Cats"
},
"post_comment_join": {
"name": "post"
}
}
Then, I create a few comments (in this case, 2 on the dog post and 1 on the cat post)
PUT posts-index/_doc/comment-1&routing=1&refresh
{
"text": "this is comment 1 for post 1",
"post_comment_join": {
"name": "comment",
"parent": "post-1"
}
}
PUT posts-index/_doc/comment-2&routing=1&refresh
{
"text": "this is comment 2 for post 1",
"post_comment_join": {
"name": "comment",
"parent": "post-1"
}
}
PUT posts-index/_doc/comment-3&routing=1&refresh
{
"text": "this is a comment 1 for post 2",
"post_comment_join": {
"name": "comment",
"parent": "post-2"
}
}
I can search for all comment documents using a has_parent query:
POST post-index/_search
{
"query": {
"has_parent": {
"parent_type": "post",
"query": {
"match_all": {}
}
}
}
}
{
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1.0,
"hits": [ /* returns the 3 comments */ ]
}
}
What I can't figure out how to do is find the number of comments in each category
I've looked into Parent Aggregations, but they seem to only allow you aggregate based on the type of the parent. In this case, all parents are of type post, so that doesn't help.
I've also tried using a basic terms aggregation using the join_field#parent_field syntax:
POST post-index/_search
{
"query": {
"has_parent": {
"parent_type": "post",
"query": {
"match_all": {}
}
}
},
"aggs": {
"agg-by-post-category": {
"terms": {
"field": "post_comment_join#post.category"
}
}
}
}
// returns { "buckets": [] } in the aggs
Unfortunately, this returns no results. It seems as though the post_comment_join#post syntax can be used to aggregate by parent doc, but not by an attribute on the parent doc. (i.e., by the _id field of a post, but not by post.category)
Can anyone help me figure out the right aggs syntax to return all comments grouped by their parent post's category?
Again, here is the result I'm looking for:
{
"aggregations" : {
"comment-counts-by-post-category" : {
"buckets" : [
{
"key" : "Dogs",
"doc_count" : 2,
},
{
"key" : "Cats",
"doc_count" : 1,
}
]
}
}
}
Platform details
Amazon Opensearch service version 7.9
You can use any of below two to find count of comments by category.
GET posts-index/_search
{
"query": {
"has_child": {
"type": "comment",
"inner_hits": {
"_source": false,
"size": 0
},
"query": {
"match_all": {}
}
}
}
}
GET posts-index/_search
{
"aggs": {
"top-tags": {
"terms": {
"field": "post.category",
"size": 10
},
"aggs": {
"to-answers": {
"children": {
"type": "comment"
},
"aggs": {
"comments-count": {
"value_count": {
"field": "text"
}
}
}
}
}
}
}
}

Search for flattened field existence in ElasticSearch

I'm storing an arbitrary nested object as a flattened field "_meta" which contains various information related to a product.
Here is the mapping for that field:
"mappings": {
"dynamic": "strict",
"properties": {
"_meta": {
"type": "flattened"
},
...
So when trying to search for:
{
"query": {
"exists": {
"field": "_meta.user"
}
}
}
I'm expecting to retrieve all documents that have that field populated. I get zero hits, although if I search for a particular document, I can see that at least one document has that field populated:
"user": {
"origin_title": "some title",
"origin_title_en": "some other title",
"address": "some address",
"performed_orders_count": 0,
"phone": "some phone",
"name": "some name",
"tariff": null,
"proposal_image_background_color": null
},
So how exactly does searching through a flattened data field work?
Why I'm not getting any results?
Tldr;
It is because of the way flattened fields work.
In your case:
{
"_meta":{
"user": {
"name": "some name"
}
}
}
Elasticsearch available representation are:
{
"_meta": ["some name"],
"_meta.user.name": "some name"
}
To reproduce
For the set up:
PUT /74025685/
{
"mappings": {
"dynamic": "strict",
"properties": {
"_meta":{
"type": "flattened"
}
}
}
}
POST /_bulk
{"index":{"_index":"74025685"}}
{"_meta":{"user": "some user"}}
{"index":{"_index":"74025685"}}
{"_meta":{"user": null, "age": 10}}
{"index":{"_index":"74025685"}}
{"_meta":{"user": ""}}
{"index":{"_index":"74025685"}}
{"_meta":{"user": {"username": "some user"}}}
This query is going to find 2 records:
GET 74025685/_search
{
"query": {
"term": {
"_meta": {
"value": "some user"
}
}
}
}
This one, is only going to match the first documents:
GET 74025685/_search
{
"query": {
"term": {
"_meta.user": {
"value": "some user"
}
}
}
}
And so for the exist query:
This one will only return the last doc.
GET 74025685/_search
{
"query": {
"exists": {
"field": "_meta.user.username"
}
}
}
Whereas this one os going to return the 1st and 3rd:
GET 74025685/_search
{
"query": {
"exists": {
"field": "_meta.user"
}
}
}

How to boost specific terms in elastic search?

If I have the following mapping:
PUT /book
{
"settings": {},
"mappings": {
"properties": {
"title": {
"type": "text"
},
"author": {
"type": "text"
}
}
}
}
How can i boost specific authors higher than others?
In case of the below example:
PUT /book/_doc/1
{
"title": "car parts",
"author": "john smith"
}
PUT /book/_doc/2
{
"title": "car",
"author": "bob bobby"
}
PUT /book/_doc/3
{
"title": "soap",
"author": "sam sammy"
}
PUT /book/_doc/4
{
"title": "car designs",
"author": "joe walker"
}
GET /book/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "car" }},
{ "match": { "title": "parts" }}
]
}
}
}
How do I make it so my search will give me books by "joe walker" are at the top of the search results?
One solution is to make use of function_score.
The function_score allows you to modify the score of documents that are retrieved by a query.
From here
Base on your mappings try to run this query for example:
GET book/_search
{
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"title": "car"
}
},
{
"match": {
"title": "parts"
}
}
]
}
},
"functions": [
{
"filter": {
"match": {
"author": "joe walker"
}
},
"weight": 30
}
],
"max_boost": 30,
"score_mode": "max",
"boost_mode": "multiply"
}
}
}
The query inside function_score is the same should query that you used.
Now we want to take all the results from the query and give more weight (increase the score) to joe walker's books, meaning prioritize its books over the others.
To achieved that we created a function (inside functions) that compute a new score for each document returned by the query filtered by joe walker books.
You can play with the weight and other params.
Hope it helps

Elasticsearch OR filtered query does not return results

I have the following data set:
{
"_index": "myIndex",
"_type": "myType",
"_id": "220005",
"_score": 1,
"_source": {
"id": "220005",
"name": "Some Name",
"type": "myDataType",
"doc_as_upsert": true
}
}
Doing a direct match query like so:
GET typo3data/destination/_search
{
"query": {
"match": {
"name": "Some Name"
}
},
"size": 500
}
Will return the data just fine:
"hits": {
"total": 1,
"max_score": 3.442347,
"hits": [...
Doing an OR-query however (I am not sure which syntax is correct, the first syntax is taken from elasticsearch docs, the second is a working query taken from another project with the same versions):
GET typo3data/destination/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"or": {
"filters": [
{
"term": {
"name": "Some Name"
}
}
]
}
}
}
},
"size": 500
}
or
{
"query":
{
"match_all": {}
},
"filter":
{
"or":
[
{ "term": { "name": "Some Name"} },
{ "term": { "name": "Some Other Name"} }
]
},
"size": 1000
}
Does not return anything.
The mapping for the name field is:
"name": {
"type": "string",
"index": "not_analyzed"
}
Elasticsearch version is 1.4.4.
When indexing "some name" , this is broken into tokens as follows -
"some name" => [ "some" , "name" ]
Now in a normal match query , it also does the same above process before matching result. If either "same" or "name" is present , that document is qualified as result
match query ("some name") => search for term "some" or "name"
The term query does not analyze or tokenize your query. This means that it looks for a exact token or term of "some name" which is not present.
term query ("some name") => search for term "some name"
Hence you wont be seeing any result.
Things should work fine if you make the field not_analyzed , but then make sure the case is also matching,
You can read more about the same here.
After extending our mapping to include every field we have:
PUT typo3data/_mapping/destination
{
"someType": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"parentId": {
"type": "integer"
},
"type": {
"type": "string"
},
"generatedUid": {
"type": "integer"
}
}
}
}
The or-filters were working. So the general answer is: If you have such a problem, check your mappings closely and rather do too much work on them than too little.
If someone has an explanation why this might be happening, I will gladly pass the answer mark on to it.

elasticsearch nested query, i think

I come from a related database background and something like this would be so simple there, but I can't figure this out. I've been trying to learn Elasticsearch for a week or so and I'm trying to figure out what I think is a nested query. Here's some sample data:
PUT /myindex/pets/_mapping
{
"pets": {
"properties": {
"name": {
"type": "string"
},
"pet": {
"type": "nested",
"properties": {
"name": {"type": "string"}
}
}
}
}
}
POST /myindex/pets/
{"pet": {"name": "rosco"}, "name": "sam smith"}
POST /myindex/pets/
{"pet": {"name": "patches"}, "name": "sam smith"}
POST /myindex/pets
{"pet": {"name": "rosco"}, "name": "jack mitchell"}
What would the query look like that only returns documents matching:
owner name is "sam smith"
pet name is "rosco"
I've tried a mixmatch of bool, match, nested, filtered/filter type queries, but I just keep getting errors. Stuff like this stands out in the errors:
nested: ElasticsearchParseException[Expected field name but got START_OBJECT \"nested\"];
Here was the query:
GET /myindex/pets/_search
{
"query": {
"match": {
"name": "sam smith"
},
"nested": {
"path": "pet",
"query": {
"match": {
"pet.name": "rosco"
}
}
}
}
}
I'm beginning to think that I just can't target something this specific due to the relevant nature of Elasticsearch.
Any ideas?
Man, these queries are tricky sometimes... This seems to work:
GET /myindex/pets/_search
{
"query": {
"filtered": {
"query": {
"match": {
"name": "sam smith"
}
},
"filter": {
"nested": {
"path": "pet",
"query": {
"match": {
"pet.name": "rosco"
}
}
}
}
}
}
}

Resources