I have an index of users with structure:
User
book_ids:[] //array of book ids
books : [{
book_id:
name:
}] //array of books
I want to create a query that returns a map of Book Id and number of users that read it.
The result of the query should include books that are not used by any user.
I have a very simplified version of the query:
{
"query":{
"bool":{
"must":[
{
"nested":{
"path":"books",
"query": {
"bool": {
"must": {
"terms": {
"books.book_id": [100,200] //book ids that provided as a parameter
}
}
}
}
}
}
]
}
},
"aggs":{
"books":{
"terms":{
"field":"book_ids",
"include":[100,200] //book ids that provided as a parameter
}
}
},
"size":0
}
The result of the query will be
buckets: [
{key: 100, doc_count: 53}
]
So there are 53 users who read the book with id 100, but there is no user who reads book with id 200(as we don't have it in response).
The question here is how can I change the query to get a following result:
buckets: [
{key: 100, doc_count: 53},
{key: 200, doc_count: 0}
]
Terms aggregations doesn't add the bucket in the result if a given term does not exist in the index.
You can use filters aggregation for this purpose:
{
"query": {
...
},
"aggs": {
"books": {
"filters": {
"filters": {
"100": { "match": { "book_ids": 100 } },
"200": { "match": { "book_ids": 200 } }
}
}
}
},
"size": 0
}
To reproduce
# post some books ids, with the 5 missing
POST /_bulk
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [1, 2, 3] }
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [4, 2, 3] }
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [6, 2, 3] }
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [7, 2, 3] }
GET /72201832/_search
{
"size": 0,
"aggs": {
"books": {
"filters": {
"filters": {
"1": { "term": {"book_ids": "1"} },
"2": { "term": {"book_ids": "2"} },
"3": { "term": {"book_ids": "3"} },
"4": { "term": {"book_ids": "4"} },
"5": { "term": {"book_ids": "5"} },
"6": { "term": {"book_ids": "6"} },
"7": { "term": {"book_ids": "7"} }
}
}
}
}
}
I am attempting to update a rather sizable Elasticsearch query. I am very new to Elasticsearch and am having a hard time wrapping my head around everything that is happening here.
This is the original query:
elasticsearch_query search_models, {
query: {
filtered: {
query: {
function_score: {
query: {
bool: {
must: [
{
multi_match: {
operator: "and",
type: "cross_fields",
query: params[:term],
fuzziness: (params[:fuzzy] || 0),
fields: [
"name^2", "address", "email", "email2",
"primary_contact", "id", "lotname^7",
"lotname_keyword^9", "corp_name^4",
"vin^4", "serial_number^3", "dba_names",
"title_number", "title_tracking",
"full_name^3", "username", "reference", "user_name",
"phone", "text_number", "name_keyword^9"
],
},
}
],
must_not: [
{ term: { access: { value: 2 } } },
{ term: { ledger: { value: "payroll" } } },
{ term: { ledger: { value: "credit card" } } }
],
should: [
{ term: { active: { value: 1, boost: 100 } } },
{ term: { active: { value: 2, boost: 50 } } },
{ term: { active: { value: 3, boost: 0.05 } } },
{ term: { active: { value: 4, boost: 50 } } },
{ term: { active: { value: 5, boost: 50 } } },
{ term: { active: { value: 6, boost: 50 } } },
{ term: { branch_id: { value: current_branch.id, boost: 100 } } }
]
}
},
functions: [
{
filter: { term: { auto_declined: 1 } },
boost_factor: 0.3
},
{
filter: { term: { auto_declined: 0 } },
boost_factor: 0.0001
},
{
filter: { term: { access: 1 } }, # current employee
boost_factor: 10
},
{
filter: { term: { access: 0 } },
boost_factor: 0.2
},
{
filter: { term: { unit_status: 1 } }, # current unit
boost_factor: 2
},
{
filter: { type: {value: 'txn'} },
boost_factor: 0.4
}
]
}
},
filter: {
"or" =>
{ filters: [
{ term: { "branch_id" => current_branch.id }},
{ type: { "value" => "auction" }},
{ type: { "value" => "fee_schedule"}},
{ type: { "value" => "unit"}},
{
"and" => [
{ type: { "value" => "user" }},
{ "or" => [
{ term: { "access" => 1 }}
]}
]
}
]}
}
}
}
}
This is where I have gotten to thus far:
elasticsearch_query search_models, {
query: {
bool: {
must: {
function_score: {
query: {
bool: [
must: {
multi_match: {
query: params[:term],
type: "cross_fields",
operator: "and",
fields: [
"name^2", "address", "email", "email2",
"primary_contact", "id", "lotname^7",
"lotname_keyword^9", "corp_name^4",
"vin^4", "serial_number^3", "dba_names",
"title_number", "title_tracking",
"full_name^3", "username", "reference", "user_name",
"phone", "text_number", "name_keyword^9"
]
}
},
must_not: [
{ term: { access: { value: 2 } } },
{ term: { ledger: { value: "payroll" } } },
{ term: { ledger: { value: "credit card" } } }
],
should: [
{ term: { active: { value: 1, boost: 100 } } },
{ term: { active: { value: 2, boost: 50 } } },
{ term: { active: { value: 3, boost: 0.05 } } },
{ term: { active: { value: 4, boost: 50 } } },
{ term: { active: { value: 5, boost: 50 } } },
{ term: { active: { value: 6, boost: 50 } } },
{ term: { branch_id: { value: current_branch.id, boost: 100 } } }
]
]
},
functions: [
{
filter: { term: { auto_declined: 1 } },
weight: 0.3
},
{
filter: { term: { auto_declined: 0 } },
weight: 0.0001
},
{
filter: { term: { access: 1 } }, # current employee
weight: 10
},
{
filter: { term: { access: 0 } },
weight: 0.2
},
{
filter: { term: { unit_status: 1 } }, # current unit
weight: 2
},
{
filter: { type: {value: 'txn'} },
weight: 0.4
}
]
}
},
filter: [
{ term: { branch_id: current_branch.id } },
{ type: { value: "auction" } },
{ type: { value: "fee_schedule"} },
{ type: { value: "unit"} },
bool: {
must: {
bool: {
should: [
{ type: { value: "user" } },
{ term: { access: 1 } }
]
}
}
}
]
}
}
}
I have:
replaced 'filtered' with 'bool' and 'must'
replaced 'boost_factor' with 'weight'
removed 'fuzziness' from the 'cross_fields' type 'multi_match'
attempted to update the 'or' and 'and' logic with newer 'bool' syntax.
The first three actions seemed to have done the trick with their respective errors, but I am getting hung up on this filter with the 'or' and 'and' logic. I would greatly appreciate some guidance!
This is the error I am receiving:
[400] {"error":{"root_cause":[{"type":"parsing_exception","reason":"[bool] query malformed, no start_object after query name","line":1,"col":61}],"type":"parsing_exception","reason":"[bool] query malformed, no start_object after query name","line":1,"col":61},"status":400}
If any further information would be helpful, please let me know.
Your query is forming invalid Json, that why the exception, see below query with no Json error.
{
"query": {
"bool": {
"must": {
"function_score": {
"query": {
"bool": {
"must": [{
"multi_match": {
"query": "params[:term]",
"type": "cross_fields",
"operator": "and",
"fields": [
"name^2", "address", "email", "email2",
"primary_contact", "id", "lotname^7",
"lotname_keyword^9", "corp_name^4",
"vin^4", "serial_number^3", "dba_names",
"title_number", "title_tracking",
"full_name^3", "username", "reference", "user_name",
"phone", "text_number", "name_keyword^9"
]
}
}],
"must_not": [
{ "term": { "access": { "value": 2 } } },
{ "term": { "ledger": { "value": "payroll" } } },
{ "term": { "ledger": { "value": "credit card" } } }
],
"should": [
{ "term": { "active": { "value": 1, "boost": 100 } } }
]
}
},
"functions": [
{
"filter": { "term": { "auto_declined": 1 } },
"weight": 0.3
},
{
"filter": { "term": { "auto_declined": 0 } },
"weight": 0.0001
},
{
"filter": { "term": { "access": 1 } },
"weight": 10
},
{
"filter": { "term": { "access": 0 } },
"weight": 0.2
},
{
"filter": { "term": { "unit_status": 1 } },
"weight": 2
},
{
"filter": { "type": {"value": "txn"} },
"weight": 0.4
}
]
}
},
"filter": [
{ "term": { "branch_id": "current_branch.id" } },
{ "type": { "value": "auction" } },
{ "type": { "value": "fee_schedule"} },
{ "type": { "value": "unit"} },
{"bool": {
"must": {
"bool": {
"should": [
{ "type": { "value": "user" } },
{ "term": { "access": 1 } }
]
}
}
}
}
]
}
}
}
You can compare and modify the curly braces and brackets.
I have following document structure:
{
product_name: "Product1",
product_id: 1,
...,
articles: [
{
article_name: 'Article 101',
id: 101,
some_param: 10,
clients: []
},
{
article_name: 'Article 102',
id: 102,
some_param: 11,
clients: [
{
client_id: 10001,
client_name: "some client 1001"
}
...
]
}
]
},
{
product_name: "Product2",
product_id: 2,
...,
articles: [
{
article_name: 'Article 101',
id: 101,
some_param: 10,
clients: []
},
{
article_name: 'Article 102',
id: 102,
some_param: 10,
clients: [
{
client_id: 10001,
client_name: "some client 1001"
}
...
]
}
]
}
I need to get documents (product) ONLY if some of its articles match 2 conditions (single article should match both conditions): articles.some_param = 10 AND articles.clients.client_id = 10001
So I need to get only product with id 2.
I'm using this query now, which is incorrect (and I know why), because it fetches both documents:
{
"query": {
"bool": {
"filter": [
{
"term": {
"articles.clients.id": 10001
}
},
{
"terms": {
"articles.some_param": 10
}
}
]
}
}
}
How can I write query which gets only products which has at least 1 article which matches both conditions: articles.some_param = 10 AND articles.clients.client_id = 10001
e.g., to get Product with ID 2 only?
Something like this:
{
"query": {
"nested": {
"path": "articles",
"query": {
"bool": {
"must": [
{
"term": {
"articles.some_param": {
"value": 10
}
}
},
{
"nested": {
"path": "articles.clients",
"query": {
"term": {
"articles.clients.id":{
"value": 10001
}
}
}
}
}
]
}
}
}
}
}
UPDATE:
Try wrap second query to bool.
{
"query": {
"nested": {
"path": "articles",
"query": {
"bool": {
"must": [
{
"term": {
"articles.some_param": {
"value": 10
}
}
},
{
"bool":{
"must" : [
{
"nested": {
"path": "articles.clients",
"query": {
"term": {
"articles.clients.id":{
"value": 10001
}
}
}
}
}
]
}
}
]
}
}
}
}
}
p.s. I could be mistaken with a path on the second nested query. Just couldn't check. So you can play around with the path on the second query.
p.p.s. The filter is not the query what you need. It does not calculate the scores
I have the following document mapping
{
properties: {
id: {
type: 'keyword'
},
rel: {
type: 'nested',
properties: {
type: {
type: 'keyword'
},
...
}
}
}
}
In the end I want to plot a x-y chart where x axis is count of type t1 and y axis is count of type t2, so for the following documents
{ id: 1, rel: [ { type: t1, ... }, { type: t1, ... }, { type: t2, ... }] }
{ id: 2, rel: [ { type: t1, ... }, { type: t1, ... }] }
{ id: 3, rel: [ { type: t1, ... }, { type: t1, ... }] }
will map to 3 (x, y) points (2, 1), (2, 0), (2, 0), and I'm going to plot them on x-y plane like this
^
|
| 1
+---2-->
Right now I use the following aggregation
{
"_source": false,
"aggregations": {
"g1": {
"terms": {
"field": "id",
"size": 10000
},
"aggregations": {
"rel": {
"nested": {
"path": "rel"
},
"aggregations": {
"filter-t1": {
"filter": {
"terms": {
"rel.type": [
"t1"
]
}
}
},
"filter-t2": {
"filter": {
"terms": {
"rel.type": [
"t2"
]
}
}
}
}
}
}
}
}
}
to get the following result
{
"aggregations": {
"g1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1",
"doc_count": 1,
"rel": {
"doc_count": 4942,
"filter-t1": {
"doc_count": 6
},
"filter-t2": {
"doc_count": 20
}
}
},
{
"key": "2",
"doc_count": 1,
"rel": {
"doc_count": 3039,
"filter-t1": {
"doc_count": 6
},
"filter-t2": {
"doc_count": 11
}
}
}
...
and calculate number of documents in each coordinate in API layer.
The problem is that number of total documents is at millions of documents level, querying all in a request doesn't work. I don't find a way to do pagination in aggregations either, from size seems only work for _source.
Is there a way to achieve what I want in elasticsearch?
I have a list of customers with this structure:
{
"name" : "Toya Romano",
"hungry" : false,
"date" : 1420090500020
}
I would like to get the ratio of people who are hungry. How can I do it with an ElasticSearch query? I am running ES 2.3.
Rather a hacky approach because of this issue, but this should work:
{
"size": 0,
"aggs": {
"whatever": {
"filters": {
"filters": [{}]
},
"aggs": {
"all_people": {
"filter": {}
},
"hungry_count": {
"filter": {
"term": {
"hungry": true
}
}
},
"hungry_ratio": {
"bucket_script": {
"buckets_path": {
"total_hungry": "hungry_count._count",
"all": "all_people._count"
},
"script": "total_hungry/all"
}
}
}
}
}
}
With the result like this:
"buckets": [
{
"doc_count": 5,
"all_people": {
"doc_count": 5
},
"hungry_count": {
"doc_count": 3
},
"hungry_ratio": {
"value": 0.6
}
}
]