Multiple OR filter in Elasticsearch - elasticsearch

Hello I'm having trouble deciding the correctness of the following query for multiple OR in Elasticsearch. I want to select all the unique data (not count, but select all rows)
My best try for this in elastic query is
GET mystash/_search
{
"aggs": {
"uniques":{
"filter":
{
"or":
[
{ "term": { "url.raw" : "/a.json" } },
{ "term": { "url.raw" : "/b.json" } },
{ "term": { "url.raw" : "/c.json"} },
{ "term": { "url.raw" : "/d.json"} }
]
},
"aggs": {
"unique" :{
"terms" :{
"field" : "id.raw",
"size" : 0
}
}
}
}
}
}
The equivalent SQL would be
SELECT DISTINCT id
FROM json_record
WHERE
json_record.url = 'a.json' OR
json_record.url = 'b.json' OR
json_record.url = 'c.json' OR
json_record.url = 'd.json'
I was wondering whether the query above is correct, since the data will be needed for report generations.

Some remarks:
You should use a query filter instead of an aggregation filter. Your query loads all documents.
You can replace your or+term filter by a single terms filter
You could use a size=0 at the root of the query to get only agg result and not search results
Example code:
{"size":0,
"query" :{"filtered":{"filter":{"terms":{"url":["a", "b", "c"]}}}},
"aggs" :{"unique":{"term":{"field":"id", "size" :0}}}
}

Related

Elasticsearch: How to filter results with a specific word in a value using elasticsearch

I need to add a parameter to my search that filters results containing a specific word in a value. The query is searching for user history records and contains a url key. I need to filter out /history and any other url containing that string.
Here's my current query:
GET /user_log/_search
{
"size" : 50,
"query": {
"match": {
"user_id": 56678
}
}
}
Here's an example of a record, boiled down to just the value we're looking at:
"_source": {
"url": "/history?page=2&direction=desc",
},
How can the parameters of the search be changed to filter out this result.
You can use the filter param of boolean query in Elasticsearch.
if your url field is of type keyword, you can use the below query
{
"query": {
"bool": {
"must": {
"match": {
"user_id": 56678
}
},
"filter": { --> note filter
"term": {
"url": "/history"
}
}
}
}
}
I found a way to solve my specific issue. Instead of filtering on the url I'm filtering on a different value. Here's what I'm using now:
{
"size" : 50,
"query": {
"bool" : {
"must" : {
"match" : { "user_id" : 56678 }
},
"must_not": {
"match" : { "controller": "History" }
}
}
}
}
I'm still going to leave this question open for a while to see if anyone has other ways of solving the original problem.

Elastic Search Multiple Filter values for the same field

Say that I have to filter cars constructors in a Elastic Search Index (ES 7.15), where the field car_maker is mapped to keyword, having it a limited number of possibilities among car makers string names:
{
"mappings": {
"properties": {
"car_maker": {
"type": "keyword"
}
}
}
}
GET /cars/_search
{
"query": {
"bool": {
"filter": [{
"term": {
"car_maker": "Honda"
}
}]
}
}
}
This, along with a matching query will work ok. The filter will not participate to score calculation as desired.
Now I would like to to filter more car makers for that query (let's say a should query):
{
"query": {
"bool": {
"filter" : [
{"term" : { "car_maker" : "Honda"}},
{"term" : { "car_maker" : "Ferrari"}}
]
}
}
}
this is not going to work. I will have any error from ES query engine, but any result too. Of course is always possibile to apply more filters to different fields like car_maker and car_color, but how to do the opposite: apply more values (Honda, Ferrari, etc.) to the same filter field car_maker like in the example above, without conditioning the score calculation?
You might want to try the following filter query:
{
"query" : {
"bool" : {
"filter" : {
"terms" : {
"car_maker" : ["Honda", "Ferrari"]
}
}
}
}
}

Elasticsearch filter multiple terms with only matching results and not any of them

How I can get only filtered matching results with all the multi term search. I have this sample table where titleid is a mapping int field and personid is a keyword:
titleid:1,personid:a
titleid:3,personid:a
titleid:1,personid:b
titleid:2,personid:b
titleid:1,personid:c
titleid:5,personid:c
The expeted result is:
titleid:1
With a sample query like this one:
{query:
{bool:
{filter:
{must:[
{terms : {fields: {personid:[a,b,c]}}
]
}}}}
I have the following results:
titleid: 1,2,3,5
Maybe this will help, I did the query in sql and got the expected result. What I did was ask the query to give me the sum of titleid that matches the quantity of searched parameters. This is only to be more self explained, the idea is to use elasticsearch.
select titleid
from (
select count(titleid) as title_count, titleid
from table1
where personid in ('a','b','c')
group by titleid
) as vw
where title_count = 3
if you only want records with titleid == 1 AND personid == 'a' you can filter on both fields. only the boolean query uses must, should, and most_not. with a filter since it's filtering (eg, removing) by definition it's a must
"query": {
"bool": {
"filter": [
{
"term": {
"titleId": { "value": 1 }
}
},
{
"term": {
"personid": { "value": "a" }
}
}
]
}
}
UPDATE::
Now your question looks like you want to filter and aggregate your results and then aggregate on those. There's a few metrics and bucket aggregations
Using bucket selector aggregation (this isn't tested but should be very close if not correct)
{
"aggs" : {
"title_id" : {
"filter" : { "terms": { "personid": ["a","b","c"] } },
"aggs" : {
"id_count" : { "count" : { "field" : "titleid" } }
}
},
aggs": {
"count_filter": {
"bucket_selector": {
"buckets_path": {
"the_doc_count": "_count"
},
"script": "the_doc_count == 3"
}
}
}
}
}
However, be aware that Pipeline aggregations work on the outputs produced from other aggregations, so the overall amount of work that needs to be done to calculate the initial doc_counts will be the same. Since the script parts needs to be executed for each input bucket, the opetation might potentially be slow for high cardinality fields as in thousands of thousands of terms.

Return distinct values in Elasticsearch

I am trying to solve an issue where I have to get distinct result in the search.
{
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "ABC",
"favorite_cars" : [ "ferrari","toyota" ]
}, {
"name" : "GEORGE",
"favorite_cars" : [ "honda","Hyundae" ]
}
When I perform a term query on favourite cars "ferrari". I get two results whose name is ABC. I simply want that the result returned should be one in this case. So my requirement will be if I can apply a distinct on name field to receive one 1 result.
Thanks
One way to achieve what you want is to use a terms aggregation on the name field and then a top_hits sub-aggregation with size 1, like this:
{
"size": 0,
"query": {
"term": {
"favorite_cars": "ferrari"
}
},
"aggs": {
"names": {
"terms": {
"field": "name"
},
"aggs": {
"single_result": {
"top_hits": {
"size": 1
}
}
}
}
}
}
That way, you'll get a single term ABC and then nested into it a single matching document

Elasticsearch DSL query from an SQL statement

I'm new to Elasticsearch. I don't think I fully understand the concept of query and filters. In my case I just want to use filters as I don't want to use advance feature like scoring.
How would I convert the following SQL statement into elasticsearch query?
SELECT * FROM advertiser
WHERE company like '%com%'
AND sales_rep IN (1,2)
What I have so far:
curl -XGET 'localhost:9200/advertisers/advertiser/_search?pretty=true' -d '
{
"query" : {
"bool" : {
"must" : {
"wildcard" : { "company" : "*com*" }
}
}
},
"size":1000000
}'
How to I add the OR filters on sales_rep field?
Thanks
Add a "should" clause after your must clause. In a bool query, one or more should clauses must match by default. Actually, you can set the "minimum_number_should_match" to be any number, Check out the bool query docs.
For your case, this should work.
"should" : [
{
"term" : { "sales_rep_id" : "1" }
},
{
"term" : { "sales_rep_id" : "2" }
}
],
The same concept works for bool filters. Just change "query" to "filter". The bool filter docs are here.
I come across this post 4 years too late...
Anyways, perhaps the following code could be useful...
{
"query": {
"filtered": {
"query": {
"wildcard": {
"company": "*com*"
}
},
"filter": {
"bool": {
"should": [
{
"terms": {
"sales_rep_id": [ "1", "2" ]
}
}
]
}
}
}
}
}

Resources