Elastic search, having should inside of should - elasticsearch

I am searching throught logs with wildcards on multiple fields ,these wildcard queries are inside a should class, every log aside the fields that I search with wildcards is having a "action_id" field, I want to return only the logs that are matching one or more of wildcards and have one or more of the action_id's that I want(A or B in the example).
What doesnt work:
q={
"size" : 20,
"sort" : [
{ "clsfd_id" : {"order" : "asc"}},
],
"fields":
"query":{
"filtered":{
"query":{
"match_all":{
}
},
"filter":{
"bool":{
"should":[
{
"query":{
"term":{
"id":"*12*"
}
}} #gets filled dynamically with wildcard queries
],
"should":[
{"query":{
"term":{
"action_id":"A"
}
}
},
{"query":{
"term":{
"action_id":"B"
}
}
},
]
}
}
}
}
}
Example doc:
{
"_index": "logs",
"_type": "log",
"_id": "AVdPQBuRkYFjD-WA9CiE",
"_score": null,
"_source": {
"username": "Ιδιώτης",
"user_id": null,
"action_on": "Part",
"ip": "sensitive info",
"idents": [
"sensiti",
"sensitive info"
],
"time": "2016-09-21T19:18:11.184576",
"id": 5993765,
"changes": "bla bla bla",
"action_id": "A"
}
*This is Elastic 1.7 by the way

"I want to return only the logs that are matching the wildcards"
Shouldn't it then be in a must clause instead of should? Also terms filter can take multiple term candidates so you don't need to create separate filters for action_id A and B. When reading docs note that query DSL has changed quite a lot between 1.x and 2.x versions of Elastisearch, filters and queries have been more or less "merged" together.
Edit: Based on the comment, this should work assuming the original query was functional (sorry too lazy to test it):
{
"size": 20,
"sort": [{"clsfd_id" : {"order" : "asc"}}],
"query":{
"filtered":{
"query":{"match_all":{}},
"filter":{
"bool":{
"should":[
{
"query":{
"term": {"id":"*12*"}
}
},
{
"query":{
"term": {"id":"*23*"}
}
}
],
"must":[
{
"query":{
"term": {"action_id": ["A", "B"]}
}
}
]
}
}
}
}
}
The more verbose option is to create a new "query => filtered => filter => should => terms" instead of the multi-term filter.

Does this work?
{
"size" : 20,
"sort" : [{
"date" : {
"order" : "asc"
}
}
],
"fields" : [],
"query" : {
"filtered" : {
"query" : {
"wildcard" : {
"id" : {
"value" : "*12*"
}
}
},
"filter" : {
"terms" : {
"action_id" : ["A", "B"]
}
}
}
} }

Related

Elasticsearch query_string filter with Fields when not empty string

Im trying to build a query_string with elasticsearch DSL, my query is sql style is like this :
SELECT NAME,DESCRIPTION, URL, FACEBOOK_URL, YEAR_CREATION FROM MY_INDEX WHERE FACEBOOK_URL<>'' and ( Match('NAME: sometext OR DESCRIPTION: sometext )) AND YEAR_CREATION > 2000
I dont know how to include filter for no empty value for FACEBOOK_URL
Thanks for help...
It's very clear about #Kamal's point. You should examine the type of your "FACEBOOK" field, which must be keyword type but not text.
Please see the below mapping, sample documents, the request query and response.
Note that I may not have added all the fields but only the concerned fields so as to mirror the query you've added.
Mapping:
PUT facebook
{
"mappings": {
"properties": {
"name":{
"type": "text",
"fields": {
"keyword":{
"type":"keyword"
}
}
},
"description":{
"type": "text",
"fields": {
"keyword":{
"type":"keyword"
}
}
},
"facebook_url":{
"type": "keyword"
},
"year_creation":{
"type": "date"
}
}
}
}
Sample Docs:
In the below 4 documents, only the 3rd document mentioned would be something that you would want to be returned.
Docs 1 and 2 have empty values of facebook_url while doc 4 does not have the field in the first place at all.
POST facebook/_doc/1
{
"name": "sometext",
"description": "sometext",
"facebook_url": "",
"year_creation": "2019-01-01"
}
POST facebook/_doc/2
{
"name": "sometext",
"description": "sometext",
"facebook_url": "",
"year_creation": "2019-01-01"
}
POST facebook/_doc/3
{
"name" : "sometext",
"description" : "sometext",
"facebook_url" : "http://mytest.fb.link",
"year_creation" : "2019-01-01"
}
POST facebook/_doc/4
{
"name": "sometext",
"description": "sometext",
"year_creation": "2019-01-01"
}
Request Query:
POST facebook/_search
{
"_source": ["name", "description","facebook_url","year_creation"],
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"name": "sometext"
}
},
{
"match": {
"description": "sometext"
}
}
]
}
},
{
"exists": {
"field": "facebook_url"
}
},
{
"range": {
"year_creation": {
"gte": "2000-01-01"
}
}
}
],
"must_not": [
{
"term": {
"facebook_url": {
"value": ""
}
}
}
]
}
}
}
I think the query would be self-explainable.
I have added Exists query so that if the document does not have that field, it would not be appearing the result, however for empty values I've added a clause in must_not.
Notice that in my design, I've used facebook_url as keyword type as it makes no sense to have it in text type. For that reason, I've used Term Query.
Also note that for date filtering, I've made use of Range Query. Do go through the links for more clarification as it is important to understand more on how each of these query works.
Response:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 2.148216,
"hits" : [
{
"_index" : "facebook",
"_type" : "_doc",
"_id" : "3",
"_score" : 2.148216,
"_source" : {
"facebook_url" : "http://mytest.fb.link",
"year_creation" : "2019-01-01",
"name" : "sometext",
"description" : "sometext"
}
}
]
}
}
Updated Answer:
Change the field of ANNEE_CREATION from integer to Date field as that is the correct type for the Date fields.
You have not applied range query on the date field based on your query in question.
Note that for must_not apply the logic on keyword field of facebook that you have and not on text field.
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":" Bordeaux",
"fields":[
"VILLE",
"ADRESSE",
"FACEBOOK"
]
}
},
{
"exists":{
"field":"FACEBOOK"
}
}
],
"must_not":[
{
"term":{
"FACEBOOK.keyword":{ <------ Make sure this is a keyword field
"value":""
}
}
}
],
"filter":[
{
"range":{
"FONDS_LEVEES_TOTAL":{
"gt":0
}
}
},
{
"range":{ <----- Apply the range query here based on what you've mentioned in question
"ANNEE_CREATION":{ <----- Make sure this is the date field
"gte": "2015" <----- Make sure you apply correct query parameter in range query
}
}
}
]
}
},
"track_total_hits":true,
"from":0,
"size":8,
"_source":[
"FACEBOOK",
"NOM",
"ANNEE_CREATION",
"FONDS_LEVEES_TOTAL"
]
}
As expected only the document having Id 3 is returned as result.

ElasticSearch source filtering array of objects

Here is a document
{
"Id": "1",
"Name": "Thing",
"Prices": [
{"CompanyId": "1", "Price": "11.11"},
{"CompanyId": "2", "Price": "12.12"},
{"CompanyId": "3", "Price": "13.13"}
And here is the associated ElasticSearch schema:
"Prices" : {
"type" : "nested",
"properties" : {
"CompanyId": {
"type" : "integer"
},
"Price" : {
"type" : "scaled_float",
"scaling_factor" : 100
}
}
}
If a user is buying for CompantId = 3 then the supplier doesn't want them to be able to see the preferential pricing for CompanyId = 1, say.
Therefore I need to use a source filter to remove all prices for which the CompanyId is not 3.
I have found that this works.
"_source":{
"excludes": ["Prices.companyId.CompanyId"]
}
But I don't understand how or why.
It can't possibly work because the required CompanyId is not mentioned anywhere in the whole ElasticSearch search JSON.
Adding a full search JSON:
{
"query":{
"bool":{
"must":[
{
"match_all":{
}
}
],
"filter":{
"match":{
"PurchasingViews":6060
}
}
}
},
"size":20,
"aggs":{
"CompanyName.raw":{
"terms":{
"field":"CompanyName.raw",
"size":20,
"order":{
"_count":"desc"
}
}
}
},
"_source":{
"excludes":[
"PurchasingViews",
"ContractFilters",
"SearchField*",
"Keywords*",
"Menus*",
"Prices.companyId.CompanyId"
]
}
}
Result:
{
"took":224,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"skipped":0,
"failed":0
},
"hits":{
"total":1173525,
"max_score":1.0,
"hits":[
{
"_index":"products_purchasing",
"_type":"product_purchasing",
"_id":"12787114",
"_score":1.0,
"_source":{
"CompanyName":"...",
"Prices":[
{
"CompanyId":1474,
"Price":697.3
}
],
"CompanyId":571057,
"PartNumber":"...",
"LongDescription_en":"...",
"Name_en":"...",
"DescriptionSnippet_en":"...",
"ProductId":9605985,
"Id":12787114
}
}
]
},
"aggregations":{
"CompanyName.raw":{
"doc_count_error_upper_bound":84,
"sum_other_doc_count":21078,
"buckets":[
{
"key":"...",
"doc_count":534039
}
]
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
I believe the way you have put your mapping with the "nested" type has created the reference you are questioning.
Also - I would suggest framing the query as looking for 3 only rather than "excluding" everything except 3

Elastic - Multiple filter query syntax

Hello I have the following query that I am running:
{
"_source": [
"source1",
"source2",
"source3",
"source4",
],
"query": {
"bool": {
"minimum_should_match": 1,
"must": {
"filter": [
{
"term": {
"_type": {
"value": "someval1"
}
}
},
{
"term": {
"_type": {
"value": "someval2"
}
}
}
],
"query_string": {
"analyze_wildcard": "true",
"query": "tesla*",
"rewrite": "scoring_boolean"
}
}
}
},
"size": 50,
"sort": [
"_score"
]
}
That is currently returning:
'"reason":"[bool] malformed query, expected [END_OBJECT] but found [FIELD_NAME]","line":1,"col":343},"status":400}'
Any idea how to use multiple filters on a query? I was able to do it just fine on elastic 2.4 but since OR is now deprecated as well as filtered, I am a bit lost.
Thanks!
The syntax of the query is wrong. filter should not be wrapped into the must statement. It should be in the same level with must. Also bool queries must statement should be an array, not an object. So your query should look like this
{
"_source":[
"source1",
"source2",
"source3",
"source4"
],
"query":{
"bool":{
"minimum_should_match":1,
"must":[
{
"query_string":{
"analyze_wildcard":"true",
"query":"tesla*",
"rewrite":"scoring_boolean"
}
}
],
"filter":{
"bool":{
"should":[
{
"term":{
"_type":{
"value":"someval1"
}
}
},
{
"term":{
"_type":{
"value":"someval2"
}
}
}
]
}
}
}
},
"size":50,
"sort":[
"_score"
]
}
I think your filter is OR, that's why I wrap it inside should

Query based on Fields existing in different Indices in Elasticsearch

I've got the following query
{
"from":0,
"size":50000,
"_source":[
"T121",
"timestamp"
],
"sort":{
"timestamp":{
"order":"asc"
}
},
"query":{
"bool":{
"must":{
"range":{
"timestamp":{
"gte":"2017-01-17 11:44:41.347",
"lte":"2017-02-18 11:44:47.878"
}
}
},
"must":{
"exists":{
"field":"T121"
}
}
}
}
}
http://172.22.23.169:9200/index1,index2,Index3/_search?pretty
With this URL i want to query over a number of indices in Elasticsearch and only return those documents where a specific field exists.
Is it possible to put in a list of fields in the "exists" clause where i define
if "field1" OR "field2" OR "fiedl3" are existing in one of the documents return it, otherwise don't, or do i have to script such a case?
To search across all indices use > http://172.22.23.169:9200/_search?pretty
To search across selected indices add following filter to "bool" filter
"must": {
"terms": {
"_index": [
"index1",
"index2"
]
}
}
For OR'ing multiple "exists", you can use should clause with multiple exists and specify "minimum_should_match" to control searched records.
{
"from":0,
"size":50000,
"_source":[
"T121",
"timestamp"
],
"sort":{
"timestamp":{
"order":"asc"
}
},
"query":{
"bool":{
"must":{
"range":{
"timestamp":{
"gte":"2017-01-17 11:44:41.347",
"lte":"2017-02-18 11:44:47.878"
}
}
},
"should":[
{
"exists":{
"field":"field1"
}
},
{
"exists":{
"field":"field2"
}
},
{
"exists":{
"field":"field3"
}
}
]
}
}
}

Is it possible to use a more-like-this query on nested fields?

I have an "event" type based on a (nested) press article, including the title, and the text, which both have multifields.
I've tried :
{
"query":{
"nested":{
"path":"article",
"query":{
"mlt":{
"fields":["article.title.search","article.text.search"],
"max_query_terms": 20,
"min_term_freq": 1,
"include": "false",
"like":[{
"_index":"myindex",
"_type":"event",
"doc":{
"article":{
"title":"this is the title",
"text":"this is the body of the article"
}
}]
}
}
}
}
}
But it always returns 0 hits
{
"query": {
"nested":{
"path":"articles",
"query":{
"more_like_this" : {
"fields" : ["articles.brand", "articles.category", "articles.material"],
"like" : [
{
"_index" : "$index",
"_type" : "$type",
"_id" : "$id"
}
],
"min_term_freq" : 1,
"max_query_terms" : 20
}
}
}
}
This Works for me, Taking in consideration that the mapping of the nested fields you are using must be defined as term vectors.
"brand": {
"type": "string",
"index": "not_analyzed",
"term_vector": "yes"
}
Refer to: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html

Resources