Exclude empty array fields - but include documents missing the field - in elasticsearch - elasticsearch

I'm trying to run a query against elasticsearch that will find documents where one of the following conditions applies:
The document is missing the given field (tags) OR
The document has the value foo as an element of the tags array
The problem is that my current query will return documents that have a tags field where the value is an empty array. Presumably this is because elasticsearch is treating an empty array as the same thing as not having the field at all. Here's the full query I'm running that's returning the bad results:
{
"from": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"exists": {
"field": "_rankings.public"
}
},
{
"or": [
{
"missing": {
"existence": true,
"field": "tags",
"null_value": false
}
},
{
"terms": {
"execution": "or",
"tags": [
"foo"
]
}
}
]
}
]
}
},
"query": {
"match_all": {}
}
}
},
"size": 10000,
"sort": [
{
"_rankings.public": {
"ignore_unmapped": true,
"order": "asc"
}
}
]
}

I don't think you can achieve this so easily "out-of-the-box" for the reason you already mentioned: there's no difference between an empty array and a field (corresponding to that array) with no values in it.
Your only option might be to use a "null_value" for that "tags" field and, if you have any control over the data that goes into your documents, to treat a "[]" array as a '["_your_null_value_of_choice_"]'. And in your query to change "null_value": false to true.

Related

ElasticSearch - query documents where the nested field is empty array []

I am trying to filter by the empty arrays of the nested field.
I tried many different commands, even scripts, and flattened fields, but couldn't retrieve any results. Does anyone has experience with this, is it possible to be done in the ES? I also want to aggregate (count results) by the same empty array field value []
mapping
suitability:
type: "nested"
properties:
group:
type: "keyword"
code:
type: "keyword"
in the index, I have this nested field in every document
"suitability": [
{
"group": "RG309",
"code": 1
},
{
"group": "RG318",
"code": 1
},
{
"group": "RG355",
"code": 2
}
]
also some documents have an empty nested field
"suitability": []
query for empty suitability results ( DOESN'T WORK - always return total_hits: 0)
GET /_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"ignore_unmapped": [
true
],
"path": "suitability",
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "suitability"
}
}
]
}
}
}
}
]
}
},
"track_total_hits": true
}
query for not empty suitability ( THIS WORKS: returns all results )
{
"query": {
"bool": {
"must": [
{
"nested": {
"ignore_unmapped": [
true
],
"path": "suitability",
"query": {
"bool": {
"must": [
{
"terms": {
"suitability.rule_result": [
"1",
"2",
"3"
]
}
}
]
}
}
}
}
]
}
},
"track_total_hits": true
}

Elasticsearch wildcard query on numeric fields without using mapping

I'm looking for a creative solution because I can't use mapping as solution is already in production.
I have this query:
{
"size": 4,
"query": {
"bool": {
"filter": [
{
"range": {
"time": {
"from": 1597249812405,
"to": null,
}
}
},
{
"query_string": {
"query": "*181*",
"fields": [
"deId^1.0",
"deTag^1.0",
],
"type": "best_fields",
"default_operator": "or",
"max_determinized_states": 10000,
"enable_position_increments": true,
"fuzziness": "AUTO",
"fuzzy_prefix_length": 0,
"fuzzy_max_expansions": 50,
"phrase_slop": 0,
"escape": false,
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
},
"sort": [
{
"time": {
"order": "asc"
}
}
]
}
"deId" field is an integer in elasticsearch and the query returns nothing (though should),
Is there a solution to search for wildcards in numeric fields without using the multi field option which requires mapping?
Once you index an integer, ES does not treat the individual digits as position-sensitive tokens. In other words, it's not directly possible to perform wildcards on numeric datatypes.
There are some sub-optimal ways of solving this (think scripting & String.substring) but the easiest would be to convert those integers to strings.
Let's look at an example deId of 123181994:
POST prod/_doc
{
"deId_str": "123181994"
}
then
GET prod/_search
{
"query": {
"bool": {
"filter": [
{
"query_string": {
"query": "*181*",
"fields": [
"deId_str"
]
}
}
]
}
}
}
works like a charm.
Since your index/mapping is already in production, look into _update_by_query and stringify all the necessary numbers in a single call. After that, if you don't want to (and/or cannot) pass the strings at index time, use ingest pipelines to do the conversion for you.

What's the most efficient way to filter out items with member that don't contain the terms?

What's the most efficient way to write the following query?
Get 5 items that have member member that contains any of these items: ['item1', 'item2'].
should: [{ terms : {member: ['item1', 'item2'] } }]
If you find only 3 items, get 2 more where member is empty.
How do I finish this query?
You can use a should clause with term and not exists. So it will fetch documents where field member matches input query and where field doesn't exist. You can pass size to get top 5 documents from result
{
"size": 5,
"query": {
"bool": {
"should": [
{
"terms": {
"member": [
"1",
"2",
"3"
]
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "member"
}
}
]
}
}
]
}
}
}

elasticsearch inner join

I have an index with some fields, my documents contains valid "category" data also contains "url"(analyzed field) data but not contains respsize..
in the other hand documents that contains "respsize" data (greater than 0) also contains "url" data but not contains "category" data..
I think you got the point, I need join or intersection whatever that a query returns all documents contains respsize and category that have same same url documents.
Here what I did so far;(url field analyzed, rest of them not_analyzed)
here documents that have category:
and other documents have respsize that I need to combine them based on url
I need a dsl query that return records that have same url token(in this scenario it will be www.domainname.com) with merge category and respsize,
I simply want field in second img "category":"27" like in img1 but of course with rest of all fields.
here is my query but not work
GET webproxylog/accesslog/_search
{
"query": {
"filtered": {
"filter" : {
"and" : {
"filters": [
{
"not": {
"filter": {
"terms": {
"category": [
"-",
"-1",
"0"
]
},
"term": {
"respsize": "0"
}
}
},
"term": {
"category": "www.hurriyet.com.tr"
}
}
],
"_cache" : true
}
}
}
},
"sort": [
{
"respsize": {
"order": "desc"
}
}
]
}
You can try the query below. It will require the url field to be the one you specify (i.e. must) and then either of the next two clauses (i.e. should) must be true, i.e. category should be not one of the given terms or the respsize must be greater than 0.
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"url": "www.hurriyet.com.tr"
}
}
],
"should": [
{
"not": {
"terms": {
"category": [
"-",
"-1",
"0"
]
}
}
},
{
"range": {
"respsize": {
"gt": 0
}
}
}
]
}
}
}
}
}

query not applying custom score

I'm making the next query, my problem is that the custom score (scrip_score) is not being applied. Am I doing something wrong?:
{
"query": {
"bool": {
"must": [
{
"terms": {
"tactics": [
"user_id"
"type_user",
"browser_plugins",
"cashback"
]
}
}
]
},
"script_score": {
"script": "type_user === 2 ? 1 : 2"
}
},
"from": "0",
"size": 50,
"sort": {
"name": {
"order": "desc",
"ignore_unmapped": true
}
}
}
The script_score section in your query gets ignored. If you want it to be taken into account you need to wrap you existing bool query into a function_score query where you can use the script_score part as well.

Resources