Value does not exist - elasticsearch

I am familiar with checking if a field exists using the exists query. I am wondering if there is a way to check if a value does not exist instead; something like this:
GET /_search
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "user",
"value": "id"
}
}
}
}
}
Update:
I want to add that it is a compound query so counting the result will not work.

If you want to check that if a particular field value exists or not, then you can simply use a match query. There is no need to use exists query with the must_not clause.
If the document matching the field value is there in your index, then its count will come in the search result. hits.total.value will you the count of matching documents.
Adding a working example
Index Data:
{
"user": "abc"
}
Search Query:
{
"size":0,
"query": {
"match": {
"user": "abc"
}
}
}
Search Result:
"hits": {
"total": {
"value": 1, // note this
"relation": "eq"
},
"max_score": null,
"hits": []
}
Search Query:
{
"size":0,
"query": {
"match": {
"user": "def"
}
}
}
Search Result:
"hits": {
"total": {
"value": 0, // note this
"relation": "eq"
},
"max_score": null,
"hits": []
}
Another option is to use count API
GET /_count
{
"query": {
"match": {
"user": "def"
}
}
}
Search Result:
{
"count": 0, // note this
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
}
}

Related

How do I use the whitespace analyzer correctly?

I am currently having an issue where I cannot search for UUID's in my logs. For instance, I have a fieldname "log" and in there is a full log, for example:
"log": "time=\"2022-10-10T07:46:00Z\" level=info msg=\"message to endpoint (outgoing)\" message=\"{8503fb5a-3899-4305-8480-6ddc0f5df296 2022-10-10T09:45:59+02:00}\"\n",
I want to get this log in elastic search, and via Postman I send this:
{
"query": {
"match": {
"log": {
"analyzer": "whitespace",
"query": "8503fb5a-3899-4305-8480-6ddc0f5df296"
}
}
},
"size": 50,
"from": 0
}
As a response I get:
{
"took": 930,
"timed_out": false,
"num_reduce_phases": 2,
"_shards": {
"total": 581,
"successful": 581,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
But when I search on "8503fb5a" alone, then I get the wanted results. This means the dashes are still causing issues, but I thought using the whitespace analyzer should fix this? Am I doing something wrong?
These are the fields I have.
You not required to use whitespace analyzer.
You have 2 option to search entire UUID.
First, You can use match query with operator set to and:
{
"query": {
"match": {
"log":{
"query": "8503fb5a-3899-4305-8480-6ddc0f5df296",
"operator": "and"
}
}
}
}
Second, You can use match_phrase query which will search for exact match.
{
"query": {
"match_phrase": {
"log": "8503fb5a-3899-4305-8480-6ddc0f5df296"
}
}
}

Elasticsearch - How do i search on 2 fields. 1 must be null and other must match search text

I am trying to do a search on elasticsearch 6.8.
I don't have control over the elastic search instance, meaning i cannot control how the data is indexed.
I have data structured like this when i do a match. all search:
{ "took": 4,
"timed_out": false,
"_shards": {
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 15.703552,
"hits": [ {
"_index": "(removed index)",
"_type": "_doc",
"_id": "******** (Removed id)",
"_score": 15.703552,
"_source": {
"VCompany": {
"cvrNummer": 12345678,
"penheder": [
{
"pNummer": 1234567898,
"periode": {
"gyldigFra": "2013-04-10",
"gyldigTil": "2014-09-30"
}
}
],
"vMetadata": {
"nyesteNavn": {
"navn": "company1",
"periode": {
"gyldigFra": "2013-04-10",
"gyldigTil": "2014-09-30"
}
},
}
}
}
}
}]
The json might not be fully complete because i removed some unneeded data. So what I am trying to do is search where: "vCompany.vMetaData.nyesteNavn.gyldigTil" is null and where "vCompany.vMetaData.nyesteNavn.navn" will match a text string.
I tried something like this:
{
"query": {
"bool": {
"must": [
{"match": {"Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn": "company1"}}
],
"should": {
"terms": {
"Vrvirksomhed.penheder.periode.gyldigTil": null
}
}
}
}
You need to use must_not with exists query like below to check if field is null or not. Below query will give result where company1 is matching and Vrvirksomhed.penheder.periode.gyldigTil field is null.
{
"query": {
"bool": {
"must": [
{
"match": {
"Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn": "company1"
}
}
],
"must_not": [
{
"exists": {
"field": "Vrvirksomhed.penheder.periode.gyldigTil"
}
}
]
}
}
}

Elasticsearch aggregation shows incorrect total

Elasticsearch version is 7.4.2
I suck at Elasticsearch and I'm trying to figure out what's wrong with this query.
{
"size": 10,
"from": 0,
"query": {
"bool": {
"must": [
{
"exists": {
"field": "firstName"
}
},
{
"query_string": {
"query": "*",
"fields": [
"params.display",
"params.description",
"params.name",
"lastName"
]
}
},
{
"match": {
"status": "DONE"
}
}
],
"filter": [
{
"term": {
"success": true
}
}
]
}
},
"sort": {
"createDate": "desc"
},
"collapse": {
"field": "lastName.keyword",
"inner_hits": {
"name": "lastChange",
"size": 1,
"sort": [
{
"createDate": "desc"
}
]
}
},
"aggs": {
"total": {
"cardinality": {
"field": "lastName.keyword"
}
}
}
}
It returns:
"aggregations": {
"total": {
"value": 429896
}
}
So ~430k results, but in pagination we stop getting results around the 426k mark. Meaning, when I run the query with
{
"size": 10,
"from": 427000,
...
}
I get:
{
"took": 2215,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"total": {
"value": 429896
}
}
}
But if I change from to be 426000 I still get results.
You are comparing the cardinality aggregation value of your field lastName.keyword to your total documents in the index, which is two different things.
You can check the total no of documents in your index using the count API and from/size you are defined at query level ie it brings the documents matching your search query and as you don't have track_total_hits it shows 10k with relation gte means there are more than 10k documents matching your search query.
When it comes to your aggregation, I can see in both the case it returns the count as 429896 as this aggregation is not depend on the from/size you are mentioning for your query.
I was surprised when I found out that the cardinality parameter has Precision control.
Setting the maximum value was the solution for me.

Multiple Match Phrase Prefixes Return Zero Results In Elasticsearch

I have the following Elasticsearch, version 2.3, query which produces zero results.
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"phone": "123"
}
},
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}
Output from above query:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Output of above query with _explain
{
"_index": "index_name",
"_type": "doc_type",
"_id": "_explain",
"_version": 4,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}
However, when I do either of the following I get results including the one document that matches both parts of the above query. If I include the full phone number then the document will appear in the results.
Phone numbers are stored as strings without any formatting. i.e. "1234567890".
Any reason why the two prefix query returns zero results?
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"phone": "123"
}
}
]
}
}
}
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}
I was able to get the results I wanted by changing the phone number query to a regexp query instead of a match_phrase_prefix query.
{
"query": {
"bool": {
"must": [
{
"regexp": {
"phone": "123[0-9]+"
}
},
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}

Elasticsearch Cardinality Aggregation giving completely wrong results

I am saving each page view of a website in an ES index, where each page is recognized by an entity_id.
I need to get the total count of unique page views since a given point in time.
I have the following mapping:
{
"my_index": {
"mappings": {
"page_views": {
"_all": {
"enabled": true
},
"properties": {
"created": {
"type": "long"
},
"entity_id": {
"type": "integer"
}
}
}
}
}
}
According to the Elasticsearch docs, the way to do that is using a cardinality aggregation.
Here is my search request:
GET my_index/page_views/_search
{
"filter": {
"bool": {
"must": [
[
{
"range": {
"created": {
"gte": 9999999999
}
}
}
]
]
}
},
"aggs": {
"distinct_entities": {
"cardinality": {
"field": "entity_id",
"precision_threshold": 100
}
}
}
}
Note, that I have used a timestamp in the future, so no results are returned.
And the result I'm getting is:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
},
"aggregations": {
"distinct_entities": {
"value": 116
}
}
}
I don't understand how the unique page visits could be 116, giving that there are no page visits at all for the search query. What am I doing wrong?
Your aggregation is returning the global value for the cardinality. If you want it to return only the cardinality of the filtered set, one way you could do that is to use a filter aggregation, then nest your cardinality aggregation inside that. Leaving out the filtered query for clarity (you can add it back in easily enough), the query I tried looks like:
curl -XPOST "http://localhost:9200/my_index/page_views/_search " -d'
{
"size": 0,
"aggs": {
"filtered_entities": {
"filter": {
"bool": {
"must": [
[
{
"range": {
"created": {
"gte": 9999999999
}
}
}
]
]
}
},
"aggs": {
"distinct_entities": {
"cardinality": {
"field": "entity_id",
"precision_threshold": 100
}
}
}
}
}
}'
which returns:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"filtered_entities": {
"doc_count": 0,
"distinct_entities": {
"value": 0
}
}
}
}
Here is some code you can play with:
http://sense.qbox.io/gist/bd90a74839ca56329e8de28c457190872d19fc1b
I used Elasticsearch 1.3.4, by the way.

Resources