How do I use the whitespace analyzer correctly? - elasticsearch

I am currently having an issue where I cannot search for UUID's in my logs. For instance, I have a fieldname "log" and in there is a full log, for example:
"log": "time=\"2022-10-10T07:46:00Z\" level=info msg=\"message to endpoint (outgoing)\" message=\"{8503fb5a-3899-4305-8480-6ddc0f5df296 2022-10-10T09:45:59+02:00}\"\n",
I want to get this log in elastic search, and via Postman I send this:
{
"query": {
"match": {
"log": {
"analyzer": "whitespace",
"query": "8503fb5a-3899-4305-8480-6ddc0f5df296"
}
}
},
"size": 50,
"from": 0
}
As a response I get:
{
"took": 930,
"timed_out": false,
"num_reduce_phases": 2,
"_shards": {
"total": 581,
"successful": 581,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
But when I search on "8503fb5a" alone, then I get the wanted results. This means the dashes are still causing issues, but I thought using the whitespace analyzer should fix this? Am I doing something wrong?
These are the fields I have.

You not required to use whitespace analyzer.
You have 2 option to search entire UUID.
First, You can use match query with operator set to and:
{
"query": {
"match": {
"log":{
"query": "8503fb5a-3899-4305-8480-6ddc0f5df296",
"operator": "and"
}
}
}
}
Second, You can use match_phrase query which will search for exact match.
{
"query": {
"match_phrase": {
"log": "8503fb5a-3899-4305-8480-6ddc0f5df296"
}
}
}

Related

Elasticsearch aggregation shows incorrect total

Elasticsearch version is 7.4.2
I suck at Elasticsearch and I'm trying to figure out what's wrong with this query.
{
"size": 10,
"from": 0,
"query": {
"bool": {
"must": [
{
"exists": {
"field": "firstName"
}
},
{
"query_string": {
"query": "*",
"fields": [
"params.display",
"params.description",
"params.name",
"lastName"
]
}
},
{
"match": {
"status": "DONE"
}
}
],
"filter": [
{
"term": {
"success": true
}
}
]
}
},
"sort": {
"createDate": "desc"
},
"collapse": {
"field": "lastName.keyword",
"inner_hits": {
"name": "lastChange",
"size": 1,
"sort": [
{
"createDate": "desc"
}
]
}
},
"aggs": {
"total": {
"cardinality": {
"field": "lastName.keyword"
}
}
}
}
It returns:
"aggregations": {
"total": {
"value": 429896
}
}
So ~430k results, but in pagination we stop getting results around the 426k mark. Meaning, when I run the query with
{
"size": 10,
"from": 427000,
...
}
I get:
{
"took": 2215,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"total": {
"value": 429896
}
}
}
But if I change from to be 426000 I still get results.
You are comparing the cardinality aggregation value of your field lastName.keyword to your total documents in the index, which is two different things.
You can check the total no of documents in your index using the count API and from/size you are defined at query level ie it brings the documents matching your search query and as you don't have track_total_hits it shows 10k with relation gte means there are more than 10k documents matching your search query.
When it comes to your aggregation, I can see in both the case it returns the count as 429896 as this aggregation is not depend on the from/size you are mentioning for your query.
I was surprised when I found out that the cardinality parameter has Precision control.
Setting the maximum value was the solution for me.

Value does not exist

I am familiar with checking if a field exists using the exists query. I am wondering if there is a way to check if a value does not exist instead; something like this:
GET /_search
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "user",
"value": "id"
}
}
}
}
}
Update:
I want to add that it is a compound query so counting the result will not work.
If you want to check that if a particular field value exists or not, then you can simply use a match query. There is no need to use exists query with the must_not clause.
If the document matching the field value is there in your index, then its count will come in the search result. hits.total.value will you the count of matching documents.
Adding a working example
Index Data:
{
"user": "abc"
}
Search Query:
{
"size":0,
"query": {
"match": {
"user": "abc"
}
}
}
Search Result:
"hits": {
"total": {
"value": 1, // note this
"relation": "eq"
},
"max_score": null,
"hits": []
}
Search Query:
{
"size":0,
"query": {
"match": {
"user": "def"
}
}
}
Search Result:
"hits": {
"total": {
"value": 0, // note this
"relation": "eq"
},
"max_score": null,
"hits": []
}
Another option is to use count API
GET /_count
{
"query": {
"match": {
"user": "def"
}
}
}
Search Result:
{
"count": 0, // note this
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
}
}

Elasticseach multiple indices suggestions

I have following problem. This is actually my implementation of an "did you mean" query. If I use only one index the results fit perfectly. If I use multiple indices I wont get any results.
Does this query only work for single indices?
GET index1/_search
{
"suggest": {
"text": "exmple",
"multi_phrase": {
"phrase": {
"field": "all",
"size": 5,
"gram_size": 3,
"collate": {
"query": {
"source": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": {
"multi_match": {
"query": "{{suggestion}}",
"type": "cross_fields",
"fields": [
"name",
"name2"
],
"operator": "AND",
"lenient": true
}
}
}
}
},
"params": {
"field_name": "all"
}
}
}
}
}
}
If I try this query against on single index everything works fine. If I use multiple indices the results are empty.
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": 0,
"hits": []
},
"suggest": {
"multi_phrase": [
{
"text": "example",
"offset": 0,
"length": 9,
"options": []
}
]
}
}
I found the solution on my own. I have to use confidence parameter.
The confidence level defines a factor applied to the input phrases
score which is used as a threshold for other suggest candidates. Only
candidates that score higher than the threshold will be included in
the result. For instance a confidence level of 1.0 will only return
suggestions that score higher than the input phrase. If set to 0.0 the
top N candidates are returned. The default is 1.0.

Elastic Search don't return highlight results

I'm sending a request like this:
{
"from": 0,
"query": {
"match": {
"_all": "presidencia"
}
}
,
"aggs": {
//... some aggregations
}
,
"highlight": {
"fields": {
"nomeOrgaoSuperior": {}
}
}
}
But my response doesn't come with highlight field.
Response:
{
"took": 68,
"timed_out": false,
"_shards": {"total": 15, "successful": 15, "failed": 0},
"hits": {
"total": 692785,
"max_score": 0.48536316,
"hits": [
//Some hits...
]
},
"aggregations": {
//some aggs ...
}
}
Do i need some extra configuration on my index or what?
Found the problem. I was trying to use highlight on field that wasn't analysed by my analyser. So, my search was analysed and the fields i was trying to get the highlight wasn't. That made the highlighter to never return a match.

Fuzziness behavior on a match_phrase query

Days ago I got this "problem". I was running a match_phrase query in my index. Everything was as expected, until I did the same search with a multiple words nouns (before I was using single word nouns, eg: university). I made one misspelling and the search did not work (not found), if I removed a word (let's say the one that was spelled correctly), the search work (found).
Here there are the example I made:
The settings
PUT index1
{
"mappings": {
"myType": {
"properties": {
"field1": {
"type": "string",
"analyzer": "standard"
}
}
}
}
}
POST index1/myType/1
{
"field1": "Commercial Banks"
}
Case 1: Single noun search
GET index1/myType/_search
{
"query": {
"match": {
"field1": {
"type": "phrase",
"query": "comersial",
"fuzziness": "AUTO"
}
}
}
}
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.19178303,
"hits": [
{
"_index": "index1",
"_type": "myType",
"_id": "1",
"_score": 0.19178303,
"_source": {
"field1": "Commercial Banks"
}
}
]
}
}
Case 2: Multiple noun search
GET index1/myType/_search
{
"query": {
"match": {
"field1": {
"type": "phrase",
"query": "comersial banks",
"fuzziness": "AUTO"
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
So, in the second case, why am I not finding the document when performing the match_phrase query? Is there something I am missing?
Those result just make doubt about what I know.
Am I using the fuzzy search incorrectly? I'm not sure if this is a problem, or I'm the one who do not understand the behavior.
Many thanks in advance for reading my question. I hope you can help me with this.
Fuzziness is not supported in phrase queries.
Currently, ES is silent about it, i.e. it allows you to specify the parameter but doesn't warn you that it is not supported. A pull request (#18322) (related to issue #7764) exists that will remedy to this problem. Once merged into ES 5, this query will error out.
In the breaking changes document for 5.0, we can see that this won't be supported:
The multi_match query will fail if fuzziness is used for cross_fields, phrase or phrase_prefix type. This parameter was undocumented and silently ignored before for these types of multi_match.

Resources