Elasticsearch not finding match for document that contains query - elasticsearch

I am trying to search an index for documents that have exception field containing "semaphore" AND "RabbitMQ.Client.Impl".
Example exception:
System.ObjectDisposedException: The semaphore has been disposed.
at System.Threading.SemaphoreSlim.Release(Int32 releaseCount)
at RabbitMQ.Client.Impl.AsyncConsumerWorkService.WorkPool.HandleConcurrent(Work work, IModel model, SemaphoreSlim limiter)
When I search for "semaphore" - document is returned - great!
POST /logs-2023-01/_search?pretty=true
{
"query": {
"bool": {
"must": [
{
"match": {
"exception": "semaphore"
}
},
{
"range": {
"logDate": {
"gte": "now-43200m"
}
}
}
]
}
},
"size": 1000
}
Query above returns:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 7.5582323,
"hits": [
{
"_index": "logs-2023-01",
"_type": "record",
"_id": "q21yk4UBAdlSjmEEw5gy",
"_score": 7.5582323,
"_source": {
"applicationName": "k8s-application",
"logDate": "2023-01-08T22:13:59.873",
"logLevel": "Error",
"loggerName": "TaskScheduler.UnobservedTaskException.Logger",
"machineName": "k8s-pod-6755d4997c-rztgl",
"threadId": "2",
"message": "An unobserved task exception occurred. The semaphore has been disposed.",
"exception": """
System.ObjectDisposedException: The semaphore has been disposed.
at System.Threading.SemaphoreSlim.Release(Int32 releaseCount)
at RabbitMQ.Client.Impl.AsyncConsumerWorkService.WorkPool.HandleConcurrent(Work work, IModel model, SemaphoreSlim limiter)
""",
"sortDate": "2023-01-08T22:13:59.000027026"
}
}
]
}
}
However when I do same search for query "RabbitMQ.Client.Impl" (which is 100% contained in the exception) - I get nothing - why?
POST /logs-2023-01/_search?pretty=true
{
"query": {
"bool": {
"must": [
{
"match": {
"exception": "RabbitMQ.Client.Impl"
}
},
{
"range": {
"logDate": {
"gte": "now-43200m"
}
}
}
]
}
},
"size": 1000
}
Query above returns:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}

Tldr;
match queries will look for exact tokens.
Solution
Tokens are generated at ingestion time by the analyser.
The default analyser split token on whitespace.
Which means rabbitmq.client.impl.asyncconsumerworkservice.workpool.handleconcurrent is going to be a token.
Which is not going to match RabbitMQ.Client.Impl
But you can use match_phrase_prefix
with the following query:
GET 75236255/_search
{
"query": {
"match_phrase_prefix": {
"exception": "RabbitMQ.Client.Impl"
}
}
}

Related

In Painless, remove value from array

In Splunk SPL, it's easy to remove a value from an array....
| eval Account_Name = mvindex(Account_Name, 0)
Windows security logs reference the account name as the machine name in array(0)
array(1) contains the actual executing account name.
I need to do the same thing as the mvindex function in Painless.
I find lots of hits searching this but haven't found anything that works. THere must be a simple way to remove an array value.
Did you look for the following thing?
POST sample_index/_doc
{
"Account_Name": [
"machine-name",
"account-name"
]
}
POST sample_index/_update_by_query
{
"query": {
"match_all": {}
},
"script": {
"source": "ctx._source['Account_Name'].remove(0)",
"lang": "painless"
}
}
GET sample_index/_search
The result after search :
{
"took": 892,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "sample_index",
"_id": "t3RIY4YBnUNkT6fHnBrI",
"_score": 1,
"_source": {
"Account_Name": [
"account-name"
]
}
}
]
}
}

Elasticsearch - How do i search on 2 fields. 1 must be null and other must match search text

I am trying to do a search on elasticsearch 6.8.
I don't have control over the elastic search instance, meaning i cannot control how the data is indexed.
I have data structured like this when i do a match. all search:
{ "took": 4,
"timed_out": false,
"_shards": {
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 15.703552,
"hits": [ {
"_index": "(removed index)",
"_type": "_doc",
"_id": "******** (Removed id)",
"_score": 15.703552,
"_source": {
"VCompany": {
"cvrNummer": 12345678,
"penheder": [
{
"pNummer": 1234567898,
"periode": {
"gyldigFra": "2013-04-10",
"gyldigTil": "2014-09-30"
}
}
],
"vMetadata": {
"nyesteNavn": {
"navn": "company1",
"periode": {
"gyldigFra": "2013-04-10",
"gyldigTil": "2014-09-30"
}
},
}
}
}
}
}]
The json might not be fully complete because i removed some unneeded data. So what I am trying to do is search where: "vCompany.vMetaData.nyesteNavn.gyldigTil" is null and where "vCompany.vMetaData.nyesteNavn.navn" will match a text string.
I tried something like this:
{
"query": {
"bool": {
"must": [
{"match": {"Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn": "company1"}}
],
"should": {
"terms": {
"Vrvirksomhed.penheder.periode.gyldigTil": null
}
}
}
}
You need to use must_not with exists query like below to check if field is null or not. Below query will give result where company1 is matching and Vrvirksomhed.penheder.periode.gyldigTil field is null.
{
"query": {
"bool": {
"must": [
{
"match": {
"Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn": "company1"
}
}
],
"must_not": [
{
"exists": {
"field": "Vrvirksomhed.penheder.periode.gyldigTil"
}
}
]
}
}
}

Multiple Match Phrase Prefixes Return Zero Results In Elasticsearch

I have the following Elasticsearch, version 2.3, query which produces zero results.
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"phone": "123"
}
},
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}
Output from above query:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Output of above query with _explain
{
"_index": "index_name",
"_type": "doc_type",
"_id": "_explain",
"_version": 4,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}
However, when I do either of the following I get results including the one document that matches both parts of the above query. If I include the full phone number then the document will appear in the results.
Phone numbers are stored as strings without any formatting. i.e. "1234567890".
Any reason why the two prefix query returns zero results?
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"phone": "123"
}
}
]
}
}
}
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}
I was able to get the results I wanted by changing the phone number query to a regexp query instead of a match_phrase_prefix query.
{
"query": {
"bool": {
"must": [
{
"regexp": {
"phone": "123[0-9]+"
}
},
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}

Difference between a "plain" terms query and a terms query using a filter

I am trying to understand what the difference is between:
a "plain" elasticsearch query that is going to match a terms query and return a certain number of hits.
and a filtered query (therefore using a filter) that is going to return the same number of hits.
Here is the terms query:
GET _search
{
"query": {
"terms": {
"childcareTypes": [
"SOLE_CHARGE",
"OUT_OF_SCHOOL",
"BABY_SITTING"
],
"minimum_match": 3
}
}
}
Here is the filtered version:
GET _search
{
"query": {
"filtered": {
"filter": {
"terms": {
"childcareTypes": [
"SOLE_CHARGE",
"OUT_OF_SCHOOL",
"BABY_SITTING"
],
"execution": "and"
}
}
}
}
}
Both return a total hits of 8000 (against my index).
Here is the result from the "plain" terms query:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8000,
"max_score": 5.134171,
"hits": [
{
"_index": "bignibou",
"_type": "advertisement",
"_id": "AUs2T2lt3L5LNr7nkot2",
"_score": 5.134171,
"_source": {
"childcareWorkerType": "AUXILIAIRE_PARENTALE",
"childcareTypes": [
"SOLE_CHARGE",
"OUT_OF_SCHOOL",
"BABY_SITTING"
],
"address": {
"latitude": 48.8532558,
"longitude": 2.36584
},
"giveBath": "EMPTY"
}
},
...
Here is the result from the "filtered" query:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 8000,
"max_score": 1,
"hits": [
{
"_index": "bignibou",
"_type": "advertisement",
"_id": "AUs2T2lt3L5LNr7nkot2",
"_score": 1,
"_source": {
"childcareWorkerType": "AUXILIAIRE_PARENTALE",
"childcareTypes": [
"SOLE_CHARGE",
"OUT_OF_SCHOOL",
"BABY_SITTING"
],
"address": {
"latitude": 48.8532558,
"longitude": 2.36584
},
"giveBath": "EMPTY"
}
},
....
Then what are the differences between the two?
This is related to the differences between queries and filters (more information here).
In your case, unlike terms query, terms filter :
is cached
doesn't compute the score : all matching documents have the same _score of 1 (look at your results)
Consequently, the biggest difference is that the filtered query will be faster than a 'plain' terms query.

Elasticsearch Cardinality Aggregation giving completely wrong results

I am saving each page view of a website in an ES index, where each page is recognized by an entity_id.
I need to get the total count of unique page views since a given point in time.
I have the following mapping:
{
"my_index": {
"mappings": {
"page_views": {
"_all": {
"enabled": true
},
"properties": {
"created": {
"type": "long"
},
"entity_id": {
"type": "integer"
}
}
}
}
}
}
According to the Elasticsearch docs, the way to do that is using a cardinality aggregation.
Here is my search request:
GET my_index/page_views/_search
{
"filter": {
"bool": {
"must": [
[
{
"range": {
"created": {
"gte": 9999999999
}
}
}
]
]
}
},
"aggs": {
"distinct_entities": {
"cardinality": {
"field": "entity_id",
"precision_threshold": 100
}
}
}
}
Note, that I have used a timestamp in the future, so no results are returned.
And the result I'm getting is:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
},
"aggregations": {
"distinct_entities": {
"value": 116
}
}
}
I don't understand how the unique page visits could be 116, giving that there are no page visits at all for the search query. What am I doing wrong?
Your aggregation is returning the global value for the cardinality. If you want it to return only the cardinality of the filtered set, one way you could do that is to use a filter aggregation, then nest your cardinality aggregation inside that. Leaving out the filtered query for clarity (you can add it back in easily enough), the query I tried looks like:
curl -XPOST "http://localhost:9200/my_index/page_views/_search " -d'
{
"size": 0,
"aggs": {
"filtered_entities": {
"filter": {
"bool": {
"must": [
[
{
"range": {
"created": {
"gte": 9999999999
}
}
}
]
]
}
},
"aggs": {
"distinct_entities": {
"cardinality": {
"field": "entity_id",
"precision_threshold": 100
}
}
}
}
}
}'
which returns:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 0,
"hits": []
},
"aggregations": {
"filtered_entities": {
"doc_count": 0,
"distinct_entities": {
"value": 0
}
}
}
}
Here is some code you can play with:
http://sense.qbox.io/gist/bd90a74839ca56329e8de28c457190872d19fc1b
I used Elasticsearch 1.3.4, by the way.

Resources