what happens when performing a match query on a date field - elasticsearch

Why can I perform a query of the following type:
GET myindex/_search
{
"query": {
"bool" : {
"must": [
{"match": {"#timestamp": "454545645656"}}
]
}
}
}
when the field type is the following one?
"mappings": {
"fluentd": {
"properties": {
"#timestamp": {
"type": "date"
},
does it make sense? Does the query value passes the analyzer and compares the field against what?

No, even you are using the match query on date field and match are analyzed means it goes through the same analyzers applied at index time on the field. As explained in official ES doc.
But as explained in the official ES doc on date datatype.
Queries on dates are internally converted to range queries on this
long representation
You can test it yourself by using the explain=true param on your search query. More info about explain API can be found here.
I did this for your search query and you can see in the result(explanation part) it shows the range query on the date field.
URL:- /_search?explain=true
"hits": [
{
"_shard": "[date-index][0]",
"_node": "h2H2MJd5T5-b1cUSkHVHcw",
"_index": "date-index",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"#timestamp": "454545645656"
},
"_explanation": {
"value": 1.0,
"description": "#timestamp:[454545645656 TO 454545645656]", --> see range query
"details": []
}
}

Related

How to perform filter on aggregation results in elastic search?

I have an elastic search index that contains a certain field on which I want to perform a wildcard query. The issue is that the field is duplicated in many docs hence I want to use aggregation first to get unique values for that field and then perform a wildcard query on top of that. Is there a way I can perform the query on aggregation results in elastic search?
I believe you can find the results you need by collapsing your search results rather than using your strategy of first obtaining the aggregation results and then running a wildcard query.
Adding a working example with index data (with the default mapping), search query and search result.
Index Data:
{
"role": "example123",
"number": 1
}
{
"role": "example",
"number": 2
}
{
"role": "example",
"number": 3
}
Search Query:
{
"query": {
"wildcard": {
"role": "example*"
}
},
"collapse": {
"field": "role.keyword"
}
}
Search Result:
"hits": [
{
"_index": "72724517",
"_id": "1",
"_score": 1.0,
"_source": {
"role": "example",
"number": 1
},
"fields": {
"role.keyword": [
"example"
]
}
},
{
"_index": "72724517",
"_id": "3",
"_score": 1.0,
"_source": {
"role": "example123",
"number": 1
},
"fields": {
"role.keyword": [
"example123"
]
}
}
]

How to add fuzziness to search as you type field in Elasticsearch?

I've been trying to add some fuzziness to my search as you type field type on Elasticsearch, but never got the needed query. Anyone have any idea to implement this?
Fuzzy Query returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.
The fuzziness parameter can be specified as:
AUTO -- It generates an edit distance based on the length of the term.
For lengths:
0..2 -- must match exactly
3..5 -- one edit allowed Greater than 5 -- two edits allowed
Adding working example with index data and search query.
Index Data:
{
"title":"product"
}
{
"title":"prodct"
}
Search Query:
{
"query": {
"fuzzy": {
"title": {
"value": "prodc",
"fuzziness":2,
"transpositions":true,
"boost": 5
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 2.0794415,
"_source": {
"title": "product"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 2.0794415,
"_source": {
"title": "produt"
}
}
]
Refer these blogs to get a detailed explaination on fuzzy query
https://www.elastic.co/blog/found-fuzzy-search
https://qbox.io/blog/elasticsearch-optimization-fuzziness-performance
Update 1:
Refer this ES official documentation
The fuzziness , prefix_length , max_expansions , rewrite , and
fuzzy_transpositions parameters are supported for the terms that are
used to construct term queries, but do not have an effect on the
prefix query constructed from the final term.
There are some open issues and discuss links that states that - Fuzziness not work with bool_prefix multi_match (search-as-you-type)
https://github.com/elastic/elasticsearch/issues/56229
https://discuss.elastic.co/t/fuzziness-not-work-with-bool-prefix-multi-match-search-as-you-type/229602/3
I know this question is asked long ago but I think this worked for me.
Since Elasticsearch allows a single field to be declared with multiple data types, my mapping is like below.
PUT products
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"product_type": {
"type": "search_as_you_type"
}
}
}
}
}
}
After adding some data to the index I fetched like this.
GET products/_search
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "prodc",
"type": "bool_prefix",
"fields": [
"title.product_type",
"title.product_type._2gram",
"title.product_type._3gram"
]
}
},
{
"multi_match": {
"query": "prodc",
"fuzziness": 2
}
}
]
}
}
}

Elasticsearch - pass fuzziness parameter in query_string

I have a fuzzy query with customized AUTO:10,20 fuzziness value.
{
"query": {
"match": {
"name": {
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
}
How to convert it to a query_string query? I tried nike~AUTO:10,20 but it is not working.
It's possible with query_strng as well, let me show using the same example as OP provided, both match_query provided by OP matches and query_string fetches the same document with same score.
And according to this and this ES docs, Elasticsearch supports AUTO:10,20 format, which is shown in my example as well.
Also
Index mapping
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
}
}
Index some doc
{
"name" : "nike"
}
Search query using match with fuzziness
{
"query": {
"match": {
"name": {
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
}
And result
"hits": [
{
"_index": "so-query",
"_type": "_doc",
"_id": "1",
"_score": 0.9808292,
"_source": {
"name": "nike"
}
}
]
Query_string with fuzziness
{
"query": {
"query_string": {
"fields": ["name"],
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
And result
"hits": [
{
"_index": "so-query",
"_type": "_doc",
"_id": "1",
"_score": 0.9808292,
"_source": {
"name": "nike"
}
}
]
Lucene syntax only allows you to specify "fuzziness" with the tilde symbol "~", optionally followed by 0, 1 or 2 to indicate the edit distance.
Elasticsearch Query DSL supports a configurable special value for AUTO which then is used to build the proper Lucene query.
You would need to implement that logic on your application side, by evaluating the desired edit distance based on the length of your search term and then use <searchTerm>~<editDistance> in your query_string-query.

why does elasticsearch calculates score for term queries?

I want to make a simple query based on knowing a unique field value using a term query. For instance:
{
"query": {
"term": {
"products.product_id": {
"value": "Ubsdf-234kjasdf"
}
}
}
}
Regarding term queries, Elasticsearch documentation states:
Returns documents that contain an exact term in a provided field.
You can use the term query to find documents based on a precise value such as a price, a product ID, or a username.
On the other hand, documentation also suggests that the _score is calculated for queries where relevancy matters (and is not the case for filter context which involves exact match).
I find it a bit confusing. Why does Elasticsearch calculates _score for term queries which are supposed to be concerned with exact match and not relevancy?
term queries are not analyzed, hence they would not go with the analysis phase, hence used for an exact match, but their score is still calculated when used in query context.
When you use term queries in filter context, then it means you are not searching on them, and rather doing filtering on them, hence there is no score calculated for them.
More info on query and filter context in official ES doc.
Both the example of term query in filter and query context shown in my below example
Term query in query context
{
"query": {
"bool": {
"must": [
{
"term": {
"title": "c"
}
}
]
}
},
"size": 10
}
And result with a score
"hits": [
{
"_index": "cpp",
"_type": "_doc",
"_id": "4",
"_score": 0.2876821, --> notice score is calculated
"_source": {
"title": "c"
}
}
]
Term query in filter context
{
"query": {
"bool": {
"filter": [ --> prev cluase replaced by `filter`
{
"term": {
"title": "c"
}
}
]
}
},
"size": 10
}
And search result with filter context
"hits": [
{
"_index": "cpp",
"_type": "_doc",
"_id": "4",
"_score": 0.0, --> notice score is 0.
"_source": {
"title": "c"
}
}
]
Filter context means that you need to wrap your term query inside a bool/filter query, like this:
{
"query": {
"bool": {
"filter": {
"term": {
"products.product_id": {
"value": "Ubsdf-234kjasdf"
}
}
}
}
}
}
The above query will not compute scores.

Why is Elasticsearch filter showing all records?

I am using Elasticsearch 5.5 and trying to run a filter query on some metrics data. For example:
{
"_index": "zabbix_test-us-east-2-node2-2017.10.29",
"_type": "jmx",
"_id": "AV9lcbNtvbkfeNFaDYH2",
"_score": 0.00015684571,
"_source": {
"metric_value_number": 95721248,
"path": "/home/ubuntu/etc_logstash/jmx/zabbix_test",
"#timestamp": "2017-10-29T00:04:31.014Z",
"#version": "1",
"host": "18.221.245.150",
"index": "zabbix_test-us-east-2-node2",
"metric_path": "zabbix_test-us-east-2-node2.Memory.NonHeapMemoryUsage.used",
"type": "jmx"
}
},
{
"_index": "zabbix_test-us-east-2-node2-2017.10.29",
"_type": "jmx",
"_id": "AV9lcbNtvbkfeNFaDYIU",
"_score": 0.00015684571,
"_source": {
"metric_value_number": 0,
"path": "/home/ubuntu/etc_logstash/jmx/zabbix_test",
"#timestamp": "2017-10-29T00:04:31.030Z",
"#version": "1",
"host": "18.221.245.150",
"index": "zabbix_test-us-east-2-node2",
"metric_path": "zabbix_test-us-east-2-node2.ClientRequest.ReadLatency.Count",
"type": "jmx"
}
}
I am running the following query:
GET /zabbix_test-us-east-2-node2-2017.10.29/jmx/_search
{
"query": {
"bool": {
"must": {
"match": {
"metric_path" : "zabbix_test-us-east-2-node2.ClientRequest.ReadLatency.Count"
}
}
}
}
}
Even then if it displaying all records. However, if I use the following text, it works by showing exact matches:
GET /zabbix_test-us-east-2-node2-2017.10.29/jmx/_search
{
"query": {
"bool": {
"must": {
"match": {
"metric_path" : "zabbix_test-us-east-2-node2.Memory.NonHeapMemoryUsage.used"
}
}
}
}
}
Can anyone please tell me what wrong I am doing here?
Thanks.
You didn't mention anything about mappings so I suppose you're using dynamic mapping - you've just indexed documents like these two in your elasticsearch.
Once you visit
{yourhost}/zabbix_test-us-east-2-node2-2017.10.29/_mapping
you will see that metric_path field probably has type text which is default for strings. As documentation states:
A field to index full-text values, such as the body of an email or the description of a product. These fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed
So your field is processed by analyzer and finally you're not executing match against something like this: zabbix_test-us-east-2-node2.ClientRequest.ReadLatency.Count but rather against some analyzed form, probably split by periods, and some other special characters.
So if you want to perform filtering like you posted, you should statically define your index before indexing any documents. You don't have to do it for each property, but at least metric_path should be defined as keyword. So you can start with:
PUT {yourhost}/zabbix_test-us-east-2-node2-2017.10.29
{
"mappings": {
"jmx": {
"properties": {
"metric_path": {
"type": "keyword"
}
}
}
}
}
Then you should index your documents. Mapping for other fields will be established by ES dynamically, but both queries attached by you will return exactly one result - just as you expect.

Resources