Why is Elasticsearch filter showing all records? - elasticsearch

I am using Elasticsearch 5.5 and trying to run a filter query on some metrics data. For example:
{
"_index": "zabbix_test-us-east-2-node2-2017.10.29",
"_type": "jmx",
"_id": "AV9lcbNtvbkfeNFaDYH2",
"_score": 0.00015684571,
"_source": {
"metric_value_number": 95721248,
"path": "/home/ubuntu/etc_logstash/jmx/zabbix_test",
"#timestamp": "2017-10-29T00:04:31.014Z",
"#version": "1",
"host": "18.221.245.150",
"index": "zabbix_test-us-east-2-node2",
"metric_path": "zabbix_test-us-east-2-node2.Memory.NonHeapMemoryUsage.used",
"type": "jmx"
}
},
{
"_index": "zabbix_test-us-east-2-node2-2017.10.29",
"_type": "jmx",
"_id": "AV9lcbNtvbkfeNFaDYIU",
"_score": 0.00015684571,
"_source": {
"metric_value_number": 0,
"path": "/home/ubuntu/etc_logstash/jmx/zabbix_test",
"#timestamp": "2017-10-29T00:04:31.030Z",
"#version": "1",
"host": "18.221.245.150",
"index": "zabbix_test-us-east-2-node2",
"metric_path": "zabbix_test-us-east-2-node2.ClientRequest.ReadLatency.Count",
"type": "jmx"
}
}
I am running the following query:
GET /zabbix_test-us-east-2-node2-2017.10.29/jmx/_search
{
"query": {
"bool": {
"must": {
"match": {
"metric_path" : "zabbix_test-us-east-2-node2.ClientRequest.ReadLatency.Count"
}
}
}
}
}
Even then if it displaying all records. However, if I use the following text, it works by showing exact matches:
GET /zabbix_test-us-east-2-node2-2017.10.29/jmx/_search
{
"query": {
"bool": {
"must": {
"match": {
"metric_path" : "zabbix_test-us-east-2-node2.Memory.NonHeapMemoryUsage.used"
}
}
}
}
}
Can anyone please tell me what wrong I am doing here?
Thanks.

You didn't mention anything about mappings so I suppose you're using dynamic mapping - you've just indexed documents like these two in your elasticsearch.
Once you visit
{yourhost}/zabbix_test-us-east-2-node2-2017.10.29/_mapping
you will see that metric_path field probably has type text which is default for strings. As documentation states:
A field to index full-text values, such as the body of an email or the description of a product. These fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed
So your field is processed by analyzer and finally you're not executing match against something like this: zabbix_test-us-east-2-node2.ClientRequest.ReadLatency.Count but rather against some analyzed form, probably split by periods, and some other special characters.
So if you want to perform filtering like you posted, you should statically define your index before indexing any documents. You don't have to do it for each property, but at least metric_path should be defined as keyword. So you can start with:
PUT {yourhost}/zabbix_test-us-east-2-node2-2017.10.29
{
"mappings": {
"jmx": {
"properties": {
"metric_path": {
"type": "keyword"
}
}
}
}
}
Then you should index your documents. Mapping for other fields will be established by ES dynamically, but both queries attached by you will return exactly one result - just as you expect.

Related

Username search in Elasticsearch

I want to implement a simple username search within Elasticsearch. I don't want weighted username searches yet, so I would expect it wouldn't be to hard to find resources on how do this. But in the end, I came across NGrams and lot of outdated Elasticsearch tutorials and I completely lost track on the best practice on how to do this.
This is now my setup, but it is really bad because it matches so much unrelated usernames:
{
"settings": {
"index" : {
"max_ngram_diff": "11"
},
"analysis": {
"analyzer": {
"username_analyzer": {
"tokenizer": "username_tokenizer",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"username_tokenizer": {
"type": "ngram",
"min_gram": "1",
"max_gram": "12"
}
}
}
},
"mappings": {
"properties": {
"_all" : { "enabled" : false },
"username": {
"type": "text",
"analyzer": "username_analyzer"
}
}
}
}
I am using the newest Elasticsearch and I just want to query similar/exact usernames. I have a user db and users should be able to search for eachother, nothing to fancy.
If you want to search for exact usernames, then you can use the term query
Term query returns documents that contain an exact term in a provided field. If you have not defined any explicit index mapping, then you need to add .keyword to the field. This uses the keyword analyzer instead of the standard analyzer.
There is no need to use an n-gram tokenizer if you want to search for the exact term.
Adding a working example with index data, index mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"username": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Index Data:
{
"username": "Jack"
}
{
"username": "John"
}
Search Query:
{
"query": {
"term": {
"username.keyword": "Jack"
}
}
}
Search Result:
"hits": [
{
"_index": "68844541",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"username": "Jack"
}
}
]
Edit 1:
To match for similar terms, you can use the fuzziness parameter along with the match query
{
"query": {
"match": {
"username": {
"query": "someting",
"fuzziness":"auto"
}
}
}
}
Search Result will be
"hits": [
{
"_index": "68844541",
"_type": "_doc",
"_id": "3",
"_score": 0.6065038,
"_source": {
"username": "something"
}
}
]

How to add fuzziness to search as you type field in Elasticsearch?

I've been trying to add some fuzziness to my search as you type field type on Elasticsearch, but never got the needed query. Anyone have any idea to implement this?
Fuzzy Query returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.
The fuzziness parameter can be specified as:
AUTO -- It generates an edit distance based on the length of the term.
For lengths:
0..2 -- must match exactly
3..5 -- one edit allowed Greater than 5 -- two edits allowed
Adding working example with index data and search query.
Index Data:
{
"title":"product"
}
{
"title":"prodct"
}
Search Query:
{
"query": {
"fuzzy": {
"title": {
"value": "prodc",
"fuzziness":2,
"transpositions":true,
"boost": 5
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 2.0794415,
"_source": {
"title": "product"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 2.0794415,
"_source": {
"title": "produt"
}
}
]
Refer these blogs to get a detailed explaination on fuzzy query
https://www.elastic.co/blog/found-fuzzy-search
https://qbox.io/blog/elasticsearch-optimization-fuzziness-performance
Update 1:
Refer this ES official documentation
The fuzziness , prefix_length , max_expansions , rewrite , and
fuzzy_transpositions parameters are supported for the terms that are
used to construct term queries, but do not have an effect on the
prefix query constructed from the final term.
There are some open issues and discuss links that states that - Fuzziness not work with bool_prefix multi_match (search-as-you-type)
https://github.com/elastic/elasticsearch/issues/56229
https://discuss.elastic.co/t/fuzziness-not-work-with-bool-prefix-multi-match-search-as-you-type/229602/3
I know this question is asked long ago but I think this worked for me.
Since Elasticsearch allows a single field to be declared with multiple data types, my mapping is like below.
PUT products
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"product_type": {
"type": "search_as_you_type"
}
}
}
}
}
}
After adding some data to the index I fetched like this.
GET products/_search
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "prodc",
"type": "bool_prefix",
"fields": [
"title.product_type",
"title.product_type._2gram",
"title.product_type._3gram"
]
}
},
{
"multi_match": {
"query": "prodc",
"fuzziness": 2
}
}
]
}
}
}

Elasticsearch - pass fuzziness parameter in query_string

I have a fuzzy query with customized AUTO:10,20 fuzziness value.
{
"query": {
"match": {
"name": {
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
}
How to convert it to a query_string query? I tried nike~AUTO:10,20 but it is not working.
It's possible with query_strng as well, let me show using the same example as OP provided, both match_query provided by OP matches and query_string fetches the same document with same score.
And according to this and this ES docs, Elasticsearch supports AUTO:10,20 format, which is shown in my example as well.
Also
Index mapping
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
}
}
Index some doc
{
"name" : "nike"
}
Search query using match with fuzziness
{
"query": {
"match": {
"name": {
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
}
And result
"hits": [
{
"_index": "so-query",
"_type": "_doc",
"_id": "1",
"_score": 0.9808292,
"_source": {
"name": "nike"
}
}
]
Query_string with fuzziness
{
"query": {
"query_string": {
"fields": ["name"],
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
And result
"hits": [
{
"_index": "so-query",
"_type": "_doc",
"_id": "1",
"_score": 0.9808292,
"_source": {
"name": "nike"
}
}
]
Lucene syntax only allows you to specify "fuzziness" with the tilde symbol "~", optionally followed by 0, 1 or 2 to indicate the edit distance.
Elasticsearch Query DSL supports a configurable special value for AUTO which then is used to build the proper Lucene query.
You would need to implement that logic on your application side, by evaluating the desired edit distance based on the length of your search term and then use <searchTerm>~<editDistance> in your query_string-query.

Elasticsearch search on _source fields are not working

I have a document in the elastic-search 5, as follow
{
"_index": "my_index",
"_type": "json",
"_id": "document_id",
"_score": 1,
"_source": {
"message": "{\"the_id\": \"custom_id\", \"more\": \"Data\"}",
"type": "json",
"the_id": "custom_id",
"#timestamp": "2017-04-03T13:31:39.995Z",
"port": 48038,
"#version": "1",
"host": "127.0.0.1"
}
}
When I am querying using Kibana console for _id as follow, It is working fine and getting the record
GET _search
{
"query": {
"bool": {
"filter": [
{ "term": { "_id": "document_id" }}
]
}
}
}
But if I am querying for _source level field, in this case the_id, not getting any result.
GET _search
{
"query": {
"bool": {
"filter": [
{ "term": { "the_id": "custom_id" }}
]
}
}
}
How can I make sure I always able to query to _source level.
As default mapping used in this case, elasticsearch creates multi-fields( the_id and the_id.keyword) for your the_id field. Here the_id will be created with text type mapping and the_id.keyword will be created with keyword type mapping.
As term queries match against exact value of the field, you have to provide the_id.keyword in your query.
To read more about it read the section Why doesn’t the term query match my document? in official docs here

Elasticsearch Multi-Field With 'Raw' Value Not Being Created

I'm attempting to add an un-analyzed version of an analyzed field, as a 'raw' multi-field, as per the ElasticSearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/multi-fields.html
This seems to be a common, well-supported pattern.
I've created the following index / field :
{
"person": {
"aliases": {},
"mappings": {
"employee": {
"properties": {
"userName": {
"type": "string",
"analyzer": "autocomplete",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
If I query the index directly, i.e. GET /person, I see the mapping as I've posted above, so I'm confident that there wasn't a syntax error, etc.
However, when we're pushing data into the index, a userName.raw field is not being created.
{
"_index": "person",
"_type": "employee",
"_id": "2",
"_version": 1,
"found": true,
"_source": {
"username": "Test Value"
}
}
Anyone see something I'm missing?
Thanks!
EDIT:
This was a novice mistake when creating my index.
PUT person
{
"person": {
"aliases": {},
"mappings": {
"employee": {
"properties": {
"email": {
Notice the person key is being PUT in the 'person' index. This was creating a nested person.
Correct syntax is to remove the extra "person"
PUT person
{
"aliases": {},
"mappings": {
"employee": {
"properties": {
"email": {
Please see Linoy.M.K's answer, as he is correct.
The 'raw' field will not appear when retrieving a record by ID. Its only useful as part of a query.
Adding multiple analyzers will not modify your source document means your source document will always have username only not username.raw
Added analyzers are useful when you do searching, means you can now search with username and username.raw to achieve different behavior like below.
GET /person/employee/_search
{
"query": {
"match": {
"username": "Te"
}
}
}
GET /person/employee/_search
{
"query": {
"match": {
"username.raw": "Test Value"
}
}
}

Resources