Cannot search Elastic Search with string containing "/0" - elasticsearch

I'm using Elastic Search to search for my ID. My ID somehow looks like 21.11101/0000-0000-9B71-2. However, when I use it in my ES, it will throw an error saying that:
Failed to parse query [21.11101/0000-0000-9B71-2]"
Encountered: EOF after : \"/0000-0000-9B71-2\"
I think this happens because ES thinks that /0 is an EOF character. How to tell ES to treat my search query as a normal text? Here is my query:
GET _search
{
"query": {
"query_string": {
"query": "21.11101/0000-0000-9B71-2"
}
}
}
And here is my _mapping:
"PID": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
I want to use Prefix Query, so basically, my query will look like this:
GET _search
{
"query": {
"prefix": {
"ID" : "21.11101/00"
}
}
}
But that query returns 0 hit. I guess because of the error is above. So, how can I solve this problem?
Thanks for your help.

Let's break it down a little bit: what your mapping says is:
"PID": { "type": "text", ... which means the content of PID field will be text with default analyzer, which standard. It will tokenize your input text by white-spaces and some word-delimiters, like -(hyphen), /(forward slash) and some other.
... "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } will prompt ES to keep additional version of your PID field as a keyword, which will be accessible via alias PID.keyword. The important limitation of multi-fields is that you can't perform search against it. You can only make sorting and aggregation.
Taking that into account my suggestion is as simple as specifying non-standard analyzer which won't tokenize input text (it is keyword analyzer):
curl -XPUT http://localhost:9200/my_pids -d '
{
"mappings": {
"pid": {
"properties": {
"PID": {
"type": "text",
"analyzer": "keyword",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}'
If you won't use the sorting / aggregation by your PID.keyword field - feel free to drop the fields section as well.
Hope it helps :)

Related

Is it possible to declare "text" and "keyword" on the same field in elasticsearch?

I want to be able to both search on a field and aggregate it. Is there any option to declare a field both as text and keyword?
I've read a little bit about multifields. Is that a possible solution?
By default, Elasticsearch will map text values to a text field, with a keyword sub-field. Your mapping would look like this:
"mappings": {
"properties": {
"newfield": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
As a consequence, it will both be possible to perform full-text search on newfield and keyword search and aggregations using the newfield.keyword field.
Relevant documentation: Dynamic field mapping

Elasticsearch match string with spaces, columns, dashes exactly

I'm using Elasticsearch 6.8, and trying to write a query in python notebook. Here is a mapping used for the index i'm working with:
{ "mapping": { "news": { "properties": { "dateCreated": { "type": "date", "format": "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis" }, "itemId": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "market": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "timeWindow": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } }
I'm trying to search for exact string like "[2020-08-16 10:00:00.0,2020-08-16 11:00:00.0]" in "timeWindow" field (which is a "text" type, not a "date" field), and also select by market="en-us" (market is a "text" field too). This string has spaces,colons,commas, a lot of whitecharacters, and I don't know how to make a right query.
At the moment I have this query:
res = es.search(index='my_index',
doc_type='news',
body={
'size': size,
'query':{
"bool":{
"must":[{
"simple_query_string": {
"query": "[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]",
"default_operator": "and",
"minimum_should_match":"100%"
}
},
{"match":{"market":"en-us"}}
]
}
}
})
The problem is that is doesn't match my "simple_query_string" for timeWindow string exactly (I understand that this string gets tokenized, splitted into parts like "2020","08","17","00","01", etc, and each token is analyzed separately), and I'm getting different values for timeWindow that I want to exclude, like
['[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]'
'[2020-08-17 00:05:00.0,2020-08-17 01:05:00.0]'
...
'[2020-08-17 00:50:00.0,2020-08-17 01:50:00.0]'
'[2020-08-17 00:55:00.0,2020-08-17 01:55:00.0]'
'[2020-08-17 01:00:00.0,2020-08-17 02:00:00.0]']
Is there a way to do what I want?
UPD (and answer):
My current query uses "term" and "timeWindow.keyword", this combination allows me to do exact search for string with spaces and other whitecharacters:
res = es.search(index='msn_click_events', doc_type='news', body={
'size': size,
'query':{
"bool":{
"must":[{
"term": {
"timeWindow.keyword": tw
}
},
{"match":{"market":"en-us"}}
]
}
}
})
And this query selects only right timewindows values (string):
['[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]'
'[2020-08-17 01:00:00.0,2020-08-17 02:00:00.0]'
'[2020-08-17 02:00:00.0,2020-08-17 03:00:00.0]'
...
'[2020-08-17 22:00:00.0,2020-08-17 23:00:00.0]'
'[2020-08-17 23:00:00.0,2020-08-18 00:00:00.0]']
On your timeWindow field you need a keyword aka exact search but you are using the full-text query and as you defined this field as text field and you already guessed it correct, it gets analyzed during the index time, hence you are not getting the correct results.
If you are using the dynamic mapping, then .keyword field would be generated for each text field in the mapping, so you can simply use timeWindow.keyword in your query and it will work.
If you have defined your mapping than you need to add the keyword field to store the timewindow, reindex the data and use that keyword field in query to get the expected results.

How to set elasticsearch index mapping as not_analysed for all the fields

I want my elasticsearch index to match the exact value for all the fields. How do I map my index to "not_analysed" for all the fields.
I'd suggest making use of multi-fields in your mapping (which would be default behavior if you aren't creating mapping (dynamic mapping)).
That way you can switch to traditional search and exact match searches when required.
Note that for exact matches, you would need to have keyword datatype + Term Query. Sample examples are provided in the links I've specified.
Hope it helps!
You can use dynamic_templates mapping for this. As a default, Elasticsearch is making the fields type as text and index: true like below:
{
"products2": {
"mappings": {
"product": {
"properties": {
"color": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
As you see, also it creates a keyword field as multi-field. This keyword fields indexed but not analyzed like text. if you want to drop this default behaviour. You can use below configuration for the index while creating it :
PUT products
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"product": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"index": false
}
}
}
]
}
}
}
After doing this the index will be like below :
{
"products": {
"mappings": {
"product": {
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"index": false
}
}
}
],
"properties": {
"color": {
"type": "keyword",
"index": false
},
"type": {
"type": "keyword",
"index": false
}
}
}
}
}
}
Note: I don't know the case but you can use the multi-field feature as mentioned by #Kamal. Otherwise, you can not search on the not analyzed fields. Also, you can use the dynamic_templates mapping set some fields are analyzed.
Please check the documentation for more information :
https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic-templates.html
Also, I was explained the behaviour in this article. Sorry about that but it is Turkish. You can check the example code samples with google translate if you want.

Avoid creating dual mappings from logstash

I notice that logstash creates an extra "keyword" field in the index mapping for every string field that it extracts from the log files and sends to elastic search.
There are many fields that I've removed completely with the prune plugin, but there are other fields that I don't want to remove completely, but I also don't need to have a *.keyword for them.
Is there a way to have logstash only create *.keyword fields for some fields and not others? Specifically, is there a way for logstash to have a whitelist of fields that it is OK to create *.keywords for, and not do it for anything else?
(using elasticsearch 6.x)
I think you need to change the mapping of the desired fields. The mapping page shows the default text type mapping:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/breaking_50_mapping_changes.html
I tried to set a field without a keyword field and it worked except you couldn't agregate on that field (I tried terms aggregation) even if you set index: true in the mapping. I might have missed something but I think this is where you should start.
The solution I'm working with for now is a dynamic templates.
I can map some fields to just text and others to text and a keyword. For example:
{
"mappings": {
"doc": {
"dynamic_templates": [
{
"match_my_custom_fields": {
"match_mapping_type": "string",
"match": "custom_prefix_*",
"mapping": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "keyword",
"ignore_above": 256
}
}
}
],
"properties": {
"geoip": {
"dynamic": true,
"properties": {
"ip": {
"type": "ip"
},
"location": {
"type": "geo_point"
},
"latitude": {
"type": "half_float"
},
"longitude": {
"type": "half_float"
}
}
}
}
}
}
This way, everything beginning with custom_prefix_ will have a text and keyword field, and everything else will just have a keyword.
Of course, I somehow broke the geoip.geo_point that was being emitted from the geoip logstash plugin, and now my map visualizations won't work, so I need to figure out how to restore that.
EDIT: Got geo_point working again, see the "geoip" prop

ElasticSearch 5.1 Fielddata is disabled in text field by default [ERROR: trying to use aggregation on field]

Having this field in my mapping
"answer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
i try to execute this aggregation
"aggs": {
"answer": {
"terms": {
"field": "answer"
}
},
but i get this error
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [answer] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."
Do i have to change my mapping or am i using the wrong aggregation ? (just updated from 2.x to 5.1)
You need to aggregate on the keyword sub-field, like this:
"aggs": {
"answer": {
"terms": {
"field": "answer.keyword"
}
},
That will work.
In Aggregation, just add keyword to answer.It worked for me. For text fields we need to add keyword.
"field": "answer.keyword"
Adding to #Val's answer, you can also set the fielddata to true during your mapping itself:
"answer": {
"type": "text",
"fielddata": true, <-- add this line
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},

Resources