Wildcard query over _all field on Elasticsearch - elasticsearch

I'm trying to perform wildcard queries over the _all field. An example query could be:
GET index/type/_search
{
"from" : 0,
"size" : 1000,
"query" : {
"bool" : {
"must" : {
"wildcard" : {
"_all" : "*tito*"
}
}
}
}
}
The thing is that to use a wildcard query the _all field needs to be not_analyzed, otherwise the query won't work. See ES documentation for more info.
I tried to set the mappings over the _all field using this request:
PUT index
{
"mappings": {
"type": {
"_all" : {
"enabled" : true,
"index_analyzer": "not_analyzed",
"search_analyzer": "not_analyzed"
},
"_timestamp": {
"enabled": "true"
},
"properties": {
"someProp": {
"type": "date"
}
}
}
}
}
But I'm getting the error analyzer [not_analyzed] not found for field [_all].
I want to know what I'm doing wrong and if there is another (better) way to perform this kind of queries.
Thanks.-

Have you tried removing:
"search_analyzer": "not_analyzed"
Also, I wonder how well a wildcard across all properties will scale. Have you looked into NGrams? See the docs here.

Most probably you wanted to give option
"index": "not_analyzed"
Index attribute for a string field, _all is a string field, determines if that field should be analyzed or not.
"search_analyzer" is to set to determine which analyzer should be used for user entered query, which is valid if index attribute is set to analyzed.
"index_analyzer" is to set to determine which analyzer should be used for documents, again which is valid if index attribute is set to analyzed.

Related

How to preserve original term during transliteration in Elasticsearch with ICU plugin?

I'm using the folowing ICU transform filter to peform transliteration
"transliterate": {
"type": "icu_transform",
"id": "Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC"
}
Current problem is that this filter replace the original term in index so search in native language is not possible with term query like this
{
"terms" : {
"field" : [
"term"
],
"boost" : 1.0
}
}
Is there any way to make icu_transform filter produce 2 terms original one and transliterated one?
If no i think the optimal solution will be maping with copy to another field and analyzer for this field without transliterate filter. Can you suggest smth more efficient?
I'm using Elasticsearch 5.6.4
Multi-fields allow you to index the same source value to different fields in different ways. You can index to a field with the standard analyzer and to another field with an analyzer that applies the ICU transform filter. For example,
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"my_field": {
"type": "text",
"fields": {
"latin": {
"type": "text",
"analyzer": "latin"
}
}
}
}
}
}
}
Then you can query the my_field or my_field.latin field.

How to add analyzer at query level in elasticsearch?

I need to remove the stop words from query in elasticsearch. I am able to apply analyzer at index level but let me know how to apply analyzer at query or search level in elasticsearch.
you have to configure your elasticsearch mappings to add search_analyzers to fields you want to analyze query time.
like
{
"service" :{
"_source" : {"enabled" : true },
"properties":{
"name" : {"type" : "string", "index" : "not_analyzed"},
"name_snow": { "type": "string", "search_analyzer": "simple_analyzer", "index_analyzer": "snowball_analyzer" }
}
}
}
when you will query on this field, the terms entered will be analyzed first than queries in the shard.

How to search with keyword analyzer?

I have keyword analyzer as default analyzer, like so:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"default": {
"type": "keyword"
}}}}}}
```
But now I can't search anything. e.g:
{
"query": {
"query_string": {
"query": "cast"
}}}
Gives me 0 results all though "cast" is a common value i the indexed documents. (http://gist.github.com/baelter/b0720a52ee5a27e27d3a)
Search for "*" works fine btw.
I only have explicit defaults in my mapping:
{
"oceanography_point": {
"_all" : {
"enabled" : true
},
"properties" : {}
}
}
The index behaves as if no fields are included in _all, because field:value queries works fine.
Am I misusing the keyword analyzer?
Using keyword analyzer , you can only do an exact string match.
Lets assume that you have used keyword analyzer and no filters.
In that case for as string indexed as "Cast away in forest" , neither search for "cast" or "away" will work. You need to do an exact "Cast away in forest" string to match it. ( Assuming no lowercase filter used , you need to give the right case too)
A better approach would be to use multi fields to declare one copy as keyword analyzed and other one normal.
You can search on one of this field and aggregate on the other.
Okey, some 15h of trial and error I can conclude that this works for search:
{
"settings": {
"index": {
"analysis": {
"tokenizer": {
"default": {
"type": "keyword"
}}}}}}
How ever this breaks faceting so I ended up using a dynamic template instead:
"dynamic_templates" : [
{
"strings_not_analyzed" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
],

Elasticsearch: filter for a substring in the value of a document field?

I am new to Elasticsearch. I have the following mapping for a string field:
"ipAddress": {
"type": "string",
"store": "no",
"index": "not_analyzed",
"omit_norms": "true",
"include_in_all": false
}
A document with value in the ipAddress field looks like:
"ipAddress": "123.3.4.12 134.4.5.6"
Notice that in the above there are two IP addresses, separated by a blank.
Now I need to filter documents based on this field. This is an example filter value
123.3.4.12
And the filter value is always a single IP address as shown above.
I look at the filters at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filters.html
and I cannot seem to be able to find right filter for this. I tried the term filter,
{
"query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter": {
"term" : { "ipAddress" : "123.3.4.12" }
}
}
}
}
but it seems that it returns a document only when the filter value 100% matches the value of a document's field.
Can anyone help me out on this?
Update:
Based on John Petrone's suggestion, I got it working by defining a whitespace tokenizer based analyzer as follows:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"blank_sep_analyzer": {
"tokenizer": "whitespace"
}
}
}
}
},
"mappings": {
"ipAddress": {
"type": "string",
"store": "no",
"index": "analyzed",
"analyzer": "blank_sep_analyzer",
"omit_norms": "true",
"include_in_all": false
}
}
}
The problem is that the field is not analyzed, so if you have 2 IP addresses in it the term is actually the full field, e.g. "123.3.4.12 134.4.5.6".
I'd suggest a different approach - if you are always going to have lists of IP addresses separated by spaces consider using the whitespace tokenizer to create tokens as whitespaces - should create several tokens that the IP address will then match:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-whitespace-tokenizer.html
Another approach could be storing the IP addresses as an array. And then the current mappings would work. You would just have to separate the IP addresses when indexing the document.

Specify which fields are indexed in ElasticSearch

I have a document with a number of fields that I never query on so I would like to turn indexing on those fields off to save resources. I believe I need to disable the _all field, but how do I specify which fields are indexed then?
By default all the fields are indexed within the _all special field as well, which provides the so called catchall feature out of the box. However, you can specify for each field in your mapping whether you want to add it to the _all field or not, through the include_in_all option:
"person" : {
"properties" : {
"name" : {
"type" : "string", "store" : "yes", "include_in_all" : false
}
}
}
The above example disables the default behaviour for the name field, which won't be part of the _all field.
Otherwise, if you don't need the _all field at all for a specific type you can disable it like this, again in your mapping:
"person" : {
"_all" : {"enabled" : false},
"properties" : {
"name" : {
"type" : "string", "store" : "yes"
}
}
}
When you disable it your fields will still be indexed separately, but you won't have the catchall feature that _all provides. You will need then to query your specific fields instead of relying on the _all special field, that's it. In fact, when you query and don't specify a field, elasticsearch queries the _all field under the hood, unless you override the default field to query.
Each string field has index param in the mapping config, which defaults to analyzed. That means that besides the _all field each field is indexed solely.
And for the _all field it is said in reference that:
By default, it is enabled and all fields are included in it for ease of use.
So, to completely disable indexing for a field you have to specify (if the _all field is enabled):
"mappings": {
"your_mapping": {
"properties": {
"field_not_to_index": {
"type": "string",
"include_in_all": false,
"index": "no"
}
}
}
}
For the fields that should be queried on whether include them in the _all field (with "index": "no" to save resources) if you query through the _all field, or if you query on those fields solely use the index param with any positive value (analyzed or not_analyzed) and disable the _all field to save resources.
Following is an important doc page to understand the index settings in elastic search
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/mapping-intro.html
For your problem, ideally you should set the "index" flag to no in the field properties.
You can utilize enabled field to disable particular field or entire mapping.
ElasticSearch Doc
Disable Field mapping (i.e. session_data field)
{
"mappings": {
"_doc": {
"properties": {
"session_data": {
"enabled": false
}
}
}
}
}
Disable entire mapping
{
"mappings": {
"_doc": {
"enabled": false
}
}
}
Set dynamic index and _all index to false. Specify the required fields in mapping.
https://www.elastic.co/guide/en/elasticsearch/guide/current/dynamic-mapping.html
{
"mappings":{
"candidates":{
"_all":{
"enabled":false
},
"dynamic": "false",
"properties":{
"tags":{
"type":"text"
},
"derivedAttributes":{
"properties":{
"city":{
"type":"text"
},
"zip5":{
"type":"keyword"
}
}
}
}
}
}
}
_all has been deprecated since 6.0. Use below
"mappings": {
"dynamic":"false",
"properties": {
"field_to_index":{"index": true, "type": "text"}
}
According to es docs
Setting dynamic to false doesn’t alter the contents of the _source field at all. The _source will still contain the whole JSON document that you indexed. However, any unknown fields will not be added to the mapping and will not be searchable.

Resources