ElasticSearch - Upgraded indexed field to a Multi Field - new field is empty - elasticsearch

Having noticed my sort on an indexed string field doesn't work properly, I've discovered that it's sorting analyzed strings so "bags of words" and if I want it to work properly I have to sort on the non-analyzed string. My plan was to just change the string field to a multi-field, using information I found in those two articles:
https://www.elastic.co/blog/changing-mapping-with-zero-downtime (Upgrade to a multi-field part)
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html
Using Sense I've created this field mapping
PUT myindex/_mapping/type
{
"properties": {
"Title": {
"type": "string",
"fields": {
"Raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
And then I try to sort my search results using the newly made field. I've put all of the name variations I could think of after reading the articles:
POST myindex/_search
{
"_source" : ["Title","titlemap.Raw","titlemap.Title","titlemap.Title.Raw","Title.Title","Raw","Title.Raw"],
"size": 6,
"query": {
"multi_match": {
"query": "title",
"fields": ["Title^5"
],
"fuzziness": "auto",
"type": "best_fields"
}
},
"sort": {
"Title.Raw": "asc"
}
}
And that's what I get in response:
{
"_index": "myindex_2015_11_26_12_22_38",
"_type": "type",
"_id": "1205",
"_score": null,
"_source": {
"Title": "The title of the item"
},
"sort": [
null
]
}
Only the Title field's value is shown in the response and the sort criterium is null for every result.
Am I doing something wrong, or is there another way to do that?

The index name is not the same after re-indexing and thus the default mapping gets installed... that's probably why.
I suggest using an index template instead, so you don't have to care when to create the index and ES will do it for you. The idea is to create a template with the proper mapping you need and then ES will create every new index whenever it deems necessary, add the myindex alias and apply the proper mapping to it.
curl -XPUT localhost:9200/_template/myindex_template -d '{
"template": "myindex_*",
"settings": {
"number_of_shards": 1
},
"aliases": {
"myindex": {}
},
"mappings": {
"type": {
"properties": {
"Title": {
"type": "string",
"fields": {
"Raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
Then whenever you launch your re-indexing process a new index with a new name will be created BUT with the proper mapping and the proper alias.

Related

How to make Elasticsearch aggregation only create 1 bucket?

I have an Elasticsearch index which contains a field called "host". I'm trying to send a query to Elasticsearch to get a list of all the unique values of host in the index. This is currently as close as I can get:
{
"size": 0,
"aggs": {
"hosts": {
"terms": {"field": "host"}
}
}
}
Which returns:
"buckets": [
{
"key": "04",
"doc_count": 201
},
{
"key": "cyn",
"doc_count": 201
},
{
"key": "pc",
"doc_count": 201
}
]
However the actual name of the host is 04-cyn-pc. My understanding is that it is spliting them up into keywords so I try something like this:
{
"properties": {
"host": {
"type": "text",
"fields": {
"raw": {
"type": "text",
"analyzer": "keyword",
"fielddata": true
}
}
}
}
}
But it returns illegal_argument_exception "reason": "Mapper for [host.raw] conflicts with existing mapping in other types:\n[mapper [host.raw] has different [index] values, mapper [host.raw] has different [analyzer]]"
As you can probably tell i'm very new to Elasticsearch and any help or direction would be awesome, thanks!
Try this instead:
{
"properties": {
"host": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
}
Elastic automatically indexes string fields as text and keyword type if you do not specify a mapping. In your example if you do not want your field to be analyzed for full text search, you should just define that fields' type as keyword. So you can get rid of burden of analyzed text field. With the mapping below you can easily solve your problem without changing your agg query.
"properties": {
"host": {
"type": "keyword"
}
}

Elasticsearch Multi-Field With 'Raw' Value Not Being Created

I'm attempting to add an un-analyzed version of an analyzed field, as a 'raw' multi-field, as per the ElasticSearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/2.4/multi-fields.html
This seems to be a common, well-supported pattern.
I've created the following index / field :
{
"person": {
"aliases": {},
"mappings": {
"employee": {
"properties": {
"userName": {
"type": "string",
"analyzer": "autocomplete",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
If I query the index directly, i.e. GET /person, I see the mapping as I've posted above, so I'm confident that there wasn't a syntax error, etc.
However, when we're pushing data into the index, a userName.raw field is not being created.
{
"_index": "person",
"_type": "employee",
"_id": "2",
"_version": 1,
"found": true,
"_source": {
"username": "Test Value"
}
}
Anyone see something I'm missing?
Thanks!
EDIT:
This was a novice mistake when creating my index.
PUT person
{
"person": {
"aliases": {},
"mappings": {
"employee": {
"properties": {
"email": {
Notice the person key is being PUT in the 'person' index. This was creating a nested person.
Correct syntax is to remove the extra "person"
PUT person
{
"aliases": {},
"mappings": {
"employee": {
"properties": {
"email": {
Please see Linoy.M.K's answer, as he is correct.
The 'raw' field will not appear when retrieving a record by ID. Its only useful as part of a query.
Adding multiple analyzers will not modify your source document means your source document will always have username only not username.raw
Added analyzers are useful when you do searching, means you can now search with username and username.raw to achieve different behavior like below.
GET /person/employee/_search
{
"query": {
"match": {
"username": "Te"
}
}
}
GET /person/employee/_search
{
"query": {
"match": {
"username.raw": "Test Value"
}
}
}

Dynamic Mapping for an object field that unwraps the parent path

I am evaluating whether ElasticSearch can meet the needs of a new system I'm building. It looks amazing, so I'm really hopeful I can figure out a mapping strategy that works.
In this system, administrators can define fields to be associated with documents dynamically. So a given type (in the elasticsearch sense of the word) can have any number of fields, which I do not know the name of ahead of time. And each field can be of any type: int, date, string, etc.
An example document may look like:
{
"name": "bob",
"age": 22,
"title": "Vice Intern",
"tagline": "Ask not what your company can do for you, but..."
}
Notice that there are 2 string fields. Awesome. My problem though is that I want the "tagline" to be analyzed, but I do not want "title" to be analyzed.
Remember I don't know the names of these fields ahead of time. And there could be multiple fields of each type. So there could be 10 string fields of various names, 3 of which should be analyzed and 7 of which should not.
Another requirement I have is that the name the administrator gives the field should also be what they can search by. So, for example, if they want to find all the Vice Interns who have something to say, the lucene query may be:
+title:"Vice Intern" +tagline:"company"
So my thought was that I could define a dynamic mapping. Since I don't know the names of the fields ahead of time, it seems like a great approach. The key though is coming up with a way of differentiating string fields that should be analyzed and ones that shouldn't be!
I thought, hey, I'll just put all the fields that need analyzing into a nested object, like this:
{
"name": "bob",
"age": 22,
"title": "Vice Intern",
"textfields": {
"tagline": "Ask not what your company can do for you, but...",
"somethingelse": "lorem ipsum",
}
}
Then, in my dynamic mapping, I have a way of mapping those fields differently:
{
"mytype": {
"dynamic_templates": {
"nested_textfields": {
"match": "textfields",
"match_mapping_type": "string",
"mapping": {
"index": "analyzed",
"analyzer": "default"
}
}
}
}
}
I know that isn't right, I actually need some kind of nested mapping, but no matter, because if I understand it correctly, even if I got that working, it would mean those fields are searched for (via lucene syntax) like this:
+title:"Vice Intern" +textfields.tagline:"company"
And I don't want the "textfields" prefix. Since I'm the one providing the textfields object that wraps the text fields, I know that the fields within it are still uniquely named across the entire document.
I thought of using a pattern match instead. So instead of wrapping them in a "textfields" object, I could prefix them, like "textfield_tagline". But when doing that, the {name} token in the dynamic mapping includes the prefix, I don't see a way to just pull out the "*" portion.
Any solution which gets me the necessary behavior is a correct answer. Even if that involves nested mapping information into the documents themselves (can you do that? I've seen something like that, I think...).
EDIT:
I've attempted the following dynamic template. I'm trying to use index_name to remove the 'textfields.' in the index. This dynamic template just doesn't seem to match though, because after putting a document and looking at the mapping I see no analyzer specified.
{
"mytype" : {
"dynamic_templates":
[
{
"textfields": {
"path_match": "textfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "default",
"index_name": "{name}",
"fields": {
"sort": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}_sort"
}
}
}
}
}
]
}
}
I was able to duplicate the results that you asked for specifically with the following index creation (with mappings), document, and search query. The type does vary a bit, but it serves the purpose of the example.
Index Settings
PUT http://localhost:9200/sandbox
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"mytype": {
"dynamic_templates": [
{
"indexedfields": {
"path_match": "indexedfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "analyzed",
"analyzer": "default",
"index_name": "{name}",
"fields": {
"sort": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}_sort"
}
}
}
}
},
{
"textfields": {
"path_match": "textfields.*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "not_analyzed",
"index_name": "{name}"
}
}
},
{
"strings": {
"path_match": "*",
"match_mapping_type" : "string",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
]
}
}
}
Document
PUT http://localhost:9200/sandbox/mytype/1
{
"indexedfields":{
"hello":"Hello world",
"message":"The great balls of the world are on fire"
},
"textfields":{
"username":"User Name",
"projectname":"Project Name"
}
}
Search
POST http://localhost:9200/sandbox/mytype/_search
{
"query": {
"query_string": {
"query": "message:\"great balls\""
}
},
"filter":{
"query":{
"query_string":{
"query":"username:\"User Name\""
}
}
},
"from":0,
"size":10,
"sort":[
]
}
The search returns the following response:
{
"took":2,
"timed_out":false,
"_shards":{
"total":1,
"successful":1,
"failed":0
},
"hits":{
"total":1,
"max_score":0.19178301,
"hits":[
{
"_index":"sandbox",
"_type":"mytype",
"_id":"1",
"_score":0.19178301,
"_source":{
"indexedfields":{
"hello":"Hello world",
"message":"The great balls of the world are on fire"
},
"textfields":{
"username":"User Name",
"projectname":"Project Name"
}
}
}
]
}
}

Saving variable types under a single key in elasticsearch?

I have bunch of documents coming in from fluentd and I'm saving then to elasticsearch with fluent-plugin-elasticsearch.
Some of those documents have a string under the key name and some have an object.
Example
{
"name": "foo"
}
and
{
"name": {
"en": "foo",
"fi": "bar"
}
}
These documents are the same type in terms of my application and they are saved to same elasticsearch index.
But elasticsearch has an issue with this. When the second document is saved to elasticsearch it throws this error:
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [name]
This seems to happen because elasticsearch has set the key name to be type of string. I can see this using curl http://localhost:9200/fluentd-[tagname]/_mapping and it obviously doesn't like it when I try save an object to it afterwards.
So is there any way to workaround this in elasticsearch?
I cannot control the incoming documents and there are multiple keys with variable types - not just name. So I cannot make a single hack for that key only.
This is pretty annoying since those documents are completely left out of elasticsearch and sent to /dev/null.
If this is completely impossible - is possible to at least save those documents to a file or something so I wouldn't lose them?
Here's my template for the fluentd-* indices:
{
"fluentd_template": {
"template": "fluentd-*",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"index": {
"query": {
"default_field": "msg"
},
"analysis" : {
"analyzer" : {
"default" : {
"type" : "keyword"
}
}
}
}
},
"mappings": {
"_default_": {
"_all": {
"enabled": false
},
"_source": {
"compress": true
},
"properties": {
"#timestamp": {
"type": "date",
"index": "not_analyzed"
}
}
}
}
}
}

Elasticsearch: sorting Spanish double names alphabetically

I am doing an Elasticsearch query and I want the results ordered alphabetically by last name. My problem: the last names are all Spanish double names, and ES doesn't order them the way I would like it.
I would prefer the order to be:
Batres Rivera
Batrín Chojoj
Fion Morales
Lopez Giron
Martinez Castellanos
Milán Casanova
This is my query:
{
"query": {
"match_all": {}
},
"sort": [
{
"Last Name": {
"order": "asc"
}
}
]
}
The order that I get with this is:
Batres Rivera
Batrín Chojoj
Milán Casanova
Martinez Castellanos
Fion Morales
Lopez Giron
So it is not sorting by the first string, but by either of both (Batres, Batrín, Casanova, Castellanos, Fion, Giron).
If I try additionally
{
"order": "asc",
"mode": "max"
}
then I get:
Batrín Chojoj
Lopez Giron
Martinez Castellanos
Milán Casanova
Fion Morales
Batres Rivera
All the fields are indexed by default, I checked with
curl -XGET localhost/my_index/_mapping
and I get back
my_index: {
my_type: {
properties: {
FirstName: {
type: string
}LastName: {
type: string
}MiddleName: {
type: string
}
...
}
}
}
Does anyone know how to make the results to be ordered to be ordered alphabetically by the beginning string of the last name?
Thanks!
The problem is that your LastName field is analyzed, so the string Batres Rivera is indexed as a multi-value field with two terms: batres and rivera. But this isn't like an ordered array, it's more like a "bag of values". So when you try to sort on the field, it chooses one of the terms (the min or max) and sorts on that.
What you need to do is to store the LastName as a single term (Batres Rivera) for sorting purposes, by mapping the field as
{ "type": "string", "index": "not_analyzed"}
Obviously you can't then use that field for search purposes: you wouldn't be able to search for rivera and match on that field.
The way to support both searching and sorting is to use multi-fields: ie index the same value in two ways, one for searching and one for sorting.
In 0.90.* the syntax for multi-fields is:
curl -XPUT "http://localhost:9200/my_index" -d'
{
"mappings": {
"my_type": {
"properties": {
"LastName": {
"type": "multi_field",
"fields": {
"LastName": {
"type": "string"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
In 1.0.* the multi_field type has been removed and now any core field type supports sub-fields as follows:
curl -XPUT "http://localhost:9200/my_index" -d'
{
"mappings": {
"my_type": {
"properties": {
"LastName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}'
So you can use the LastName field for searching, and the LastName.raw field for sorting:
curl -XGET "http://localhost:9200/my_index/my_type/_search" -d'
{
"query": {
"match": {
"LastName": "rivera"
}
},
"sort": "LastName.raw"
}'
Language specific sorting
You should also look at using the ICU analysis plugin to sort using the Spanish sort order (or collation). This is a bit more complex but is worth using:
curl -XPUT "http://localhost:9200/my_index" -d'
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"type": "custom",
"tokenizer": "icu_tokenizer",
"filter": [
"icu_folding"
]
},
"es_sorting": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"spanish"
]
}
},
"filter": {
"spanish": {
"type": "icu_collation",
"language": "es"
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"LastName": {
"type": "string",
"analyzer": "folding",
"fields": {
"raw": {
"type": "string",
"analyzer": "es_sorting"
}
}
}
}
}
}
}'
We create a folding analyzer which we'll use for the LastName field, which will analyze a string like Muñoz Rivera into the two terms munoz (without the ~) and rivera. So a user can search for munoz or muñoz and either will match.
Then we create the es_sorting analyzer which indexes the proper sort order for muñoz rivera (lowercased) in Spanish.
Searching would be done in the same way:
curl -XGET "http://localhost:9200/my_index/my_type/_search" -d'
{
"query": {
"match": {
"LastName": "rivera"
}
},
"sort": "LastName.raw"
}'
We need to know how you are indexing the name.
Please check this discussion link.
http://elasticsearch-users.115913.n3.nabble.com/Is-there-a-way-to-search-terms-lower-cased-td932996.html
This should be very helpful for your case. This depends on your mapping settings. What analyzer you use for the name field.
Need your mapping definition to decide on a proper solution.

Resources