elasticsearch: how to define mapping with nested fields? - elasticsearch

I am going to define mapping with nested fields. according to this documentation, payload to /order-statistics/_mapping/order looks like:
{
"mappings" : {
"order": {
"properties" : {
"order_no" : {
"type" : "string"
},
"order_products" : {
"type" : "nested",
"properties" : {
"order_product_no" : {
"type" : "int"
},
"order_product_options" : {
"type" : "nested",
"properties" : {
"order_product_option_no" : {
"type" : "int"
}
}
}
}
}
}
}
}
}
I've already created the order-statistics index with a call to curl -XPUT 'localhost:9200/order-statistics' and I'm using predefined types such as int, string, double, But I get the following error and can't find what wrong with.
{
"error":{
"root_cause":[
{
"type":"mapper_parsing_exception",
"reason":"Root mapping definition has unsupported parameters: [mappings : {order={properties={order_no={type=string}, order_products={type=nested, properties={order_product_no={type=int}, order_product_options={type=nested, properties={order_product_option_no={type=int}}}}}}}}]"
}
],
"type":"mapper_parsing_exception",
"reason":"Root mapping definition has unsupported parameters: [mappings : {order={properties={order_no={type=string}, order_products={type=nested, properties={order_product_no={type=int}, order_product_options={type=nested, properties={order_product_option_no={type=int}}}}}}}}]"
},
"status":400
}
could someone explain why this not work?

You are using int as type for some fields which is not a valid type in either 2.x or 5.x. For integer values, please use integer or long depending on the values you want to store. For details, please see the docs on core mapping types.
Which version of elasticsearch are you using - 2.x or 5.x? If you are on 5.x already, you should go with keyword or text for your string fields instead of using just string which was the naming up to 2.x. But this is still only a warning.
Additionally, you should be aware of the implications when using nested instead of just object. Using a nested type is necessary if you store an array of objects and want to query for more than one property of such an object with the guarantee that only these documents match where one of the nested objects in the array matches all your conditions. But this comes at a cost, so consider using the simple object type, if this works for you. For more details, please see the docs on nested data type and especially the warning at the end.

Related

How to update data type of a field in elasticsearch

I am publishing a data to elasticsearch using fluentd. It has a field Data.CPU which is currently set to string. Index name is health_gateway
I have made some changes in python code which is generating the data so now this field Data.CPU has now become integer. But still elasticsearch is showing it as string. How can I update it data type.
I tried running below commands in kibana dev tools:
PUT health_gateway/doc/_mapping
{
"doc" : {
"properties" : {
"Data.CPU" : {"type" : "integer"}
}
}
}
But it gave me below error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
}
],
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
},
"status" : 400
}
There is also this document which says using mutate we can convert the data type but I am not able to understand it properly.
I do not want to delete the index and recreate as I have created a visualization based on this index and after deleting it will also be deleted. Can anyone please help in this.
The short answer is that you can't change the mapping of a field that already exists in a given index, as explained in the official docs.
The specific error you got is because you included /doc/ in your request path (you probably wanted /<index>/_mapping), but fixing this alone won't be sufficient.
Finally, I'm not sure you really have a dot in the field name there. Last I heard it wasn't possible to use dots in field names.
Nevertheless, there are several ways forward in your situation... here are a couple of them:
Use a scripted field
You can add a scripted field to the Kibana index-pattern. It's quick to implement, but has major performance implications. You can read more about them on the Elastic blog here (especially under the heading "Match a number and return that match").
Add a new multi-field
You could add a new multifield. The example below assumes that CPU is a nested field under Data, rather than really being called Data.CPU with a literal .:
PUT health_gateway/_mapping
{
"doc": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "keyword",
"fields": {
"int": {
"type": "short"
}
}
}
}
}
}
}
}
Reindex your data within ES
Use the Reindex API. Be sure to set the correct mapping on the target index.
Delete and reindex everything from source
If you are able to regenerate the data from source in a timely manner, without disrupting users, you can simply delete the index and reingest all your data with an updated mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
Using the below mapping, Data.CPU.raw will be of integer type
{
"mappings": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "string",
"fields": {
"raw": {
"type": "integer"
}
}
}
}
}
}
}
}
OR you can create a new index with correct index mapping, and reindex the data in it using the reindex API

Elasticsearch document aliases

I have multiple mappings which come from the same datasource but have small differences, like the example below.
{
"type_A" : {
"properties" : {
"name" : {
"type" : "string"
}
"meta_A" : {
"type" : "string"
}
}
}
}
{
"type_B" : {
"properties" : {
"name" : {
"type" : "string"
}
"meta_B" : {
"type" : "string"
}
}
}
}
What I want to be able to is:
Directly query specific fields (like meta_A)
Directly query all documents from the datsource
Query all documents from a specific mapping
What I was looking into is the type filter, so preferably I could write a query like this:
{
"query": {
"filtered" : {
"filter" : {
"type" : { "value" : "unified_type" }
}
}
// other query clauses
}
}
So instead of typing "type_A","type_B" in an or clause in the type filter I would like to have this "unified_type", but without giving up the possibility to directly query "type_A".
How could I achive this?
I don't think that it's possible. However, you could use copy_to functionality, so you would have your fields as they are now and their values copied into unified name.
The copy_to parameter allows you to create custom _all fields. In
other words, the values of multiple fields can be copied into a group
field, which can then be queried as a single field. For instance, the
first_name and last_name fields can be copied to the full_name field
as follows:
So you'd be copying both "meta_A" and "meta_B" into some "unified_meta" field and query this one.

Avoid mapping multiple fields in elastic search

I have the following problem when indexing documents in elasticsearch, my documents contain some fields that are not repeated in other documents, so I end having a mapping of more than 100.000 elements. Let's see an example:
If I send something like this to an empty index:
{"example":{
"a1":123,
"a2":444,
"a3":52566,
"a4":7,
.....
"aN":11
}
}
It will create the following mapping:
{"example" : {
"properties" : {
"a1" : {
"type" : "long"
},
"a2" : {
"type" : "long"
},
"a3" : {
"type" : "long"
},
"a4" : {
"type" : "long"
},
.....
"aN" : {
"type" : "long"
}
}
}
}
Then if I send another document:
{"example":{
"b1":123,
"b2":444,
"b3":52566,
"b4":7,
.....
"bN":11
}
}
It will create a mapping double as the one above.
The object is more complex than this, but the situation that I'm having now is that the mapping is that big that is killing the server.
How can I address this? is the multifield working in this scenario? I tried in several ways but it doesn't seem to work.
Thanks.
It is pretty tough to give you a definitive answer given we have no idea of your usecase, but my initial guess is that if you have a mapping of thousands of fields that have no logical bond you've probably made some wrong choices about the architecture of your data. Could you tell us why you need to have thousands of fields that have different names for a single document type ? As it is there's not much we can do to pinpoint you into the right direction.
If you really want to do so, create mapping as on example below:
POST /index_name/_mapping/type_name
{
"type_name": {
"enabled": false
}
}
It will give required behavior. elasticsearch will stop to create mapping for fields, as well as parsing and indexing of your documents.
See these links to get more information:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-object-type.html#_enabled_3

Elasticsearch correct way to specify nested field for timestamp

Elasticsearch's documentation states that I can map timestamp to a custom property with the _timestamp mapping. The example on their website shows that:
{
"tweet" : {
"_timestamp" : {
"enabled" : true,
"path" : "post_date"
}
}
}
Will cause 2009-11-15T14:12:12 to be used as the timestamp value for:
{
"message" : "You know, for Search",
"post_date" : "2009-11-15T14:12:12"
}
But what if i wanted to map
{
"message" : "You know, for Search",
"nested": {
"post_date" : "2009-11-15T14:12:12"
}
}
How can i map my nested post_date. What will be the path property?
EDIT: in my properties mapping, i didn't provide the "type":"nested" property for my nested objects , I just provided their subproperties in their properties property. This the default properties mapping generated by ES 1.4.1 during the first indexing.
You need to give complete path while accessing nested fields.
In this case , you can use this - "nested.post_data"

How to specify integer type for csv-river generated type in Elasticsearch

Seems I'm way out of my league here and hope that someone frome the Elasticsearch community can tune in and pinpoint me to my error.
I wanted to try out Kibana + Elastic search for fiddling with some COUNTER3-compliant stats we have and for this needed to import CSV files.
I choose the CSV River plugin to import the csv files.
The river-csv was configured with following PUT request to http://localhost:9200/_river/test_csv_river/_meta
{
"type": "csv",
"csv_file": {
"folder": "C:\\localdata\\UsageStats\\COUNTER3",
"first_line_is_header": "true"
}
}
This works fine but it turns out that the number fields are imported as strings, not numbers. Confirmed by checking on the indexes type mapping : http://localhost:9200/test_csv_river/csv_type/_mapping
{
test_csv_river: {
mappings: {
csv_type: {
properties: {
ft_total: {
type: "string"
},
.....
}
}
}
}
}
This related SO question made me believe that I can change the types field properties AFTER I have created the index: How to update a field type in elasticsearch?
But when I made the following PUT Request I get an error
{"error":"MergeMappingException[Merge failed with failures {[mapper [ft_total] of different type, current_type [string], merged_type [integer]]}]","status":400}
http://localhost:9200/test_csv_river/csv_type/_mapping
{
"csv_type" : {
"properties" : {
"ft_total" : {"type" : "integer", "store" : "yes"}
}
}
}
Is there no way to update that type from string > integer?
Alternatively, is there any way I can make sure the index is created with a specific field predefined as "type" : "integer" when using the CSV river plugin ?

Resources