ElasticSearch Fields Mapping to String by default when indexing - elasticsearch

Let me first explain my scenario.
I am fetching data from RDBMS and pushing it into ElasticSearch.
Fetched Results are in the form of List and i am preparing bulk index request like this:
BulkRequestBuilder bulkRequest = client.prepareBulk();
for (Map<String,Object> singleDataRow : ResultSet)
{
IndexRequest indexRequest = new IndexRequest("testindex","testtype",singleDataRow.getObject("NAME"));
bulkRequest.add(indexRequest);
}
bulkRequest.execute().actionGet();
My Map = includes Map of string to string, string to big decimal, string to big integer etc.
eg.
{ BIRTHDATE : 2015-03-05 , NAME : deepankar , AGE : 22 , AMOUNT : 15.5 }
But when i see the mapping of my testtype in testindex, all mapping of fields are of "type" : "string"
Why the fields does not maps to "type": "string" , or "type" : "long" , and even "type" : "date" as elasticsearch does it by default?

Elasticsearch will attempt to 'guess' the field type by the first insert, unless you create and map fields beforehand.
There are two possible reasons why your fields are being indexed as string instead of long or any other type:
You're not really sending these fields as int, so you're sending '10' instead of 10
You've already inserted at least 1 document that had a string value for that field, so if you've inserted your first document with AGE: '22' Elasticsearch will set that field to type: string and any future inserts will have a string value.
If you want to make sure, you can delete the current index, re-create it and manually set up mapping before inserting the first document, like so:
curl -XPUT 'http://localhost:9200/testindex/_mapping/testmapping' -d '
{
"testmapping" : {
"properties" : {
"birthdate" : { "type" : "date", "format": "dateOptionalTime" },
"name" : { "type" : "string" },
"age" : { "type" : "long" },
"amount" : { "type" : "double" }
}
}
}
'

Related

Elasticsearch indexing timestamp-field fails

I fail at indexing timestamp fields with ElasticSearch (version 7.10.2) and I do not understand why.
So I create an index with the following mapping. You can copy & paste it directly to Kibana:
PUT /my-dokumente
{
"mappings" : {
"properties" : {
"aufenthalt" : {
"properties" : {
"aufnahme" : {
"properties" : {
"zeitpunkt" : {
"type" : "date",
"format": "yyyy-MM-dd HH:mm:ss",
"ignore_malformed": true
}
}
},
"entlassung" : {
"properties" : {
"zeitpunkt" : {
"type" : "date",
"format": "yyyy-MM-dd HH:mm:ss",
"ignore_malformed": true
}
}
}
}
}
}
}
}
Then I try to index a document:
PUT /my-dokumente/dokumente/1165963
{
"aufenthalt" :
{
"aufnahme" :
{
"zeitpunkt" : "2019-08-18 15:02:13"
},
"entlassung" :
{
"zeitpunkt" : "2019-08-20 10:29:22"
}
}
}
Now, i get this error:
"mapper [aufenthalt.entlassung.zeitpunkt] cannot be changed from type [date] to [text]
Why is elastic search not parsing my date?
I also tried with many different mapping settings like strict_date_hour_minute_second or to send the timestamp as "2019-08-18T15:02:13" or "2019-08-18T15:02:13Z" also, I converted it to epoch millis, but I always get some different error message, for example Cannot update parameter [format] from [strict_date_hour_minute_second] to [strict_date_optional_time||epoch_millis].
So the basic question is just: How can I send a timestamp value to ElasicSearch? (with Kibana/CURL).
PS: I am not using a Client SDK like Java High Level Rest Client. Why are talking about basic Kibana/CURL.
It can't be that complicated. What am I missing?
Thank you!
Mapping types are removed in 7.x. Refer to this official documentation
You need to add _doc in URL when indexing a document to Elasticsearch
Modify the URL as PUT /my-dokumente/_doc

elasticsearch: how to define mapping with nested fields?

I am going to define mapping with nested fields. according to this documentation, payload to /order-statistics/_mapping/order looks like:
{
"mappings" : {
"order": {
"properties" : {
"order_no" : {
"type" : "string"
},
"order_products" : {
"type" : "nested",
"properties" : {
"order_product_no" : {
"type" : "int"
},
"order_product_options" : {
"type" : "nested",
"properties" : {
"order_product_option_no" : {
"type" : "int"
}
}
}
}
}
}
}
}
}
I've already created the order-statistics index with a call to curl -XPUT 'localhost:9200/order-statistics' and I'm using predefined types such as int, string, double, But I get the following error and can't find what wrong with.
{
"error":{
"root_cause":[
{
"type":"mapper_parsing_exception",
"reason":"Root mapping definition has unsupported parameters: [mappings : {order={properties={order_no={type=string}, order_products={type=nested, properties={order_product_no={type=int}, order_product_options={type=nested, properties={order_product_option_no={type=int}}}}}}}}]"
}
],
"type":"mapper_parsing_exception",
"reason":"Root mapping definition has unsupported parameters: [mappings : {order={properties={order_no={type=string}, order_products={type=nested, properties={order_product_no={type=int}, order_product_options={type=nested, properties={order_product_option_no={type=int}}}}}}}}]"
},
"status":400
}
could someone explain why this not work?
You are using int as type for some fields which is not a valid type in either 2.x or 5.x. For integer values, please use integer or long depending on the values you want to store. For details, please see the docs on core mapping types.
Which version of elasticsearch are you using - 2.x or 5.x? If you are on 5.x already, you should go with keyword or text for your string fields instead of using just string which was the naming up to 2.x. But this is still only a warning.
Additionally, you should be aware of the implications when using nested instead of just object. Using a nested type is necessary if you store an array of objects and want to query for more than one property of such an object with the guarantee that only these documents match where one of the nested objects in the array matches all your conditions. But this comes at a cost, so consider using the simple object type, if this works for you. For more details, please see the docs on nested data type and especially the warning at the end.

Elasticsearch document aliases

I have multiple mappings which come from the same datasource but have small differences, like the example below.
{
"type_A" : {
"properties" : {
"name" : {
"type" : "string"
}
"meta_A" : {
"type" : "string"
}
}
}
}
{
"type_B" : {
"properties" : {
"name" : {
"type" : "string"
}
"meta_B" : {
"type" : "string"
}
}
}
}
What I want to be able to is:
Directly query specific fields (like meta_A)
Directly query all documents from the datsource
Query all documents from a specific mapping
What I was looking into is the type filter, so preferably I could write a query like this:
{
"query": {
"filtered" : {
"filter" : {
"type" : { "value" : "unified_type" }
}
}
// other query clauses
}
}
So instead of typing "type_A","type_B" in an or clause in the type filter I would like to have this "unified_type", but without giving up the possibility to directly query "type_A".
How could I achive this?
I don't think that it's possible. However, you could use copy_to functionality, so you would have your fields as they are now and their values copied into unified name.
The copy_to parameter allows you to create custom _all fields. In
other words, the values of multiple fields can be copied into a group
field, which can then be queried as a single field. For instance, the
first_name and last_name fields can be copied to the full_name field
as follows:
So you'd be copying both "meta_A" and "meta_B" into some "unified_meta" field and query this one.

Changing elasticsearch mapping

Take the simplest case of indexing the following document in elasticsearch
{
"name": "Mark",
"age": 28
}
With automatic mapping the mapping for this index would now look like
"properties" : {
"doc" : {
"properties" : {
"age" : { "type" : "long"},
"name" : { "type" : "string"
}
}
},
But say I then wanted to allow the case where this document should be indexed
{
"name": "Bill",
"age": "seven"
}
If I try this the mapping does not update and elasticsearch throws an error since there is a conflict with the type of the age property.
Is there any way to do this so both docs could be automatically indexed and consequently queryable?
Mappings are defined per type so what you could do is having two types in your index:
numeric
alphabetical
And split the documents according to the value in the age field. If you run a query you can query both types.
you can add new fields and update a mapping. But you cannot update a mapping.To do that you need to drop the index and create a new mapping and index the data..!
For more info refer this link reference
You can't change existing mapping.You can only add new field in it.
Or you have to delete old mapping & create a new mapping for that particular index.

How to add multiple object types to elasticsearch using jdbc river?

I'm using the jdbc river to successfully add one object type, "contacts", to elasticsearch. How can I add another contact type with different fields? I'd like to add "companies" as well.
What I have is below. Do I need to do a separate PUT statement? If I do, no new data appears to be added to elasticsearch.
PUT /_river/projects_river/_meta
{
"type" : "jdbc",
"index" : {
"index" : "ALL",
"type" : "project",
"bulk_size" : 500,
"max_bulk_requests" : 1,
"autocommit": true
},
"jdbc" : {
"driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"poll" : "30s",
"strategy" : "poll",
"url" : "jdbc:sqlserver://connectionstring",
"user":"username","password":"password",
"sql" : "select ContactID as _id, * from Contact"
}
}
Also, when search returns results, how can I tell if they are of type contact or company? Right now they all have a type of "jdbc", and changing that in the code above throws an error.
You can achieve what you want with inserting several columns to your sql query.
Like ContactID AS _id you can also define indexName AS _index and indexType AS _type in your sql query.
Also, if you need another river, add rivers with different _river types.
In your case such as,
PUT /_river/projects_river2/_meta + Query ....
PUT /_river/projects_river3/_meta + Query ....
Anyone else who stumbles across this, please see official documentation for syntax first: https://github.com/jprante/elasticsearch-river-jdbc/wiki/How-bulk-indexing-isused-by-the-JDBC-river
Here's the final put statement I used:
PUT /_river/contact/_meta
{
"type":"jdbc",
"jdbc": {
"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url":"connectionstring",
"user":"username",
"password":"password",
"sql":"select ContactID as _id,* from Contact",
"poll": "5m",
"strategy": "simple",
"index": "contact",
"type": "contact"
}
}

Resources