Disable date detection in Tire's elasticsearch mapping - elasticsearch

I'm indexing a document with a property obj_properties, which is a hash of property name -> property value. elasticsearch is inferring that some of the property values are dates, leading to the following error when it encounters a subsequent value for the same property that can't be parsed as a date.
org.elasticsearch.index.mapper.MapperParsingException: failed to parse date field <NON-DATE FIELD within obj_properties>
So, I'd like to disable date detection for obj_properties and anything nested within it. Per
http://elasticsearch-users.115913.n3.nabble.com/Date-Detection-not-always-wanted-tp1638890p1639415.html
(Note, I believe the linked post contains a typo -- the field should be date_formats rather than date_format, but I've tried both ways)
I've created the following mapping
mapping do
indexes :name
indexes :obj_properties, type: "object", date_formats: "none"
end
but I continue to receive the same exception. The properties in obj_properties are not known ahead of time, so it's not possible to create an exhaustive mapping of types. Any ideas? Is disabling date detection the correct approach?

You can turn off date detection for a particular type by specifying it in the mapping:
curl -XPUT 'http://127.0.0.1:9200/myindex/?pretty=1' -d '
{
"mappings" : {
"mytype" : {
"date_detection" : 0
}
}
}
'
or for all types in an index by specifying it in the default mapping:
curl -XPUT 'http://127.0.0.1:9200/myindex/?pretty=1' -d '
{
"mappings" : {
"_default_" : {
"date_detection" : 0
}
}
}
'

mapping(date_detection: false) do
indexes :name
indexes :obj_properties, type: "object"
end
then curl 'http://127.0.0.1:9200/myindex/_mapping?pretty=1' will include date_detection = false mentioned here
Although i believe this applies to the entire index - not a particular field

Related

Changing type of property in index type's mapping

I have index mapping for type 'T1' as below:
"T1" : {
"properties" : {
"prop1" : {
"type" : "text"
}
}
}
And now I want to change the type of prop1 from text to keyword. I don't want to delete index. I have also read people suggesting to create another property with new type and replace it. But then I have to update old documents which I am not interested into. I tried to use PUT api as below but I never works.
PUT /indexName/T1/_mapping -d
{
"T1" : {
"properties" : {
"prop1" : {
"type" : "keyword"
}
}
}
}
Is there any way to achieve this?
Mapping cannot be modified, hence the PUT api you have used will not work. The new index will have to be created with the updated mapping to be used and reindexing all the data to new index.
To prevent downtime you can always use alias:
https://www.elastic.co/blog/changing-mapping-with-zero-downtime
A mapping cannot be updated once it is persisted. The only option is to create a new index with the correct mappings and reindex your data using the reindex API provided by ES.
You can read about the reindex API here:
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/docs-reindex.html

Change Mapping for Field for ALL OF LOGSTASH Created indexes

I would like to change the type of the field location to geo_point. I'm using ES with Logstash, as y'all know, indices are generated with the name logstash-yyyy-mm-dd
I first created a logstash index and named it logstash-2016-03-29, like so:
curl -XPUT 'http://localhost:9200/logstash-2016-03-29'
then, I changed the mapping for supposedly all the indices called Logstash-* using the following:
curl -XPOST "http://localhost:9200/logstash-*/_mapping/logs" -d '{
"properties" : {
"location" : { "type":"geo_point" }
}
}'
And when I ran the Logstash configuration file, all the location fields in the index logstash-2016-03-29 were indeed of type geo_point.
However, today, the auto-generated index logstash-2016-03-30 had field location of type String instead of geo_point. I thought the type should be applied on ANY index that starts with the name logstash-*. Apparently, I was wrong. How can I fix this so that any future index created by logstash that have the location field would have that field type set to geo_point instead of String?
Thanks.
You should define it using the index template
curl -XPUT localhost:9200/_template/template_2 -d '
{
"template" : "logstash-",
"mappings" : {
"logs" : {
"properties": {
"location" : { "type" : "geo_point" }
}
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html

How to create a common mapping template for indices?

For the app i created, the indices are generated once in a week. And the type and nature of the data is not varying and that implies, I need the same mapping type for these indices. Is it possible in elasticsearch to apply the same mapping to all the indices as they are created?. This could avoid me the overhead of defining mapping each time the index is created.
Definitely, you can use what is called an index template. Since your mapping type is stable, that's the perfect condition for using index templates.
It's as easy as creating an index. See below, whenever you want to index a document in an index whose name matches my_*, ES will select that template and create the index for you using the given mappings, settings and aliases:
curl -XPUT localhost:9200/_template/template_1 -d '{
"template" : "my_*",
"settings" : {
"number_of_shards" : 1
},
"aliases" : {
"my_alias" : {}
},
"mappings" : {
"my_type" : {
"properties" : {
"my_field": { "type": "string" }
}
}
}
}'
It's basically the technique used by Logstash when it needs to index new logs for each new day in a new daily index.
You can employ index template to address your problem. The official documentation can be found here.
A use case of how to apply the same with examples can be found in this blog

Is there a way to apply the synonym token filter in ElasticSearch to field names rather than the value?

Consider the following JSON file:
{
"titleSony": "Matrix",
"cast": [
{
"firstName": "Keanu",
"lastName": "Reeves"
}
]
}
Now, I know in ElasticSearch, you can apply a synonym token filter to field values as given in the following link: Elasticsearch Analysis: Synonym token filter.
Hence, I can create a "synonym.txt" file with Matrix => Matx, then if I search for titleSony:Matx, it will return the documents with Matrix as well.
Now, what I would like is to create a synonym for the field name titleSony. For example - titleSony => titleAll, such that when I search for titleAll, I should get all documents with titleSony as well.
Is there any way to accomplish this in ElasticSearch?
Now, what I would like is to create a synonym for the field name "titleSony". For example - titleSony => titleAll , hence when I search for "titleAll", I should get all documents with "titleSony" as well.
Yes, somewhat. Elasticsearch has some default behavior very similar to this, which I'll touch on in a bit.
The feature you're looking for is called "Copy to field." It allows you to specify that the terms in one field should be copied into another. This is useful for consolidating terms you expect to match into a single field, to help simplify your query when you would like to match against any one of a number of fields.
In this example, you would specify in your mapping that the terms in the titleSony field ought to be copied into the titleAll field. Presumably you'd have other fields (say, titleDisney) which also copy into that field as well. So a search against titleAll will effectively match the other fields whose terms are copied into it.
An excerpt of your mapping might look something like this:
{
"movies" : {
"properties" : {
"titleSony" : { "type" : "string", "copy_to" : "titleAll" },
"titleDisney" : { "type" : "string", "copy_to" : "titleAll" },
"titleAll" : { "type" : "string" },
"cast" : { ... },
...
}
}
I mentioned earlier that Elasticsearch does something like this. By default it creates a special field called _all into which all the document's terms are copied. This field lets you construct very simple queries to match against terms that occur in any field on the document. So as you see, this is a fairly common convention in Elasticsearch. (Elasticsearch mapping: _all field.)

Can I create a document with the update API if the document doesn't exist yet

I have a very simple question :
I want to update multiple documents to elasticsearch. Sometimes the document already exists but sometimes not. I don't want to use a get request to check the existence of the document (this is decreasing my performance). I want to use directly my update request to index the document directly if it doesn't exist yet.
I know that we can use upsert to create a non existing field when updating a document, but this is not what I want. I want to index the document if it doesn't exist. I don't know if upsert can do this.
Can you provide me some explaination ?
Thanks in advance!
This is doable using the update api. It does require that you define the id of each document, since the update api requires the id of the document to determine its presence.
Given an index created with the following documents:
PUT /cars/car/1
{ "color": "blue", "brand": "mercedes" }
PUT /cars/car/2
{ "color": "blue", "brand": "toyota" }
We can get the upsert functionality you want using the update api with the following api call.
POST /cars/car/3/_update
{
"doc": {
"color" : "brown",
"brand" : "ford"
},
"doc_as_upsert" : true
}
This api call will add the document to the index since it does not exist.
Running the call a second time after changing the color of the car, will update the document, instead of creating a new document.
POST /cars/car/3/_update
{
"doc": {
"color" : "black",
"brand" : "ford"
},
"doc_as_upsert" : true
}
AFAIK when you index the documents (with a PUT call), the existing version gets replaced with the newer version. If the document did not exist, it gets created. There is no need to make a distinction between INSERT and UPDATE in ElasticSearch.
UPDATE: According to the documentation, if you use op_type=create, or a special _create version of the indexing call, then any call for a document which already exists will fail.
Quote from the documentation:
Here is an example of using the op_type parameter:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
Another option to specify create is to use the following uri:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
For bulk API use
bulks.push({
update: {
_index: 'index',
_type: 'type',
_id: id
}
});
bulks.push({"doc_as_upsert":true, "doc": your_doc});
As of elasticsearch-model v0.1.4, upserts aren't supported. I was able to work around this by creating a custom callback.
after_commit on: :update do
begin
__elasticsearch__.update_document
rescue Elasticsearch::Transport::Transport::Errors::NotFound
__elasticsearch__.index_document
end
end
I think you want "create" action
Here's the bulk API documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
The index and create actions expect a source on the next line, and have the same semantics as the op_type parameter in the standard index API: create fails if a document with the same ID already exists in the target, index adds or replaces a document as necessary.
Difference between actions:
create
(Optional, string) Indexes the specified document if it does not already exist. The following line must contain the source data to be indexed.
index
(Optional, string) Indexes the specified document. If the document exists, replaces the document and increments the version. The following line must contain the source data to be indexed.
update
(Optional, string) Performs a partial document update. The following line must contain the partial document and update options.
doc
(Optional, object) The partial document to index. Required for update operations.

Resources