jprante elasticsearch jdbc river changing the date value - jdbc

I am trying to index mysql records in elasticsearch using the jprante's elasticsearch jdbc river. I just noticed that the value in the date field is getting changed in the index.
Mapping:
content_date:{
"type":"date"
}
content_date field for a record in mysql -> 2012-10-06 02:11:30
after running the jdbc river....
content_date field for same record in elasticsearch -> 2012-10-05T20:41:30Z
River:
curl -XPUT 'localhost:9200/_riv_index/_riv_type/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/db",
"user" : "user",
"password" : "password",
"sql" : "select * from table where id=2409",
"poll" : "1d",
"versioning" : false
},
"index" : {
"index" : "myindex",
"type" : "mytype"
}
}'
Change in date format is acceptable, but why is the date value getting changed?
The river is adding utc time difference to the mysql record's date and saving it in elasticsearch. How do I stop this time conversion?

From the Elasticsearch POV, here's what docs said :
The date type is a special type which maps to JSON string type. It follows a specific format that can be explicitly set. All dates are UTC. Internally, a date maps to a number type long, with the added parsing stage from string to long and from long to string.
Not sure that you can change it.

Solution for this issue is to use timezone in jdbc block
"timezone" : "TimeZone.getDefault()"
Also I am saving date and time in separate field in mysql DB
| date | date | YES | | NULL | |
| time | time | YES | | NULL | |
Elasticsearch uses Joda timeformat to save date. Hence it's automatically converting my date to datetime.
In the date field, since I don't have time, it is automatically adding zero's to it.
Since I need to display data via Kibana that why I need this split..I converted format of date and time as varchar(20) as a workaround(bad idea I know) and its working fine now ..

Related

ES SQL result doesn't use correct date mapping

I'm experimenting with the SQL options from Elasticsearch and I noticed that a timestamp field that I mapped as "strict_date_optional_time_nanos||epoch_millis" doesn't show up as it is indexed. This is what the timestamp column looks like when I do a SELECT * FROM index:
| timeStamp |
+------------------------+
|1970-01-20T04:38:39.243Z|
The actual value indexed is: 1675772407310 (9th of Feb 13:59:24). I cannot seem to find information as to why it's this way.
I believe that Elasticsearch already internally performs the conversion to the datetime type.
In this case, you can do a cast to get the value in epoch format.
GET _sql?format=txt
{
"query": """ SELECT cast(timestamp as bigint) FROM "test"
"""
}

Error parsing date histogram filter in Kibana visualization

I'm using Kibana version 4.4.1 with ES 2.2.0 from the Debian repos.
I have a field with the following type defined:
"InvitationTime" : {
"type" : "date",
"format" : "dd/MM/yyyy HH:mm:ss Z"
}
I created a data table visualization with a date histogram aggregation on this field:
When I click on one of the dates to filter though, I get an error where it tries to parse the millis-since-epoch value of the field in my field format:
Is this a bug or am I doing something wrong?
Thanks
Wong
You need to modify your mapping with the following format instead:
"format" : "dd/MM/yyyy HH:mm:ss Z||epoch_millis"
You also need to recreate your index and re-index your data. It should work fine.

Field-specific versioning in Elasticsearch

There is a good deal of documentation about how Elasticsearch supports document level external versioning. However, if one wants to do a partial update (say, to a specific field), it'd be useful to have this type of version checking at the field level.
For instance, say I have an object field name, with primitive fields value and timestamp. I only want the partial updates to succeed if the timestamp value is greater than the value currently in Elasticsearch.
Is there an easy way to do this? Can it be done with a script? Or is there a more standard way of doing it?
Yes it's very easy, using a script. See here https://www.elastic.co/guide/en/elasticsearch/reference/2.0/docs-update.html.
I've written an example here to update the "value" field if and only if the specified timestamp value (given in parameter update_time) is greater than the "timestamp" field. If the timestamp field value is less than the update_time parameter then it will be updated, otherwise the update will not be performed.
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : {
"inline": "if(ctx._source.name.timestamp > update_time){ ctx.op = \"none\"};
ctx._source.name.value = value; ctx._source.name.timestamp = update_time;",
"params" : {
"update_time" : 432422,
"value": "My new value"
}
}
}'
You can get the current time in the script if desired, rather than passing as a parameter e.g.:
update_time = DateTime.now().getMillis()

how to detect changes in database and automatically adding new row to elasticsearch index

What I've already done:
I connected my hbase with elasticsearch via this tutorial:
http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html
And I get index with hbase table content, but after adding new row to hbase, it is not automatically added to elasticsearch index. I tried to add this line to my conf:
"schedule" : "* 1/5 * ? * *"
and mapping:
"mappings": {
"jdbc" : {
"_id" : {
"path" : "ID"
}
}
}
which assigns _id = ID, and ID has unique value in my hbase table.
It's working well: when I add new row to hbase it is uploaded to index in less then 5 minutes. But it is not good for performance, because every 5 minutes it executes a query and doesn't add old content to index only because of _id has to be unique. It is good for small db, but I had over 10 millions row in my hbase table, so my index is working all time.
It is any solution or plugin to elasticsearch to automatically detected changes in db and add only the new row to index?
I create index using:
curl -XPUT 'localhost:9200/_river/jdbc/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:phoenix:localhost",
"user" : "",
"password" : "",
"sql" : "select ID, MESSAGE from test",
"schedule" : "* 1/5 * ? * *"
}
}'
Thanks for help.
You're looking for something called a "river" plugin. There are various around supporting all kinds of databases and even a physical file system. However, the one you're looking for it the HBase River Plugin.

ElasticSearch index unix timestamp

I have to index documents containing a 'time' field whose value is an integer representing the number of seconds since epoch (aka unix timestamp).
I've been reading ES docs and have found this:
http://www.elasticsearch.org/guide/reference/mapping/date-format.html
But it seems that if I want to submit unix timestamps and want them stored in a 'date' field (integer field is not useful for me) I have only two options:
Implement my own date format
Convert to a supported format at the sender
Is there any other option I missed?
Thanks!
If you supply a mapping that tells ES the field is a date, it can use epoch millis as an input. If you want ES to auto-detect you'll have to provide ISO8601 or other discoverable format.
Update: I should also note that you can influence what strings ES will recognize as dates in your mapping. http://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html
In case you want to use Kibana, which I expect, and visualize according to the time of a log/entry you will need at least one field to be a date field.
Please note that you have to set the field as date type BEFORE you input any data into the /index/type. Otherwise it will be stored as long and unchangeable.
Simple example that can be pasted into the marvel/sense plugin:
# Make sure the index isn't there
DELETE /logger
# Create the index
PUT /logger
# Add the mapping of properties to the document type `mem`
PUT /logger/_mapping/mem
{
"mem": {
"properties": {
"timestamp": {
"type": "date"
},
"free": {
"type": "long"
}
}
}
}
# Inspect the newly created mapping
GET /logger/_mapping/mem
Run each of these commands in serie.
Generate free mem logs
Here is a simple script that echo to your terminal and logs to your local elasticsearch:
while (( 1==1 )); do memfree=`free -b|tail -n 1|tr -s ' ' ' '|cut -d ' ' -f4`; echo $load; curl -XPOST "localhost:9200/logger/mem" -d "{ \"timestamp\": `date +%s%3N`, \"free\": $memfree }"; sleep 1; done
Inspect data in elastic search
Paste this in your marvel/sense
GET /logger/mem/_search
Now you can move to Kibana and do some graphs. Kibana will autodetect your date field.

Resources