Delete records not updating in elastic search river plugin - elasticsearch

In elastic search river , if i deleted a record in mysql , its still showing in index . I have enabled auto-commit also . How make mysql and elastic search in sync and also how to make delta-imports in elastic ?
{
"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/testrivet",
"user" : "root",
"password" : "Gemini*123",
"sql" : [
{
"statement" : "select *,empid as _id from empdata"
}
],
"strategy" : "simple",
"schedule" : "0 0-59 0-23 ? * *",
"autocommit" : true,
"metrics": {enabled:true}
},
"index" : {
"autocommit":true
}
}

Indeed, if a record is deleted from your database, there's no way your JDBC river will be able to retrieve it anymore in order to delete the corresponding record in ES.
An alternative is to "soft-delete" records from your database by setting a flag (i.e. a new boolean column). The flag would be true when the record is active and false when the record is deleted. That way when your import process runs, you'd get all records and based on that flag you know you have to delete the documents from Elasticsearch.
There are other ways but they involve adding another component to the mix, so if this would do the job I'd suggest doing like that.

Related

How to get creation time of indices in elastic search using Jest

I am trying to delete the indexes from elasticsearch which are created 24 hours before. I am not finding a way to get the creation time of indices for the particular node. Using spring boot elastic search, this can be accomplished. However, I am using the Jest API.
You can get the settings.index.creation_date value that was stored at index creation time.
With curl you can get it easily using:
curl -XGET localhost:9200/your_index/_settings
You get:
{
"your_index" : {
"settings" : {
"index" : {
"creation_date" : "1460663685415", <--- this is what you're looking for
"number_of_shards" : "5",
"number_of_replicas" : "1",
"version" : {
"created" : "1040599"
},
"uuid" : "dIG5GYsMTueOwONu4RGSQw"
}
}
}
}
With Jest, you can get the same value using:
import io.searchbox.indices.settings.GetSettings;
GetSettings getSettings = new GetSettings.Builder().build();
JestResult result = client.execute(getSettings);
You can then use JestResult in order to find the creation_date
If I may suggest something, curator would be a much handier tool for achieving what you need.
Simply run this once a day:
curator delete indices --older-than 1 --time-unit days

Not getting incremental data with jdbc importer from sql to elastic search

As per jdbc importer :
It is recommended to use timestamps in UTC for synchronization. This example fetches all product rows which has added since the last run, using a millisecond resolution column mytimestamp:
{
"type" : "jdbc",
"jdbc" : {
"url" : "jdbc:mysql://localhost:3306/test",
"user" : "",
"password" : "",
"sql" : [
{
"statement" : "select * from \"products\" where \"mytimestamp\" > ?",
"parameter" : [ "$metrics.lastexecutionstart" ]
}
],
"index" : "my_jdbc_index",
"type" : "my_jdbc_type"
}
}
I want to input data incrementally based on a column modified data whose format is 2015-08-20 14:52:09 also i use a scheduler which runs every minute . I tried with the value of sql key as
"statement" : "select * from \"products\" where \"modifiedDate\" > ?",
But data was not loaded.
Am I missing out something ?
the format of lastexecutionstart like this "2016-03-27T06:37:09.165Z".
it contain 'T' and 'Z' . So that is why your data was not loaded.
If you want to know more.
here is link
https://github.com/jprante/elasticsearch-jdbc

How to add multiple object types to elasticsearch using jdbc river?

I'm using the jdbc river to successfully add one object type, "contacts", to elasticsearch. How can I add another contact type with different fields? I'd like to add "companies" as well.
What I have is below. Do I need to do a separate PUT statement? If I do, no new data appears to be added to elasticsearch.
PUT /_river/projects_river/_meta
{
"type" : "jdbc",
"index" : {
"index" : "ALL",
"type" : "project",
"bulk_size" : 500,
"max_bulk_requests" : 1,
"autocommit": true
},
"jdbc" : {
"driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"poll" : "30s",
"strategy" : "poll",
"url" : "jdbc:sqlserver://connectionstring",
"user":"username","password":"password",
"sql" : "select ContactID as _id, * from Contact"
}
}
Also, when search returns results, how can I tell if they are of type contact or company? Right now they all have a type of "jdbc", and changing that in the code above throws an error.
You can achieve what you want with inserting several columns to your sql query.
Like ContactID AS _id you can also define indexName AS _index and indexType AS _type in your sql query.
Also, if you need another river, add rivers with different _river types.
In your case such as,
PUT /_river/projects_river2/_meta + Query ....
PUT /_river/projects_river3/_meta + Query ....
Anyone else who stumbles across this, please see official documentation for syntax first: https://github.com/jprante/elasticsearch-river-jdbc/wiki/How-bulk-indexing-isused-by-the-JDBC-river
Here's the final put statement I used:
PUT /_river/contact/_meta
{
"type":"jdbc",
"jdbc": {
"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"url":"connectionstring",
"user":"username",
"password":"password",
"sql":"select ContactID as _id,* from Contact",
"poll": "5m",
"strategy": "simple",
"index": "contact",
"type": "contact"
}
}

Elasticsearch querying alias with routing giving partial results

In an effort to create multi-tenant architecture for my project.
I've created an elasticsearch cluster with an index 'tenant'
"tenant" : {
"some_type" : {
"_routing" : {
"required" : true,
"path" : "tenantId"
},
Now,
I've also created some aliases -
"tenant" : {
"aliases" : {
"tenant_1" : {
"index_routing" : "1",
"search_routing" : "1"
},
"tenant_2" : {
"index_routing" : "2",
"search_routing" : "2"
},
"tenant_3" : {
"index_routing" : "3",
"search_routing" : "3"
},
"tenant_4" : {
"index_routing" : "4",
"search_routing" : "4"
}
I've added some data with tenantId = 2
After all that, I tried to query 'tenant_2' but I only got partial results, while querying 'tenant' index directly returns with the full results.
Why's that?
I was sure that routing is supposed to query all the shards that documents with tenantId = 2 resides on.
When you have created aliases in elasticsearch, you have to do all operations using aliases only. Be it indexing, update or search.
Try reindexing the data again and check if possible (If it is a test index, I hope so).
Remove all the indices.
curl -XDELETE 'localhost:9200/' # Warning:!! Dont use this in production.
Use this command only if it is test index.
Create the index again. Create alias again. Do all the indexing, search and delete operations on alias name. Even the import of data should also be done via alias name.

Index huge data into Elasticsearch

I am new to elasticsearch and have huge data(more than 16k huge rows in the mysql table). I need to push this data into elasticsearch and am facing problems indexing it into it.
Is there a way to make indexing data faster? How to deal with huge data?
Expanding on the Bulk API
You will make a POST request to the /_bulk
Your payload will follow the following format where \n is the newline character.
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
...
Make sure your json is not pretty printed
The available actions are index, create, update and delete.
Bulk Load Example
To answer your question, if you just want to bulk load data into your index.
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
The first line contains the action and metadata. In this case, we are calling create. We will be inserting a document of type type1 into the index named test with a manually assigned id of 3 (instead of elasticsearch auto-generating one).
The second line contains all the fields in your mapping, which in this example is just field1 with a value of value3.
You will just concatenate as many as these as you'd like to insert into your index.
This may be an old thread but I though I would comment anyway for anyone who is looking for a solution to this problem. The JDBC river plugin for Elastic Search is very useful for importing data from a wide array of supported DB's.
Link to JDBC' River source here..
Using Git Bash' curl command I PUT the following configuration document to allow for communication between ES instance and MySQL instance -
curl -XPUT 'localhost:9200/_river/uber/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/elastic",
"user" : "root",
"password" : "root",
"sql" : "select * from tbl_indexed",
"poll" : "24h",
"max_retries": 3,
"max_retries_wait" : "10s"
},
"index": {
"index": "uber",
"type" : "uber",
"bulk_size" : 100
}
}'
Ensure you have the mysql-connector-java-VERSION-bin in the river-jdbc plugin directory which contains jdbc-river' necessary JAR files.
Try bulk api
http://www.elasticsearch.org/guide/reference/api/bulk.html

Resources