I have the following setup :
Mysql RDBMS server
Elastic Search Server
My requirement is to copy data periodically from MYSQL RDBMS and update Elastic server with it.Currently i am following the approach below :
A Batch Job which reads all data from MYSQL using Spring Data Jpa
It then pushes all data to elastic server using spring data elastic
This approach is very cumbersome and not efficient.Is there a way where i can read only the updated values using spring data and update the index accordingly in elastic.
Using jdbc-river etc is not an option for me as the application uses Spring data elastic to get data and search over elastic search,with jdbc-river it will not be able to function properly i think.
Related
I am trying to write a method using ElasticsearchRestTemplate to fetch data from elasticsearch using query DSL ]. I looked into the documentation, but the documentation is not clear to me how to get the data from elasticsearch using java. Can anyone please help me how to fetch data from elasticsearch using Java client ( ElasticsearchRestTemplate ) by query DSL?
Spring Data Elasticsearch does not support QueryDsl.
I have rest API through which I am sending and getting massages to the Kafka server using spring-boot. Now I want to save those messages to elasticsearch. How to do it can anyone help?
Actually this is a systematic job in which case, is somehow like setting up a database storage architecture.
TO BE SIMPLE AND SHORT:
First you need to decide which ES version you want to use, because there are some breaking changes between ES 2.x to 7.x. And those differences may affect the way you design the schema of your storage.
Assume you use latest 7.x ES, you will need to create index(es) where you want the data fetched from kafka to be stored into. Checkout https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html
Later you have indexes created, you need to apply and learn some basic knowledge about ES high level rest client and low level rest client. The low level rest client enables you the basic connection to ES cluster via HTTP. And high level rest client apis give you sufficient ways to do ops like documents CRUD, search, aggregations for your data. You can easily find dependencies via maven and use them in your Spring Boot Application. Checkout https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high.html
Is there any way to fetch incremental data from an Oracle database using user-defined query using JDBC?
We are ok to use Spark, Kafka or plain JDBC.
The only thing it should be able to support heavy load.
You've not specified the destination. If it's a Kafka topic then using Apache Kafka makes sense to do the extract too, using Kafka Connect.
In which case, you can use the Kafka Connect JDBC connector to do this. See here for the specifics on using incremental mode with a custom query.
++ EDIT ++
If your final target is BigQuery then you can use Kafka Connect for that too with the appropriate BigQuery connector. You can see an example of it in action here.
I know to write a Kafka consumer and insert/update each record into Oracle database but I want to leverage Kafka Connect API and JDBC Sink Connector for this purpose. Except the property file, in my search I couldn't find a complete executable example with detailed steps to configure and write relevant code in Java to consume a Kafka topic with json message and insert/update (merge) a table in Oracle database using Kafka connect API with JDBC Sink Connector. Can someone point demonstrate an example including configuration and dependencies? Are there any disadvantages with this approach? Do we anticipate any potential issues when table data increases to millions?
Thanks in advance.
There won't be an example for your specific use-case becuase the JDBC connector is meant to be generic.
Here is one configuration example with an Oracle database
All you need is
A topic of some format
key.converter and value.converter to be set to deserialize that topic
Your JDBC string and database schema (tables, projection fields, etc)
Any other JDBC Sink Specific Options
All this goes in a Java properties / JSON file, not Java source code
If you have a specific issue creating this configuration, please comment.
Do we anticipate any potential issues when table data increases to millions?
Well, those issues would be database server related, not with Kafka Connect. For example, disk filling up or increased load while accepting continuous writes.
Are there any disadvantages with this approach?
You'd have to handle de-deduplication or record expiration (e.g. GDPR) separately, if you did want that.
Is there any tutorial or guide to migrate data from MSSQL to MySQL or any other database?
I am trying to migrate data from MSSQL to MySQL using apache NiFi with some modification in data + schema.
Check this SO post for my answer for migrating between two different RDBMS in Apache NiFi. You may need some extra processors such as a JoltTransformJSON after the ConvertAvroToJSON processor, to make the schema/data changes.