Table To Elastic Search - Create/Update - elasticsearch

I have a table with size of 50GB. I want to move the table to elastic search for faster search. Currently it takes more minutes even to run a count(1) query.
Is there is open source project which does this, so that we can use it for faster searching of results? Also whenever there is a considerable updates in the table need to push the changes to elastic search
Please advise.
Thanks

Related

Fastest Search for data in DB -> Migration to NoSQL

I'm planning to do the following
a. migrate table of 80gb to nosql - https://github.com/msathis/SQLToNoSQLImporter
b. write a full text nosql query to fetch the records which has LIKE operator or contains
Does it going to give me the results in a second? Are i should go for Solr kind of server to get results quickly or Elastic Search or CouchDB.
Basically a table has 30 columns and all are string/text and wanted to do full text search in each column selected by the user.
Please advise
Thanks

mongoDB laravel search query taking too much time

i have 400000+ records now stored in MongoDB with a regular indexed but when i fire a update or search query through laravel elenquote it's taking too much time to get the particular records.
in where condition we have use indexed columns only.
we are using atlas M10 cluster instance with multiple replicas
so anyone have a some idea about it please share us
my replication lag graph
this is my profiler data
My Indexs in schema

How to add calculations to an Elastic Search database?

I'm using Elastic Search to index large amounts of sensor data for analytics purposes. The table has 4 million + rows and growing fast - expecting 40 million within the next year. This makes Elastic Search seem like a natural fit, especially with tools such as Kibana to easily display the data.
Elastic Search seems great, however there are are some more complex calculations that have to be performed as well. One such calculation is for our "average user time", where we take two data points (timestamp of item picked up and timestamp of item placed back), subtract them from each other and do an average of all these for one specific customer over a specific timeframe. The SQL query would look something like "select * from events where event_type = 'object picked up' or event_type = 'object placed back down'" then take all these events and get diffs on all their timestamps, add them all together then divide by count.
These types of calculations to my understanding are not the type of thing that Elastic Search is meant to do. I've had people recommend Hadoop but that could take a long time to get set up and we can use a fast language like GO or Node/JavaScript to batch process things and add them to the DB periodically... but what is the right way to do this? Allowing for future scalability and working nicely with Elastic Search.
Our setup is: Rails, AngularJS, Elastic Search, Heroku, Postgres.
Maybe you could try to use scripted metrics. In connection with filters can give you more or less proper solution for your problem
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html

how do you update or sync with a jdbc river

A question about rivers and data syncing with a production database using elastic search:
Are rivers suited for only bulk loading data initially, or does it somehow listen or monitor for changes.
If I have a nightly import of data, is it just better to delete rivers and indexes, and re-index and recreate the rivers?
If I update or change a river, do I have to delete and re-create the index?
How do I set up a schedule with a river to fetch new data periodically. Can it store last maxid so that it can do diff queries in the sql to select into the river?
Any suggestions on a better way to keep the database and elastic search in sync - without calling individual index update functions with a PUT command?
All of the Elasticsearch rivers are different - some are provided directly by Elasticsearch, many more are developed by third parties:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html
Each operates differently, so to answer your questions you have to choose a specific river. For your case, since you're looking to index data from a production database, I'll assume that the JDBC river is what you would use:
https://github.com/jprante/elasticsearch-river-jdbc
This river will index data from your JDBC source, including picking up changes. It can do so on a schedule (there is detailed documentation on the schedule parameter on this page: https://github.com/jprante/elasticsearch-river-jdbc). However, this river will not pick up deletes:
https://github.com/jprante/elasticsearch-river-jdbc/issues/213
you may find this discussion useful, concerning getting around the lack of delete support with building a new river/index daily and using index aliases: ElasticSearch river JDBC MySQL not deleting records
You can just map your id in your DB to be _id with alias, this way the elastic will identify when the document was changed or not.

Solr (re)indexing database

I've looked for several other questions related to mine but for now I couldn't find a solution for my issue.
Here is the situation:
A database with table table_x
A cronjob which checks every 2 minutes to index newly added/updated content in table_x using Solr
Extra information about the cronjob and table_x
- The cronjobs checks a field in table_x to determine if row has to be indexed with Solr or not
- table_x contains over 400k records
What we want is Solr to reindex whole table_x. But there are (we think) 2 factors that are not clear for us:
- What will happen when Solr is indexing all 400k records and the cronjob detects more records to be indexed
- What will happen when a search query is performed on the website while Solr is indexing all 400k records?
If there is someone who can answer this to me?
Kind regards,
Pim
The question has two parts
What happens to indexing when you see the changes detected while the initial indexing is going on?
You can make the second cron trigger to wait till the first is completed. This is more of your application question and how you want to handle it.
How does query is affected by new indexing or indexing in progress?
Which version of solr you are using? You can use NearRelaTimeSearch to see the changes before har commits.

Resources