i have 400000+ records now stored in MongoDB with a regular indexed but when i fire a update or search query through laravel elenquote it's taking too much time to get the particular records.
in where condition we have use indexed columns only.
we are using atlas M10 cluster instance with multiple replicas
so anyone have a some idea about it please share us
my replication lag graph
this is my profiler data
My Indexs in schema
Related
I'm planning to do the following
a. migrate table of 80gb to nosql - https://github.com/msathis/SQLToNoSQLImporter
b. write a full text nosql query to fetch the records which has LIKE operator or contains
Does it going to give me the results in a second? Are i should go for Solr kind of server to get results quickly or Elastic Search or CouchDB.
Basically a table has 30 columns and all are string/text and wanted to do full text search in each column selected by the user.
Please advise
Thanks
I have 2 indexes and they both have one common field (basically relationship).
Now as elastic search is not giving filters from multiple indexes, should we store them in memory in variable and filter them in node.js (which basically means that my application itself is working as a database server now).
We previously were using MongoDB which is also a NoSQL DB but we were able to manage it through aggregate queries but seems the elastic search is not providing that.
So even if we use both databases combined, we have to store results of them somewhere to further filter data from them as we are giving users advanced search functionality where they are able to filter data from multiple collections.
So should we store results in memory to filter data further? We are currently giving advanced search in 100 million records to customers but that was not having the advanced text search that elastic search provides, now we are planning to provide elastic search text search to customers.
What do you suggest should we use the approach here to make MongoDB and elastic search together? We are using node.js to serve data.
Or which option to choose from
Denormalizing: Flatten your data
Application-side joins: Run multiple queries on normalized data
Nested objects: Store arrays of objects
Parent-child relationships: Store multiple documents through joins
https://blog.mimacom.com/parent-child-elasticsearch/
https://spoon-elastic.com/all-elastic-search-post/simple-elastic-usage/denormalize-index-elasticsearch/
Storing things client side in memory is not the solution.
First of all the simplest way to solve this problem is to simply make one combined index. Its very trivial to do this. Just insert all the documents from index 2 into index 1. Prefix all fields coming from index-2 by some prefix like "idx2". That way you won't overwrite any similar fields. You can use an ingestion pipeline to do this, or just do it client side. You only will ever do this once.
After that you can perform aggregations on the single index, since you have all the data in one-index.
If you are using somehting other than ES as your primary data-store you need to reconfigure the indexing operation to redirect everything that was earlier going into index-2 to go into index-1 as well(with the prefixed terms).
100 million records is trivial for something like ELasticsearch. Doing anykind of "joins" client side is NOT RECOMMENDED, as this will obviate the entire value of using ES.
If you need any further help on executing this, feel free to contact me. I have 11 years exp in ES. And I have seen people struggle with "joins" for 99% of the time. :)
The first thing to do when coming from MySQL/PostGres or even Mongodb is to restructure the indices to suit the needs of data-querying. Never try to work with multiple indices, ES is not built for that.
HTH.
We are trying to do a one-time data load from Oracle to Elastic Search.
We have evaluated Logstash but the indexing is taking a lot of time.
We have tried Apache Nifi but are facing difficulty in loading nested objects and computed results in apache Nifi.
We are trying to maintain one-many relations in nested objects(We have an oracle query to fetch these results) and we are also maintaining the result of a hierarchical query as a field in the index.
We are looking for an open-source alternative and an efficient approach to load around 10 tables with 3 million records each from Oracle to Elastic Search.
Please suggest.
I am working on a project where I have two big tables (parent and child) in Oracle. One is having 65 Million and the other 80 Million records. In total, data from 10 columns are required from these tables and saved as one document into Elastic search. The load for two tables can be done separately also. What are two comparable options to move data (one time load) from these tables into Elastic search and out of the two which one would you recommend? The requirement is that it should be Fast and simple so that it can not only be used for one time data load but also be used in case there is a failure and the elastic search index needs to be created again from scratch.
As already suggested one option may be logstash: the advantage of logstash is simplicity, but it can be complicated to monitor and it can be difficult to configure if you have to transform some field during the ingestion.
One alternative can be nifi: it offers jdbc and elasticsearch plugin, and you can monitor, start and stop the ingestion directly with the web interface. It is possible with nifi to build a more complex and robust pipeline: handling exceptions, translating data types and performing data enrichment.
I am building an application where i am tracking user activity changes and showing the activity logs to the users. Here are a few points :
Insert 100 million records per day.
These records to be indexed and available in search results immediately(within a few seconds).
Users can filter records on any of the 10 fields that are exposed.
I think both Mongo and Oracle will not accomplish what you need. I would recommend offloading the search component from your primary data store, maybe something like ElasticSearch:
http://www.elasticsearch.org/
My recommendation is ElasticSearch as your primary use-case is "filter" (Facets in ElasticSearch) and search. Is it written to scale-up (otherwise Lucene is also good) and keeping big data in mind.
100 million records a day sounds like you would need a rapidly growing server farm to store the data. I am not familiar with how Oracle would distribute these data, but with MongoDB, you would need to shard your data based on the fields that your search queries are using (including the 10 fields for filtering). If you search only by shard key, MongoDB is intelligent enough to only hit the machines that contain the correct shard, so it would be like querying a small database on one machine to get what you need back. In addition, if the shard keys can fit into the memory of each machine in your cluster, and are indexed with MongoDB's btree indexing, then your queries would be pretty instant.