Currently pulling large scale of data from the oracle database & then performing calculation on web side for generating HTML reports. I am using Groovy & Grails frame work for report generation.
Now the problem is , We are having very huge calculation & it takes lots of time to generate report on web side.
I am planning to re-architecture my reports , so it generate reports very quickly.
I don't have any command on ORACLE database as it's third-party production database.
I don't want any replication of the database , because it has millions of records , so can't schedule & replication it slow down the production.
I finally came up with some caching architecture , which perform like some calculation engine.
Anyone can help me by providing best solution ?
Thanks
What is structure of your data? Do you want to query so SQL can help you, or is it binary/document?
Do you need persistence (durability) or not?
Redis is fast. But if you have single threaded app using MS SQL and their bulk importer, it's incredibly fast too.
Redis is key/value stores so you need to perform single SET for every column within your domain object, so it can be slower than any other RDBMS which uses INSERT along with all columns.
Or if your results are in form of JSON object, Mongo can be very useful.
It just depends on your data and purpose of persistence.
Related
We are converting from SQL Server to Cassandra for various reasons. The back end system is converted and working and now we are focusing on the front end systems.
In the current system we have a number of Telerik data grids where the app loads all the data and search/sort/filter is done in the grid itself. We want to avoid this and are going to push the search/sort/filter to the DB. In SQL Server this is not a problem because of ad-hoc queries. However in Cassandra it becomes very confusing.
If any operation was allowed then of course a Cassandra table would have to model the data that way. However I was wondering how this is performed in real world scenarios for large amounts of data and large amounts of columns.
For instance, if I had a grid with columns 1, 2, 3, 4 what is the best course of action?
Highly control what the user can do
Create a lot of tables to model the data and pick the one to select from
Don't allow the user to do any data operations
As any NoSQL system, Cassandra performs the queries on Primary Keys best. You can of course use secondary indices, but it will be a lot slower.
So the recommended way is to create Materialized Views for all possible queries.
Another way is to use something like Apache Ignite on top of Cassandra to do analytics, but you don't want to use grids for some reason as i get it.
I'm designing a Data REST API for the purpose of dynamic reporting. Basically, you pass in the data to it (along with the functions to manipulate the data) and it returns you a HTML with the functions applied on the data. Typically, these functions would be filtering, grouping, aggregating, sorting (what a regular RDBMS would offer).
I'm contemplating as to use an in-memory DB for this. By doing so, I'll simply leverage the functions offered by the DB rather than having to implement myself.
However, this requires the service to load the data (perhaps bulk load) and then run series of queries constructed dynamically - as part of every service call.
The data (input) to be loaded in the database can be max 100K rows. Certainly not millions!
But the service can be accessed by different threads (each will load their data-set into the database and read concurrently). Of-course the jdbc connection will be pooled and the tables will be truncated at the end of every transaction.
I'm asking myself that am I going overboard and trying to exploit the in-memory dbs? I have used it myself (and often hear) about in-memory dbs (especially H2 and HSQL) only in the context of integration testing.
Would be interested to hear your views.
Our application (java,spring, hibernate) uses postgress to store data.
We are looking to add an analysis engine to the application. I want to explore using a nosql db to run the analysis on. This is an attempt at learning the nosql a bit also to free the main application activity from performance penalty (as much as possible).
So, I want the data changes to also synch to the nosql db (in addition to postgres). Any synch mechanism will affect the performance of the main data/transaction activity.
Is it a good idea to push the data changes to a message bus and free the main transaction as early as possible ? Can anyone point me to frameworks/technologies/ideas that address this issue of same data going to two different data stores.
The simplest solution would be sending data to a Postgres read replica and running your analytics queries on that. The performance impact is minimal and this would save a lot of time compared to alternative approaches.
Unless you really know what you are doing, I would avoid NoSQL for this kind of application. If your dataset is too big for a Postgres read replica, you might want to use Redshift, which is a columnar datastore that is optimized for types of analytics queries typically performed.
I have been using cache for a long time. We store data against some key and fetch it from cache whenever required. I know that StackOverflow and many other sites heavily rely on cache. My question is do they always use key-value mechanism for caching or do they form some sql like query within a cache? For instance, I want to view last week report. This report's content will vary each day. Do i need to store different reports against each day (where day as a key) or can I get this result from forming some query that aggregate result across different key? Does any caching product (like redis) provide this functionality?
Thanks In Advance
Cache is always done as a key-value hash table. This is how it stays so fast. If you're doing querying then you're not doing cache.
What you may be trying to ask is... you could have in your database a table that contains agregated report data. And you could query against that pre-calculated table.
One of the reasons for cache (e.g. memcached ) being fast is its simplicity of data access and querying protocol.
The more functionality you add, more tradeoff you will have to do on the efficiency part. A full fledged SQL engine in a "caching" database is not a good design. Though you can utilize a data structures oriented database like Redis to design your cache data to suit your querying needs. For example: one set or one hash for each date.
A step further, you can use databases like MongoDb , or memsql which are pretty fast and have rich querying support.So an aggregation report once a while won't be an issue.
However, as a design decision, you will have to accept that their caching throughput will not be as much as memcached or redis.
At work we are thinking to move from Oracle to a NoSQL database, so I have to make some test on Cassandra and MongoDB. I have to move a lot of tables to the NoSQL database the idea is to have the data synchronized between this two platforms.
So I create a simple procedure that make selects into the Oracle DB and insert into mongo. Some of my colleagues point that maybe there is an easier(and more professional) way to do it.
Anybody had this problem before? how do you solve it?
If your goal is to copy your existing structure from Oracle to a NoSQL database then you should probably reconsider your move in the first place. By doing that you are losing any of the benefits one sees from going to a non-relational data store.
A good first step would be to take a long look at your existing structure and determine how it can be modified to affect positive impact on your application. Additionally, consider a hybrid system at the same time. Cassandra is great for a lot of things, but if you need a relational system and already are using a lot of Oracle functionality, it likely makes sense for most of your database to stay in Oracle, while moving the pieces that require frequent writes and would benefit from a different structure to Mongo or Cassandra.
Once you've made the decisions about your structure, I would suggest writing scripts/programs/add a module to your existing app, to write the data in the new format to the new data store. That will give you the most fine-grained control over every step in the process, which in a large system-wide architectural change, I would want to have.
You can also consider using components of Hadoop ecosystem to perform this kind of (ETL) task .For that you need to model your Cassandra DB as per the requirements.
Steps could be to migrate your oracle table data to HDFS (using SQOOP preferably) and then writing Map-Reduce job to transform this data and insert into Cassandra Data Model .