How to configure mondrian olap to use ehcache for perforamnce boost

How to configure mondrian olap to use ehcache for perforamnce boost - mondrian

I want to configure Mondrian OLAP to use EhCache. Has anyone tried to do this, please share your thoughts and how to do on the same.

You should take a look at Mondrian's SegmentCache SPI. It allows you to implement custom cache providers for cell data. There is an example available in the default in-memory implementation.

Also take a look at one of the CTools - CDC - Community Distributed Cache - lets you control the cache and so on..
http://forums.pentaho.com/showthread.php?95994-CDC-Community-Distributed-Cache

Related

Use redis cache with EF Core

guys.
I use EF Core with caching. The current cache realization needs objects to support the IMemoryCache interface.
But all distributed caches realized IDistributionCache and there is no direct way to say DbContext to use, for example, Redis Cache instead of an in-memory cache.
Are there any existing solutions for the Redis cache integration? Or I must realize my own proxy between IDistributionCache and IMemoryCache.
My current .NET version is the .NET 5.0 - preview 4.
Thank you.

I am looking into this also. You might want to look at
https://github.com/VahidN/EFCoreSecondLevelCacheInterceptor

Which cache system is better for mondrian SegmentCache

Which cache system is better for mondrian SegmentCache?
Memcached, Redis, Hazelcast or anything else?

Not so much because it's better, but because setting it up is simpler, the Community Distributed Cache uses Hazelcast to create a cluster of cache nodes which keep results in memory. From previous experience I think it works quite nicely and it's very easy to set up. You can find the CDC plugin in the Pentaho Marketplace or on github: https://github.com/webdetails/cdc

Hbase vs Cassandra: Which is better for a timeseries data storage?

I use my API logs to extract information like:
In this period of time how many are the users of my API ?
Or in this period of time, what type of services are called the most ?
Almost all the information I extract depend on the timestamp. Actually I use MongoDB and I added the time-stamp as an index(for 80GB, indexes size is 12GB).
A migration to cassandra or Hbase was recommended for me. And I want to know which is better for my use case:
Analysis for timeseries data.
Both good write and read performance are required.
Possibility of using hadoop to do my data analysis.
Thanks for sharing your point of view or your experience.

Advantages of Cassandra:
Cassandra generally shows better performance (though both are excellent).
Cassandra is substantially easier to setup and manage from an operational stand point (though there are tools that will help either way).
Advantages of HBase:
Native to the hadoop ecosystem
HBase will require you installing hadoop anyway, and you get a nice two-for-one. To use Cassandra you will probably need to go to use DataStax Enterprise, a commercial, non-open source product, OR investigate using Spark for your analytics work which has an open-source connector with Cassandra.

Chocolate or Vanilla ice cream - which is better?
I would suggest that you would be the best decision maker. Set up development environments for each option, and this will tell you much more about operational and tuning issues than, I think, anyone else might be able to give you. :)

map a rbms to a dfs

I'm trying to take xwiki (rbms) and map it to a cloud base (dfs) without using MySQL. Any ideas?

Officially, only RDBMSs known to Hibernate are supported, so there's no support for nosql databases yet. There is a GSoC project proposed for developing support for AppEngine, and one of the research projects that XWiki is participating in, Compatible One, also needs cloud storage for XWiki. So, official support for clouds is going to come pretty soon.

You may want to look up sqoop -- a tool to move data from a relational database to hbase and back again

hazelcast vs ehcache

Question is clear as you see in the title, it would be appreciated to hear your ideas about adv./disadv. differences between them.
UPDATE:
I have decided to use Hazelcast because of the advantages like distributed caching/locking mechanism as well as the extremely easy configuration while adapting it to your application.

We tried both of them for one of the largest online classifieds and e-commerce platform. We started with ehcache/terracotta(server array) cause it's well-known, backed by Terracotta and has bigger community support than hazelcast. When we get it on production environment(distributed,beyond one node cluster) things changed, our backend architecture became really expensive so we decided to give hazelcast a chance.
Hazelcast is dead simple, it does what it says and performs really well without any configuration overhead.
Our caching layer is on top of hazelcast for more than a year, we are quite pleased with it.

Even though Ehcache has been popular among Java systems, I find it less flexible than other caching solutions. I played around with Hazelcast and yes it did the job, it was easy to get running etc and it is newer than Ehcache. I can say that Ehcache has much more features than Hazelcast, is more mature, and has big support behind it.
There are several other good cache solutions as well, with all different properties and solutions such as good old Memcache, Membase (now CouchBase), Redis, AppFabric, even several NoSQL solutions which provides key value stores with or without persistence. They all have different characteristics in the sense they implement CAP theorem, or BASE theorem along with transactions.
You should care more about, which one have the functionality you want in your application, again, you should consider CAP theorem or BASE theorem for your application.
This test was done very recently with Cassandra on the cloud by Netflix. They reached to million writes per second with about 300 instances. Cassandra is not a memory cache but you data model is like a cache, which is consist of key value pairs. You can as well use Cassandra as a distributed memory cache.

Hazelcast has been a nightmare to scale and stability is still a major issue.
The dedicated client to grid component choices are
The messy version that cant survive node loss anywhere, negating the point of backups (superclient), or
An incredibly slow native client option that does not allow for any type of load balancing to processing nodes in the grid.
If any host could request records from this data grid it would be a sweet design, but you are stuck with those two lackluster option to get anything out of it.
Also multiple issues with database thread pools locking up on individual members and not writing anything to the databases, causing permanent records loss is a frequent issue and we often have to take the whole thing down for hours to refresh any of the JVM's. Split brain is also still an issue, although in 1.9.6 it seems to have calmed down a little.
Rallying to move to Ehcache and improving the database layer instead of using this as a band-aid.

Hazelcast serializes everything whenever there is a node (standard-one), so the data you will save to Hazelcast must implement serialization.
http://open.bekk.no/efficient-java-serialization/

Hazelcast has been a nightmare for me. I was able to get it "working" in a clustered Websphere environment. I use the term "working" loosely. First, all of Hazelcast's documentation is out of date and only shows examples using deprecated method calls. Trying to use the new code without comments in the Javadocs and no examples in the documentation is very hard. Also, the J2EE container code simply does not work at this point because it does not support XA transactions in Websphere. An error is thrown calling code that follows their only J2EE example explicitly(it does look like Milestone 3.0 is addressing this). I had to forget about joining Hazelcast to a J2EE transaction. It does seem Hazelcast is definitely geared to a non EJB/Non-J2EE container environment. Making calls to Hazelcast.getAllInstances() fails to retain any information about Hazelcast's state when switching from one enterprise java bean to another. That forces me to create a new Hazelcast instance just to run calls that give me access to my data. That causes many Hazelcast Instances to start up on the same JVM. Also,retrieving data from Hazelcast is not fast. I tried retrieving data using both the Native Client and directly as a member of the cluster. I stored 51 lists, each containing only 625 objects in Hazelcast. I could not perform a query directly on a list and did not want to store a map just to get access to that feature (SQL operations can be performed on a map). It took about a half second to retrieve each list of 625 objects because Hazelcast Serializes the entire list and sends it over the wire rather than just giving me the delta (what has changed). Another thing, I had to switch to a TCPIP configuration and explicitly list the ip addresses of the servers I wanted to be in the cluster. The default Multicast configuration did not work and from the group discussions in google, other people are experiencing that difficulty as well. To sum up; I did eventually get 8 machines communicating in a cluster through many hours of torturous programmatic configuration and trial and error (the documentation will be little help) but when I did, I still had no control over the number of instances and partitions being created on each JVM due to the half finished nature of Hazelcast for EJB/J2EE and it was VERY SLOW. I implemented a real use case in the unemployment insurance application I work on and the code was much faster making direct calls to the database. It would have been cool if Hazelcast worked as advertised because I really did not want to use a separate service to implement what I am trying to do. I have used MongoDB extensively so I may skip the whole in memory cache and just serialize my objects as documents in a separate repository.

One advantage of Ehcache is that it is backed by a company (Terracotta) that does extensive performance, failover, and platform testing in a large performance lab. Terracotta provides support, indemnity, etc. For many companies, that sort of thing is important.
I have not used Hazelcast but I've heard that it is easy to use and that it works. I haven't heard anything with respect to scalability or performance of Hazelcast vs Terracotta/Ehcache but given the amount of scalability and failover testing that Terracotta does, it's hard for me to imagine that Hazelcast would be competitive in a production deployment. But I presume it would work fine for smaller uses.
[Bias: I'm a former employee of Terracotta.]

Developers describe Ehcache as "Java's Most Widely-Used Cache". Ehcache is an open-source, standards-based cache for boosting performance, offloading your database, and simplifying scalability. It's the most widely-used Java-based cache because it's robust, proven, and full-featured. Ehcache scales from in-process, with one or more nodes, all the way to mixed in-process/out-of-process configurations with terabyte-sized caches. On the other hand, Hazelcast is detailed as "Clustering and highly scalable data distribution platform for Java". With its various distributed data structures, distributed caching capabilities, elastic nature, memcache support, integration with Spring and Hibernate and more importantly with so many happy users, Hazelcast is feature-rich, enterprise-ready and developer-friendly in-memory data grid solution.
Ehcache and Hazelcast are primarily classified as "Cache" and "In-Memory Databases" tools respectively.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio