is there a way to store multiple type of objects in single cacheStore in apache ignite? - performance

I'm trying to do some in-memory calculations, I'm doing it by using Apache Ignite with CacheStore Implementations, where It insisting me to do cross cache joins, which are not working Efficiently.
so is there a way to do store multiple types of Objects in same cache store.

Ignite SQL performance does not depend on weather join is cross-cache or not. If you have performance issues, there should be another reason.
But in any case, you can store multiple data types in a single cache, there is no limitation. The way CacheStore is configured in this case depends of what implementation you use.

Maybe I'm missing the point but can't you just have your things implement the same base class?

Related

Cache only specific tables in Spring boot

I have a table with millions of rows (with 98% reads, maybe 1 - 2% writes) which has references to couple of other config tables (with maybe 20 entries each). What are the best practices for caching the tables in this case? I cannot cache the table with millions of rows. But at the same time, I also don't want to hit the DB for the config tables. Is there a work around for this? I'm using Spring boot, and the data is in postgres.
Thanks.
First of all, let me refer to this:
What are the best practices for caching the tables in this case
I don't think you should "cache tables" as you say. In the Application, you work with the data, and this is what should be cached. This means the object that you cache should be already in a structure that includes these relations. Of course, in order to fetch the whole object from the database, you can use JOINs, but when the object gets cached, it doesn't matter already, the translation from Relational model to the object model was done.
Now the question is too broad because the actual answer can vary on the technologies you use, nature of data, and so forth.
You should answer the following questions before you design the cache (the list is out my head, but hopefully you'll get the idea):
What is the cache invalidation strategy? You say, there are 2% writes, what happens if the data gets updated, the data in the cache may become stale. Is it ok?
A kind of generalization of the previous question: If you have multiple instances (JVMs) of the same application, and one of them triggered the update to the DB data, what should happen to other apps' caches?
How long the stale/invalid data can reside in the cache?
Do the use cases of your application access all the data from the tables with the same frequencies or some data is more "interesting" (for example, the oldest data is not read, but the latest data is always "hot")? Probably if its millions of data for configuration, the JVM doesn't have all these objects in the heap at the same time, so there should be some "slice" of this data...
What are the performance implications of having the cache? How does it affect the GC behavior?
What technologies can be used in your case (maybe due to some regulations/licensing, some technologies are just not available, this is more a case in large organizations)
Based on these observations you can go with:
In-memory cache:
Spring integrates with various in-memory cache technologies, you can also use them without spring at all, to name a few:
Google Guava cache (for older spring cache implementations)
Coffeine (for newer spring cache implementations)
In memory map of key / value
In memory but in another process:
Redis
Infinispan
Now, these caches are slower than those listed in the previous category but still can
be significantly faster than the DB.
Data Grids:
Hazelcast
Off heap memory-based caches (this means that you store the data off-heap, so its not eligible for garbage collection)
Postgres related solutions. For example, you can still go to db, but since you can opt for keeping the index in-memory the queries will be significantly faster.
Some ORM mapping specific caches (like hibernate has its cache as well).
Some kind of mix of all above.
Implement your own solution - well, this is something that probably you shouldn't do as the first attempt to address the issue, because caching can be tricky.
In the end, let me provide a link to some very interesting session given by Michael Plod about caching. I believe it will help you to find the solution that works for you best.

Which is better ORM (Apache Cayenne) , JDBC or SpringJDBC?

I am Working on multiple database like MSSQL server and PostgreSQL with heavy transactions and complex queries. I have searched that simple jdbc is more faster then ORM. I was thinking of using ORM because I do not want to write different query for different database for same work, and also for standardized my dao layer. I am mapping my database tables without using foreign keys and for ORM like apache cayenne I have to map tables with foreign key constraint, so I can use my Joins or any other multiple table operations. Is it good to use and ORM or simple jdbc is fine.
From your problem dscription, you already have an understanding of the tradeoffs involved. So this is really a decision that you need to make for yourself based on those tradeoffs.
My only advice here will be to take a second look at performance requirements. While ORM does introduce an overhead of creating, storing and managing objects, in all but a few cases, you can safely ignore this overhead for the sake of a better abstraction. Also when working with JDBC very often you end up writing your own code to convert ResultSet to objects, which will encounter its own overhead. So you may not end up with faster code, while forfeiting all the benefits of a clean object model and a framework that manages it.
So my own preference is to go with a better abstraction (ORM in this case), and then use the framework tools for optimizing the performance. E.g. to speed up processing of large ResultSets Cayenne provides a few techniques: result iterators, DataRow queries, paginated queries, etc.
On the other hand I would use JDBC or something like MyBatis when it is not possible to cleanly model your data as entities. E.g. when there are no natural relationships, all access happens via stored procedures, etc. Doesn't seem like your case though.

How to retrieve the data from database without using apache jackrabbit datastore?

I have integrated the jack rabbit with Oracle database and I am storing the
Data using Jackrabbit, if I don't want to retrieve the data using the
Jackrabbit, in what way I can get the data. In database data is storing in
blob type.
The way Jackrabbit stores the data in the DB is an implementation detail, and it does not magically map this into a "nice" DB schema if that's what you mean. (The hierarchical nature and all the JCR features make this impossible). It's a bit like having a Unix file system and then asking how can I read the low level inodes etc. from the file system implementation - you really should not.
Last but not least note that while it is running nothing else (except for a Jackrabbit cluster setup) must write to the DB (the tables used by Jackrabbit) as this will easily lead to data corruption.
As #TedTrippin already mentioned above, an ORM framework would make things much easier. But if you really want to do it manually in Oracle, the approach would be:
Study the code of the OCM http://jackrabbit.apache.org/jcr/object-content-mapping.html, then get the content according to the logic of associations and relations from Oracle, probably not in one but multiple queries per document; eventually with user-defined functions, which are supported in Oracle and might make things easier.
Would be interesting to know the background of your questions. You tagged it with "Spring" and "CMS". I don't see any reason why you would want to access the data directly from Oracle, it's tedious. In case you want to provide an API for the content to an external system, or in case you have lost a CMS that was once in front of and just using the Jackrabbit repo as a content store, you could still use such ORM / OCM framework standalone to make it easier to access the data.

Couchbase as a cache and cache invalidation

I'm thinking about using Couchbase as a cache layer. I'm aware of the many advantages provided by Couchbase, like the easy scalability. But what interests me more is the rich document model of couchbase, compared to the simple key-value one of memcached.
My RDBMS is SQL Server, and we use NHibernate. The queries and the database are already quite optimized and I think that caching is the best option for further scaling.
My project is to implement a simple relationnel model between entities (much simpler than the one in the RDBMS), to handle invalidation. When an entity is invalidated (removed from cache) by the application, all dependent entities could also be removed. The logic of defining the dependencies between entities would be handled at the application level by a dedicated component. There would be 10 or 12 different entities (I don't want to cache all my application domain).
My document model in Couchbase would look like this:
Key (the one generated by the application), keys' format depends on entity type
Hashed key (to have a uniform unique key accross all entities)
Entity
Dependencies - list of hashed keys of the entities that must be removed when main entity is removed
So my questions are:
On invalidation, we would need to resolve a graph of dependencies (asynchronously). Is it fast to look for specific keys with around 500k entities?
Any feedback on the general idea?
Maintaining the dependencies between entities can be quite simplified, and might not be such a big issue.
Pierre
I use Couchbase 2.2 in production as a persistent cache layer and really happy with it (running about 2M documents). My app getting really fast gets (1 millisecond). Your idea is valid and I don't see anything wrong with using Couchbase as a entity storage for invalidation. Its a mature and very stable product.
You are correct in your entity design. You can have a main json doc that has list of references to other child documents. So that before deleting main document you will delete all children first.
Also, not sure if its applicable in your case, you can take advantage of Couchbase ability to expire documents. When you insert key/value(json doc) you can specify TTL(time to live) if you know it upfront. This way you don't need to explicitly delete entities from Couchbase.
Delete operation itself is fast (you can run it as asynchronous operation) and having 500K documents in the Couchbase cluster it really small size. You should see under 1 millisecond get operations.
But consider having minimum 3 Couchbase nodes in one cluster, so that you can take one node down at any given point of time without compromising data stored in the cluster. See Sizing a Couchbase Server 2.0 cluster
Some additional resources:
10 things developers should know about Couchbase
Top 10 things an Ops / Sys admin must know about Couchbase
App Development with Documents, their Schemas and Relationships
Couchbase Models
Here are my thoughts:
On invalidation, we would need to resolve a graph of dependencies
(asynchronously). Is it fast to look for specific keys with around
500k entities?
Are you looking for keys in your RDBMS or in CB? If in CB, you will need to use a view/index; now, views are disk-based, but stored in sorted order so they are no slower than SQL indices. Accessing them in parallel will be faster than in series. It will be the slow point in your operation though if you use CB.
Continuing along with this thought, I have used CB successfully to store and navigate a hierarchical data structure with 500k+ nodes in it. CB performs well, but does take a few seconds to spit out the whole index if I need it (which I do if I need to do a mass-update operation).
Any feedback on the general idea?
The idea is sound. In fact, I'm seeing 10x the performance of SQL with hierarchical queries when I run them on my Couchbase cluster. I also found that a single couchbase instance outperforms multiple instances when doing an index lookup - I do not know why that is (the 2-instance cb index is 5x faster than my SQL setup). To speed things up further, you can parellelize the queries to the cb index.

mongodb: force in-memory

After using a myisam for years now with 3 indexes + around 500 columns for Mio of rows, I wonder how to "force" mongodb to store indexes in memory for fast-read performance.
In general, it is a simply structured table and all queries are WHERE index1=.. or index2=... or index3=.. (myisam) and pretty simple in mongodb as well.
It's nice if mongodb is managing the index and ram on its own.
However, I am not sure if it does and about the way mongodb can speed up these queries on indexs-only best.
Thanks
It's nice if mongodb is managing the index and ram on its own.
MongoDB does not manage the RAM at all. It uses Memory-Mapped files and basically "pretends" that everything is RAM all of the time.
Instead, the operating system is responsible for managing which objects are kept in RAM. Typically on a LRU basis.
You may want to check the sizes of your indexes. If you cannot keep all of those indexes in RAM, then MongoDB will likely perform poorly.
However, I am not sure if it does and about the way mongodb can speed up these queries on indexs-only best.
MongoDB can use Covered Indexes to retrieve directly from the DB. However, you have to be very specific about the fields returned. If you include fields that are not part of the index, then it will not return "index-only" queries.
The default behavior is to include all fields, so you will need to look at the specific queries and make the appropriate changes to allow "index-only". Note that these queries do not include the _id, which may cause issues down the line.
You don't need to "force" mongo to store indices in memory. An index is brought in memory when you use it and then stays in memory until the OS kicks it out.
MongoDB will will automatically use covered index when it can.

Resources