jboss searchable cache - caching

I have jboss eap 5.1 and I am using struts 1.1 in it.Jdk is 1.6. Can it be used with the jboss searchable cache?How much memory can be stored in jboss searchable cache?I want to create google search type on type page load for my application.

I understand that you want to create a searchable page and in order to answer your question appropriately, it will be good to understand what is that you are looking to search?
Are you searching among the objects stored in memory or are you searching among the rows in a database or are your searching a file system?
The fact that you mentioned JBossSearchableCache makes me believe that you might be searching among the objects in memory.
If that is the case, you can only store as much memory as is allocated to your JVM. For example: If your JVM has 1GB allocated you'll have approximately 1GB minus what your runtime needs. If you need more memory then you should consider a Data Grid like Infinispan where you can store all the data in memory in a grid that's highly available and also give you MapReduce capabilities. You can also use Hibernate OGM to write SQL type of queries that span the entire grid.
Hope this helps.
Good luck!

Related

Cache only specific tables in Spring boot

I have a table with millions of rows (with 98% reads, maybe 1 - 2% writes) which has references to couple of other config tables (with maybe 20 entries each). What are the best practices for caching the tables in this case? I cannot cache the table with millions of rows. But at the same time, I also don't want to hit the DB for the config tables. Is there a work around for this? I'm using Spring boot, and the data is in postgres.
Thanks.
First of all, let me refer to this:
What are the best practices for caching the tables in this case
I don't think you should "cache tables" as you say. In the Application, you work with the data, and this is what should be cached. This means the object that you cache should be already in a structure that includes these relations. Of course, in order to fetch the whole object from the database, you can use JOINs, but when the object gets cached, it doesn't matter already, the translation from Relational model to the object model was done.
Now the question is too broad because the actual answer can vary on the technologies you use, nature of data, and so forth.
You should answer the following questions before you design the cache (the list is out my head, but hopefully you'll get the idea):
What is the cache invalidation strategy? You say, there are 2% writes, what happens if the data gets updated, the data in the cache may become stale. Is it ok?
A kind of generalization of the previous question: If you have multiple instances (JVMs) of the same application, and one of them triggered the update to the DB data, what should happen to other apps' caches?
How long the stale/invalid data can reside in the cache?
Do the use cases of your application access all the data from the tables with the same frequencies or some data is more "interesting" (for example, the oldest data is not read, but the latest data is always "hot")? Probably if its millions of data for configuration, the JVM doesn't have all these objects in the heap at the same time, so there should be some "slice" of this data...
What are the performance implications of having the cache? How does it affect the GC behavior?
What technologies can be used in your case (maybe due to some regulations/licensing, some technologies are just not available, this is more a case in large organizations)
Based on these observations you can go with:
In-memory cache:
Spring integrates with various in-memory cache technologies, you can also use them without spring at all, to name a few:
Google Guava cache (for older spring cache implementations)
Coffeine (for newer spring cache implementations)
In memory map of key / value
In memory but in another process:
Redis
Infinispan
Now, these caches are slower than those listed in the previous category but still can
be significantly faster than the DB.
Data Grids:
Hazelcast
Off heap memory-based caches (this means that you store the data off-heap, so its not eligible for garbage collection)
Postgres related solutions. For example, you can still go to db, but since you can opt for keeping the index in-memory the queries will be significantly faster.
Some ORM mapping specific caches (like hibernate has its cache as well).
Some kind of mix of all above.
Implement your own solution - well, this is something that probably you shouldn't do as the first attempt to address the issue, because caching can be tricky.
In the end, let me provide a link to some very interesting session given by Michael Plod about caching. I believe it will help you to find the solution that works for you best.

Difference between In-Memory cache and In-Memory Database

I was wondering if I could get an explanation between the differences between In-Memory cache(redis, memcached), In-Memory data grids (gemfire) and In-Memory database (VoltDB). I'm having a hard time distinguishing the key characteristics between the 3.
Cache - By definition means it is stored in memory. Any data stored in memory (RAM) for faster access is called cache. Examples: Ehcache, Memcache Typically you put an object in cache with String as Key and access the cache using the Key. It is very straight forward. It depends on the application when to access the cahce vs database and no complex processing happens in the Cache. If the cache spans multiple machines, then it is called distributed cache. For example, Netflix uses EVCAche which is built on top of Memcache to store the users movie recommendations that you see on the home screen.
In Memory Database - It has all the features of a Cache plus come processing/querying capabilities. Redis falls under this category. Redis supports multiple data structures and you can query the data in the Redis ( examples like get last 10 accessed items, get the most used item etc). It can span multiple machine and is usually very high performant and also support persistence to disk if needed. For example, Twitter uses Redis database to store the timeline information.
I don't know about gemfire and VoltDB, but even memcached and redis are very different. Memcached is really simple caching, a place to store variables in a very uncomplex fashion, and then retrieve them so you don't have to go to a file or database lookup every time you need that data. The types of variable are very simple. Redis on the other hand is actually an in memory database, with a very interesting selection of data types. It has a wonderful data type for doing sorted lists, which works great for applications such as leader boards. You add your new record to the data, and it gets sorted automagically.
So I wouldn't get too hung up on the categories. You really need to examine each tool differently to see what it can do for you, and the application you're building. It's kind of like trying to draw comparisons on nosql databases - they are all very different, and do different things well.
I would add that things in the "database" category tend to have more features to protect and replicate your data than a simple "cache". Cache is temporary (usually) where as database data should be persistent. Many cache solutions I've seen do not persist to disk, so if you lost power to your whole cluster, you'd lose everything in cache.
But there are some cache solutions that have persistence and replication features too, so the line is blurry.
An in-memory Cache is a common query store therefore relieves DB of read Workloads. Common examples of in-memory cache are Redis cache. An example could be Web site storing popular searches made by clients thereby relieving the DB of some load.
In-memory Cache provides query functionality on top of caching (storing session data in RAM (temporary storage)).
Memcache falls in the temp store caching category.

Hibernate Search RAMProvider - not sure how it works

I'm using spring boot and for easier setup(no user right manipulation) I decided to use RAM provider instead of FS. Can anyone confirm my way of thoughts.
Whenever I'm restarting I'm loosing the index.
Anytime something goes through Hibernate it will be auto-indexed as there are #Indexed annotations on proper entities.
In case of restart I need to rebuild the index as it is lost using
try {
FullTextEntityManager fullTextEntityManager =
Search.getFullTextEntityManager(entityManager);
fullTextEntityManager.createIndexer().startAndWait();
} catch (InterruptedException e) {
System.out.println(
"An error occurred trying to build the search index: " +
e.toString());
}
In case that I will use FSDirectoryProvider, index will be automatically reloaded from FS and above code is not necessary anymore. Unless, there is change in ORM entities. I guess then I will need somehow to manually force the re-indexing.
Is there some DBDirectory implementation that one can depend on? In this case the index file is loaded to RAM or each update to index is written to DB separately?
All of your assumptions are nearly correct.
An in-memory index is lost under two circumstances:
JVM shutdown
index gets reopened while application is still running
You need to reindex your entities only if:
you changed the way how your entities are analyzed or tokenized during indexing or searching
you added or removed entity properties from or to the index
you changed relations between entities that affects your index
At the time of writting there is no database based directory. In the past I tried to adapt Compass JdbcDirectory. Unfortunately, I never had the time to go further than with a working proof of concept.
There is an open issues since 2011 in the project tracker. It seems that in near future there won't be official support for a database driven directory in Hibernate Search.
Keep in mind that an in-memory index is only sufficient for small data:
Warning: This class is not intended to work with huge indexes.
Everything beyond several hundred megabytes will waste resources (GC
cycles), because it uses an internal buffer size of 1024 bytes,
producing millions of byte[1024] arrays. This class is optimized for
small memory-resident indexes. It also has bad concurrency on
multithreaded environments.
You can use the Infinispan Directory as an alternative to keep stuff in memory but have a replica on durable storage.
The Infinispan project provides both a
an Apache Lucene Directory implementation
an Hibernate Search DirectoryProvider
A pointer to the source code
Infinispan is meant to aggressively cache data in memory, but has several options to offload such data to permanent storage by enabling a CacheStore in its configuration.
Among the many CacheStore implementations, you might be interested in:
the FSCacheStore which stores stuff in filesystem
the JDBC based CacheStore is often a good combo for Hibernate.
There are many more alternatives, like connecting to cloud storage, popular NoSQL databases, etc.. Infinispan also supports real-time replication across nodes, so your options for index storage in Hibernate Search are pretty much limitless.

Torquebox Infinispan Cache - Too many open files

I looked around and apparently Infinispan has a limit on the amount of keys you can store when persisting data to the FileStore. I get the "too many open files" exception.
I love the idea of torquebox and was anxious to slim down the stack and just use Infinispan instead of Redis. I have an app that needs to cache allot of data. The queries are computationally expensive and need to be re-computed daily (phone and other productivity metrics by agent in a call center).
I don't run a cluster though I understand the cache would persist if I had at least one app running. I would rather like to persist the cache. Has anybody run into this issue and have a work around?
Yes, Infinispan's FileCacheStore used to have an issue with opening too many files. The new SingleFileStore in 5.3.x solves that problem, but it looks like Torquebox still uses Infinispan 5.1.x (https://github.com/torquebox/torquebox/blob/master/pom.xml#L277).
I am also using infinispan cache in a live application.
Basically we are storing database queries and its result in cache for tables which are not up-datable and smaller in data size.
There are two approaches to design it:
Use queries as key and its data as value
It leads to too many entries in cache when so many different queries are placed into it.
Use xyz as key and Map as value (Map contains the queries as key and its data as value)
It leads to single entry in cache whenever data is needed from this cache (I call it query cache) retrieve Map first by using key xyz then find the query in Map itself.
We are using second approach.

mongodb: force in-memory

After using a myisam for years now with 3 indexes + around 500 columns for Mio of rows, I wonder how to "force" mongodb to store indexes in memory for fast-read performance.
In general, it is a simply structured table and all queries are WHERE index1=.. or index2=... or index3=.. (myisam) and pretty simple in mongodb as well.
It's nice if mongodb is managing the index and ram on its own.
However, I am not sure if it does and about the way mongodb can speed up these queries on indexs-only best.
Thanks
It's nice if mongodb is managing the index and ram on its own.
MongoDB does not manage the RAM at all. It uses Memory-Mapped files and basically "pretends" that everything is RAM all of the time.
Instead, the operating system is responsible for managing which objects are kept in RAM. Typically on a LRU basis.
You may want to check the sizes of your indexes. If you cannot keep all of those indexes in RAM, then MongoDB will likely perform poorly.
However, I am not sure if it does and about the way mongodb can speed up these queries on indexs-only best.
MongoDB can use Covered Indexes to retrieve directly from the DB. However, you have to be very specific about the fields returned. If you include fields that are not part of the index, then it will not return "index-only" queries.
The default behavior is to include all fields, so you will need to look at the specific queries and make the appropriate changes to allow "index-only". Note that these queries do not include the _id, which may cause issues down the line.
You don't need to "force" mongo to store indices in memory. An index is brought in memory when you use it and then stays in memory until the OS kicks it out.
MongoDB will will automatically use covered index when it can.

Resources