My application has internationalization for all tables. So all tables has its another table for different languange support with key as language code like 'en-us'. Every time if it hits the db and to show in page then applications get slow. so We implemented by extending AbstractMessageSource class.I referred the link http://forum.springsource.org/showthread.php?t=15223 But based on this stored all the messages are stored in the memory. if table size/number of table grows this mesage hash also grows. then memory problem comes. So we have planned to keep it in disk using ehcache technique. Please provide me the sample. Let me know is this valid option to store the objects?
Change the Map entries in DataSourceMessageSource to:
/** Cache holding already generated MessageFormats per message code and Locale
* Map
/** all messages (for all basenames) per locale
* Map
That will get you going. You also need an ehcache.xml with cache entries for each of the above. You should speicfy overflowToDisk=true.
Note that you will incur a deserialization cost. If you are seeing a high cost in cpu doing that it might be worth restructuring the code to return what you want speficically rather than a map.
Greg Luck
Related
I want to perform data health check on huge volume of data, which can be either in RDBMS or cloud file storage like Amazon S3. Which tool would be appropriate for performing data health check, which can give me number of rows, rows not matching a given schema for data type validation, average volume for given time period etc?
I do not want to use any bigdata platform like Qubole or Databricks because of extra cost involved. I found Drools which can perform similar operations but it would need reading full data into memory and associate with a POJO before validation. Any alternatives would be appreciated where I do not have to load full data into memory.
You can avoid loading full data in memory by implementing the StatelessKieSession object of drools. StatelessKieSession works only on the current event it does not maintain the state of any event also does not keep objects in the memory. Read more about StatelessKieSession here.
Also, you can use Stateful KieSession and give an expiry to an event using the #expires declaration which expiries event after the specified time. Read more about #expires here.
I have a table with millions of rows (with 98% reads, maybe 1 - 2% writes) which has references to couple of other config tables (with maybe 20 entries each). What are the best practices for caching the tables in this case? I cannot cache the table with millions of rows. But at the same time, I also don't want to hit the DB for the config tables. Is there a work around for this? I'm using Spring boot, and the data is in postgres.
Thanks.
First of all, let me refer to this:
What are the best practices for caching the tables in this case
I don't think you should "cache tables" as you say. In the Application, you work with the data, and this is what should be cached. This means the object that you cache should be already in a structure that includes these relations. Of course, in order to fetch the whole object from the database, you can use JOINs, but when the object gets cached, it doesn't matter already, the translation from Relational model to the object model was done.
Now the question is too broad because the actual answer can vary on the technologies you use, nature of data, and so forth.
You should answer the following questions before you design the cache (the list is out my head, but hopefully you'll get the idea):
What is the cache invalidation strategy? You say, there are 2% writes, what happens if the data gets updated, the data in the cache may become stale. Is it ok?
A kind of generalization of the previous question: If you have multiple instances (JVMs) of the same application, and one of them triggered the update to the DB data, what should happen to other apps' caches?
How long the stale/invalid data can reside in the cache?
Do the use cases of your application access all the data from the tables with the same frequencies or some data is more "interesting" (for example, the oldest data is not read, but the latest data is always "hot")? Probably if its millions of data for configuration, the JVM doesn't have all these objects in the heap at the same time, so there should be some "slice" of this data...
What are the performance implications of having the cache? How does it affect the GC behavior?
What technologies can be used in your case (maybe due to some regulations/licensing, some technologies are just not available, this is more a case in large organizations)
Based on these observations you can go with:
In-memory cache:
Spring integrates with various in-memory cache technologies, you can also use them without spring at all, to name a few:
Google Guava cache (for older spring cache implementations)
Coffeine (for newer spring cache implementations)
In memory map of key / value
In memory but in another process:
Redis
Infinispan
Now, these caches are slower than those listed in the previous category but still can
be significantly faster than the DB.
Data Grids:
Hazelcast
Off heap memory-based caches (this means that you store the data off-heap, so its not eligible for garbage collection)
Postgres related solutions. For example, you can still go to db, but since you can opt for keeping the index in-memory the queries will be significantly faster.
Some ORM mapping specific caches (like hibernate has its cache as well).
Some kind of mix of all above.
Implement your own solution - well, this is something that probably you shouldn't do as the first attempt to address the issue, because caching can be tricky.
In the end, let me provide a link to some very interesting session given by Michael Plod about caching. I believe it will help you to find the solution that works for you best.
I have an instance of Laravel up and running with a load balancer in place. We've setup memcached (two server nodes) to handle session management. So far the site is running fine in our test environment. The site largely ties into a web based API, so we only store a few values (other than user authentication data) in a user's session to work with the site.
After a short amount of usage by one or two users, there are about 3000 items in the cache. I don't have full access to the nodes, so I don't know exactly what the items are. However we don't appear to be maxing out the nodes with memory and the application functionality is good.
Is this to be expected? I understand that the cache management will clear out old records over time as they expire, so these could just be "remnant" data records, but this is my first time working with memcached so I want to verify that this is normal behavior.
It's quite normal for any caching solution to rack up a number of items. Especially for lots of small objects it's often more efficient for a cache to keep them beyond their expiry (but no longer serve them) and then clear them out in a big sweep periodically.
"Remnant records" pretty much describes it.
As long as your application performs as expected, I wouldn't worry. You should worry when you get a lot of cache misses for objects that were supposed to be in cache but kicked out before expiry due to lack of memory to store them all.
Yes
It is normal to have lots of records in Memcache. But you need to have proper session management.
Store small amount of values per session. (Data which is required most of the API's, Like user access token)
Cache expiration
The biggest challenge when using Memcache is avoiding cache staleness while still writing clean code. Most developers store data to Memcache and delete or update data when it changes. This strategy can get messy very quickly – Memcache code becomes riddled throughout an application. Rails’ Sweepers can help with this problem, but other languages and frameworks don’t have similar alternatives.
One simple strategy to avoid code complexity is to write data to Memcache with an expiration. Data with an expiration will automatically expire when the expiration is reached. Most applications can benefit from time-based cache expiration with infrequently changing content such as static assets, headers, footers, blog posts, etc.
List management
A simple list stored in Memcache can be useful for maintaining denormalized relationships.
For example An e-commerce website may want to store a small table of recent purchases. Rather than keeping a serialized list in Memcache and recalculating it when a new purchase is made, append and prepend can be used to store denormalized data, avoiding a database query.
Note - Memcache only supports a max value size of 1 MB. Be careful creating lists that may grow larger in size than the maximum allowed value size
Also Check these links-
https://cloud.google.com/appengine/docs/adminconsole/memcache
http://docs.oracle.com/cd/E17952_01/refman-5.6-en/ha-memcached-faq.html
http://symas.com/mdb/memcache/
Anyone an idea?
The issue is: I am writing a high performance application. It has a SQL database which I use for persistence. In memory objects get updated, then the changes queued for a disc write (which is pretty much always an insert in a versioned table). The small time risk is given as accepted - in case of a crash, program code will resynclocal state with external systems.
Now, quite often I need to run lookups on certain values, and it would be nice to have standard interface. Basically a bag of objects, but with the ability to run queries efficiently against an in memory index. For example I have a table of "instruments" which all have a unique code, and I need to look up this code.... about 30.000 times per second as I get updates for every instrument.
Anyone an idea for a decent high performance library for this?
You should be able to use an in-memory SQLite database (:memory) with System.Data.SQLite.
I am working on a basic Struts based application that is experience major spikes in memory. We have a monitoring tool that will notice one request per user adding 3MB to the JVM heap memory. Are there any tips to encourage earlier garbage collection, free up memory or improve performance?
The application is a basic Struts application but there are a lot of rows in the JSP report, so there may be a lot of objects created. But it isn't stuff you haven't seen before.
Perform a set of database query.
Create an serialized POJO object bean. This represents a row.
Add a row to an array list.
Set the array list to the form object when the action is invoked.
The JSP logic will iterate through the list from the ActionForm and the data is displayed to the user.
Notes:
1. The form is in session scope and possibly that array list of data (maybe this is an issue).
2. The POJO bean contains 20 or so fields, a mix of String or BigDecimal data.
The report can have 300 to 1200 or so rows. So there are at least that many objects created.
Given the information you provide, I'd estimate that you're typically loading 1 to 2 megabytes of data for a result: 750 rows * 20 fields * 100 bytes per field = 1.4 Mb. Now consider all of the temporary objects needed between the database and the final markup. 3 Mb isn't surprising.
I'd only be concerned if that memory seems to have leaked; i.e., the next garbage collection of the young generation space doesn't collect all of those objects.
List item
When desiging reports to be rendered in web application, consider the number of records fetched from database.
If the number of records is high and the overall recordset is taking lot of memory, then consider using pagination of report.
As far as possible donot invoke garbage collector explicitly. This is so because of two reasons:
Garbage collection is costly process
as it scans whole of the memory.
Most of the production servers would
be tuned at JVM level to avoid
explicit garabage collection
I believe the problem is the arraylist in the ActionForm that needs to allocate a huge chunk of memory space. I would write the query results directly to the response: read the row from the resultset, write to response, read next row, write etc. Maybe it's not MVC but it would be better for your heap :-)
ActionForms are fine for CRUD operations, but for reports ... I don't think so.
Note: if the ActionForm has scope=session the instance will be alive (along with the huge arraylist) until session expires. If scope=request the instance will be available for the GC.