Amazon RDS (PostgreSQL): Optimize memory usage

Amazon RDS (PostgreSQL): Optimize memory usage - performance

There is something i did not really understand with Amazon RDS (the PostrgreSQL version). Some queries takes a lot of time to show their results. I have set all relevant indexes (as shown with EXPLAIN). So I think it's not due to my schema design.
I do not use a big machine (m3.xlarge) as bigger ones are too much expensive. My database size is about 300GB.
It seems that Postgres does not use all the available memory (only ~5GB, the "Freeable memory" report of the console shows that there are always ~10GB freeable...). I try to tune my "parameter group" as proposed by tune-your-postgres-rds-instance, especially set EFFECTIVE_CACHE_SIZE to 70%. But it does not change anything.
I'm probably wrong somewhere... Any idea ?

To make more memory available to your queries you would tune your work_mem.
There are implications to doing that since that's memory per backend.
effective_cache_size actually doesn't deal with memory at all. It's an optimizer parameter.
"Freeable memory" is a good thing - it means that the memory is currently used (most likely) by postgres in the operating system cache.
You can increase your shared_buffers to allow postgres to use more of it's own memory for caching, but there are limits to it's effectiveness that mean you don't usually want to use more than 25% of available memory to this.

Related

JVM memory tuning for eXist

Suppose you had a server with 24G RAM at your disposal, how much memory would you allocate to (Tomcat to run) eXist?
I'm setting up our new webserver, with an Intel Xeon E5649 (2.53GHz) processor, running Ubuntu 12.04 64-bit. eXist is running as a webapp inside Tomcat, and the db is only used for querying 'stable' collections --that is, no updates are being executed to the resources inside eXist.
I've been experimenting with different heap sizes (via -Xms and -Xmx settings when starting the Tomcat process), and so far haven't noticed much difference in response time for queries against eXist. In other words, it doesn't seem to matter much whether the JVM is allocated 4G or 16G. I have also upped the #cachesize and #collectionCache in eXist's WEB-INF/conf.xml file to e.g. 8192M, but this doesn't seem to have much effect. I suppose these settings /do/ have an influence when eXist is running inside Tomcat?
I know each situation is different (and I know there's a Tomcat server involved), but are there some rules of thumb for eXist performance w.r.t. the memory it is allocated? I'd like to get at a sensible memory configuration for a setup with a larger amount of RAM available.

This question was asked and answered on the exist-open mailing list. The answer from wolfgang#exist-db.org was:
Giving more memory to eXist will not necessarily improve response times. "Bad"
queries may consume lots of RAM, but the better your queries are optimized, the
less RAM they need: most of the heavy processing will be done using index
lookups and the optimizer will try to reduce the size of the node sets to be
passed around. Caching memory thus has to be large enough to hold the most
relevant index pages. If this is already the case, increasing the caching space
will not improve performance anymore. On the other hand, a too small cacheSize
of collectionCache will result in a recognizable bottleneck. For example, a
batch upload of resources or creating a backup can take several hours (instead
of e.g. minutes) if #collectionCache is too small.
If most of your queries are optimized to use indexes, 8gb RAM for eXist does
usually give you enough room to handle the occasional high load. Ideally you
could run some load tests to see what the maximum memory use actually is. For
#cacheSize, I rarely have to go beyond 512m. The setting for #collectionCache
depends on the number of collections and documents in the database. If you have
tens or hundreds of thousands of collections, you may have to increase it up to
768m or more. As I said above, you will recognize a sudden breakdown in
performance during uploads or backups if the collectionCache becomes too small.
So to summarize, a reasonable setting for me would be: -Xmx8192m,
#cacheSize="512m", #collectionCache="768m". If you can afford giving 16G main
memory it certainly won’t hurt. Also, if you are using the lucene index or the
new range index, you should consider increasing the #buffer setting in the
corresponding index module configurations in conf.xml as well:
<module id="lucene-index" buffer="256" class="org.exist.indexing.lucene.LuceneIndex" />
<module id="range-index" buffer="256" class="org.exist.indexing.range.RangeIndex"/>

Virtuoso System Requirements

We would be using Virtuoso for storing RDFs, the triple count will be 100 million to start with. I need to know what should be typical RAM, CPU, Disk etc for this. Querying will be with SPARQL and there will be a bit complex queries.
Kindly provide your inputs.

The average size of a Virtuoso version 6.x triple (quad) is about 30bytes thus for 100 million triples you would need about 3GB RAM , this being the most critical component to enable the database working set to fit in memory , data does not need to be loaded from disk once the database is "warmed up", for best performance. This would be especially the case when running complex queries. In terms of disk, the fast they are the quicker the databaase can be loaded into memory, checkpoints performed etc. thus SSDs or similar devices are recommended where possible, espcially if memory is limited and reading data from disk at times in unavoidable. In terms of processor standard commodity 64bit processor available today would suffice, typically running on a Linux x86_64 system of your choice, as said memory is always the most critical component though.
See the following Virtuoso FAQ and peformance tuning documents for more details:
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtRDFPerformanceTuning
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/#FAQ

Cache size on heroku postgres smaller than advertised?

I fired up a Zilla instance of heroku postgres which is advertised as having 17GB of memory cache.
When I run show all; I see:
effective_cache_size | 12240000kB
Does this mean the cache is 12GB and not 17GB? Or am I missing something? Queries run much slower when my dataset goes above the 12GB point.

There is a limit on the available memory on the underlying hardware (17G for a zilla). This amount of memory cannot be used entirely for the "hot dataset" cache, however. Many other aspects of normal postgres operations also require memory, as you may imagine. Some of that includes establishing a connection (which spawns a backend), queries requiring joins, queries requiring sorts, or aggregates like count, sum, max, etc. Additionally, processes such as autovacuum also use part of that available memory.

What happens when mongodb is out of memory?

For example i have db with 20 GB of data and only 2 GB ram,swap is off. Will i be able to find and insert data? How bad perfomance would be?

it's best to google this, but many sources say that when your working set outgrows your RAM size the performance will drop significantly.
Sharding might be an interesting option, rather than adding more RAM..
http://www.mongodb.org/display/DOCS/Checking+Server+Memory+Usage
http://highscalability.com/blog/2011/9/13/must-see-5-steps-to-scaling-mongodb-or-any-db-in-8-minutes.html
http://blog.boxedice.com/2010/12/13/mongodb-monitoring-keep-in-it-ram/
http://groups.google.com/group/mongodb-user/browse_thread/thread/37f80ff39258e6f4
Can MongoDB work when size of database larger then RAM?
What does it mean to fit "working set" into RAM for MongoDB?
You might also want to read-up on the 4square outage last year:
http://highscalability.com/blog/2010/10/15/troubles-with-sharding-what-can-we-learn-from-the-foursquare.html
http://groups.google.com/group/mongodb-user/browse_thread/thread/528a94f287e9d77e
http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/
side-note:
you said "swap is off" ... ? why? You should always have a sufficient swap space on a UNIX system! Swap-size = 1...2-times RAM size is a good idea. Using a fast partition is a good idea. Really bad things happen if your UNIX system runs out of RAM and doesn't have Swap .. processes just die inexplicably.. that is a bad very thing! especially in production. Disk is cheap! add a generous swap partition! :-)

It really depends on the size of your working set.
MongoDB can handle a very large database and still be very fast if your working set is less than your RAM size.
The working set is the set of documents you are working on a time and indexes.
Here is a link which might help you understand this : http://www.colinhowe.co.uk/2011/02/23/mongodb-performance-for-data-bigger-than-memor/

How much memory should a caching system use on Windows?

I'm developing a client/server application where the server holds large pieces of data such as big images or video files which are requested by the client and I need to create an in-memory client caching system to hold a few of those large data to speed up the process. Just to be clear, each individual image or video is not that big but the overall size of all of them can be really big.
But I'm faced with the "how much data should I cache" problem and was wondering if there are some kind of golden rules on Windows about what strategy I should adopt. The caching is done on the client, I do not need caching on the server.
Should I stay under x% of global memory usage at all time ? And how much would that be ? What will happen if another program is launched and takes up a lot of memory, should I empty the cache ?
Should I request how much free memory is available prior to caching and use a fixed percentage of that memory for my needs ?
I hope I do not have to go there but should I ask the user how much memory he is willing to allocate to my application ? If so, how can I calculate the default value for that property and for those who will never use that setting ?

Rather than create your own caching algorithms why don't you write the data to a file with the FILE_ATTRIBUTE_TEMPORARY attribute and make use of the client machine's own cache.
Although this approach appears to imply that you use a file, if there is memory available in the system then the file will never leave the cache and will remain in memory the whole time.
Some advantages:
You don't need to write any code.
The system cache takes account of all the other processes running. It would not be practical for you to take that on yourself.
On 64 bit Windows the system can use all the memory available to it for the cache. In a 32 bit Delphi process you are limited to the 32 bit address space.
Even if your cache is full and your files to get flushed to disk, local disk access is much faster than querying the database and then transmitting the files over the network.

It depends on what other software runs on the server. I would make it possible to configure it manually at first. Develop a system that can use a specific amount of memory. If you can, build it so that you can change that value while it is running.
If you got those possibilities, you can try some tweaking to see what works best. I don't know any golden rules, but I'd figure you should be able to set a percentage of total memory or total available memory with a specific minimum amount of memory to be free for the system at all times. If you save a miminum of say 500 MB for the server OS, you can use the rest, or 90% of the rest for your cache. But those numbers depend on the version of the OS and the other applications running on the server.
I think it's best to make the numbers configurable from the outside and create a management tool that lets you set the values manually first. Then, if you found out what works best, you can deduct formulas to calculate those values, and integrate them in your management tool. This tool should not be an integral part of the cache program itself (which will probably be a service without GUI anyway).

Questions:
One image can be requested by multiple clients? Or, one image can be requested by multiple times in a short interval?
How short is the interval?
The speed of the network is really high? Higher than the speed of the hard drive?? If you have a normal network, then the harddrive will be able to read the files from disk and deliver them over network in real time. Especially that Windows is already doing some good caching so the most recent files are already in cache.
The main purpose of the computer that is running the server app is to run the server? Or is just a normal computer used also for other tasks? In other words is it a dedicated server or a normal workstation/desktop?
but should I ask the user how much
memory he is willing to allocate to my
application ?
I would definitively go there!!!
If the user thinks that the server application is not a important application it will probably give it low priority (low cache). Else, it it thinks it is the most important running app, it will allow the app to allocate all RAM it needs in detriment of other less important applications.
Just deliver the application with that setting set by default to a acceptable value (which will be something like x% of the total amount of RAM). I will use like 70% of total RAM if the main purpose of the computer to hold this server application and about 40-50% if its purpose is 'general use' computer.

A server application usually needs resources set aside for its own use by its administrator. I would not care about others application behaviour, I would care about being a "polite" application, thereby it should allow memory cache size and so on to be configurable by the administator, which is the only one who knows how to configure his systems properly (usually...)
Defaults values should anyway take into consideration how much memory is available overall, especially on 32 bit systems with less than 4GB of memory (as long as Delphi delivers only 32 bit apps), to leave something free to the operating systems and avoids too frequent swapping. Asking the user to select it at setup is also advisable.
If the application is the only one running on a server, a value between 40 to 75% of available memory could be ok (depending on how much memory is needed beyond the cache), but again, ask the user because it's almost impossible to know what other applications running may need. You can also have a min cache size and a max cache size, start by allocating the lower value, and then grow it when and if needed, and shrink it if necessary.
On a 32 bit system this is a kind of memory usage that could benefit from using PAE/AWE to access more than 3GB of memory.
Update: you can also perform a monitoring of cache hits/misses and calculate which cache size would fit the user needs best (it could be too small but too large as well), and the advise the user about that.

To be honest, the questions you ask would not be my main concern. I would be more concerned with how effective my cache would be. If your files are really that big, how many can you hold in the cache? And if your client server app has many users, what are the chances that your cache will actually cache something someone else will use?
It might be worth doing an analysis before you burn too much time on the fine details.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio