Need to load the whole postgreSQL database into the RAM - performance

How do I put my whole PostgreSql database into the RAM for a faster access?? I have 8GB memory and I want to dedicate 2 GB for the DB. I have read about the shared buffers settings but it just caches the most accessed fragment of the database. I needed a solution where the whole DB is put into the RAM and any read would happen from the RAM DB and any write operation would first write into the RAM DB and then the DB on the hard drive.(some thing like the default fsync = on with shared buffers in postgresql configuration settings).

I have asked myself the same question for a while. One of the disadvantages of PostgreSQL is that it does not seem to support an IN MEMORY storage engines as MySQL does...
Anyway I ran in to an article couple of weeks ago describing how this could be done; although it only seems to work on Linux. I really can't vouch for it for I have not tried it myself, but it does seem to make sense since a PostgreSQL tablespace is indeed assigned a mounted repository.
However, even with this approach, I am not sure you could put your index(s) into RAM as well; I do not think MySQL forces HASH index use with its IN MEMORY table for nothing...
I also wanted to do a similar thing to improve performance for I am also working with huge data sets. I am using python; they have dictionary data types which are basically hash tables in the form of {key: value} pairs. Using these is very efficient and effective. Basically, to get my PostgreSQL table into RAM, I load it into such a python dictionary, work with it, and persist it into db once in a while; its worth it if it is used well.
If you are not using python, I am pretty sure their is a similar dictionary-mapping data structure in your language.
Hope this helps!

if you are pulling data by id, use memcached - http://www.danga.com/memcached/ + postgresql.

Set up an old-fashioned RAMdisk and tell pg to store its data there.
Be sure you back it up well though.

Perhaps something like a Tangosol Coherence cache if you're using Java.

With only an 8GB database, if you've already optimized all the SQL activity and you're ready solve query problems with hardware, I suggest you're in trouble. This is just not a scalable solution in the long term. Are you sure there is nothing you can do to make substantial differences on the software and database design side?

I haven't tried this myself (yet) but:
There is a standard docker image available for postgres - https://hub.docker.com/_/postgres/
docker supports tmpfs mounts that are entirely in-memory https://docs.docker.com/storage/tmpfs/
Theoretically, it should be possible to combine the two.
If you do this, you might also want to tweak seq_page_cost and random_page_cost to reflect the relative storage costs. See https://www.postgresql.org/docs/current/runtime-config-query.html
The pre-existing advice for query optimization and increasing shared_buffers still stands though. The chances are that if you're having these problems on a database this small simply putting it into RAM probably isn't the right fix.

One solution is to use Fujistu version of PostGreSQL that supports in memory columnstore indexes...
https://www.postgresql.fastware.com/in-memory-columnar-index-brochure
But it cost a lot....
Or run MS SQL Server with the In Memory tables features.... Even the free version express has it !

Related

How to run statistical analyses on Oracle server using SAS

In order to take advantage of an Oracle server's vastly larger disk space and RAM, is it possible to run a SAS procedure (eg, proc glimmix or proc nlmixed) on a dataset stored on a server using the ODBC interface?
Or am I limited to extracting datasets to my PC via ODBC and not actually manipulating or analyzing the data with SAS while the data resides on the server?
At the end of the day, some of the work will have to be done by SAS on your PC, assuming you're doing anything complicated (like GLIMMIX would be). SAS (in particular 9.3 or newer) is pretty smart about making the database do as much work as possible; for example, even some PROC MEANS may execute fully on the database side.
However, this is only true to the extent that the procedure can be translated into database functionality without extraordinary measures. SAS isn't likely to perform a regression on the database side, since that's not native Oracle. The data has to make its way across the (likely limited) bandwidth, to some extent.
You can certainly do a lot to limit what you have to do in SAS. Any presummarization can be done in Oracle; any other data prep work prior to the actual PROC GLIMMIX can likely be done in Oracle. You can certainly give it a shot by simply using libname connections and doing something like
proc glimmix data=oracle.table ... options ... ;
run;
and seeing what happens - maybe it'll surprise you, or even me, in how much it handles in-database. It might bring it over locally, it might not.
You may want to consider asking a question with a simplified version of what you're doing, including example data, and simply asking if anyone has any ideas for improving performance. There's a lot of tweaking that can be done, and perhaps some of us here can help.

mongodb: force in-memory

After using a myisam for years now with 3 indexes + around 500 columns for Mio of rows, I wonder how to "force" mongodb to store indexes in memory for fast-read performance.
In general, it is a simply structured table and all queries are WHERE index1=.. or index2=... or index3=.. (myisam) and pretty simple in mongodb as well.
It's nice if mongodb is managing the index and ram on its own.
However, I am not sure if it does and about the way mongodb can speed up these queries on indexs-only best.
Thanks
It's nice if mongodb is managing the index and ram on its own.
MongoDB does not manage the RAM at all. It uses Memory-Mapped files and basically "pretends" that everything is RAM all of the time.
Instead, the operating system is responsible for managing which objects are kept in RAM. Typically on a LRU basis.
You may want to check the sizes of your indexes. If you cannot keep all of those indexes in RAM, then MongoDB will likely perform poorly.
However, I am not sure if it does and about the way mongodb can speed up these queries on indexs-only best.
MongoDB can use Covered Indexes to retrieve directly from the DB. However, you have to be very specific about the fields returned. If you include fields that are not part of the index, then it will not return "index-only" queries.
The default behavior is to include all fields, so you will need to look at the specific queries and make the appropriate changes to allow "index-only". Note that these queries do not include the _id, which may cause issues down the line.
You don't need to "force" mongo to store indices in memory. An index is brought in memory when you use it and then stays in memory until the OS kicks it out.
MongoDB will will automatically use covered index when it can.

Postgres Hstore vs. Redis - performance wise

I read about HStores in Postgres something that is offered by Redis as well.
Our application is written in NodeJS. Two questions:
Performance-wise, is Postgres HStore comparable to Redis?
for session storage, what would you recommend--Redis, or Postgres with some other kind of data type (like HStore, or maybe even the usual relational table)? And how bad is one option vs the other?
Another constraint, is that we will need to use the data that is already in PostgreSQL and combine it with the active sessions (which we aren't sure where to store at this point, if in Redis or PostgreSQL).
From what we have read, we have been pointed out to use Redis as a Session manager, but due to the PostgreSQL constraint, we are not sure how to combine both and the possible performance issues that may arise.
Thanks!
Redis will be faster than Postgres because Pg offers reliability guarantees on your data (when the transaction is committed, it is guaranteed to be on disk), whereas Redis has a concept of writing to disk when it feels like it, so shouldn't be used for critical data.
Redis seems like a good option for your session data, or heck even store in a cookie or in your client side Javascript. But if you need data from your database on every request then it might not be even worth involving Redis. It very much depends on your application.
Using PostgreSQL as session manager is usually bad idea.
For older than 9.1 was physical limit of transaction per second based on persistent media parameters. For session management you usually don't need MGA (because there are not collision) and it means so MGA is overhead and databases without MGA and ACID must be significantly faster (10 or 100).
I know a use case, where PostgreSQL was used for session management and Performance was really terrible and unstable - it was eshop with about 10000 living sessions. When session management was moved to memcached, then performance and stability was significantly increased. PostgreSQL can be used for 100 living session without problem probably. For higher numbers there are better tools.

Need: In memory object database, transactional safety, indices, LINQ, no persistence

Anyone an idea?
The issue is: I am writing a high performance application. It has a SQL database which I use for persistence. In memory objects get updated, then the changes queued for a disc write (which is pretty much always an insert in a versioned table). The small time risk is given as accepted - in case of a crash, program code will resynclocal state with external systems.
Now, quite often I need to run lookups on certain values, and it would be nice to have standard interface. Basically a bag of objects, but with the ability to run queries efficiently against an in memory index. For example I have a table of "instruments" which all have a unique code, and I need to look up this code.... about 30.000 times per second as I get updates for every instrument.
Anyone an idea for a decent high performance library for this?
You should be able to use an in-memory SQLite database (:memory) with System.Data.SQLite.

Database with low memory requirements and Ruby interface

I need a database with low memory requirements for a small virtual server with few memory. At the moment I'm stuck with SQLite and Kyoto Cabinet or Tokyo Cabinet. The database should have a Ruby interface.
Ideally I want to avoid key-value-stores, because I have “complex” queries (more complex than looking up a single key) and tuples as keys. On the other hand I don't want to have a fixed schema and avoid the planning and migration efforts of a SQL database. A database server is also not necessary because only a single application will use the database.
Do you have any recommendations and numbers for me?
There is schema-less Postgresql (Postgresql 9.2 + json). Not as hard/confusing to set up as I thought. You get lots of flexibility with queries while still getting the benefits of schema-less storage. PG 9.2 includes plv8js, a new language handler that allows you to create functions in JavaScript. Here is one example of how you can index and query JSON docs in PG 9.2: http://people.planetpostgresql.org/andrew/index.php?/archives/249-Using-PLV8-to-index-JSON.html
CouchDB (Use BigCouch. Based on CouchDB, but fewer bugs/problems.):
very low memory requirements.
schema-less.
HTTP-based interface. Ruby has plenty of HTTP clients. HTTP caching (like Varnish) can also speed reads.
creative/complex queries. You can create indexes and queries on any key in the document (record). You can get very creative with queries since the indexes are very programmable.
Downsides:
Learning curve of setting up your queries/indexes.
You have to schedule a type of cleanup operation called "compaction".
Data will take up more space compared to other databases.
More: http://www.paperplanes.de/2010/7/26/10_annoying_things_about_couchdb.html
If disk is cheap and memory expensive, it would make a good candidate for your needs.
"...another strength of CouchDB, which has proven to serve thousands of concurrent requests only needed about 10MB of RAM - how awesome is that?!?!" (From: http://www.larsgeorge.com/2009/03/hbase-vs-couchdb-in-berlin.html )
SQLite3 is a great fit for what you are trying to do. It's used by a lot of companies as their embedded app database because it's flexible, fast, well tested, and has a small footprint. It's easy to create and blow away tables so it plays well with testing or single-application-use data stores.
The SQL language it uses is rich enough to do normal things but I'd recommend using Sequel with it. It's a great ORM and easily lets you treat it as a full-blown ORM, or drop all the way down to talking raw SQL to the DBM.
You are looking for a solution that only has a database file and no running server, probably. In that case, Sqlite should be a good choice - If you don't need it, just close the connection and that's it. Sqlite has everything that you need from and RDMS (expect for enforcing FK's directly, but that can be done with triggers), with a very little memory footprint, so in that case you are probably worried more about the memory your ORM (if any) uses.
Personally, I use sqlite for that use case as well, as it is portable and easy to access and install (which shouldn't be the problem on a server anyway, but in a desktop application it is).
BerkeleyDB with SQLite API is what you need.
http://www.oracle.com/technetwork/database/berkeleydb/overview/sql-160887.html

Resources