HBase slower in RAMdisk - hadoop

I have a general question about using Apache HBase with a RAMdisk.
There is a big collection of data in a single table, about 25GB in total.
With this data I am doing some basic aggregations, using a Java program.
As I have enough RAM avaiable I tried to put this data set into a RAMdisk using tmpfs:
mount -t tmpfs -o size=40G none /home/user/ramdisk
Then I stopped HBase, copied the content of the data folder into the RAMdisk.
Finally I created a symbolic link, linking the old data directory to the new one and started HBase again.
It works, but when I process the aggregations now, It became slightly slower than before.
I could image of not having that much impact of using a RAMdisk, if HBase compresses the data (Snappy-compression is activated) and so on... but I can't guess why a faster medium would lead to a slower access of the data. There is enough available RAM left such that this cannot be the bottleneck.
Maybe someone has a general idea or insight about this?

I think it's going to be one of two things:
A: Do you really have more than 40G of free ram before allocating the disk ? I'm impressed & all if you actually had that much free, but seeing ram free afterwards isn't an indicator that you didn't just use a big chunk of swap.
B: compression (even something fast like snappy) is going to hurt performance... particularly for something like a database engine that has a lot of wacky optimization in it. You're right that a ramdisk should be ludicrously faster, but it having to jump all over your database queries, and then having to jump all over the compressed image to decompress chunks, has to have a pretty big overhead.

Related

Clojure Time Series Analysis

I have a large data set (200GB uncompressed, 9GB compressed in bz2 -9 ) of stock tick data.
I want to run some basic time series analysis on them.
My machine has 16GB of RAM.
I would prefer to:
keep all data, compressed, in memory
decompress that data on the fly, and stream it [so nothing ever hits disk]
do all analysis in memory
Now, I think there's nice interactions here with Clojure's laziness, and future objects (i.e. I can define objects s.t. when I try to access them, I'll decompress them on the fly.)
Question: what are the things I should keep in mind when doing high performance time series analysis in Clojure?
I'm particular interested in tricks involving:
efficiently storing tick data in memory
efficiently doing computation
weird convolutions to reduce # of passes over the data
Books / articles / research paper suggestions welcome. (I'm a CS PhD student).
Thanks.
Some ideas:
In terms of storing the compressed data, I don't think you will be able to do much better than your OS's own file system caching. Just make sure it s configured to use 11GB+ of RAM for file system caching and it should pull your whole compressed data set into memory as it is read the first time.
You should then be able to define your Clojure code to pull into the data lazily via a ZipInputStream, which will perform the decompression for you.
If you need to perform a second pass on the data, just create a new ZipInputStream on the same file. OS level caching should ensure that you don't hit the disk again.
I have heard of systems like that implemented in Java. It is possible. You'll certainly want to understand how to create your own lazy sequences in order to accomplish this. I also wouldn't hesitate to drop down into Java if you need to make sure that you're dealing with the primitive types that you want to deal with. e.g. Clojure won't generate code to do math on 32-bit ints, it will only generate code to work with longs, and if you don't want that it could be a pain.
It would also be worth some effort to make your in-memory format compatible with a disk format. That would give you the option of memory mapping files, or (at the very least) make your startup easy if your program were to crash. e.g. It could just read the files on disk to recover its previous state.

What happens when mongodb is out of memory?

For example i have db with 20 GB of data and only 2 GB ram,swap is off. Will i be able to find and insert data? How bad perfomance would be?
it's best to google this, but many sources say that when your working set outgrows your RAM size the performance will drop significantly.
Sharding might be an interesting option, rather than adding more RAM..
http://www.mongodb.org/display/DOCS/Checking+Server+Memory+Usage
http://highscalability.com/blog/2011/9/13/must-see-5-steps-to-scaling-mongodb-or-any-db-in-8-minutes.html
http://blog.boxedice.com/2010/12/13/mongodb-monitoring-keep-in-it-ram/
http://groups.google.com/group/mongodb-user/browse_thread/thread/37f80ff39258e6f4
Can MongoDB work when size of database larger then RAM?
What does it mean to fit "working set" into RAM for MongoDB?
You might also want to read-up on the 4square outage last year:
http://highscalability.com/blog/2010/10/15/troubles-with-sharding-what-can-we-learn-from-the-foursquare.html
http://groups.google.com/group/mongodb-user/browse_thread/thread/528a94f287e9d77e
http://blog.foursquare.com/2010/10/05/so-that-was-a-bummer/
side-note:
you said "swap is off" ... ? why? You should always have a sufficient swap space on a UNIX system! Swap-size = 1...2-times RAM size is a good idea. Using a fast partition is a good idea. Really bad things happen if your UNIX system runs out of RAM and doesn't have Swap .. processes just die inexplicably.. that is a bad very thing! especially in production. Disk is cheap! add a generous swap partition! :-)
It really depends on the size of your working set.
MongoDB can handle a very large database and still be very fast if your working set is less than your RAM size.
The working set is the set of documents you are working on a time and indexes.
Here is a link which might help you understand this : http://www.colinhowe.co.uk/2011/02/23/mongodb-performance-for-data-bigger-than-memor/

Can I use tmpfs in this way?

I have plenty of RAM on my development machine.
Can I move my Rails' application sources to the tmpfs partition in order to gain performance boost because of in-memory storage which is order of magnitude faster than HDD?
I understand that tmpfs is temporary storage by its nature BUT can I use it for this task if I'll write some script to move sources from HDD partition to tmpfs and backup it back to HDD before reboot?
Is it sane?
It is sane to use a RAMdisk to speed up access to read only resources.
It is quite dangerous to use this approach for wriable resources because if you lose power during operation you are guaranteed to lose the data. If you don't mind losing data or you implement some form of cache mechanism so that data "saved" to the RAMdisk is copied back to hard disk pretty soon after it is written, then this approach is ok with read/write data.
However, check out the hardware and OS you're running under as well. If you have SSD disks or disks and I/O subsystems with large caches, you may find that performance isn't that bad in the first place. On the OS front, (for example) Windows Vista & 7 will use any spare RAM for disk caching, and this works very effectively, which means you will see little or no performance gain using a RAMdisk.
A RAM disk or cache also only works if you have enough RAM. If there isn't enough RAM in your PC, you'll end up using VM and performance will get worse, not better.
You can quickly try doing this manually to see what sort of performance change you achieve, and then make up your mind if the gain is worth the pain (of copying data from/to the HDD and the extra risks involved).
Yes, as long as you don't mind losing the data if your machine reboots unexpectedly (e.g. power loss). I don't know what your use case is, but there are cases where the need for performance exceeds the need for securely saving every data persistently e.g. if you don't mind a few hours' worth of data loss). If your use case falls into that category, then tmpfs is a perfectly fine solution.
You can use it this way, but it doesn't make much sense:
If you have enough RAM, then the files will be in the filesystem cache (i.e. in RAM) anyway. So, you don't win anything by using tmpfs, but you also don't lose anything.
If you don't have enough RAM, the tmpfs will be flushed out to swap. Now, your Rails sources eat up precious swap space despite the fact there is already a copy on disk in the filesystem. So, you lose swap space and you don't win anything on performance (whether the files are read back in from swap or the filesystem is equally expensive).
If you don't want to take that first time hit until all the files are in the cache, you could put something like this in your development environment startup scripts:
find /usr/lib/ruby/gems/1.9.1/{rails,action,active}* -exec cat '{}' + > /dev/null
Which will read all the Rails files and echo them into /dev/null and as a side effect pull them into the cache. (Do this while getting your coding coffee.)

On-demand paging to allow analysis of large amounts of data

I am working on an analysis tool that reads output from a process and continuously converts this to an internal format. After the "logging phase" is complete, analysis is done on the data. The data is all held in memory.
However, due to the fact that all logged information is held in memory, there is a limit on the duration of the logging. For most use cases this is ok, but it should be possible to run for longer, even if this will hurt performance.
Ideally, the program should be able to start using hard drive space in addition to RAM once the RAM usage reaches a certain limit.
This leads to my question:
Are there any existing solutions for doing this? It has to work on both Unix and Windows.
To use the disk after memory is full, we use Cache technologies such as EhCache. They can be configured with the amount of memory to use, and to overflow to disk.
But they also have smarter algorithms you can configure as needed, such as sending to disk data not used in the last 10 minutes etc... This could be a plus for you.
Without knowing more about your application it is not possible to provide a perfect answer. However it does sound a bit like you are re-inventing the wheel. Have you considered using an in-process database library like sqlite?
If you used that or similar it will take care of moving the data to and from the disk and memory and give you powerful SQL query capabilities at the same time. Even if your logging data is in a custom format if each item has a key or index of some kind a small light database may be a good fit.
This might seem too obvious, but what about memory mapped files? This does what you want and even allows a 32 bit application to use much more than 4GB of memory. The principle is simple, you allocate the memory you need (on disk) and then map just a portion of that into system memory. You could, for example, map something like 75% of the available physical memory size. Then work on it, and when you need another portion of the data, just re-map. The downside to this is that you have to do the mapping manually, but that's not necessarily bad. The good thing is that you can use more data than what fits into physical memory and into the per-process memory limit. It works really great if you actually use only part of the data at any given time.
There may be libraries that do this automatically, like the one KLE suggested (though I do not know that one). Doing it manually means you'll learn a lot about it and have more control, though I'd prefer a library if it does exactly what you want with regard to how and when the disk is being used.
This works similar on both Windows on Unix. For Windows, here is an article by Raymond Chen that shows a simple example.

Optimizing locations of on-disk data for sequential access

I need to store large amounts of data on-disk in approximately 1k blocks. I will be accessing these objects in a way that is hard to predict, but where patterns probably exist.
Is there an algorithm or heuristic I can use that will rearrange the objects on disk based on my access patterns to try to maximize sequential access, and thus minimize disk seek time?
On modern OSes (Windows, Linux, etc) there is absolutely nothing you can do to optimise seek times! Here's why:
You are in a pre-emptive multitasking system. Your application and all it's data can be flushed to disk at any time - user switches task, screen saver kicks in, battery runs out of charge, etc.
You cannot guarantee that the file is contiguous on disk. Doing Aaron's first bullet point will not ensure an unfragmented file. When you start writing the file, the OS doesn't know how big the file is going to be so it could put it in a small space, fragmenting it as you write more data to it.
Memory mapping the file only works as long as the file size is less than the available address range in your application. On Win32, the amount of address space available is about 2Gb - memory used by application. Mapping larger files usually involves un-mapping and re-mapping portions of the file, which won't be the best of things to do.
Putting data in the centre of the file is no help as, for all you know, the central portion of the file could be the most fragmented bit.
To paraphrase Raymond Chen, if you have to ask about OS limits, you're probably doing something wrong. Treat your filesystem as an immutable black box, it just is what it is (I know, you can use RAID and so on to help).
The first step you must take (and must be taken whenever you're optimising) is to measure what you've currently got. Never assume anything. Verify everything with hard data.
From your post, it sounds like you haven't actually written any code yet, or, if you have, there is no performance problem at the moment.
The only real solution is to look at the bigger picture and develop methods to get data off the disk without stalling the application. This would usually be through asynchronous access and speculative loading. If your application is always accessing the disk and doing work with small subsets of the data, you may want to consider reorganising the data to put all the useful stuff in one place and the other data elsewhere. Without knowing the full problem domain it's not possible to to be really helpful.
Depending on what you mean by "hard to predict", I can think of a few options:
If you always seek based on the same block field/property, store the records on disk sorted by that field. This lets you use binary search for O(log n) efficiency.
If you seek on different block fields, consider storing an external index for each field. A b-tree gives you O(log n) efficiency. When you seek, grab the appropriate index, search it for your block's data file address and jump to it.
Better yet, if your blocks are homogeneous, consider breaking them down into database records. A database gives you optimized storage, indexing, and the ability to perform advanced queries for free.
Use memory-mapped file access rather than the usual open-seek-read/write pattern. This technique works on Windows and Unix platforms.
In this way the operating system's virtual memory system will handle the caching for you. Accesses of blocks that are already in memory will result in no disk seek or read time. Writes from memory back to disk are handled automatically and efficiently and without blocking your application.
Aaron's notes are good too as they will affect initial-load time for a chunk that's not in memory. Combine that with the memory-mapped technique -- after all it's easier to reorder chunks using memcpy() than by reading/writing from disk and attempting swapouts etc.
The most simple way to solve this is to use an OS which solves that for you under the hood, like Linux. Give it enough RAM to hold 10% of the objects in RAM and it will try to keep as many of them in the cache as possible reducing the load time to 0. The recent server versions of Windows might work, too (some of them didn't for me, that's why I'm mentioning this).
If this is a no go, try this algorithm:
Create a very big file on the harddisk. It is very important that you write this in one go so the OS will allocate a continuous space on disk.
Write all your objects into that file. Make sure that each object is the same size (or give each the same space in the file and note the length in the first few bytes of of each chunk). Use an empty harddisk or a disk which has just been defragmented.
In a data structure, keep the offsets of each data chunk and how often it is accessed. When it is accessed very often, swap its position in the file with a chunk that is closer to the start of the file and which has a lesser access count.
[EDIT] Access this file with the memory-mapped API of your OS to allow the OS to effectively cache the most used parts to get best performance until you can optimize the file layout next time.
Over time, heavily accessed chunks will bubble to the top. Note that you can collect the access patterns over some time, analyze them and do the reorder over night when there is little load on your machine. Or you can do the reorder on a completely different machine and swap the file (and the offset table) when that's done.
That said, you should really rely on a modern OS where a lot of clever people have thought long and hard to solve these issues for you.
That's an interesting challenge. Unfortunately, I don't know how to solve this out of the box, either. Corbin's approach sounds reasonable to me.
Here's a little optimization suggestion, at least: Place the most-accessed items at the center of your disk (or unfragmented file), not at the start of end. That way, seeking to lesser-used data will be closer by average. Err, that's pretty obvious, though.
Please let us know if you figure out a solution yourself.

Resources