Is a hard restart of redis required to free memory? - memory-management

I recently came upon a SO question where the op asked in which scenarios redis frees up memory. It seems they were recommended a hard start is a potential way, however this is untested in the case of redis. Can anyone let me know for sure whether this works?
I have a live environment, I don't want to have to restart redis-server, but its memory foot print is debilitating now and I'm on the verge of a server migration. So it's important for me to remove as much bloat as possible (and there's a ton of bloat).

I'm not sure what you mean by "bloat", but attaching your server's INFO ALL output may be helpful.
By default, Redis uses jemalloc as a memory allocator. The allocator is in charge of actually freeing RAM for the OS to reclaim, after Redis frees it. Redis v4 and above include the ability to force the allocator to purge the freed RAM (MEMORY PURGE, see https://github.com/antirez/redis-doc/pull/851).
Regardless of purge, there's also the matter of memory fragmentation. While v4 has the experimental active defrag feature, a restart is the way to "fix" that in prior versions.
To mitigate a restart and the downtime involved, use Redis' replication to create a slave and failover your apps to it before restarting the original master.

Related

How does memcached work? (details inside)

As I understand memcached should cache stuff to virtual/physical memory. I've a Wordpress installation with W3TC installed, and my server is capable of using either Disk or Memcached, so I went for Memcached.
When I check the memory usage in cPanel, it's on 0 (physical's on ~100Mb). When I try to load the site, memory usage jumps to 100-300Mb (different values for both types of memory, but they are both at ~300Mb). CPU usage jumps to 100%. It stays like that for a few minutes.
So how does memcached work then? It doesn't make sense to me. Would I be better off using Disk cache instead? The site's utterly slow too, unless I'm reloading it or revisiting pages I've already visited - then it's lightning-fast. Disk cache however, seems slow-ish too in general...
What am I supposed to do? Is there a way of "fixing" this, if there is something to fix in the first place of course?
Any info or insight is appreciated,
Thanks!

Monitoring I/O requests

One of my Railo web applications generates too many I/O requests.
Since it's hosted on an Amazon Ec2 instance, that directly affects my billing badly, because of EBS disk activity (hundreds of milions of operations).
How can I monitor I/O requests? The perfect tool would allow me to find which template/component makes intensive I/O.
I'm already using FusionReactor and that's great for profiling memory spaces and so on, but it doesn't have anything for I/O.
so you could start out by using the operating system monitoring tools to see if you have mainly reads or writes, next step is looking at memory issues despite it being an disk IO issue, maybe your servers are low on memory and thrashing the drives as they are swapping pages in and out of memory.
if you have not done so turn on the template cache this will stop railo checking the file system on every page request (provided you have the memory).
if you have plenty of memory (both for your OS and for the JVM) and you have template caching on start looking for your busy pages in fusion reactor, check for cffile, cfdirectory and other tags in these pages.... good luck.
also use of queries of queries is often a culprit in high disk io as internally a database is used which runs pages to disk on large resultsets if I remeber correctly.

What's the status of LevelDB? Is it safe for use in production?

Does anyone know how well tested LevelDB is and what is its status for use in production? It's a relatively new library and when I checked the source code it didn't appear to be handling errors too well. Does anyone use LevelDB in production and can comment on my question?
We use LevelDB in our website, but wrapped in SSDB(https://github.com/ideawu/ssdb), the LevelDB network server, with hash/zset data types support. Our SSDB instance serves 100 million queries per day.
LevelDB has a lot of high-visibility problems https://github.com/bitcoin/bitcoin/issues/2770 and the code is so poorly written that a bounty was needed to find a fix https://bitcointalk.org/index.php?topic=337294.0;all And the leveldb discussion group is predominantly bug reports about very fundamental database functionality that fails to work as advertised. https://groups.google.com/forum/#!forum/leveldb (e.g., "snapshots" aren't actually snapshots, and can be tainted by subsequent writes https://groups.google.com/forum/#!topic/leveldb/IAKJaL2zqZM etc...)
On the date that this question was asked, LevelDB was certainly NOT production ready and anyone who thought so was delusional. The code quality is abysmal, as confirmed by independent developers https://twitter.com/rescrv/status/406106256890286080
One place it is used in a production environment is the Bitcoin project. Within bitcoin, it's usage is critical for the security of the platform. See the release notes for Bitcoin QT 0.8.0
How do you qualify "relatively new" as it was out in 2011?
Can you please give more detail on "not handling errors too well"?
LevelDB is used as a backend in Riak and Hyperdex, which have both customised it to improve throughput under huge loads. There was a great video from Ricon East 2013 explaining the Riak changes made by Basho. (taken down at some point prior to 2019-03).
Note that RocksDB is another major fork, by Facebook, which is recommended for serverside. History of it forking from LevelDB is on WikiPedia. You can read about how RocksDB handles errors on this page:
Currently in RocksDB, any error during a write operation (write to
WAL, Memtable Flush, background compaction etc) causes the database
instance to go into read-only mode by default and further user writes
are not accepted....
Call DB::Resume() to manually resume the DB and put it in read-write
mode. This function will clear the error, purge any obsolete files,
and restart background flush and compaction operations. At present, it
only supports resuming from background errors that happen during
compaction. In the future, we will add more cases.

Best EC2 setup for redis server [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
We are deploying a large scale web application that uses only redis as a data store. I notice the the benchmark of our redis master is around 8000 transactions per second on EC2, far less than the stated benchmarks on dedicated hardware.
I understand that there is a performance penalty for running Redis on a virtual machine like EC2, but I would love some pointers from people who have deployed Redis in production environments on EC2 on what EC2 setup you have found most effective for getting more out of redis.
Thanks.
EC2 is probably not the best environment to run Redis on virtualized hardware, but it is a popular one, and there are a number of points to know to get the best from Redis on this platform.
I'm one of the authors of http://redis.io/topics/benchmarks and http://redis.io/topics/latency which cover most of the topics I present below. This is just a summary of the main points.
Virtualization toll
It is not specific to EC2, but Redis is significantly slower when running on a VM (in term of maximum supported throughput). This is due to the fact for basic operations, Redis does not add much overhead to the epoll/read/write system calls required to handle client connections (like memcached, or other efficient key/value stores). System calls are typically more expensive on a VM, and they represent a significant part of Redis activity (especially in benchmarks). In that conditions, a 50% decrease in term of maximum throughput compared to bare metal is not uncommon.
Of course, it also depends on the quality of the hypervisor. For EC2, Xen is used.
Benchmarking in good conditions
Benchmarking can be tricky, especially on a platform like EC2. One point often forgotten is to ensure a proper configuration for both the benchmark client and server. For instance, do not run redis-benchmark on a CPU starved micro-instance (which will likely be throttled down by Amazon) while targeting your Redis server. Both machines are equally important to get a good maximum throughput.
Actually, to evaluate Redis performance, you need to:
run redis-benchmark locally (on the same machine than the server), assuming you have more than one vCPU core.
run redis-benchmark remotely (from a different VM), on a machine whose QoS configuration is equivalent to the server machine
So you can evaluate and compare performance of the machines and the network.
On EC2, you will have the best results with second generation M3 instances (or high-memory, or cluster compute instances) so you can benefit of HVM (hardware virtualization) instead of relying on slower para-virtualization.
The fork issue
This is not specific to EC2, but to Xen: forking a large process can be really slow on Xen (it looks better with kvm). For Redis this is a big problem if you plan to use persistence: both persistence options (RDB or AOF) require the main thread to fork and launch background save or rewrite processes.
In some cases, fork latency can freeze Redis event loop for several seconds. The more memory managed by the Redis instance, the more latency.
On EC2, be sure to use a HVM enabled instance (M3, high-memory, cluster), it will mitigate the issue.
Then, if you have large memory requirements, and your application can tolerate it, consider running several smaller Redis instances on the same machine, and shard your data. It can decrease the latency due to fork operations to an acceptable level.
Persistence configuration
This is a key point to get good performance from Redis (both on VM and bare metal). So please take the time to carefully read http://redis.io/topics/persistence
If you use RDB, keep in mind the memory copy-on-write mechanism will start duplicating pages once the save background process has been forked off. So you need to ensure there is enough memory for Redis itself, plus some margin to cope with the COW. the amount of extra memory depends on your workload. The more you write in the instance, the more extra memory you need.
Please note writing a file may also consume some memory (because of the filesystem cache), so during a Redis background save, you need to account for Redis memory, COW overhead, and size of the dump file.
The machine running the Redis server must never swap. If it does, the result will be catastrophic. Contrary to some other stores, Redis is not virtual memory friendly.
With Linux, be sure to set sensible system parameters: vm.overcommit_memory=1 and vm.swappiness=0 (or a very low value anyway). Do not use old kernel versions: they are quite bad at enforcing a low swappiness (resulting in swapping when a large file is written).
If you use AOF, review the fsync options. It is a tradeoff between raw performance and durability of the write operations. You need to make a choice and define a strategy.
You also need to get familiar with the EC2 storage options. On some VM, you have the choice between ephemeral storage and EBS. On some others, you only have EBS.
Ephemeral storage is generally faster, and you will probably get less issues than with EBS, but you can easily loose your data in case of disk failure or reboot of the host, etc ... You can imagine putting RDB snapshots on ephemeral storage, and then copying the resulting files to EBS directories, as a tradeoff between performance and robustness.
EBS is remote storage: it may eat the standard network bandwidth allocated to the VM, and impact the maximum throughput of Redis. If you plan to use EBS, consider selecting the "EBS-optimized" option to establish a QoS between the standard network and storage links.
Finally, a very common setup for performance demanding instances with EC2 is to deactivate persistence on the master, and only activate it on a slave instance. It is probably less safe for the data, but it may prevent a lot of potential latency issues on the master.

Get AppFabric Cache out of throttled mode

Hosting an AppFabric cache on the same machine as an sql server presents some known challenges, one of them being the sql server take up most of the ram and there by putting the cache in throttled mode.
When this occurs and I have freed up enough memory, how i put the cache in "not throttled state" again. Can't seem to find a powershell command to fit my need
Yes, I know it's bad pratice to host the 2 on the same machine, but those are the terms
If the cache service detects there is enough memory to accept adds/puts, it will come out of throttling state, not controlled by user.
Here is the complete answer: http://msdn.microsoft.com/en-us/library/ff921030.aspx

Resources