Does STXXL save the state after crash? - stxxl

I'm writing a webapp in C++ and would like to use a fast database on SSD.
Seems like STXXL would be faster than other choices (slite,mongodb,...)
Do STXXL's containers keep the data after accidental shutdown (server crash)?
Can the program that was using stxxl be easily restarted from the previous state?

Related

Is a hard restart of redis required to free memory?

I recently came upon a SO question where the op asked in which scenarios redis frees up memory. It seems they were recommended a hard start is a potential way, however this is untested in the case of redis. Can anyone let me know for sure whether this works?
I have a live environment, I don't want to have to restart redis-server, but its memory foot print is debilitating now and I'm on the verge of a server migration. So it's important for me to remove as much bloat as possible (and there's a ton of bloat).
I'm not sure what you mean by "bloat", but attaching your server's INFO ALL output may be helpful.
By default, Redis uses jemalloc as a memory allocator. The allocator is in charge of actually freeing RAM for the OS to reclaim, after Redis frees it. Redis v4 and above include the ability to force the allocator to purge the freed RAM (MEMORY PURGE, see https://github.com/antirez/redis-doc/pull/851).
Regardless of purge, there's also the matter of memory fragmentation. While v4 has the experimental active defrag feature, a restart is the way to "fix" that in prior versions.
To mitigate a restart and the downtime involved, use Redis' replication to create a slave and failover your apps to it before restarting the original master.

Monitoring I/O requests

One of my Railo web applications generates too many I/O requests.
Since it's hosted on an Amazon Ec2 instance, that directly affects my billing badly, because of EBS disk activity (hundreds of milions of operations).
How can I monitor I/O requests? The perfect tool would allow me to find which template/component makes intensive I/O.
I'm already using FusionReactor and that's great for profiling memory spaces and so on, but it doesn't have anything for I/O.
so you could start out by using the operating system monitoring tools to see if you have mainly reads or writes, next step is looking at memory issues despite it being an disk IO issue, maybe your servers are low on memory and thrashing the drives as they are swapping pages in and out of memory.
if you have not done so turn on the template cache this will stop railo checking the file system on every page request (provided you have the memory).
if you have plenty of memory (both for your OS and for the JVM) and you have template caching on start looking for your busy pages in fusion reactor, check for cffile, cfdirectory and other tags in these pages.... good luck.
also use of queries of queries is often a culprit in high disk io as internally a database is used which runs pages to disk on large resultsets if I remeber correctly.

Twisted process is huge

A Twisted app I have was constantly getting killed due to memory problems. The program grew in size, consuming all of the system's memory before being shut down by the os. Restart and repeat.
This is on a virtual server, so I doubled the memory, and the issue resolved - the daemon stabilized at around 1.25GB of memory
Does anyone have advice on how I can best profile this to tell what/where all the memory is getting sucked up into ?
If info on the app helps, I'm using the twisted reactor and internet.timer.TimerService to poll a database for items to update through three 'services'. the items to process are pushed into a twisted.internet.defer.DeferredList , and their processing occurs in a deferToThread block. In the deferred process there are a handful of blocking operations ( fetching web pages, etc ) and a lot of HTML parsing ( beautiful soup and other libraries ). I've suggested the reactor.threadpool size to be 10 and each 'service' defers to thread using a SemaphoreService that has 10 tokens. I really expected this daemon to max out at around 400MB of memory, not 3x that.
This is more of a generic share of thoughts how I debug memory leak/usage problems in my twisted applications.
Twisted has a ssh server support, and is something which I add in to almost all of my projects in development.
The ssh provides a interactive python interpreter access to the method which has python garbage collector available and a number of helper functions which allow me to a) inspect count of the instances from a same class, b) start and stop inspection of changes of that count over time and c) to get all references of that class. The nice thing with the interactive interpreter is that it allows ad-hoc introspection of offending instances, their relation to other objects and the state of process they are in. This so far has always proven a valuable instrument to pinpoint exact location where I have forgot / unforseen the ref release problems in my projects.

Web application very slow in Tomcat 7

I implemented a web application to start the Tomcat service works very quickly, but spending hours and when more users are entering is getting slow (up to 15 users approx.).
Checking RAM usage statistics (20%), CPU (25%)
Server Features:
RAM 8GB
Processor i7
Windows Server 2008 64bit
Tomcat 7
MySql 5.0
Struts2
-Xms1024m
-Xmx1024m
PermGen = 1024
MaxPernGen = 1024
I do not use Web server, we publish directly on Tomcat.
Entering midnight slowness is still maintained (only 1 user online)
The solution I have is to restart the Tomcat service and response time is again excellent.
Is there anyone who has experienced this issue? Any clue would be appreciated.
Not enough details provided. Need more information :(
Use htop or top to find memory and CPU usage per process & per thread.
CPU
A constant 25% CPU usage in a 4 cores system can indicate that a single-core application/thread is running 100% CPU on the only core it is able to use.
Which application is eating the CPU ?
Memory
20% memory is ~1.6GB. It is a bit more than I expect for an idle server running only tomcat + mysql. The -Xms1024 tells tomcat to preallocate 1GB memory so that explains it.
Change tomcat settings to -Xms512 and -Xmx2048. Watch tomcat memory usage while you throw some users at it. If it keeps growing until it reaches 2GB... then freezes, that can indicate a memory leak.
Disk
Use df -h to check disk usage. A full partition can make the issues you are experiencing.
Filesystem Size Used Avail Usage% Mounted on
/cygdrive/c 149G 149G 414M 100% /
(If you just discovered in this example that my laptop is running out of space. You're doing it right :D)
Logs
Logs are awesome. Yet they have a bad habit to fill up the disk. Check logs disk usage. Are logs being written/erased/rotated properly when new users connect ? Does erasing logs fix the issue ? (copy them somewhere for future analysis before you erase them)
If not. Logs are STILL awesome. They have the good habit to help you track bugs. Check tomcat logs. You may want to set logging level to debug. What happens last when the website die ? Any useful error message ? Do user connections are still received and accepted by tomcat ?
Application
I suppose that the 25% CPU goes to tomcat (and not mysql). Tomcat doesn't fail by itself. The application running on it must be failing. Try removing the application from tomcat (you can eventually put an hello world instead). Can tomcat keep working overnight without your application ? It probably can, in which case the fault is on the application.
Enable full debug logging in your application and try to track the issue. Run it straight from eclipse in debug mode and throw users at it. Does it fail consistently in the same way ?
If yes, hit "pause" in the eclipse debugger and check what the application is doing. Look at the piece of code each thread is currently running + its call stack. Repeat that a few times. If there is a deadlock, an infinite loop, or similar, you can find it this way.
You will have found the issue by now if you are lucky. If not, you're unfortunate and it's a tricky bug that might be deep inside the application. That can get tricky to trace. Determination will lead to success. Good luck =)
For performance related issue, we need to follow the given rules:
You can equalize and emphasize the size of xms and xmx for effectiveness.
-Xms2048m
-Xmx2048m
You can also enable the PermGen to be garbage collected.
-XX:+UseConcMarkSweepGC -XX:+CMSPermGenSweepingEnabled -XX:+CMSClassUnloadingEnabled
If the page changes too frequently to make this option logical, try temporarily caching the dynamic content, so that it doesn't need to be regenerated over and over again. Any techniques you can use to cache work that's already been done instead of doing it again should be used - this is the key to achieving the best Tomcat performance.
If there any database related issue, then can follow sql query perfomance tuning
rotating the Catalina.out log file, without restarting Tomcat.
In details,There are two ways.
The first, which is more direct, is that you can rotate Catalina.out by adding a simple pipe to the log rotation tool of your choice in Catalina's startup shell script. This will look something like:
"$CATALINA_BASE"/logs/catalina.out WeaponOfChoice 2>&1 &
Simply replace "WeaponOfChoice" with your favorite log rotation tool.
The second way is less direct, but ultimately better. The best way to handle the rotation of Catalina.out is to make sure it never needs to rotate. Simply set the "swallowOutput" property to true for all Contexts in "server.xml".
This will route System.err and System.out to whatever Logging implementation you have configured, or JULI, if you haven't configured.
See more at: Tomcat Catalina Out
I experienced a very slow stock Tomcat dashboard on a clean Centos7 install and found the following cause and solution:
Slow start up times for Tomcat are often related to Java's
SecureRandom implementation. By default, it uses /dev/random as an
entropy source. This can be slow as it uses system events to gather
entropy (e.g. disk reads, key presses, etc). As the urandom manpage
states:
When the entropy pool is empty, reads from /dev/random will block until additional environmental noise is gathered.
Source: https://www.digitalocean.com/community/questions/tomcat-8-5-9-restart-is-really-slow-on-my-centos-7-2-droplet
Fix it by adding the following configuration option to your tomcat.conf or (preferred) a custom file into /tomcat/conf/conf.d/:
JAVA_OPTS="-Djava.security.egd=file:/dev/./urandom"
We encountered a similar problem, the cause was "catalina.out". It is the standard destination log file for "System.out" and "System.err". It's size kept on increasing thus slowing things down and ultimately tomcat crashed. This problem was solved by rotating "catalina.out". We were using redhat so we made a shell script to rotate "catalina.out".
Here are some links:-
Mulesoft article on catalina (also contains two methods of rotating):
Tomcat Catalina Introduction
If "catalina.out" is not the problem then try this instead:-
Mulesoft article on optimizing tomcat:
Tuning Tomcat Performance For Optimum Speed
We had a problem, which looks similar to yours. Tomcat was slow to respond, but access log showed just milliseconds for answer. The problem was streaming responses. One of our services returned real-time data that user could subscribe to. EPOLL were becoming bloated. Network requests couldn't get to the Tomcat. And whats more interesting, CPU was mostly idle (since no one could ask server to do anything) and acceptor/poller threads were sitting in WAIT, not RUNNING or IN_NATIVE.
At the time we just limited amount of such requests and everything became normal.

Diagnosing pathological behavior of a piece of cluster software

I'm using a kind of load balancer over a small cluster that is able to achieve >2000rps on zero-duration requests (t.i. ones that are immediately satisfied by the worker nodes).
But as soon as the requests stop being zero-duration and start taking even 1ms, performance immediately drops >10x. The data being transfered in both directions is identical and is about 2kb in size.
This is for sure not related to saturation of the cluster or network throughput, because 200rps of 1ms requests is a very tiny load and the network is 10Gbit. Besides, the CPU load is just some 2-5% both on the load balancer and on the worker nodes.
I wonder whether that might be related to some pathological behavior of the OS scheduler, or the OS network stack (t.i. there is some special case behavior for very short interactions).
How might I diagnose the reason? Which perfcounters to watch? What tools or methodologies to use?
(Just in case someone simply knows the answer to my particular problem, I'm talking about the MS HPC Server 2008 R2's "WCF Broker", running on Windows Server 2008 R2 over Hyper-V)
One thing you can do is use ETW tracing to try and understand what the nodes are doing while your WCF job is running. On HPC server, I sometimes clusrun xperf to collect traces on all or specific nodes. There are a number of tools that you can use for analyzing ETW traces, including xperf itself. I haven't done any serious work using HPC SOA (WCF), but I did write a simple WCF raytracer app and then used xperf to profile it on several of the nodes.
Turned out it was a completely network-unrelated issue having to do with peculiarities of the scheduling mechanism of HPC Server. I resolved the issue by tweaking a configuration option "serviceRequestPrefetchCount" to 0 in the loadBalancing section of the WCF service config file.
I'm assuming that there are some shared resources with some kind of locking system in place? Is locking a bottleneck? It's hard to guess without seeing the system.
Do you have a way to profile the workers? What are they spending most of their time on, especially in the fast vs slow scenarios?

Resources