My understanding is that node.js is designed to scale by adding processes rather than by spawning threads in a process. In fact, from watching an awesome introductory video by Ryan Dahl, I get the idea that spawning threads is forbidden in node.js. I like the simplicity of this approach, but I am concerned that there might be downside when running on Windows, since processes creation is more expensive on Windows than Linux.
Given modern hardware and the fact that node.js processes can be expected to be relatively long running, does process overhead still create a significant advantage for Linux when considering hosting node.js? To put it in concrete terms, if we assume an organization that is using the Windows stack only, but is planning a big move onto node.js, is there a point in considering a new OS because of this issue?
No. Node.js runs in only 1 process and doesn't spawn processes during execution.
The reason you might have gotten the impression that node uses processes to scale is because you can add a process per CPU core to enable node to take advantage of your multicore computer (you'll need a load balancer like solution for this tho). Still: you don't spawn processes on the fly. So yes, you can run node perfectly fine on Windows (or Azure) without too much of a performance hit (if any).
Related
I am running redis on windows and I am having some performance issues. The machine is a Xeon E5 with 32GM RAM and SSD with HW-Raid with Windows Server 2012. There are some other processes running, but they are not critical and are idle most of the time.
I noticed performance problems and operations timeout very often, so I started "redis-cli --intrinsic-latency 100". The output shows that the max-latency goes up to 15000 microseconds, which is very slow I think.
I was also running a memory-profiler: The r/w-performance is not so good (5GB/sec) but I think this should not be the bottleneck. At the moment I have absolutly no idea what to try.
Can you give me some tipps how to find the performance problem?
There is no "fork" as in Linux in Windows. So when you dump your redis db, it can just "stop the world" in order to write on the disk "dump.rdb". Well, they did implement a "Copy-on-write" strategy that don't stop redis, it just copies values when dumping (the redis clients will still be able to get responses from redis). It is in their version log: https://github.com/MSOpenTech/Redis
There is a replacement for the UNIX fork() API that simulates the copy-on-write behavior using a memory mapped file.
This is the real bottleneck of redis in windows as it is an overhead and is more complex (bugs?). It is explained here:http://blogs.msdn.com/b/interoperability/archive/2012/04/26/here-s-to-the-first-release-from-ms-open-tech-redis-on-windows.aspx
As a result you could try running a redis on Linux to test if this is a performance issue of the windows port. Also, the more you write a dump.rdb, the bigger is the overhead (you can change the frequency or try disabling it completely for testing).
Finally, it could also be a network problem and you should check if it is not a network rule / hardware problem (not enough throughput! Bad cable or stuff, firewalls...). Are your redis clients on the same hardware machine?
I have been using a Windows port of Redis called "Memurai". They have a developer edition free of charge.
Now, in one of their blog they claim they have solved the fork() problem. See excerpt below.
Memurai performance seems good to me, even with persistence enabled (both RDB and AOF) although I have not run any specific test myself. There's another blog about Memurai perf in here.
It's worth giving it a try.
"Internally, Redis uses the fork() system call to perform asynchronous writes, but that’s not an option for Memurai because fork() doesn’t exist on Windows. Instead, Memurai uses Windows shared memory to implement a start-of-the-art version of fork() that’s finely tuned for performance and..."
I am trying to port a MS VC++ program to run on a rocks cluster! I am not very good with linux but I am eager to learn and I imagine porting it wouldn't be an impossible task for me. However, I do not understand how to take advantage of the cluster nodes. because it seems that the code execute only runs on the front end server (obviously).
I have read a little about MPI and its seems like I should use MPI to comminicate between nodes. The program is currently written such that I have a main thread that synchronizes all worker threads. The main thread also recieves commands to manipulate the simulation or query its state. If the simulation is properly setup, communication between executing threads can be significantly minimized. What I don't understand is how do I start the process on the compute nodes and how do I handle failures in nodes? And maybe there should be other things I should also consider when porting my program to run in a cluster?
The first step is porting the threaded MS VC++ program to run on a single Linux machine.
Once you have gotten past that point, then modify your program to use MPI in addition to threads (or instead of threads). You can do this on a single computer as well.
To run the program on multiple nodes on your cluster, you will need to submit the program to whatever scheduling system you cluster uses. The command for this is dependent on the scheduling software used for your Rocks cluster. Ask your administrator. It may look something like mpirun -np 32 yourprogram.
Handling failures is the nodes is a broad question. Your first pass should probably just report the failure, then fail the program. If the program doesn't take to long to compute on the cluster, then restarting the program, adjusting for the failed node, may be good enough. Beyond that, your application can write to disk intermediate information needed to resume where it left off. This is called checkpointing your application. Thus, when a node fails, the job fails, but restarting the job doesn't start from the beginning. Much more advanced would be trying to actually detect node failures and reschedule the work unit that was on the failed node. This assumes that the work unit doesn't have non-idempotent side effects. This sort of thing gets really complicated. Checkpointing is likely good enough.
Is there a shell, or technique, that protects me against my entire machine hanging if a process goes haywire?
I'm using ubuntu 10.10.
If you restrict your resources with limits than you can even prevent fork bombs killing your machine.
Here's a nice tutorial: http://www.cyberciti.biz/tips/linux-limiting-user-process.html
I bought a multi-core CPU for that (it does not protect all cases in theory, but for all practical purposes it lets at least one core stay free almost always - to let your interactive "kill" command run.) :p
I'm running an application right now which seems to be running at full throttle, but even though the fan seems to be spinning at it's max and the activity monitor reports that the application is using 100% of the processor, I'm suspecting that at the most it is using 100% only of a single of the two cores on my machine.
How can I tell OS X to allow an application use 100%, or as much as the OS can allow, of the processing power of my computer? I have tried some terminal commands like "nice" and "renice" to set up the priority of this process but still can't get it to run at full throttle.
I also would like to know how to do the opposite, set a limit of the processor usage of an app, example set app X to run at 20%.
Is this possible to do without modifying the code of the app?
The answer to this depends upon whether your application is multi-threaded or not. If this is a single-threaded application, which it is unless you have specifically made it multi-threaded then the process will run on one core of your multi-core hardware. There is nothing you can do about this it's a function of the underlying operating system.
If your program is multi-threaded then it is possible to have different threads executing on separate cores. This will increase the overall usage of the process and allow figures greater than 100%.
You can not however force the machine to use 'all' of the processing power available, but you can influence it with nice.
In order to reduce the amount of processor used then you can use nice to lower the priority of the process. If you are root you can also use nice to increase the priority of your process
Problem: I have a developers machine (read: fast, lots of memory), but the user has a users machine (read: slow, not very much memory).
I can simulate a slow network using Fiddler (http://www.fiddler2.com/fiddler2/)
I can look at how CPU is used over time for a process using Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx).
Is there any way I can restrict the amount of CPU a process can have, or the amount of memory a process can have in order to simulate a users machine more effectively? (In order to isolate performance problems for instance)
I suppose I could use a VM, but I'm looking for something a bit lighter.
I'm using Windows XP, but a solution for any Windows machine would be welcome. Thanks.
The platform SDK used to come with stress tools for doing just this back in the good old days (STRESS.EXE, CPUSTRESS.EXE in the SDK), but they might still be there (check your platform SDK and/or Visual Studio installation for these two files -- unfortunately I have niether the PSDK nor VS installed on the machine I'm typing from.)
Other tools:
memory: performance & reliability (e.g. handling failed memory allocation): can use EatMem
CPU: performance & reliability (e.g. race conditions): can use CPU Burn, Prime95, etc
handles (GDI, User): reliability (e.g. handling failed GDI resource allocation): ??? may have to write your own, but running out of GDI handles (buggy GTK apps would usually eat them all away until all other apps on the system would start falling dead like flies) is a real test for any Windows app
disk: performance & reliability (e.g. handling disk full): DiskFiller, etc.
AppVerifier has a low-resource simulation feature.
You could also try setting the priority of your process to be very low.
You can run MemAlloc to chew up RAM, possibly a few copies at once.
I found a related question:
Set Windows process (or user) memory limit
The accepted answer for the question has a link to the Windows API's SetProcessWorkingSetSize, so it's not exactly a tool that can limit the amount of memory that a process can use.
In terms of changing the amount of CPU resources a process can use, if you don't mind the granularity of per-core limiting of resources, Task Manager can change the processor affinity of a process.
In Task Manager, right-click a process and select "Set Affinity...", then select the processor cores that the process can be assigned to.
If the development machine has many cores but the user machine only has one, then, rather than allowing the process to run on all the available cores, set the process' processor affinity to only one core.
It has nothing to do with SetProcessWorkingSetSize
Just use internal Win32 kernel apis to restrict CPU Usage