Is there a way to scale up RabbitMQ RPC? - performance

Hi I'm using RabbitMQ RPC to distribute tasks to workers (Request/Response)
I studied rpc_client.py and rpc_server.py on the RabbitMQ tutorial and I saw to scale up we must run another process (rpc_server).
I want to assign about 1k tasks per second but it's too slow
Can you help me to solve this problem
System Spec
RAM : 8GB
CPU : Intel® Core™ i7-7700 CPU # 3.60GHz × 8

In order to scale you can either run multiple processes of python server or use multithreading to handle the requests. Ofcourse it would be important to use a threadsafe client library for RabbitMQ like AMQPStorm.
Here is another example of multithreading using Pika library (note that Pika is not threadsafe)

Related

Realtime Collect GC metrics for .Net Core 2.1 App running on Linux

I want to realtime collect GC metrics from applications running on Linux in production. I have an assumption that it is not possible to obtain reliable data about the application memory state(all memory types) running from docker or kubernetes. Accordingly, the data received from the GC may also be not unreliable or completely unreliable.
Perhaps someone faced with similar issues and can share experiences
Very grateful for any answer.

deploy bolts/spout to a specific supervisor

We are running a storm application using a single type on instance in AWS and a single topology to run our system.
This is causing some resource limitation issues.
The way we want to address this is by splitting our IO intense bolts into a cluster of a few dozens t1.small machines (for example) and all our CPU intense bolts to two large machines with lots of cpu & memory.
Basically what i am asking is, is there a way to start all this supervisors and then deploy one topology that include cpu intense bolts on the big machines and to the small machines the deploy IO bolts?
You can implement a custom scheduler using interface IScheduler.
See
http://www.exogeni.net/2015/04/enabling-site-aware-scheduling-for-apache-storm-in-exogeni/
https://dcvan24.wordpress.com/2015/04/07/metadata-aware-custom-scheduler-in-storm/
https://github.com/xumingming/storm-lib/blob/master/src/jvm/storm/DemoScheduler.java

Can Sidekiq take advantage of multiple CPU cores?

I am new to Sidekiq and use it with Ruby on Amazon EC2 instances to do some work using ImageMagick with images.
While running it I realized that every worker runs on the same core. I use EC2 c3.2xlarge machines and they have 8 cores. It shows CPU usage is 15% but one core used 100%, and the others are using 0%.
Can Sidekiq use different CPU cores for different workers? If it can, is this inefficiency caused by ImageMagic and how can I make it to use other cores?
If you want to utilize multiple cores using MRI, you'll need to start multiple Sidekiq processes; having multiple threads configured for your sidekiq instance is not enough.
So if you wanted to use all 8 cores, you would start 8 processes. They will all feed off of the same queue, so there's no need to worry about jobs being processed multiple times.

Node.js CPU load balancing

I created test with JMeter to test performance of Ghost blogging platform. Ghost written in Node.js and was installed in cloud server with 1Gb RAM, 1 CPU.
I noticed after 400 concurrent users JMeter getting errors. Till 400 concurrent users load is normal. I decide increase CPU and added 1 CPU.
But errors reproduced and added 2 CPUs, totally 4 CPUs. The problem is occuring after 400 concurrent users.
I don't understand why 1 CPU can handle 400 users and the same results with 4 CPUs.
During monitoring I noticed that only one CPU is busy and 3 other CPUs idle. When I check JMeter summary in console there were errors, about 5% of request. See screenshot.
I would like to know is it possible to balance load between CPUs?
Are you using cluster module to load-balance and Node 0.10.x?
If that's so, please update your node.js to 0.11.x.
Node 0.10.x was using balancing algorithm provided by an operating system. In 0.11.x the algorithm was changed, so it will be more evenly distributed from now on.
Node.js is famously single-threaded (see this answer): a single node process will only use one core (see this answer for a more in-depth look), which is why you see that your program fully uses one core, and that all other cores are idle.
The usual solution is to use the cluster core module of Node, which helps you launch a cluster of Node processes to handle the load, by allowing you to create child processes that all share the same server ports.
However, you can't really use this without fixing Ghost's code. An option is to use pm2, which can wrap a node program, by using the cluster module for you. For instance, with four cores:
$ pm2 start app.js -i 4
In theory this should work, except if Ghost relies on some global variables (that can't be shared by every process).
Use cluster core and for load balancing nginx. Thats bad part about node.js. Fantastic framework, but developer has to enter into load balancing mess. While java and other runtimes makes is seamless. Anyway, nothing is perfect.

monitoring application (CPU and cache usage) on single Linux box with 80 cores

I am looking for a performance monitoring tool for my application which will collect/visualize in realtime the CPU and cache usage on single Linux box like IBM System or HP ProLiant with typical configuration 8 processors / 80 cores.
Application is the home-grown multithreaded C+ code which uses OpenMP.
This monitoring tool should not run 24 hours per day; it should not do e-mail notification.
I will run this tool just before sending commands to my apps, the apps will execute the command (it may take as a maximum few minutes only). During this time interval I need to analyze:
- usage of cores
- data movement between processors
- usage of L1, L2, L3 caches
- some other metrics (help me here) which can help to find bottleneck in application
performance and resource utilization
I guess that tools like Nagios / Zabbix are too heavy for this task.
From another side using the command-line tools like "top" and "sar" for 80 cores not very convenient and plotting (not necessary real-time) would be nice to have...
While getting the per core usage is rather easy - the other values might prove to be not practical, not at least without running that application within a profiler of some sorts.
Measuring QPI utilization is something highly non-trivial if at all possible. Intel's vTune might be able to acquire such things but only when running instrumented version of your binaries.
Also on x86 there is no way to figure out L1,L2,L3 usage of any kind - you can grab the low level CPU counters to measure cache misses though (but would probably need to use instrumented/profiled binaries and always withan something like vTune or PAPI).
You could "easily" setup something to pull all the lower level performance counters into SNMP and grab the SNMP values via standard SNMP capable monitoring tools but be aware that SNMP pulling is something that you don't want to occur more than 1-2/s max. Or pull that info into something like collectd.
I'm also having the impression that you don't understand the problem domain of monitoring tools. They are not ment to be used as low level analysis probes for finding application level/system bottlenecks - at best you could get some hints which resource (from a 10K feet view) is running under full utilization. Monitoring and alterting tools are something that operations staff needs to use to understand which part of their IT system is currently used and how, to gather historical data and predict future resource utilization and to be alerted when something breaks.
SiteScope, Hyperic or any combination of shell scripts, native OS utilities and a DB to store the results may do the job.

Resources