How many requests per second does libmemcached can handle? - performance

I hava a linux server which has 2G memory/ Intel Core 2 Duo 2.4 GHz cpu, I am developing a networking system. I use
libmemcached/memcache to store and access packet info, I want to know how many requests does
libmemcached can handle in a plain linux server ? thanks!

There are too many things that could affect the request rate (CPU speed, other hardware drivers, exact kernel version, request size, cache hit rate, etc ad infinitum). There's no such thing as a "plain linux server."
Since it sounds like you've got fixed hardware, your best bet is to test the hardware you've got, and see how well it performs under your desired load.

Related

Freeswitch verto performance is low

This is my first question, thanks in advance for all your help!
I setup a freeswitch server, and call it from a web page, I found freeswitch server costs more than 13 percent cpu, when it's idle it costs less than 1 percent, if I use a sip client call it, it costs about 4 percent cpu, anyone know why it costs 9 percent more cpu if use verto? Below is some detailed information.
Freeswitch version: 1.7.0+git~20151231T160311Z~de5bbefdf0~64bit (git de5bbef 2015-12-31 16:03:11Z 64bit).
I use ubuntu 14.04 on intel i5 cpu with 16G ram.
Sip client used is Zoiper on windows.
You did not tell anything about the codecs that you're using. If you use OPUS in your WebRTC client, and it needs to be transcoded by the FreeSWITCH, then the workload looks quite relevant. OPUS is a quite expensive codec in terms of CPU effort needed for transcoding.
The same applies to the SIP client. The CPU load depends significantly on the codec encoding/decoding job during the call. In ideal situation, both legs of a call use the same codec, and then your FreeSWITCH server would only be busy with sending and receiving RTP frames, without heavy processing of the payload.
Keep in mind that the primary platform for FreeSWITCH is Debian Jessie, and there may also be issues with the kernel or libraries in Ubuntu, as nobody took care to analyze and optimize for this platform.

High CPU load but low CPU usage and RAM usage

I am running a mobile website to get the live running status of any train in India. It is http://www.spoturtrain.com . The full code is written in PHP and Nginx is used as the webserver, php-fpm is used as the application server. All php requests are proxied to the app server. During peak traffic hours in the morning, the system load shoots up to 4 but the CPU% and the memory usage is low. Please take a look at the snapshot of the top command of the server.
Th %CPU displayed in the bottom section is per-thread, which means the percentage of one CPU core used by the indicated thread. The CPU(s) section indicates the total amount of available CPU that is being utilized, so it is possible to have one thread reporting that it is using 100% CPU, while only 25% (4 core) or 12.5% (8 core) of the overall CPU cycles are being consumed.
Analyzing thread CPU usage on Linux
You don't really ask a question, so it's hard to tell if you are wanting some advice or just asking to have the numbers explained. As #Charles states, a typical "acceptable" load is 1 per CPU core before noticeable performance degradation occurs, but in the case of PHP running on most web servers, you may (but probably won't in most cases) start noticing problems at anything above 1. Whether or not you do will largely depend on your disk and network I/O.
Whether or not the performance is acceptable for your application isn't something I can answer, but you can take a look at this thread for more places to jump into the options for getting your web server to thread requests.
What is thread safe or non thread safe in PHP
Whether or not you can do anything about it depends on your hosting situation.

What is named.exe process and how to avoid consuming high CPU rates

I have a Windows Server 2008 with Plesk running two web sites.
Sometimes the server is going slow and there is a named.exe process making the CPU peak 100%.
It last a short period of time and after a while it comes again.
I would like to know what this process is for and how to configure it for not consuming this cpu and make my sites go slow.
This must be a DNS service, also known as Bind. High CPU usage may indicate one of the following:
DNS is re-reading its configuration. In this case high CPU usage shall be aligned with your activities in Plesk - i.e. adding and removing domains.
Someone (normally another DNS server) is pulling data from your DNS server. It is normal process. As you say it is for short period of time, it doesn't look like DNS DDoS
AFAIK there is no default way in Windows to restrict software from taking 100% CPU if no other apps require CPU at the moment.
See "DNS Treewalk Suite" system, off the process, and uses the antivirus.
Check the error "log" in the system.

monitoring application (CPU and cache usage) on single Linux box with 80 cores

I am looking for a performance monitoring tool for my application which will collect/visualize in realtime the CPU and cache usage on single Linux box like IBM System or HP ProLiant with typical configuration 8 processors / 80 cores.
Application is the home-grown multithreaded C+ code which uses OpenMP.
This monitoring tool should not run 24 hours per day; it should not do e-mail notification.
I will run this tool just before sending commands to my apps, the apps will execute the command (it may take as a maximum few minutes only). During this time interval I need to analyze:
- usage of cores
- data movement between processors
- usage of L1, L2, L3 caches
- some other metrics (help me here) which can help to find bottleneck in application
performance and resource utilization
I guess that tools like Nagios / Zabbix are too heavy for this task.
From another side using the command-line tools like "top" and "sar" for 80 cores not very convenient and plotting (not necessary real-time) would be nice to have...
While getting the per core usage is rather easy - the other values might prove to be not practical, not at least without running that application within a profiler of some sorts.
Measuring QPI utilization is something highly non-trivial if at all possible. Intel's vTune might be able to acquire such things but only when running instrumented version of your binaries.
Also on x86 there is no way to figure out L1,L2,L3 usage of any kind - you can grab the low level CPU counters to measure cache misses though (but would probably need to use instrumented/profiled binaries and always withan something like vTune or PAPI).
You could "easily" setup something to pull all the lower level performance counters into SNMP and grab the SNMP values via standard SNMP capable monitoring tools but be aware that SNMP pulling is something that you don't want to occur more than 1-2/s max. Or pull that info into something like collectd.
I'm also having the impression that you don't understand the problem domain of monitoring tools. They are not ment to be used as low level analysis probes for finding application level/system bottlenecks - at best you could get some hints which resource (from a 10K feet view) is running under full utilization. Monitoring and alterting tools are something that operations staff needs to use to understand which part of their IT system is currently used and how, to gather historical data and predict future resource utilization and to be alerted when something breaks.
SiteScope, Hyperic or any combination of shell scripts, native OS utilities and a DB to store the results may do the job.

How do improve Tx peformance in USB device driver?

I developed a device driver for a USB 1.1 device onr Windows 2000 and later with Windows Driver Model (WDM).
My problem is the pretty bad Tx performance when using 64byte bulk transfers. Depending on the used USB Host Controller the maximum packet throughput is either 1000 packets (UHCI) or 2000 packets (OHCI) per seconds. I've developed a similar driver on Linux Kernel 2.6 with round about 5000 packets per second.
The Linux driver uses up to 10 asynchronous bulk transfer while the Windows driver uses 1 synchronous bulk transfer. So comparing this makes it clear while the performance is so bad, but I already tried with asynchronous bulk transfers as well without success (no performance gain).
Does anybody has some tips and tricks how to boost the performance on Windows?
I've now managed it to speed up sending to about 6.6k messages/s. The solution was pretty simple, I've just implemented the same mechanism as in the Linux driver.
So now I'm scheduling up to 20 URBs at once, at what should I say, it worked.
What kind of throughput are you getting? USB 1.1 is limited to about 1.5 Mbit/s
It might be a limitation that you'll have to live with, the one thing you must never do is to starve the system for resources. I've seen so many poor driver implementations where the driver is hogging system resources in utter failure to increase its own performance.
My guess is that you're using the wrong API calls, have you looked at the USB samples in the Win32 DDK?

Resources