ibm apiconnect developer toolkit slow - performance

Why the apic command is so slow? I am the only one? What should I do to understand the problem? It is present on my computer, on our deployment server, on the pcs of the other people working on this.
$ time apic help
Error: Il comando `help` non รจ valido.
real 0m15.852s
user 0m0.045s
sys 0m0.076s
or
$ time apic -v
API Connect: v5.0.6.1 (apiconnect: v2.5.17)
real 0m14.710s
user 0m0.046s
sys 0m0.091s

APIC toolkit for APIC v5 is based on Node.js framework. It uses a lot of modules that need to be loaded when apic command is called.
So the speed is dependant on disk speed (but also on CPU & memory).
My test shows that initial load takes about 9 seconds on my 2 years old laptop:
$ time apic -v
API Connect: v5.0.8.4-iFix (apiconnect: v2.8.29)
real 0m9.206s
user 0m0.092s
sys 0m0.091s
This is test using Git Bash on Win 10 machine.
Second try (when a lot of files are cached) is 2 seconds faster:
$ time apic -v
API Connect: v5.0.8.4-iFix (apiconnect: v2.8.29)
real 0m6.625s
user 0m0.030s
sys 0m0.090s
Good news is that new version of the apic CLI toolkit is written in Golang and compiled native to the platform.
This one is fast. By all standards. It is also more capable too.
time apic version
APIConnect toolkit c81e13c07d3c2c7730827610fcaf08bbec88fe04 (Built 2020-02-10T23:21:01Z) (Tag o.c3148da-g.c81e13c)
real 0m0.192s
user 0m0.141s
sys 0m0.094s
This test is done on the same laptop but using Ubuntu under WSL.
I would suggest to either migrate to v2018 or use faster machine with modern SSD (not all SSDs are the same).

Related

How to restart a service when the CPU usage is over N%?

I have a website hosted on a server (ubuntu 18.04) with 2 core CPU and 4GB RAM. My website usually has 200 concurrent sessions in real-time (online users) on average.
Also, for those 200 online users, the resources usage will be almost:
50% of the RAM
65% of the CPU
It should be noted, my website is a Q/A website. So, users come to my website and ask their questions in different fields. Sometimes in a TV contest, a question asks and people immediately come to my website to search about it. Or they search inside google and find the link of my website and they invade my website.
In that case, the CPU of my server will be used over 90% and it's mostly because of the MySQL service.
Also, there is another scenario. When the google-bot crawler starts indexing my websites' links or checks for broken links, again, the same CPU usage happens. The point is, I cannot increase the server resources at the moment. I will do that in the future when I got a sponsor for my website.
So, as a workaround, I'm just trying to write a script that restarts the MySQL service automatically when the CPU usage is over 90%. Currently, I do that manually when I see my website is down or there's a page loading delay.
After some researches, I could get the real-time CPU usage percentage by this command:
echo $[100-$(vmstat 1 2|tail -1|awk '{print $15}')]
Also, I restart MySQL this way:
systemctl restart mysql
My exact question is, how can I exactly write that condition as a Linux bash script?
#!/bin/bash
if <how to check CPU usage> then
systemctl restart mysql
fi
If you really want to go this route, just check whether the usage is over 90%. Then run this script periodically using cron.
#! /bin/bash
(( usage = 100 - $(vmstat 1 2 | tail -n1 | awk '{print $15}') ))
if (( usage > 90 )); then
systemctl restart mysql
fi

Chrony sets system time but does not sync RTC

I have configured Chrony with rtcsync flag, which SHOULD "Enable kernel synchronization of the hardware real-time clock (RTC)", but that is not the case.
Chrony sets the system time correctly with ntp, but the RTC is untouched, and i can't seem to find out why that is. My guess is that the kernel doesn't recognize Chrony's request to sync the RTC, but that is just a guess.
Versions
Kernel: 4.19
Chrony: 3.5
UPDATE:
It appears that the external RTC is registered after the kernel tries to access it and this prevents syncing the RTC with the NTP synced system time.
from dmesg:
...
[ 6.317060] hctosys: unable to open rtc device (rtc)
...
[ 14.303503] rtc-ds1307 9-0068: registered as rtc0
...
I've done a temporary workaround by adding a cronjob that updates the hwclock every 10 minutes.
To get rtcsync working, you have to set the RTC_SYSTOHC and RTC_SYSTOHC_DEVICE kernel option properly as this simply asks the kernel to sync the system time to the RTC. It does so approximately every 11 minutes.
However, a better way of doing that is to use rtcfile (and rtcdevice) in that case, chrony will handle the RTC. It will even compute the RTC drift that could then be corrected if the RTC supports a trimming mechanism.

Network performance issues and slow tcp_write_xmit/tcp_ack syscalls with a lot of save_stack calls on OpenVZ kernel

I ran into a trouble with a bad network performance on Centos. The issue was observed on the latest OpenVZ RHEL7 kernel (3.10 based) on Dell server with 24 cores and Broadcom 5720 NIC. No matter it was host system or OpenVZ container. Server receives RTMP connections and reproxy RTMP streams to another consumers. Reads and writes was unstable and streams froze periodically for few seconds.
I've started to check system with strace and perf. Strace affects system heavily and seems that only perf may help. I've used OpenVZ debug kernel with debugfs enabled. System spends too much time in swapper process (according to perf data). I've built flame graph for the system under the load (100mbit in data, 200 mbit out) and have noticed that kernel spent too much time in tcp_write_xmit and tcp_ack. On the top of these calls I see save_stack syscalls.
On another hand, I tested the same scenario on Amazon EC2 instance (latest Amazon Linux AMI 2017.09) and perf doesn't track such issues. Total amount of samples was 300000, system spends 82% of time according to perf samples in swapper, but net_rx_action (and as consequent tcp_write_xmit and tcp_ack) in swapper takes only 1797 samples (0.59% of total amount of samples). On the top of net_rx_action call in flame graph I don't see any calls related to stack traces.
Output of OpenVZ system looks differently. Among 1833152 samples 500892 (27%) was in swapper process, 194289 samples (10.5%) was in net_rx_action.
Full svg of calls on vzkernel7 is here and svg of EC2 instance calls is here. You may download it and open in browser to interactively check flame graph.
So, I want to ask for help and I have few questions.
Why flame graph from EC2 instance doesn't contain so much save_stack calls like my server?
Does perf forces system to call save_stack or it's some kernel setting? May it be disabled and how?
Does Xen on EC2 guest process all tcp_ack and other syscalls? Is it possible that host system on EC2 server makes some job and guest system doesn't see it?
Thank you for a help.
I've read kernel sources and have an answer for my questions.
save_stack calls is caused by the Kernel Address Sanitizer feature that was enabled in OpenVZ debug kernel by CONFIG_KASAN option. When this options is enabled, on each kmem_cache_free syscall kernel calls __cache_free
static inline void __cache_free(struct kmem_cache *cachep, void *objp,
unsigned long caller)
{
/* Put the object into the quarantine, don't touch it for now. */
if (kasan_slab_free(cachep, objp))
return;
___cache_free(cachep, objp, caller);
}
With CONFIG_KASAN disabled kasan_slab_free will response with false (check include/linux/kasan.h). OpenVZ debug kernel was built with CONFIG_KASAN=y, Amazon AMI wasn't.

How many requests per second does libmemcached can handle?

I hava a linux server which has 2G memory/ Intel Core 2 Duo 2.4 GHz cpu, I am developing a networking system. I use
libmemcached/memcache to store and access packet info, I want to know how many requests does
libmemcached can handle in a plain linux server ? thanks!
There are too many things that could affect the request rate (CPU speed, other hardware drivers, exact kernel version, request size, cache hit rate, etc ad infinitum). There's no such thing as a "plain linux server."
Since it sounds like you've got fixed hardware, your best bet is to test the hardware you've got, and see how well it performs under your desired load.

How do improve Tx peformance in USB device driver?

I developed a device driver for a USB 1.1 device onr Windows 2000 and later with Windows Driver Model (WDM).
My problem is the pretty bad Tx performance when using 64byte bulk transfers. Depending on the used USB Host Controller the maximum packet throughput is either 1000 packets (UHCI) or 2000 packets (OHCI) per seconds. I've developed a similar driver on Linux Kernel 2.6 with round about 5000 packets per second.
The Linux driver uses up to 10 asynchronous bulk transfer while the Windows driver uses 1 synchronous bulk transfer. So comparing this makes it clear while the performance is so bad, but I already tried with asynchronous bulk transfers as well without success (no performance gain).
Does anybody has some tips and tricks how to boost the performance on Windows?
I've now managed it to speed up sending to about 6.6k messages/s. The solution was pretty simple, I've just implemented the same mechanism as in the Linux driver.
So now I'm scheduling up to 20 URBs at once, at what should I say, it worked.
What kind of throughput are you getting? USB 1.1 is limited to about 1.5 Mbit/s
It might be a limitation that you'll have to live with, the one thing you must never do is to starve the system for resources. I've seen so many poor driver implementations where the driver is hogging system resources in utter failure to increase its own performance.
My guess is that you're using the wrong API calls, have you looked at the USB samples in the Win32 DDK?

Resources