I'm using lftp (lftp --version shows Version 4.0.9) in mirror mode to test the performance of some sftp servers I'm specially interested in the throughput (bytes/sec) when I run lftp with a different number of concurrent connections.
When I ran the test with 25 concurrent connections it gave me a rather strange number of 5866 seconds as time to download. To check what was the real time spent in the download I used the time command (as suggested in this related question).
The output was:
$ time lftp -e 'mirror --parallel=25 (rest of the command-line)'
21732837094 bytes transferred in 5866 seconds (3.53M/s)
real 4m31.315s
user 1m25.977s
sys 1m38.041s
My first thought was that those 5866 seconds where the sum of the time spent by every connection, so dividing that by 25 gives me 234,64 seconds (or 03m54.64s) which is kind of far from 4m31.315s.
Does anyone have an insight on how the numbers from lftp are calculated?
Before lftp-4.5.0 mirror summed overlapping durations of the parallel transfers (incorrectly). It was fixed in 4.5.0 to count wall clock time when any of the transfers was active.
Related
I need to ping more than once per secound for a project, and someone wants a live monitor for amount of time the server is up, and they want high accuracy rate.
This is not for dosing
You can use the -i option to control the interval (in seconds). You can specify sub-second intervals, but note that only a superuser can set the interval to less than 0.2 seconds:
$ ping -i 0.5 example.com
while using nagios with multiple hosts spread over the network,hosts status shows a recognizable lag and takes a long time to reflect on nagios server cgi.Thus what is the optimal nrpe/nagios configration to speed up the status process for a distributed host environment.
In my case I use nagios core 4.1
nrpe 1.5
server/clients: Amazon ec2
The GUI is usually only updated once each minute (automatically), though clicking refresh can provide you with 'nearly' the latest information. I say nearly because there is a distinct processing loop inside of the Nagios core that causes it to never be real time. NRPE is going to run at the speed of your network connection - it does little else besides sending and receiving tiny amounts of data. About the only delay here is the time it takes to actually perform the check and send back the response - which, of course, has way to many factors to mention. Try looking at the output of
[nagioshome]/bin/nagiostats
There are several entries that tell you:
'Latency' - the time between when the check was scheduled to start, and the actual start time.
'Execution Time' - the amount of time checks are actually taking to run.
These entries will have three numbers, which are; Min / Max / Avg
High latency numbers (in my book that means Avg is greater than 1 second) usually means your Nagios server is over worked. There are a few things you can do to improve latency times, and these are outlined in the 'nagios.cfg' file. This latency has nothing to do with network speed or the speed of NRPE - it is primarily hardware speed. If you're already using the optimal values specified in nagios.cfg, then its time to find some faster hardware.
High execution times (for me an Avg greater than 5 seconds) can be blamed on just about everything except your Nagios system. This can be caused by faulty networks (improper packet routing), over loaded networks, faulty and/or poorly designed checks, slow target systems, ... the list is endless. Nothing you do with the Nagios and/or NRPE configs will help lower these values. Well, you could disable NRPE's encryption to improve wire time; but if you have encryption enabled in the first place, then its not likely you'd want it disabled.
I wrote this simple Bash script to detect incidence of error-pages:
date;
iterations=10000;
count_error=0;
count_expected=0;
for ((counter = 0; counter < iterations; ++counter)); do
if curl -s http://www.example.com/example/path | grep -iq error;
then
((count_error++));
else
((count_expected++));
fi;
sleep 0.1;
done;
date;
echo count_error=$count_error count_expected=$count_expected
I'm finding total execution-time does not scale linearly with iteration count. 10 iterations 00:00:12, 100 in 00:01:46, 1000 in 00:17:24, 10000 in ~50 mins, 100000 in ~10 hrs
Can anyone provide insights into the non-linearity and/or improvements to the script? Is curl unable to fire requests at rate of 10/sec? Is GC having to periodically clear internal buffers filling up with response text ?
Here are a few thoughts:
You are not creating 10 requests per second here (as you stated in the question), instead you are running the requests sequentially, i.e. as many per second as possible.
The ; at the end of each line is not required in Bash.
When testing your script from my machine against a different URL, 10 iterations take 3 seconds, 100 take 31 seconds, and 1000 take 323 seconds, so the execution time scales linearly in this range.
You could try using htop or top to identify performance bottlenecks on your client or server.
The apache benchmark tool ab is a standard tool to benchmark web servers and available on most distributions. See manpage ab(1) for more information.
Is there some way like a script we can run n times in a loop to benchmark classic ASP CPU computation time and disk I/O from server to server? That is, run on a local workstation, then a local Pentium dev server with rotational disk, then a entry level production server with RAID 01, then a virtualized cloud production server, then a quad-core xeon server with SSDs, etc. So we can see "how much faster" the servers are from one to the next. The only tricky part is testing SQL and I/O testing. Perhaps the script could compute at random and then write a 10gb file to disk or a series of 1gb files and then time how long it takes to create them, write them, copy/move them, read them in and calculate and MD5 on them and spit out the results to the client, say, in a loop of n times. What we're trying to do is prove with a control set of code and tasks server performance from one machine to the next.
-- EDIT --
Of course, as pointed out by jbwebtech below, VBS does indeed have its own Timer() function, which I had completely forgotten about. And so...
Well you can just get the time before you run each process, run the process and then deduct it from the current time...
<%
Dim start
start = Timer()
'Run my process...
...
...
Response.Write(Timer()-start))
%>
It's perhaps not perfect, but it will give you back a fairly useful reading. The
result is a Double precision number; the first part of the result is the number of days that have passed since you started, and the fractional part, obviously fractions of days. You can work this out into time if you wish by dividing by the appropriate amounts (i.e. 24 hours in a day, 60 minutes in an hour, 1440 minutes in a day, 60 seconds in an hour, 86400 seconds in a day etc.). The Timer() function gives you the time in seconds and milliseconds taken to execute the code block.
I am currently trying to understand why some of my requests in my Python Heroku app take >30 seconds. Even simple requests which do absolutely nothing.
One of the things I've done is look into the load average on my dynos. I did three things:
1) Look at the Heroku logs. Once in a while, it will print the load. Here are examples:
Mar 16 11:44:50 d.0b1adf0a-0597-4f5c-8901-dfe7cda9bce0 heroku[web.2] Dyno load average (1m): 11.900
Mar 16 11:45:11 d.0b1adf0a-0597-4f5c-8901-dfe7cda9bce0 heroku[web.2] Dyno load average (1m): 8.386
Mar 16 11:45:32 d.0b1adf0a-0597-4f5c-8901-dfe7cda9bce0 heroku[web.2] Dyno load average (1m): 6.798
Mar 16 11:45:53 d.0b1adf0a-0597-4f5c-8901-dfe7cda9bce0 heroku[web.2] Dyno load average (1m): 8.031
2) Run "heroku run uptime" several times, each time hitting a different machine (verified by running "hostname"). Here is sample output from just now:
13:22:09 up 3 days, 13:57, 0 users, load average: 15.33, 20.55, 22.51
3) Measure the load average on the machines on which my dynos live by using psutil to send metrics to graphite. The graphs confirm numbers of anywhere between 5 and 20.
I am not sure whether this explains simple requests taking very long or not, but can anyone say why the load average numbers on Heroku are so high?
Heroku sub-virtualizes hosts to the guest 'Dyno' you are using via LXC. When you run 'uptime' you are seeing the whole hosts uptime NOT your containers, and as pointed out by #jon-mountjoy you are getting a new LXC container not one of your running Dynos when you do this.
https://devcenter.heroku.com/articles/dynos#isolation-and-security
Heroku’s dyno load calculation also differs from the traditional UNIX/LINUX load calculation.
The Heroku load average reflects the number of CPU tasks that are in the ready queue (i.e. waiting to be processed). The dyno manager takes the count of runnable tasks for each dyno roughly every 20 seconds. An exponentially damped moving average is computed with the count of runnable tasks from the previous 30 minutes where period is either 1-, 5-, or 15-minutes (in seconds), the count_of_runnable_tasks is an entry of the number of tasks in the queue at a given point in time, and the avg is the previous calculated exponential load average from the last iteration
https://devcenter.heroku.com/articles/log-runtime-metrics#understanding-load-averages
The difference between Heroku's load average and Linux is that Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes remain blocked in I/O due to a busy or stalled I/O system.
On CPU bound Dyno's I would presume this wouldn't make much difference. On an IO bound Dyno the load averages reported by Heroku would be much lower than what is reported by what you would get if you could get a TRUE uptime on an LXC container.
You can also enable sending periodic load messages of your running dynos with by enabling log-runtime-metrics
Perhaps it's expected dyno idling?
PS. I suspect there's no point running heroku run uptime - that will run it in a new one-off dyno every time.