What is the maximum throughput of Loggly? - performance

How many requests per second from a client can Loggly handle? I am only able to get around 10–20 requests processed per second and I am wondering if this is normal.

I just ran a bunch of tests and found that it can't really handle much via a tcp connection using syslog-ng.
Here are my test results for anyone wanting to try it.
I used balabit's "loggen" program for this and sent 200 byte messages to the tcp port assigned to me by loggly.
Note that although the syslog RFC (3164 at least) states that a log message should not exceed 1024 bytes, I used 200 byte packets just to be fair and because many messages are that small.
Signed up for a free account.
Configured a TCP connection for testing.
Tried sending various amounts, results:
Test 1: FAIL
loggen -iS -r 6000 -s 200 -I 100 logs.loggly.com 16225
Send error Broken pipe, results may be skewed.
average rate = 1392.13 msg/sec, count=18296, time=13.142, (average) msg size=200, bandwidth=271.74 kB/sec
Test 2: FAIL
loggen -iS -r 4000 -s 200 -I 100 logs.loggly.com 16225
Send error Broken pipe, results may be skewed.
average rate = 2767.16 msg/sec, count=121146, time=43.779, (average) msg size=200, bandwidth=540.15 kB/sec
Test 3: FAIL
loggen -iS -r 2500 -s 200 -I 100 logs.loggly.com 16225
Send error Broken pipe, results may be skewed.
average rate = 1931.27 msg/sec, count=85878, time=44.467, (average) msg size=200, bandwidth=376.98 kB/sec
Test 4: FAIL
loggen -iS -r 2000 -s 200 -I 100 logs.loggly.com 16225
Send error Broken pipe, results may be skewed.
average rate = 1617.72 msg/sec, count=83134, time=51.389, (average) msg size=200, bandwidth=315.78 kB/sec
Test 5: FAIL
loggen -iS -r 1000 -s 200 -I 100 logs.loggly.com 16225
Send error Broken pipe, results may be skewed.
average rate = 936.50 msg/sec, count=63331, time=67.624, (average) msg size=200, bandwidth=182.81 kB/sec
Test 6: PASS for duration configured, FAIL for > 100 seconds - SEE TEST 7
loggen -iS -r 500 -s 200 -I 100 logs.loggly.com 16225
average rate = 325.00 msg/sec, count=32501, time=100.001, (average) msg size=200, bandwidth=63.44 kB/sec
Test 7: FAIL - Ran a new test #500 EPS for a longer period and the pipe broke after 255 seconds:
loggen -iS -r 500 -s 200 -I 10000 logs.loggly.com 16225
Send error Broken pipe, results may be skewed.
average rate = 323.35 msg/sec, count=82642, time=255.577, (average) msg size=200, bandwidth=63.12 kB/sec
Test 8: FAIL (ran for longer # 200 EPS, but still failed)
loggen -iS -r 200 -s 200 -I 10000 logs.loggly.com 16225
Send error Broken pipe, results may be skewed.
average rate = 163.53 msg/sec, count=234090, time=1431.470, (average) msg size=200, bandwidth=31.92 kB/sec
Test 9: FAIL (again, ran longer but still failed)
loggen -iS -r 50 -s 200 -I 10000 logs.loggly.com 16225
Send error Broken pipe, results may be skewed.
average rate = 47.36 msg/sec, count=89325, time=1886.014, (average) msg size=200, bandwidth=9.25 kB/sec
Test 10: FAIL? (same results, but lost the connection again. Hard to believe they can’t handle 10 eps?)
loggen -iS -r 10 -s 200 -I 10000 logs.loggly.com 16225
Send error Broken pipe, results may be skewed.
average rate = 9.94 msg/sec, count=1568, time=157.770, (average) msg size=200, bandwidth=1.94 kB/sec
Did some web searching to see what loggly can actually do, but there’s only marketing material that says it is scalable, not how scalable it is.
I did find this:
http://twitter.com/jordansissel/status/5948244626509824
Which is only 22 events per second…
Full Disclosure: I am the founder of LogZilla, so I was testing out the competition because we are launching a cloud-based syslog solution.
My tests show that our software is able to handle anywhere from 2,000 to 12,000 events per second depending on which servers we're using in the cloud.

I really don't know but I've been searching for a logging solution for node.js as well without luck.
Why?
Because all of those that I've checked (didn't check all) are using synchronous disk writing! ...... which AWFULLY degrades performance.
So if you ask me - you should re-consider your needs, and log only stuff you really need.

I ran tests similar to the ones in Clayton answer as his results made me worried that Loggly would drop messages if I sent too many at the same time. I wanted to see if the problems Clayton encountered in 2012 still existed today.
That said, here is what I found running loggen for 60 seconds generating 100,000 messages a second.
$ loggen -iS -r 100000 -s 200 -I 60 logs-01.loggly.com port
average rate = 34885.98 msg/sec, count=2093163, time=60.000, (average) msg size=200, bandwidth=6809.74 kB/sec
I was also curious what some competitors would return for similar tests and I found the following:
Papertrail
loggen -iS -D -r 100000 -s 200 -I 60 logs2.papertrailapp.com PORT
average rate = 24344.71 msg/sec, count=1461327, time=60.026, (average) msg size=200, bandwidth=4752.09 kB/sec
Logentries
$ loggen -iS -D -r 100000 -s 200 -I 60 api.logentries.com PORT
average rate = 14076.76 msg/sec, count=844609, time=60.000, (average) msg size=200, bandwidth=2747.78 kB/sec
Obviously these are not hard numbers that will always be the same as systems change over time. This just gives us a point in time reference of how they responded when I ran the tests. Your mileage will vary!
Update: I ran a longer (nearly 3 hour) test against Loggly and received the following:
loggen -iS -r 100000 -s 200 -I 10000 logs-01.loggly.com port
average rate = 15869.22 msg/sec, count=158692177, time=10000.000, (average) msg size=200, bandwidth=3097.67 kB/sec

Related

Large result is slow anywhere but local

I have a fairly large query running on Clickhouse. The problem is when running on localhost using cmd line it takes about 0.7 sec to complete. This is consistently fast. Issue is when querying from C# / HTTP / Postman. Here it takes about 10 times to return the data. (the size is about 3-4mb) so I dont think its a size issue.
I have tried to monitor network latency, but nothing to notice here.
On the host it works like a charm, but outside it does not :(.... what to do.
I exptect the latency to be a few 100 ms, but turns out to be 7 sec :/
check timings with curl https://clickhouse.yandex/docs/en/interfaces/http/
https://stackoverflow.com/a/22625150
and compare local vs remote
CH HTTP usually provides almost the same performance as TCP and HTTP could be faster for small resultsets (like 10 rows)
Again. The problem is not the HTTP.
Example:
time clickhouse-client -q "select number, arrayMap(x->sipHash64(number,x), range(10)) from numbers(10000)" >native.out
real 0m0.034s
time curl -S -o http.out 'http://localhost:8123/?query=select%20number%2C%20arrayMap(x-%3EsipHash64(number%2Cx)%2C%20range(10))%20from%20numbers(10000)'
real 0m0.017s
ls -l http.out native.out
2108707 Oct 1 16:17 http.out
2108707 Oct 1 16:17 native.out
10 000 rows - 2Mb
HTTP is faster 0.017s VS 0.034s
Canada -> Germany (openvpn)
time curl -S -o http.out 'http://user:xxx#cl.host.x:8123/?query=select%20number%2C%20arrayMap(x-%3EsipHash64(number%2Cx)%2C%20range(10))%20from%20numbers(10000)'
real 0m1.619s
ping cl.host.x
PING cl.host.x (10.253.52.6): 56 data bytes
64 bytes from 10.253.52.6: icmp_seq=0 ttl=61 time=131.710 ms
64 bytes from 10.253.52.6: icmp_seq=1 ttl=61 time=133.711 ms

Ping on shell scripts: Some packet loss, but error code $? equals to zero. How can I detect?

Sometimes my DSL router fails in this strange manner:
luis#balanceador:~$ sudo ping 8.8.8.8 -I eth9
[sudo] password for luis:
PING 8.8.8.8 (8.8.8.8) from 192.168.3.100 eth9: 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=47 time=69.3 ms
ping: sendmsg: Operation not permitted
64 bytes from 8.8.8.8: icmp_seq=3 ttl=47 time=68.0 ms
ping: sendmsg: Operation not permitted
64 bytes from 8.8.8.8: icmp_seq=5 ttl=47 time=68.9 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=47 time=67.2 ms
ping: sendmsg: Operation not permitted
64 bytes from 8.8.8.8: icmp_seq=8 ttl=47 time=67.2 ms
^C
--- 8.8.8.8 ping statistics ---
8 packets transmitted, 5 received, 37% packet loss, time 7012ms
rtt min/avg/max/mdev = 67.254/68.183/69.391/0.906 ms
luis#balanceador:~$ echo $?
0
As can be seen, error code $? is 0. So I can not simply detect if the command failed, as the output yields no error for any script.
What is the proper way to detect that there were some packet loss?
Do I need to parse the output with grep or there is some simpler method?
According to the man page, by default (on Linux), if ping does not receive any reply packets at all, it will exit with code 1. But if a packet count (-c) and deadline timeout (-w, seconds) are both specified, and fewer packets before timeout are received, it will also exit with code 1. On other errors it exits with code 2.
ping 8.8.8.8 -I eth9 -c 3 -w 3
So, the error code will be set if 3 packets are not received within 3 seconds.
As #mklement0 noted, ping on BSD behaves in a bit different way:
The ping utility exits with one of the following values:
0 - at least one response was heard from the specified host.
2 - the transmission was successful but no responses were received.
So, in this case one should try workaround it with sending one by one in a loop
ip=8.8.8.8
count=3
for i in $(seq ${count}); do
ping ${ip} -I eth9 -c 1
if [ $? -eq 2 ]; then
## break and retransmit exit code
exit 2
fi
done
Of course, if you need full statistics, just count codes "2" and "0" to some variables and print result / set error code after for loop if you need.

Pentaho "Get file from FTP" times out

Pentaho's get file from FTP step fails randomly. Sometimes it properly downloads the file, sometimes it doesn't returning error:
Error getting files from FTP : Read timed out
The timeout is set to 100 seconds and the read actually fails after less than one seconds.
Contrary to what the Get a file from FTP documentation says about the timeout, it is not in seconds, but in milliseconds.
Change it to any reasonable value like 60000 (1 minute in ms) and your import will work.

net-snmp snmptrap sending samples

I'm new in SNMP and I just configured the agent and the manager and I'm
able to receive the traps sent by the agent. But I noticed that the traps
received by the manager are captured between 10 seconds, but I need to
receive the traps as soon as I generate them not between 10 sec.
I'll show you my script which is intended to capture the signal avg power
that a client has with an Access Point, the samples are taking between 1
sec and I need to send that trap to the manager in less time than 1 sec.
while :
do
valor=$(iw dev wlan0 station dump \
| grep 'signal avg': | awk '{print $3}')
snmptrap -v 1 -c public 192.168.1.25 '1.2.3.4.5.6' \
'192.168.1.1' 6 99 '55' 1.11.12.13.14.15 s "$valor"
echo $valor >> muestras.txt
sleep 1
done
But surprisingly the traps seems to be generated between 10 sec or maybe
the manager is receive them in an elapsed time of 10 sec. I don't know
where is the problem, in the agent or in the manager, but I'm sure that the
agent generates samples in 1 sec because "muestras.txt" shows that.
Hope you can help me!.
Greetings!
I found the answer.
The problem was in the server who executes snmptrapd. Simply I passed the argument -n to the snmptrapd and that solved all!.

Ruby concurrency: non-blocking I/O vs threads

I am playing around with concurrency in Ruby (1.9.3-p0), and have created a very simple, I/O-heavy proxy task. First, I tried the non-blocking approach:
require 'rack'
require 'rack/fiber_pool'
require 'em-http'
require 'em-synchrony'
require 'em-synchrony/em-http'
proxy = lambda {|*|
result = EM::Synchrony.sync EventMachine::HttpRequest.new('http://google.com').get
[200, {}, [result.response]]
}
use Rack::FiberPool, :size => 1000
run proxy
=begin
$ thin -p 3000 -e production -R rack-synchrony.ru start
>> Thin web server (v1.3.1 codename Triple Espresso)
$ ab -c100 -n100 http://localhost:3000/
Concurrency Level: 100
Time taken for tests: 5.602 seconds
HTML transferred: 21900 bytes
Requests per second: 17.85 [#/sec] (mean)
Time per request: 5602.174 [ms] (mean)
=end
Hmm, I thought I must be doing something wrong. An average request time of 5.6s for a task where we are mostly waiting for I/O? I tried another one:
require 'sinatra'
require 'sinatra/synchrony'
require 'em-synchrony/em-http'
get '/' do
EM::HttpRequest.new("http://google.com").get.response
end
=begin
$ ruby sinatra-synchrony.rb -p 3000 -e production
== Sinatra/1.3.1 has taken the stage on 3000 for production with backup from Thin
>> Thin web server (v1.3.1 codename Triple Espresso)
$ ab -c100 -n100 http://localhost:3000/
Concurrency Level: 100
Time taken for tests: 5.476 seconds
HTML transferred: 21900 bytes
Requests per second: 18.26 [#/sec] (mean)
Time per request: 5475.756 [ms] (mean)
=end
Hmm, a little better, but not what I would call a success. Finally, I tried a threaded implementation:
require 'rack'
require 'excon'
proxy = lambda {|*|
result = Excon.get('http://google.com')
[200, {}, [result.body]]
}
run proxy
=begin
$ thin -p 3000 -e production -R rack-threaded.ru --threaded --no-epoll start
>> Thin web server (v1.3.1 codename Triple Espresso)
$ ab -c100 -n100 http://localhost:3000/
Concurrency Level: 100
Time taken for tests: 2.014 seconds
HTML transferred: 21900 bytes
Requests per second: 49.65 [#/sec] (mean)
Time per request: 2014.005 [ms] (mean)
=end
That was really, really surprising. Am I missing something here? Why is EM performing so badly here? Is there some tuning I need to do? I tried various combinations (Unicorn, several Rainbows configurations, etc), but none of them came even close to the simple, old I/O-blocking threading.
Ideas, comments and - obviously - suggestions for better implementations are very welcome.
See how your "Time per request" exactly equals total "Time taken for tests"? This is a reporting arithmetic artifact due to your request count (-n) being equal to your concurrency level (-c). The mean-time is the total-time*concurrency/num-requests. So the reported mean when -n == -c will be the time of the longest request. You should conduct your ab runs with -n > -c by several factors to get reasonable measures.
You seem to be using an old version of ab as a relatively current one reports far more detailed results by default. Running directly against google I show similar total-time == mean time when -n == -c, and get more reasonable numbers when -n > -c. You really want to look at the req/sec, mean across all concurrent requests, and the final service level breakdown to get a better understanding.
$ ab -c50 -n50 http://google.com/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking google.com (be patient).....done
Server Software: gws
Server Hostname: google.com
Server Port: 80
Document Path: /
Document Length: 219 bytes
Concurrency Level: 50
Time taken for tests: 0.023 seconds <<== note same as below
Complete requests: 50
Failed requests: 0
Write errors: 0
Non-2xx responses: 50
Total transferred: 27000 bytes
HTML transferred: 10950 bytes
Requests per second: 2220.05 [#/sec] (mean)
Time per request: 22.522 [ms] (mean) <<== note same as above
Time per request: 0.450 [ms] (mean, across all concurrent requests)
Transfer rate: 1170.73 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 2 0.6 3 3
Processing: 8 9 2.1 9 19
Waiting: 8 9 2.1 9 19
Total: 11 12 2.1 11 22
WARNING: The median and mean for the initial connection time are not within a normal deviation
These results are probably not that reliable.
Percentage of the requests served within a certain time (ms)
50% 11
66% 12
75% 12
80% 12
90% 12
95% 12
98% 22
99% 22
100% 22 (longest request) <<== note same as total and mean above
$ ab -c50 -n500 http://google.com/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking google.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests
Server Software: gws
Server Hostname: google.com
Server Port: 80
Document Path: /
Document Length: 219 bytes
Concurrency Level: 50
Time taken for tests: 0.110 seconds
Complete requests: 500
Failed requests: 0
Write errors: 0
Non-2xx responses: 500
Total transferred: 270000 bytes
HTML transferred: 109500 bytes
Requests per second: 4554.31 [#/sec] (mean)
Time per request: 10.979 [ms] (mean)
Time per request: 0.220 [ms] (mean, across all concurrent requests)
Transfer rate: 2401.69 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 1 1 0.7 1 3
Processing: 8 9 0.7 9 13
Waiting: 8 9 0.7 9 13
Total: 9 10 1.3 10 16
Percentage of the requests served within a certain time (ms)
50% 10
66% 11
75% 11
80% 12
90% 12
95% 13
98% 14
99% 15
100% 16 (longest request)

Resources