Heroku syslog drain : too many tcp sessions - heroku

I set a logging drain on my Heroku app, to send logs to my other server.
My rsyslogd well receive the logs from Heroku, but after few hours, rsyslog start to drop packets because too many TCP connections are opened:
Oct 18 06:28:17 localhost rsyslogd-2079: too many tcp sessions - dropping incoming request [try http://www.rsyslog.com/e/2079 ]
Oct 18 06:28:24 localhost rsyslogd-2079: too many tcp sessions - dropping incoming request [try http://www.rsyslog.com/e/2079 ]
Oct 18 06:28:24 localhost rsyslogd-2079: too many tcp sessions - dropping incoming request [try http://www.rsyslog.com/e/2079 ]
Oct 18 06:28:26 localhost rsyslogd-2079: too many tcp sessions - dropping incoming request [try http://www.rsyslog.com/e/2079 ]
[...]
I try to increase the maximum sessions allowed into the rsyslogd configuration (I set it to 1000, which is normally enough to handle everything).
Same issue, so I increase this value to 3000. I have fewer issues now, but I think 3000 max sessions is paticulary high.
$ModLoad imtcp
$InputTCPServerRun 514
$InputTCPMaxSessions 3000
Do you think there is another thing to do? Do I need to decrease this number?
Maybe there is something else to do to better deal with logs comming from Heroku LogPlex.

Related

Strongswan not establishing connection

I'm creating a VPN using StrongSwan. It's my first time using this tool. I followed a tutorial to set up. I've hit a blocker whereby the peer connection times out. The status is 0 up, 1 connecting.
I have tried on different servers, the same issue happends.
ipsec.conf
conn conec-example
authby=secret
left=%defaultroute
leftid=<public_IP_1>
leftsubnet=<private_ip_1>/20
right=<public_IP_2>
rightsubnet=<private_ip_2>/20
ike=aes256-sha2_256-modp1024!
esp=aes256-sha2_256!
keyingtries=0
ikelifetime=1h
lifetime=8h
dpddelay=30
dpdtimeout=120
dpdaction=restart
auto=start
ipsec.secrets
public_IP_1 public_IP_2 : PSK "randomprivatesharedkey"
Here is part of the logs:
Aug 18 17:29:01 ip-x charon: 10[IKE] retransmit 2 of request with message ID 0
Aug 18 17:29:01 ip-x charon: 10[NET] sending packet: from x.x[500] to x.x.x.x[500] (334 bytes)
Aug 18 17:30:19 ip-x charon: 13[IKE] retransmit 5 of request with message ID 0
Aug 18 17:30:19 ip-xcharon: 13[NET] sending packet: from x.x[500] tox.x.x.129[500] (334 bytes)
Aug 18 17:31:35 charon: 16[IKE] giving up after 5 retransmits
Aug 18 17:31:35 charon: 16[IKE] peer not responding, trying again (2/0)
I expected a successful connection after setting up this, though no success. How can I resolve this? Any ideas?
Based on the log excerpt, strongswan has an issue to reach the other peer.
There is way too little information to provide an exact answer; topology and addressing plan, relevant AWS security groups settings and both VPN peers configuration are needed.
Still please let me offer a few hints what to do in order to successfully connect via VPN:
UDP ports 500 and 4500 must be open on both VPN peers. In AWS, it means an AWS security group associated with the EC2 instance running strongswan must contain explicit rules to allow incoming UDP traffic on ports 500 and 4500. EC2 instance is always behind a NAT, so ESP/AH packets will be encapsulated in UDP packets.
Any firewall on both VPN peers has to allow the UDP traffic mentioned in the previous point.
Beware that the UDP encapsulation affects the MTU of the traffic going through the VPN connection.

Socket.io behind HAProxy behind Google Cloud load balancer giving connection errors

We are trying to configure our Socket.io socket servers behind HAProxy and over HAProxy we are using Google Cloud Load Balancer, so that HAProxy is not the single point of failure. As mentioned in this post by https://medium.com/google-cloud/highly-available-websockets-on-google-cloud-c74b35ee20bc#.o6xxj5br8. Also depicted in the picture below.
At the Google cloud load balancer we are using TCP Load balancing with SSL Proxy with Proxy Protocol ON.
The HAProxy is configured to use Cookies so that a client always connects to the same server. However since cookies might not be available on all our clients system, we decided to use load balancing algorithm as source in HAProxy. Here is the HAProxy configuration
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
maxconn 16384
tune.ssl.default-dh-param 2048
user haproxy
group haproxy
daemon
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
# https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3
defaults
mode http
log global
option httplog
option http-server-close
option dontlognull
option redispatch
option contstats
retries 3
backlog 10000
timeout client 25s
timeout connect 5s
timeout server 25s
timeout tunnel 3600s
timeout http-keep-alive 1s
timeout http-request 15s
timeout queue 30s
timeout tarpit 60s
default-server inter 3s rise 2 fall 3
option forwardfor
frontend public
bind *:443 ssl crt /etc/ssl/private/key.pem ca-file /etc/ssl/private/cert.crt accept-proxy
maxconn 50000
default_backend ws
backend ws
timeout check 5000
option tcp-check
option log-health-checks
balance source
cookie QUIZIZZ_WS_COOKIE insert indirect nocache
server ws1 socket-server-1:4000 maxconn 4096 weight 10 check rise 1 fall 3 check cookie ws1 port 4000
server ws2 socket-server-1:4001 maxconn 4096 weight 10 check rise 1 fall 3 check cookie ws2 port 4001
server ws3 socket-server-2:4000 maxconn 4096 weight 10 check rise 1 fall 3 check cookie ws3 port 4000
server ws4 socket-server-2:4001 maxconn 4096 weight 10 check rise 1 fall 3 check cookie ws4 port 4001
This is however giving connection errors on around 5% of our clients as compared to our old single server system. Any suggestions?
Edit: Connection errors means that the client was not able to connect to the socket server and the socket.io client was throwing connection errors.
Thanks in advance.

Haproxy 1.5 performance issues

I need to improve the performance of an haproxy 1.5 running as a load balancer in an Ubuntu 14.04 instance. I have an analytics like code on many sites and for every pageview the client asks between 2-5 diferent scrips of ours. The other day we received more than 1k requests per second on the load balancer and it started to run really slow. It reached the active sessions limit 2000 at a rate for 1000 per second. On the configuration we use option http-keep-alive 100 to maintain the connection open for 100 ms until it is closed. How can we improve this? What is the best config for this use case? I may be loosing many details here please ask for them is there is info missing.
EDIT
Here are some details:
I'm running an AWS ubuntu 14.04 server in a c3.xlarge virtual machine. There we use haproxy 1.5 to load balance web traffic between several app instances. (Every app has its own haproxy to load balance between its own app instances - deployed one per core).
The server only has haproxy and no other software installed.
The bottleneck as per haproxy stat page is the front end load balancer as at that moment it had a session rate of 258 and current sessions of 2000 (being 2000 the max), and all the apps had a 96 sessions rate and 0/1 as the current sessions. I would post image but because of my reputation points I can't do that.
This was the configuration at that point in time:
global
maxconn 18000
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
# https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3
defaults
mode http
retries 2
option redispatch
timeout connect 5s
timeout client 15s
timeout server 15s
timeout http-keep-alive 1
frontend public
log 127.0.0.1 local0 notice
option dontlognull
option httplog
bind *:80
bind *:443 ssl crt /etc/ssl/private/server.pem
default_backend rely_apps
frontend private
bind 127.0.0.1:80
stats enable
stats auth xxx:xxx
stats admin if LOCALHOST
stats uri /haproxy?stats
stats show-legends
stats realm Haproxy\ Statistics
backend rely_apps
option forwardfor
balance roundrobin
option httpchk
server app1 10.0.0.1:80 check
server app2 10.0.0.2:80 check
server app3 10.0.0.3:80 check
The connections were very high, it seems like it was closing them (or closing at a really low rate).
CPU and memory usage was really low.
Now we changed that config for the following one and it's working without problems:
global
maxconn 64000
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
tune.bufsize 16384
tune.maxrewrite 1024
nbproc 4
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# Default ciphers to use on SSL-enabled listening sockets.
# For more information, see ciphers(1SSL). This list is from:
# https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
ssl-default-bind-options no-sslv3
defaults
mode http
retries 2
option redispatch
option forceclose
option http-pretend-keepalive
timeout connect 5s
timeout client 15s
timeout server 15s
timeout http-keep-alive 1s
frontend public
log 127.0.0.1 local0 notice
option dontlognull
option httplog
maxconn 18000
bind *:80
bind *:443 ssl crt /etc/ssl/private/server.pem
default_backend rely_apps
#frontend private
# bind 127.0.0.1:80
stats enable
stats auth xxx:xxx
stats admin if LOCALHOST
stats uri /haproxy?stats
stats show-legends
stats realm Haproxy\ Statistics
backend rely_apps
option forwardfor
balance leastconn
option httpchk
server app1 10.0.0.1:80 check maxconn 100
server app2 10.0.0.2:80 check maxconn 100
server app3 10.0.0.3:80 check maxconn 100
However all connections are being closed on the return (and we have the same rate of sessions and requests).
This is not good also because we have to open a new connection for every client request (and we have 3/4 requests per client).
How can we achieve a good keep-alive (like 100ms I think could work), without hitting the max connections limit?
Thanks.
The number you give are very very low.
Please give more details about your architecture, type of server, third party software running on it (such as iptables), also share your configuration.
Baptiste

Requests going through EC2 load balancer hangs

I have setup a simple proxy on an EC2 instance using Tinyproxy (default config listening/allowing all incoming connections). This works well. If I, for debugging, fill in IP address and the port to proxy settings in my browser, I can brows through the proxy without any issues. Everything works. However if I create an EC2 load balancer in front of the instance (making sure to forward the http port correctly) it just hangs when I browse through the load balancers IP. This seems like a puzzle to me. The instance is running, the load balancer reports "in service", and going around the load balancer works, but going through it just hangs. What am I missing out, and where should I look for the error?
UPDATE
I have now had a look at the logs of Tinyproxy: When trying to access google.com directly through the instances proxy, I see logs like this:
CONNECT Apr 30 20:41:33 [1862]: Request (file descriptor 6): GET http://google.com/ HTTP/1.1
INFO Apr 30 20:41:33 [1862]: No upstream proxy for google.com
CONNECT Apr 30 20:41:33 [1862]: Established connection to host "google.com" using file descriptor 7.
INFO Apr 30 20:41:33 [1862]: Closed connection between local client (fd:6) and remote client (fd:7)
CONNECT Apr 30 20:41:33 [1901]: Connect (file descriptor 6): x1-6-84-1b-ADDJF-20-07-92.fsdfe [430.12327.65117.615]
CONNECT Apr 30 20:41:33 [1901]: Request (file descriptor 6): GET http://www.google.ie/?gws_rd=cr&ei=_V9hU8DeFMTpPJjygIgC HTTP/1.1
INFO Apr 30 20:41:33 [1901]: No upstream proxy for www.google.ie
CONNECT Apr 30 20:41:33 [1901]: Established connection to host "www.google.ie" using file descriptor 7.
However if i try to access google through the load balancer, that then forwards to the instance, then I see logs like this:
CONNECT Apr 30 20:42:54 [1860]: Request (file descriptor 6): GET / HTTP/1.1
CONNECT Apr 30 20:42:54 [1869]: Connect (file descriptor 6): ip-432-2383245-53.eu-west-1.compute.internal [10.238.155.237]
CONNECT Apr 30 20:42:54 [2037]: Connect (file descriptor 6): ip-432-2383245-53.eu-west-1.compute.internal [10.238.155.237]
INFO Apr 30 20:42:54 [1860]: process_request: trans Host GET http://google.com:8888/ for 6
INFO Apr 30 20:42:54 [1860]: No upstream proxy for google.com
CONNECT Apr 30 20:43:12 [1861]: Connect (file descriptor 6): ip-432-2383245-53.eu-west-1.compute.internal [1230.23845.515.2537]
CONNECT Apr 30 20:43:12 [2035]: Connect (file descriptor 6): ip-432-2383245-53.eu-west-1.compute.internal [143.238.12345.117]
ERROR Apr 30 20:43:12 [2035]: read_request_line: Client (file descriptor: 6) closed socket before read.
ERROR Apr 30 20:43:12 [1861]: read_request_line: Client (file descriptor: 6) closed socket before read.
ERROR Apr 30 20:43:12 [2035]: Error reading readble client_fd 6
ERROR Apr 30 20:43:12 [1861]: Error reading readble client_fd 6
WARNING Apr 30 20:43:12 [2035]: Could not retrieve request entity
WARNING Apr 30 20:43:12 [1861]: Could not retrieve request entity
From what I notice, then the ELB is trying to send the request through port 8888
You can get ELB access logs. These Access logs can help you determine the time taken for a request at different intervals. e.g:
request_processing_time: Total time elapsed (in seconds) from the time the load balancer receives the request and sends the request to a registered instance.
backend_processing_time: Total time elapsed (in seconds) from the time the load balancer sends the request to a registered instance and the instance begins sending the response headers.
response_processing_time: Total time elapsed (in seconds) from the time the load balancer receives the response header from the registered instance and starts sending the response to the client. this processing time includes both queuing time at the load balancer and the connection acquisition time from the load balancer to the backend.
...and a lot more information. You need to configure the access logs first. Please follow below articles to get more understanding around using ELB access logs:
Access Logs for Elastic Load Balancers
Access Logs
These logs may/may not solve your problem but is certainly a good point to start with. Besides, you can always check with AWS Technical support for more in depth analysis.
It sounds like you're trying to use ELB in HTTP mode as, essentially, something resembling a forward proxy, or at least a gateway to a forward proxy that you're running behind it. That's not ELB's intended application, so it isn't surprising that it wouldn't work in that configuration.
ELB in HTTP listener mode is intended to be used as a reverse proxy in front of a web endpoint/origin server.
Configuring your ELB listener to use TCP mode instead of HTTP mode should allow your intended configuration to work, but this is somewhat outside the optimum application of ELB.
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-listener-config.html

Frozen connection using ssh over Amazon EC2 using ubuntu

When I am connected to Amazon EC2 using the secure shell and don't type anything for a few minutes, everything freezes. I can't type anything or exit. After a few minutes I get a message from the server...
Last login: Fri Dec 6 23:21:28 2013 from pool-173-52-249-158.nycmny.east.verizon.net
ubuntu#ip-172-31-31-33:~$ Write failed: Broken pipe
Some of you have had to have this problem before. If you could shed some light on the situation for a newb using the cloud.
Try below options:
Explore ServerAliveCountMax and ServerAliveInterval. These settings are set in /etc/ssh/ssh_config on SSH client side.
from man ssh_config:
ServerAliveCountMax
Sets the number of server alive messages (see below) which may be sent without ssh(1) receiving any mes‐
sages back from the server. If this threshold is reached while server alive messages are being sent, ssh
will disconnect from the server, terminating the session. It is important to note that the use of server
alive messages is very different from TCPKeepAlive (below). The server alive messages are sent through
the encrypted channel and therefore will not be spoofable. The TCP keepalive option enabled by
TCPKeepAlive is spoofable. The server alive mechanism is valuable when the client or server depend on
knowing when a connection has become inactive.
The default value is 3. If, for example, ServerAliveInterval (see below) is set to 15 and
ServerAliveCountMax is left at the default, if the server becomes unresponsive, ssh will disconnect after
approximately 45 seconds. This option applies to protocol version 2 only; in protocol version 1 there is
no mechanism to request a response from the server to the server alive messages, so disconnection is the
responsibility of the TCP stack.
And
ServerAliveInterval
Sets a timeout interval in seconds after which if no data has been received from the server, ssh(1) will
send a message through the encrypted channel to request a response from the server. The default is 0,
indicating that these messages will not be sent to the server, or 300 if the BatchMode option is set.
This option applies to protocol version 2 only. ProtocolKeepAlives and SetupTimeOut are Debian-specific
compatibility aliases for this option.
Also similar settings are available from the server side which are ClientAliveInterval and ClientAliveCountMax. These settings palced in /etc/ssh/sshd_config on Server side.
from man sshd_config:
ClientAliveCountMax
Sets the number of client alive messages (see below) which may be sent without sshd(8) receiving any mes‐
sages back from the client. If this threshold is reached while client alive messages are being sent,
sshd will disconnect the client, terminating the session. It is important to note that the use of client
alive messages is very different from TCPKeepAlive (below). The client alive messages are sent through
the encrypted channel and therefore will not be spoofable. The TCP keepalive option enabled by
TCPKeepAlive is spoofable. The client alive mechanism is valuable when the client or server depend on
knowing when a connection has become inactive.
The default value is 3. If ClientAliveInterval (see below) is set to 15, and ClientAliveCountMax is left
at the default, unresponsive SSH clients will be disconnected after approximately 45 seconds. This
option applies to protocol version 2 only.
And
ClientAliveInterval
Sets a timeout interval in seconds after which if no data has been received from the client, sshd(8) will
send a message through the encrypted channel to request a response from the client. The default is 0,
indicating that these messages will not be sent to the client. This option applies to protocol version 2
only.
Looks like your firewall (from different locations) are dropping the sessions due to inactivity.
I would try just like #slayedbylucifer stated something like this in your ~/.ssh/config
Host *
ServerAliveInterval 60

Resources