Strongswan not establishing connection - amazon-ec2

I'm creating a VPN using StrongSwan. It's my first time using this tool. I followed a tutorial to set up. I've hit a blocker whereby the peer connection times out. The status is 0 up, 1 connecting.
I have tried on different servers, the same issue happends.
ipsec.conf
conn conec-example
authby=secret
left=%defaultroute
leftid=<public_IP_1>
leftsubnet=<private_ip_1>/20
right=<public_IP_2>
rightsubnet=<private_ip_2>/20
ike=aes256-sha2_256-modp1024!
esp=aes256-sha2_256!
keyingtries=0
ikelifetime=1h
lifetime=8h
dpddelay=30
dpdtimeout=120
dpdaction=restart
auto=start
ipsec.secrets
public_IP_1 public_IP_2 : PSK "randomprivatesharedkey"
Here is part of the logs:
Aug 18 17:29:01 ip-x charon: 10[IKE] retransmit 2 of request with message ID 0
Aug 18 17:29:01 ip-x charon: 10[NET] sending packet: from x.x[500] to x.x.x.x[500] (334 bytes)
Aug 18 17:30:19 ip-x charon: 13[IKE] retransmit 5 of request with message ID 0
Aug 18 17:30:19 ip-xcharon: 13[NET] sending packet: from x.x[500] tox.x.x.129[500] (334 bytes)
Aug 18 17:31:35 charon: 16[IKE] giving up after 5 retransmits
Aug 18 17:31:35 charon: 16[IKE] peer not responding, trying again (2/0)
I expected a successful connection after setting up this, though no success. How can I resolve this? Any ideas?

Based on the log excerpt, strongswan has an issue to reach the other peer.
There is way too little information to provide an exact answer; topology and addressing plan, relevant AWS security groups settings and both VPN peers configuration are needed.
Still please let me offer a few hints what to do in order to successfully connect via VPN:
UDP ports 500 and 4500 must be open on both VPN peers. In AWS, it means an AWS security group associated with the EC2 instance running strongswan must contain explicit rules to allow incoming UDP traffic on ports 500 and 4500. EC2 instance is always behind a NAT, so ESP/AH packets will be encapsulated in UDP packets.
Any firewall on both VPN peers has to allow the UDP traffic mentioned in the previous point.
Beware that the UDP encapsulation affects the MTU of the traffic going through the VPN connection.

Related

How to diagnose AWS port 25 egress block

I'm having trouble diagnosing what appears to be a complete blockage of outbound port 25 connections on AWS EC2.
I'm aware of the port throttling, but I don't think that's the issue. I don't think it's the issue because
I've been running this mail server for at least 7 years
Although I can't recall for sure, I'm fairly certain that I filled out the form to remove sending limitations ~ 7 years ago
The server only sends a few dozen emails per day
I've been running tcpdump on the interface for a while, and there are no more than a few attempts per hour to send outbound packets to anyone on port 25
I don't have any emails from AWS indicating I've exceeded a quota
(as an aside, the above said, is there a way to tell if AWS has turned on throttling, and/or what is the actual quota?)
I can telnet to port 25 on the AWS private networks (another aside, where does AWS perform the throttling?):
$ telnet 172.31.14.133 25
Trying 172.31.14.133...
Connected to 172.31.14.133.
Escape character is '^]'.
220 <mymailserver>.com ESMTP Postfix
I can not telnet to the outside world from the mail server, nor from another EC2 instance set up in this VPC for testing purposes, nor from an EC2 server set up in a different VPC. For example, the exact telnet that worked above does not work if I replace the private IP address with the public one (but I can telnet to the public one from the outside world).
The outbound security group rules are Ports all Protocols all 0.0.0.0/0
The network ACL for the VPC, both inbound and outbound, is Type ALL Traffic Protocol ALL Port Range ALL Destination 0.0.0.0/0 ALLOW
Looking at the mail logs, it appears that no outbound SMTP traffic has succeeded since January 28th. I would think even if this were throttling, something would have worked somewhere along the way, and I'm now at a complete loss on how to move forward with diagnosing this.
Update: Per suggestions below, I've gone ahead and requested removal of the limit. We'll see how that goes, but I'm still unconvinced it's the problem.
Additionally, I've turned on CloudWatch logs for the VPC. The server in question has sent 14 packets outbound to port 25 in the last 12 hours, so I really would think it would be below any throttling limit. When I look at the logs, the entries are marked as "REJECT", but still no luck on figuring out what is doing the rejecting. Is there any way to determine what "rule" is causing the reject?
Any ideas?
TIA!
From Remove the Port 25 Restriction From Your EC2 Instance:
Amazon EC2 restricts traffic on port 25 of all EC2 instances by default, but you can request for this restriction to be removed.
It says that you must:
Create a DNS A record
Request AWS to remove the port 25 restriction on your instance via a Request to Remove Email Sending Limitations form
Alternatively, you could consider using Amazon Simple Email Service (Amazon SES) to send email, rather than sending it directly from the instance.
Seems like something is blocking the traffic on port 25. Please check the following things.
Check if there are any rules set in VPC ACL to block traffic.
Check if there are any recent updates to iptables on OS.
check for any recent changes to DNS / Route 53.

Not able to ssh over a public wifi network - Operation timed out

For some reason, I'm trying to access a server via ssh over a public wifi network, but I'm getting the error:
connect to host ***.***.***.*** port 22: Operation timed out
Upon further investigation, I found out I could not ping any remote server as well:
admin ~ $ ping google.com
PING google.com (216.58.216.46): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
^C
--- google.com ping statistics ---
6 packets transmitted, 0 packets received, 100.0% packet loss
I never encountered this problem from my home network, and since I'm trying this from a Library's Public Wifi network, I wonder if it's the public network that's causing hindrances. I am however able to access the internet flawlessly through my browsers. Sorry I'm really not well versed with network stuff, but I'd appreciate any insights to get through this problem.
It could be because port 22 is closed on that network. You could test the service over other network and it should work well. Whatever, if you are sure that port 22 is open, maybe check that in the remote machine is running SSH server and listening for clients at the port in question.

Requests going through EC2 load balancer hangs

I have setup a simple proxy on an EC2 instance using Tinyproxy (default config listening/allowing all incoming connections). This works well. If I, for debugging, fill in IP address and the port to proxy settings in my browser, I can brows through the proxy without any issues. Everything works. However if I create an EC2 load balancer in front of the instance (making sure to forward the http port correctly) it just hangs when I browse through the load balancers IP. This seems like a puzzle to me. The instance is running, the load balancer reports "in service", and going around the load balancer works, but going through it just hangs. What am I missing out, and where should I look for the error?
UPDATE
I have now had a look at the logs of Tinyproxy: When trying to access google.com directly through the instances proxy, I see logs like this:
CONNECT Apr 30 20:41:33 [1862]: Request (file descriptor 6): GET http://google.com/ HTTP/1.1
INFO Apr 30 20:41:33 [1862]: No upstream proxy for google.com
CONNECT Apr 30 20:41:33 [1862]: Established connection to host "google.com" using file descriptor 7.
INFO Apr 30 20:41:33 [1862]: Closed connection between local client (fd:6) and remote client (fd:7)
CONNECT Apr 30 20:41:33 [1901]: Connect (file descriptor 6): x1-6-84-1b-ADDJF-20-07-92.fsdfe [430.12327.65117.615]
CONNECT Apr 30 20:41:33 [1901]: Request (file descriptor 6): GET http://www.google.ie/?gws_rd=cr&ei=_V9hU8DeFMTpPJjygIgC HTTP/1.1
INFO Apr 30 20:41:33 [1901]: No upstream proxy for www.google.ie
CONNECT Apr 30 20:41:33 [1901]: Established connection to host "www.google.ie" using file descriptor 7.
However if i try to access google through the load balancer, that then forwards to the instance, then I see logs like this:
CONNECT Apr 30 20:42:54 [1860]: Request (file descriptor 6): GET / HTTP/1.1
CONNECT Apr 30 20:42:54 [1869]: Connect (file descriptor 6): ip-432-2383245-53.eu-west-1.compute.internal [10.238.155.237]
CONNECT Apr 30 20:42:54 [2037]: Connect (file descriptor 6): ip-432-2383245-53.eu-west-1.compute.internal [10.238.155.237]
INFO Apr 30 20:42:54 [1860]: process_request: trans Host GET http://google.com:8888/ for 6
INFO Apr 30 20:42:54 [1860]: No upstream proxy for google.com
CONNECT Apr 30 20:43:12 [1861]: Connect (file descriptor 6): ip-432-2383245-53.eu-west-1.compute.internal [1230.23845.515.2537]
CONNECT Apr 30 20:43:12 [2035]: Connect (file descriptor 6): ip-432-2383245-53.eu-west-1.compute.internal [143.238.12345.117]
ERROR Apr 30 20:43:12 [2035]: read_request_line: Client (file descriptor: 6) closed socket before read.
ERROR Apr 30 20:43:12 [1861]: read_request_line: Client (file descriptor: 6) closed socket before read.
ERROR Apr 30 20:43:12 [2035]: Error reading readble client_fd 6
ERROR Apr 30 20:43:12 [1861]: Error reading readble client_fd 6
WARNING Apr 30 20:43:12 [2035]: Could not retrieve request entity
WARNING Apr 30 20:43:12 [1861]: Could not retrieve request entity
From what I notice, then the ELB is trying to send the request through port 8888
You can get ELB access logs. These Access logs can help you determine the time taken for a request at different intervals. e.g:
request_processing_time: Total time elapsed (in seconds) from the time the load balancer receives the request and sends the request to a registered instance.
backend_processing_time: Total time elapsed (in seconds) from the time the load balancer sends the request to a registered instance and the instance begins sending the response headers.
response_processing_time: Total time elapsed (in seconds) from the time the load balancer receives the response header from the registered instance and starts sending the response to the client. this processing time includes both queuing time at the load balancer and the connection acquisition time from the load balancer to the backend.
...and a lot more information. You need to configure the access logs first. Please follow below articles to get more understanding around using ELB access logs:
Access Logs for Elastic Load Balancers
Access Logs
These logs may/may not solve your problem but is certainly a good point to start with. Besides, you can always check with AWS Technical support for more in depth analysis.
It sounds like you're trying to use ELB in HTTP mode as, essentially, something resembling a forward proxy, or at least a gateway to a forward proxy that you're running behind it. That's not ELB's intended application, so it isn't surprising that it wouldn't work in that configuration.
ELB in HTTP listener mode is intended to be used as a reverse proxy in front of a web endpoint/origin server.
Configuring your ELB listener to use TCP mode instead of HTTP mode should allow your intended configuration to work, but this is somewhat outside the optimum application of ELB.
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-listener-config.html

Frozen connection using ssh over Amazon EC2 using ubuntu

When I am connected to Amazon EC2 using the secure shell and don't type anything for a few minutes, everything freezes. I can't type anything or exit. After a few minutes I get a message from the server...
Last login: Fri Dec 6 23:21:28 2013 from pool-173-52-249-158.nycmny.east.verizon.net
ubuntu#ip-172-31-31-33:~$ Write failed: Broken pipe
Some of you have had to have this problem before. If you could shed some light on the situation for a newb using the cloud.
Try below options:
Explore ServerAliveCountMax and ServerAliveInterval. These settings are set in /etc/ssh/ssh_config on SSH client side.
from man ssh_config:
ServerAliveCountMax
Sets the number of server alive messages (see below) which may be sent without ssh(1) receiving any mes‐
sages back from the server. If this threshold is reached while server alive messages are being sent, ssh
will disconnect from the server, terminating the session. It is important to note that the use of server
alive messages is very different from TCPKeepAlive (below). The server alive messages are sent through
the encrypted channel and therefore will not be spoofable. The TCP keepalive option enabled by
TCPKeepAlive is spoofable. The server alive mechanism is valuable when the client or server depend on
knowing when a connection has become inactive.
The default value is 3. If, for example, ServerAliveInterval (see below) is set to 15 and
ServerAliveCountMax is left at the default, if the server becomes unresponsive, ssh will disconnect after
approximately 45 seconds. This option applies to protocol version 2 only; in protocol version 1 there is
no mechanism to request a response from the server to the server alive messages, so disconnection is the
responsibility of the TCP stack.
And
ServerAliveInterval
Sets a timeout interval in seconds after which if no data has been received from the server, ssh(1) will
send a message through the encrypted channel to request a response from the server. The default is 0,
indicating that these messages will not be sent to the server, or 300 if the BatchMode option is set.
This option applies to protocol version 2 only. ProtocolKeepAlives and SetupTimeOut are Debian-specific
compatibility aliases for this option.
Also similar settings are available from the server side which are ClientAliveInterval and ClientAliveCountMax. These settings palced in /etc/ssh/sshd_config on Server side.
from man sshd_config:
ClientAliveCountMax
Sets the number of client alive messages (see below) which may be sent without sshd(8) receiving any mes‐
sages back from the client. If this threshold is reached while client alive messages are being sent,
sshd will disconnect the client, terminating the session. It is important to note that the use of client
alive messages is very different from TCPKeepAlive (below). The client alive messages are sent through
the encrypted channel and therefore will not be spoofable. The TCP keepalive option enabled by
TCPKeepAlive is spoofable. The client alive mechanism is valuable when the client or server depend on
knowing when a connection has become inactive.
The default value is 3. If ClientAliveInterval (see below) is set to 15, and ClientAliveCountMax is left
at the default, unresponsive SSH clients will be disconnected after approximately 45 seconds. This
option applies to protocol version 2 only.
And
ClientAliveInterval
Sets a timeout interval in seconds after which if no data has been received from the client, sshd(8) will
send a message through the encrypted channel to request a response from the client. The default is 0,
indicating that these messages will not be sent to the client. This option applies to protocol version 2
only.
Looks like your firewall (from different locations) are dropping the sessions due to inactivity.
I would try just like #slayedbylucifer stated something like this in your ~/.ssh/config
Host *
ServerAliveInterval 60

Heroku syslog drain : too many tcp sessions

I set a logging drain on my Heroku app, to send logs to my other server.
My rsyslogd well receive the logs from Heroku, but after few hours, rsyslog start to drop packets because too many TCP connections are opened:
Oct 18 06:28:17 localhost rsyslogd-2079: too many tcp sessions - dropping incoming request [try http://www.rsyslog.com/e/2079 ]
Oct 18 06:28:24 localhost rsyslogd-2079: too many tcp sessions - dropping incoming request [try http://www.rsyslog.com/e/2079 ]
Oct 18 06:28:24 localhost rsyslogd-2079: too many tcp sessions - dropping incoming request [try http://www.rsyslog.com/e/2079 ]
Oct 18 06:28:26 localhost rsyslogd-2079: too many tcp sessions - dropping incoming request [try http://www.rsyslog.com/e/2079 ]
[...]
I try to increase the maximum sessions allowed into the rsyslogd configuration (I set it to 1000, which is normally enough to handle everything).
Same issue, so I increase this value to 3000. I have fewer issues now, but I think 3000 max sessions is paticulary high.
$ModLoad imtcp
$InputTCPServerRun 514
$InputTCPMaxSessions 3000
Do you think there is another thing to do? Do I need to decrease this number?
Maybe there is something else to do to better deal with logs comming from Heroku LogPlex.

Resources