retrieve a specific section of a webpage - bash

I need to populate a list in a BASH script with data retrieved from a portion of a webpage. Currently, I have manually created a static list but I want the list contents to be generated dynamically. That way if new items get added to the page, subsequent runs of the script will reflect those new items.
source page: https://support.apple.com/en-us/HT210060
I would like to extract the section entitled "macOS, iOS, and tvOS" to wind up with a list like:
updateServers="appldnld.apple.com 80
gg.apple.com 80
gg.apple.com 443
gnf-mdn.apple.com 443
gnf-mr.apple.com 443
gs.apple.com 80
gs.apple.com 443
ig.apple.com 443
mesu.apple.com 80
mesu.apple.com 443
ns.itunes.apple.com 443
oscdn.apple.com 80
oscdn.apple.com 443
osrecovery.apple.com 80
osrecovery.apple.com 443
skl.apple.com 443
swcdn.apple.com 80
swdist.apple.com 443
swdownload.apple.com 80
swdownload.apple.com 443
swpost.apple.com 80
swscan.apple.com 443
updates-http.cdn-apple.com 80
updates.cdn-apple.com 443
xp.apple.com 443"
Ultimately I'd like to output each section of the page into its own separate list, but for now the portion above is my main concern.
Thank you all in advance. This is a great community.

A HTML parser is a better tool for this type tasks (e.g. Ruby's Nokogiri or Python's Beautifulsoup). For a pure-Bash solution you can use this script (assuming -P support for grep):
#!/bin/bash
wget -q https://support.apple.com/en-us/HT210060 -O- \
| \grep -ziP "(?s)<h3>macos.*?<h3>" \
| xargs -0 \
| \grep -P "<tr><td>|<td>[\d, ]+</td>" \
| sed 's:.*<td>\(.*\)</td>:\1:'
How it works:
wget with -O- downloads the website and sends it to standard output
grep -ziP "(?s)..." uses PCRE to make a multi-line search from one until the next one
the rest of the script gets the text inside host and port columns
Output:
$ script.sh
appldnld.apple.com
80
gg.apple.com
443, 80
...
80
updates.cdn-apple.com
443
xp.apple.com
443

Related

Unable to bind any program to IPv4 TCP port 80 on Mac [duplicate]

I have the following very simple docker-compose.yml, running on a Mac:
version: "3.7"
services:
apache:
image: httpd:2.4.41
ports:
- 80:80
I run docker-compose up, then I run this curl and Apache returns content:
/tmp/test $ curl -v http://localhost
* Trying ::1:80...
* TCP_NODELAY set
* Connected to localhost (::1) port 80 (#0)
> GET / HTTP/1.1
> Host: localhost
> User-Agent: curl/7.66.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Sat, 26 Oct 2019 18:30:03 GMT
< Server: Apache/2.4.41 (Unix)
< Last-Modified: Mon, 11 Jun 2007 18:53:14 GMT
< ETag: "2d-432a5e4a73a80"
< Accept-Ranges: bytes
< Content-Length: 45
< Content-Type: text/html
<
<html><body><h1>It works!</h1></body></html>
* Connection #0 to host localhost left intact
However, if I try to access the container using 127.0.0.1 instead of localhost, I get connection refused:
/tmp/test $ curl -v http://127.0.0.1
* Trying 127.0.0.1:80...
* TCP_NODELAY set
* Connection failed
* connect to 127.0.0.1 port 80 failed: Connection refused
* Failed to connect to 127.0.0.1 port 80: Connection refused
* Closing connection 0
curl: (7) Failed to connect to 127.0.0.1 port 80: Connection refused
Localhost does point to 127.0.0.1:
/tmp/test $ ping localhost
PING localhost (127.0.0.1): 56 data bytes
And netstat shows all local IP addresses port 80 to be forwarded:
/tmp/test $ netstat -tna | grep 80
...
tcp46 0 0 *.80 *.* LISTEN
...
I came to this actually trying to access the container using a custom domain I had on my /etc/hosts file pointing to 127.0.0.1. I thought there was something wrong with that domain name, but then I tried 127.0.0.1 and didn't work either, so I'm concluding there is something very basic about docker I'm not doing right.
Why is curl http://localhost working but curl http://127.0.0.1 is not?
UPDATE
It seems localhost is resolving to IPv6 ::1, so port forwarding seems to be working on IPv6 but not IPv4 addresses. Does that make any sense?
UPDATE 2
I wasn't able to fix it, but pointing my domain name to ::1 instead of 127.0.0.1 in my /etc/hosts serves as a workaround for the time being.
UPDATE 3
8 months later I bumped into the same issue and found my own question here, still unanswered. But this time I can't apply the same workaround, because I need to bind the port forwarding to my IPv4 address so it can be accessed from other hosts.
Found the culprit: pfctl
AFAIK, pfctl is not supposed to run automatically but my /System/Library/LaunchDaemons/com.apple.pfctl.plist said otherwise.
The Packet Filtering was configured to redirect all incoming traffic on port 80 to 8080, and 443 to 8443. And this is done without any process actually listening to port 80 and 443, that's why lsof and netstat wouldn't show anything,.
/Library/LaunchDaemons/it.winged.httpdfwd.plist has the following
<key>ProgramArguments</key>
<array>
<string>sh</string>
<string>-c</string>
<string>echo "rdr pass proto tcp from any to any port {80,8080} -> 127.0.0.1 port 8080" | pfctl -a "com.apple/260.HttpFwdFirewall" -Ef - && echo "rdr pass proto tcp from any to any port {443,8443} -> 127.0.0.1 port 8443" | pfctl -a "com.apple/261.HttpFwdFirewall" -Ef - && sysctl -w net.inet.ip.forwarding=1</string>
</array>
<key>RunAtLoad</key>
The solution was simply to listen on ports 8080 and 8443. All requests to ports 80 and 443 are now being redirected transparently.
While debugging this I found countless open questions about similar problems without answers. I hope this helps somebody.

display open ports grouped by process

given a netstat output, how can i display the selected open ports grouped by process?
what i got so far:
:~# netstat -tnlp | awk '/25|80|443|465|636|993/ {proc=split($7,pr,"/"); port=split($4,po,":"); print pr[2], po[port]}'
haproxy 636
haproxy 993
haproxy 993
haproxy 465
haproxy 465
exim4 25
apache2 80
exim4 25
apache2 443
desired output (in one line):
apache2 (80 443), exim4 (25), haproxy (465 636 993)
please note:
i have duplicated lines because they listen on different IPs, but i only need one (sort -u is ok)
if possible, id like to sort by process and then by port
the main goal is to have this single line displayed to the user on ssh logon, using motd (i got this part covered)
netstat -tnlp|awk '/25|80|443|465|636|993/ {proc=split($7,pr,"/"); port=split($4,po,":"); print pr[2], po[port]}'|sort|uniq|awk '{a[$1]=a[$1](" "$2" "$3)}END{for (i in a) printf "%s (%s),",i,a[i]}'
try this, Later addition
sort|uniq|awk '{a[$1]=a[$1](" "$2" "$3)}END{for (i in a) printf "%s (%s),",i,a[i]}'

Spring Boot port forwarding 80 to 8080

I recently created a spring boot app and launched it to my remote server. I am running centos7 and I have modified iptables to send port 80 to 8080 but that seemed to do nothing. I also currently have this in a .htaccess file to make it work:
RewriteCond %{SERVER_PORT} 80$ [NC]
RewriteRule index.html$ http://%{HTTP_HOST}:8080/ [P,S=1]
RewriteRule (.*) http://%{HTTP_HOST}:8080%{REQUEST_URI} [P]
My problem with the current solution is that It works great for the base url blah.com but any subsequent link of of that page will have blah.com:8080/page.html. Thus how do I better manage URL's that are displayed to the client so they dont have the port.
I think the real problem is hepsia is running and appears to have installed httpd on port 80 already. Does anyonke know where i can add a veirtualhost to hepsia's implementation of httpd?
Thanks in advance for any help
The above answer will not work unless your application is running as root on many Linux distributions. The standard way to bypass this is to run your application behind a webserver (which runs on port 80), and forward those web server requests to your app.
If this is overkill for your purpose you can set up iptables routing / redirect.
First make sure your ports are open
sudo iptables -I INPUT 1 -p tcp --dport 8080 -j ACCEPT
sudo iptables -I INPUT 1 -p tcp --dport 80 -j ACCEPT
Then the redirect as follows
sudo iptables -A PREROUTING -t nat -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 8080
Change the Boot app to listen on port 80.
In application.properties...
server.port=80
Good luck.

Splitting extra columns in a text file into separate lines (keeping first column)

I'm going to try to describe my problem and my end goal as best as I can, here it goes:
I have a script that fetches AWS ELB information (elb name + ports that's associated with a specific certifican arn).
So, in the end I have a text file (I call it elb_ports) and it looks something like this:
ccds-lb 636
cf-router 443 4443
dev-cf-router 443 4443
eng-jenkins-monit 443
gitlab-lb 443
gitlab-mattermost-elb 443
jenkins-np-elb 443
saml 443
uaa 443
I have another script that comes after that which I want it to go through that elb_ports file and replace the certificates with a new one, but according to Amazon's documentation: It says in order to replace the certificates, I need two things from that elb_ports file. The load balancer name and the load balancer port.
So basically their command looks like this
aws elb set-load-balancer-listener-ssl-certificate \
--load-balancer-name my-load-balancer \
--load-balancer-port 443 \
--ssl-certificate-id arn:aws:iam::123456789012:server-certificate/my-new-certificate
I want to be able to loop through the file and execute the command above to each elb and port, but my problem is with the elbs that has multiple ports associated with the cert like: cf-router 443 4443 for example.
So my idea was to split that into two lines, so like this:
cf-router 443
cf-router 4443
But I'm not sure how to add cf-router (for example) to the ports that come after the first one (there could be more than two ports using the same cert).
I hope I was able to explain my problem and end goal clearly, if this isn't a good method, I'm open to suggestions also.
EDIT: Perhaps something like this is beneficial, but not sure how to tailor it to my needs.. Like put each line in an array and the space as a delimiter and then loop through each line putting arr(1) (load balancer name) and then the load balancer port, but not sure how to count and go through >arr(2) in bash.
To split out your extra columns into separate lines:
while read -r lb_name lb_ports_str; do ## split line into lb name and port list string
read -r -a lb_ports <<<"$lb_ports_str" ## split out port list string into an array
for port in "${lb_ports[#]}"; do ## iterate through that array
printf '%s %s\n' "$lb_name" "$port" ## handle each port separately
done
done <elb_ports ## reading lines from elb_ports
Of course, that printf could be any other line referring to $lb_name and $port -- meaning you could potentially run your code that's installing new certificates here.

execute shell script in ruby

I want to execute the following shell script
system('echo "
rdr pass on lo0 inet proto tcp from any to 192.168.99.1 port 80 -> 192.168.99.1 port 8080
rdr pass on lo0 inet proto tcp from any to 192.168.99.1 port 443 -> 192.168.99.1 port 4443
" | sudo pfctl -ef - > /dev/null 2>&1; echo "==> Fowarding Ports: 80 -> 8080, 443 -> 4443 & Enabling pf"'
)
This works fine, i now want to pass the IP address loaded from a YAML file, i tried the following
config.yaml
configs:
use: 'home'
office:
public_ip: '192.168.99.2'
home:
public_ip: '192.168.99.1'
Vagrantfile
require 'yaml'
current_dir = File.dirname(File.expand_path(__FILE__))
configs = YAML.load_file("#{current_dir}/config.yaml")
vagrant_config = configs['configs'][configs['configs']['use']]
system('echo "
rdr pass on lo0 inet proto tcp from any to '+vagrant_config['public_ip']+' port 80 -> '+vagrant_config['public_ip']+' port 8080
rdr pass on lo0 inet proto tcp from any to '+vagrant_config['public_ip']+' port 443 -> '+vagrant_config['public_ip']+' port 4443
" | sudo pfctl -ef - > /dev/null 2>&1; echo "==> Fowarding Ports: 80 -> 8080, 443 -> 4443 & Enabling pf"'
)
The second method does not work, nor it shows any error, can someone point me to the right direction, what i want is to read public_ip dynamically from config file or variable
Thanks
UPDATE 1
I get the following output
pfctl: Use of -f option, could result in flushing of rules
present in the main ruleset added by the system at startup.
See /etc/pf.conf for further details.
No ALTQ support in kernel
ALTQ related functions disabled
pfctl: pf already enabled
What can be possibly wrong?
For troubleshooting purposes, it would be wise to output the command you're going to run prior to sending it out to system.
cmd = 'echo "
rdr pass on lo0 inet proto tcp from any to '+vagrant_config['public_ip']+' port 80 -> '+vagrant_config['public_ip']+' port 8080
rdr pass on lo0 inet proto tcp from any to '+vagrant_config['public_ip']+' port 443 -> '+vagrant_config['public_ip']+' port 4443
" | sudo pfctl -ef - > /dev/null 2>&1; echo "==> Fowarding Ports: 80 -> 8080, 443 -> 4443 & Enabling pf"'
puts "Command to run:\n\n#{cmd}"
system( cmd )
Then, it would be wise to make the output from the system command visible. To make sure you get this feedback, I suggest you replace
sudo pfctl -ef - > /dev/null 2>&1
with (adding '-v' for more verbose output - pfctl man page)
sudo pfctl -efv -
and then look for the output and/or error messages.
Then, once the bugs are sorted out, you can put it back into stealthy, quiet mode :D
Also, since you are running with sudo you'll need to make sure the shell you're running within has sudo privileges and also make sure you're not being prompted for a password unknowingly.

Resources