How to use Curl to also output first 20 characters from the response? - bash

I'm using curl to retrieve the http_code size_header redirect_url and Website Title with:
#!/bin/bash
FILE="$1"
while read LINE; do
curl -H 'Cache-Control: no-cache' -i -s -k -o >(perl -l -0777 -ne 'print $1 if /<title.*?>\s*(.*?)\s*<\/title/si') --silent --max-time 2 --write-out '%{http_code} %{size_header} %{redirect_url} ' "$LINE"
echo " $LINE"
done < ${FILE}
but I like to also retrieve the first 20 characters from the response to have more information.
The idea is to get this output
%{http_code} %{size_header} %{redirect_url} $website_title $website_first_20_bytes
I only need to add the $website_first_20_bytes to the output. How can I achieve this?
PS: No the first 20 characters from header response. Only the source.

So you probably mean something like this then (I added a bunch of newlines etc. to the output which you can trim as you please):
#!/bin/bash
FILE="$1"
while read -r LINE; do
# read the response to a variable
response=$(curl -H 'Cache-Control: no-cache' -s -k --max-time 2 --write-out '%{http_code} %{size_header} %{redirect_url} ' "$LINE")
# get the title
title=$(sed -n 's/.*<title>\(.*\)<\/title>.*/\1/ip;T;q'<<<"$response")
# read the write-out from the last line
read -r http_code size_header redirect_url < <(tail -n 1 <<<"$response")
printf "Status: %s\n" "$http_code"
printf "Size: %s\n" "$size_header"
printf "Redirect-url: %s\n" "$redirect_url"
printf "Url: %s\n" "$LINE"
printf "Title: %s" "$title"
# -c 20 only shows the 20 first chars from response
printf "Body: %s" "$(head -c 20 <<<"$response")"
done < ${FILE}

Related

How to retrieve the real redirect location header with Curl? without using {redirect_url}

I realized that Curl {redirect_url} does not always show the same redirect URL. For example if the URL header isLocation: https:/\example.com this will redirect to https:/\example.com but curl {redirect_url} shows redirect_url: https://host-domain.com/https:/\example.com and it won't display the response real location header. (I like to see the real location: result.)
This is the BASH I'm working with:
#!/bin/bash
# Usage: urls-checker.sh domains.txt
FILE="$1"
while read -r LINE; do
# read the response to a variable
response=$(curl -H 'Cache-Control: no-cache' -s -k --max-time 2 --write-out '%{http_code} %{size_header} %{redirect_url} ' "$LINE")
# get the title
title=$(sed -n 's/.*<title>\(.*\)<\/title>.*/\1/ip;T;q'<<<"$response")
# read the write-out from the last line
read -r http_code size_header redirect_url < <(tail -n 1 <<<"$response")
printf "***Url: %s\n\n" "$LINE"
printf "Status: %s\n\n" "$http_code"
printf "Size: %s\n\n" "$size_header"
printf "Redirect-url: %s\n\n" "$redirect_url"
printf "Title: %s\n\n" "$title"
# -c 20 only shows the 20 first chars from response
printf "Body: %s\n\n" "$(head -c 100 <<<"$response")"
done < "${FILE}"
How can I printf "Redirect-url: the original requested location: header without having to use redirect_url?
To read the exact Location header field value, as returned by the server, you can use the -i/--include option, in combination with grep.
For example:
$ curl 'http://httpbin.org/redirect-to?url=http:/\example.com' -si | grep -oP 'Location: \K.*'
http:/\example.com
Or, if you want to read all headers, content and the --write-out variables line (according to your script):
response=$(curl -H 'Cache-Control: no-cache' -s -i -k --max-time 2 --write-out '%{http_code} %{size_header} %{redirect_url} ' "$url")
# break the response in parts
headers=$(sed -n '1,/^\r$/p' <<<"$response")
content=$(sed -e '1,/^\r$/d' -e '$d' <<<"$response")
read -r http_code size_header redirect_url < <(tail -n1 <<<"$response")
# get the real Location
location=$(grep -oP 'Location: \K.*' <<<"$headers")
Fully integrated in your script, this looks like:
#!/bin/bash
# Usage: urls-checker.sh domains.txt
file="$1"
while read -r url; do
# read the response to a variable
response=$(curl -H 'Cache-Control: no-cache' -s -i -k --max-time 2 --write-out '%{http_code} %{size_header} %{redirect_url} ' "$url")
# break the response in parts
headers=$(sed -n '1,/^\r$/p' <<<"$response")
content=$(sed -e '1,/^\r$/d' -e '$d' <<<"$response")
read -r http_code size_header redirect_url < <(tail -n1 <<<"$response")
# get the real Location
location=$(grep -oP 'Location: \K.*' <<<"$headers")
# get the title
title=$(sed -n 's/.*<title>\(.*\)<\/title>.*/\1/ip;T;q'<<<"$content")
printf "***Url: %s\n\n" "$url"
printf "Status: %s\n\n" "$http_code"
printf "Size: %s\n\n" "$size_header"
printf "Redirect-url: %s\n\n" "$location"
printf "Title: %s\n\n" "$title"
printf "Body: %s\n\n" "$(head -c 100 <<<"$content")"
done < "$file"
According to #randomir answer and since I was only need raw redirect URL I use this command on my batch
curl -w "%{redirect_url}" -o /dev/null -s "https://stackoverflow.com/q/46507336/3019002"
https:/\example.com is not a legal URL(*). The fact that this works in browsers in an abomination (that I've fought against) and curl doesn't. %{redirect_url} shows exactly the URL curl would redirect to...
A URL should use to forward slashes, so the above should look like http://example.com.
(*) = I refuse to accept the WHATWG "definition".

While read loop in parallel

I have the following while loop in a bash script however I would like to run these in parallel (an failing) can anyone point me in to the right direction please?
Thanks!
while read LINE; do
RAYID=$(echo "$LINE" | jq -r .rayId)
LINE="$(echo $LINE | sed 's/\([[:digit:]]\{13\}\)[[:digit:]]\{6\}/\1/g')"
args=( -XPUT "localhost:9200/els/logs/$RAYID?pipeline=geoip-els" -H "Content-Type: application/json" -d "$LINE" )
curl "${args[#]}" > /dev/null 2>&1
done <<< "$ELS_LOGS"
** EDITED
Additionally to what #TomFenech stated which is correct, I want to add that it would be also nice if you add wait after done, so the script won't finish its execution, until all tasks are completed.
function doSomething(){
RAYID=$(echo "$1" | jq -r .rayId )
LINE="$(echo $1 | sed 's/\([[:digit:]]\{13\}\)[[:digit:]]\{6\}/\1/g' )"
args=( -XPUT "localhost:9200/els/logs/$RAYID?pipeline=geoip-els" -H "Content-Type: application/json" -d "$1" )
curl "${args[#]}" > /dev/null 2>&1
}
while read LINE; do
doSomething $LINE &
done <<< "$ELS_LOGS"
wait
Regards!

Curl echo Server response Bash

I'm trying to create a bash script that check url from list status code and echo server name from header. I'm actually new.
#!/bin/bash
while read LINE; do
curl -o /dev/null --silent --head --write-out '%{http_code}' "$LINE"
echo " $LINE" &
curl -I /dev/null --silent --head | grep -Fi Server "$SERVER"
echo " $SERVER"
done < dominios-https
I get the following output
301 http://example.com
grep: : No such file or directory
1) while read LINE can not use last line if text file not ended with new line.
2) You don't set "$SERVER" anywhere, and grep say it
3) Not all servers return "Server:" in headers
try it:
scriptDir=$( dirname -- "$0" )
for siteUrl in $( < "$scriptDir/myUrl.txt" )
do
if [[ -z "$siteUrl" ]]; then break; fi # break line if him empty
httpCode=$( curl -I -o /dev/null --silent --head --write-out '%{http_code}' "$siteUrl" )
echo "HTTP_CODE = $httpCode"
headServer=$( curl -I --silent --head "$siteUrl" | grep "Server" | awk '{print $2}' )
echo "Server header = $headServer"
done

Using bash and curl to do multiple posts

This is what I have done so far;
#!/bin/bash
url="asdf.com/check "
for i in $(cat query.txt); do
content=$"curl --data "email=$i" "$url")"
echo "$content" >> output.txt
done
You can use your loop like this:
#!/bin/bash
url="asdf.com/check"
while read -r line; do
curl --data "email=$line" "$url"
done < query.txt > output.txt
You can use output redirection only once just after the done as shown.

BASH output column formatting

First time posting. HELLO WORLD. Working on my first script that just simply checks if a list of my websites are online and then returns the HTTP code and the amount of time it took to return that to another file on my desktop.
-- THIS SCRIPT WILL BE RUNNING ON MAC OSX --
I would like to amend my script so that it formats its output into 3 neat columns.
currently
#!/bin/bash
file="/Users/USER12/Desktop/url-list.txt"
printf "" > /Users/USER12/Desktop/url-results.txt
while read line
do
printf "$line" >> /Users/USER12/Desktop/url-results.txt
printf "\t\t\t\t" >> /Users/USER12/Desktop/url-results.txt
curl -o /dev/null --silent --head --write-out '%{http_code} %{time_total}' "$line" >> /Users/USER12/Desktop/url-results.txt
printf "\n" >> /Users/USER12/Desktop/url-results.txt
done <"$file"
which outputs in the following format
google.com 200 0.389
facebook.com 200 0.511
abnormallyLongDomain.com 200 0.786
but i would like to format into neat aligned columns for easy reading
DOMAIN_NAME HTTP_CODE RESPONSE_TIME
google.com 200 0.389
facebook.com 200 0.511
abnormallyLongDomain.com 200 0.486
Thanks for the help everyone!!
column is very nice. You are, however, already using printf which gives you fine control over the output format. Using printf's features also allows the code to be somewhat simplified:
#!/bin/bash
file="/Users/USER12/Desktop/url-list.txt"
log="/Users/USER12/Desktop/url-results.txt"
fmt="%-25s%-12s%-12s\n"
printf "$fmt" DOMAIN_NAME HTTP_CODE RESPONSE_TIME > "$log"
while read line
do
read code time < <(curl -o /dev/null --silent --head --write-out '%{http_code} %{time_total}' "$line")
printf "$fmt" "$line" "$code" "$time" >> "$log"
done <"$file"
With the above defined format, the output looks like:
DOMAIN_NAME HTTP_CODE RESPONSE_TIME
google.com 301 0.305
facebook.com 301 0.415
abnormallyLongDomain.com 000 0.000
You can fine-tune the output format, such as spacing or alignment, by changing the fmt variable in the script.
Further Refinements
The above code opens and closes the log file with each loop. This can be avoided as Charles Duffy suggests, simply by using exec to redirect stdout to the log file before the first printf statement:
#!/bin/bash
file="/Users/USER12/Desktop/url-list.txt"
exec >"/Users/USER12/Desktop/url-results.txt"
fmt="%-25s%-12s%-12s\n"
printf "$fmt" DOMAIN_NAME HTTP_CODE RESPONSE_TIME
while read line
do
read code time < <(curl -o /dev/null --silent --head --write-out '%{http_code} %{time_total}' "$line")
printf "$fmt" "$line" "$code" "$time"
done <"$file"
Alternatively, as Chepner suggests, the print statements can be grouped:
#!/bin/bash
file="/Users/USER12/Desktop/url-list.txt"
fmt="%-25s%-12s%-12s\n"
{
printf "$fmt" DOMAIN_NAME HTTP_CODE RESPONSE_TIME
while read line
do
read code time < <(curl -o /dev/null --silent --head --write-out '%{http_code} %{time_total}' "$line")
printf "$fmt" "$line" "$code" "$time"
done <"$file"
} >"/Users/USER12/Desktop/url-results.txt"
An advantage of grouping is that, after the group, stdout is automatically restored to its normal value.
Shortened a bit
#!/bin/bash
file="./url.txt"
fmt="%s\t%s\t%s\n"
( printf "$fmt" "DOMAIN_NAME" "HTTP_CODE" "RESPONSE_TIME"
while read -r line
do
printf "$fmt" "$line" $(curl -o /dev/null --silent --head --write-out '%{http_code} %{time_total}' "$line")
done <"$file" ) | column -t > ./out.txt
Don't need redirect every printf but you can enclose the part of your script into (...) and run it in an subshell a redirect it's output. Print every field separated with one tab and use the column command to format it nicely.
Anyway, usually is better don't put filenames (nor headers) into the script and reduce it to
#!/bin/bash
while read -r line
do
printf "%s\t%s\t%s\n" "$line" $(curl -o /dev/null --silent --head --write-out '%{http_code} %{time_total}' "$line")
done | column -t
and use it like:
myscript.sh < url-list.txt >result.txt
this allows you use your script in pipes, like:
something_produces_urls | myscript.sh | grep 200 > somewhere.txt

Resources