Bash script to test status of site - bash

I have a script for testing the status of a site, to then run a command if it is offline. However, I've since realised because the site is proxied through Cloudflare, it always shows the 200 status, even if the site is offline. So I need to come up with another approach. I tried testing the site using curl and HEAD. Both get wrong response (from Cloudflare).
What I have found is that HTTPie command gets the response I need. Although only when I use the -h option (I have no idea why that makes a difference, since visually the output looks identical to when I don't use -h).
Assuming this is an okay way to go about reaching my aim ... I'd like to know how I can test if a certain string appears more than 0 times.
The string is location: https:/// (with three forward slashes).
The command I use to get the header info from the actual site (and not simply from what Cloudflare is dishing up) is, http -h https://example.com/.
I am able to test for the string using, http -h https://example.com | grep -c 'location: https:///'. This will output 1 when the string exists.
What I now want to do is run a command if the output is 1. But this is where I need help. My bash skills are minimal, and I am going about it the wrong way. What I came up with (which doesn't work) is:
#!/bin/bash
STR=$(http -h https://example.com/)
if (( $(grep -c 'location: https:///' $STR) != 1 )); then
echo "Site is UP"
exit
else
echo "Site is DOWN"
sudo wo clean --all && sudo wo stack reload --all
fi
Please explain to me why it's not working, and how to do this correctly.
Thank you.
ADDITIONS:
What the script is testing for is an odd situation in which the site suddenly starts redirecting to, literally, https:///. This obviously causes the site to be down. Safari, for instance, takes this as a redirection to localhost. Chrome simply spits the dummy with a redirect error, ERR_INVALID_REDIRECT.
When this is occurring, the headers from the site are:
HTTP/2 301
server: nginx
date: Thu, 12 May 2022 10:19:58 GMT
content-type: text/html; charset=UTF-8
content-length: 0
location: https:///
x-redirect-by: WordPress
x-powered-by: WordOps
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
referrer-policy: no-referrer, strict-origin-when-cross-origin
x-download-options: noopen
x-srcache-fetch-status: HIT
x-srcache-store-status: BYPASS
I choose to test for the string location: https:/// since that's the most specific (and unique) to this issue. Could also test for HTTP/2 301.
The intention of the script is to remedy the problem when it occurs, as a temporary solution whilst I figure out what's causing Wordpress to generate such an odd redirect. Also in case it happens whilst I am not at work, or sleeping. :-) I will have a cron job running the script every 5 mins, so at least the site is never down for longer than that.

grep reads a file, not a string. Also, you need to quote strings, especially if they might contain whitespace or shell metacharacters.
More tantentially, grep -q is the usual way to check if a string exists at least once. Perhaps see also Why is testing “$?” to see if a command succeeded or not, an anti-pattern?
I can see no reason to save the string in a variable which you only examine once; though if you want to (for debugging reasons etc) probably avoid upper case variables. See also Correct Bash and shell script variable capitalization
The parts which should happen unconditionally should be outside the condition, rather than repeated in both branches.
Nothing here is Bash-specific, so I changed the shebang to use sh instead, which is more portable and sometimes faster. Perhaps see also Difference between sh and bash
#!/bin/sh
if http -h https://example.com/ | grep -q 'location: https:///'
then
echo "Site is UP"
else
echo "Site is DOWN"
fi
sudo wo clean --all && sudo wo stack reload --all
For basic diagnostics, probably try http://shellcheck.net/ before asking for human assistance.

Related

Bash commands putting out extra information which results into issues with scripts

Okay, hopefully I can explain this correctly as I have no idea what's causing this or how to resolve this.
For some reason bash commands (on a CentOS 6.x server) are displaying more information than "normally" and that causes issues with certain scripts. I have no clue if there is a name for this, but hopefully someone knows a solution for this.
First example.
Correct / good server:
[root#goodserver ~]# vzctl enter 3567
entered into CT 3567
[root#example /]#
(this is the correct behaviour)
Incorrect / bad server:
[root#badserver /]# vzctl enter 3127
Entering CT
entered into CT 3127
Open /dev/pts/0
[root#example /]#
With the "bad" server it will display more information as usual, like:
Entering CT
Open /dev/pts/0
It's like it parsing extra information on what it's doing.
Ofcourse the above is purely something cosmetic, however with several bash scripts we use, these issues are really issues.
A part of the script we use, uses the following command (there are more, but this is mainly a example of what's wrong):
DOMAIN=`vzctl exec $VEID 'hostname -d'`
The result of the above information is parsed in /etc/named.conf.
On the GOOD server it would be added in the named.conf like this:
zone "example.com" {
type master;
file "example.com";
allow-transfer {
200.190.100.10;
200.190.101.10;
common-allow-transfer;
};
};
The above is correct.
On the BAD server it would be added in the named.conf like this:
zone "Executing command: hostname -d
example.com" {
type master;
file "Executing command: hostname -d
example.com";
allow-transfer {
200.190.100.10;
200.190.101.10;
common-allow-transfer;
};
};
So it's add stuff of the action it does, in this example "Executing command: hostname -d"
Another example here when I run the command on a good server and on the bad server.
Bad server:
[root#bad-server /]# DOMAIN=`vzctl exec 3333 'hostname -d'`
[root#bad-server /]# echo $DOMAIN
Executing command: hostname -d example.com
Good server:
[root#good-server ~]# DOMAIN=`vzctl exec 4444 'hostname -d'`
[root#good-server ~]# echo $DOMAIN
example.com
My knowledge is limited, but I have tried several things checking rsyslog and the grub.conf, but nothing seems out of the ordinary.
I have no clue why it's displaying the extra information.
Probably it's something simple / stupid, but I have been trying to solve this for hours now and I really have no clue...
So any help is really appreciated.
Added information:
Both servers use: kernel.printk = 7 4 1 7
(I don't know if that's useful)
Well (thanks to Aaron for pointing me in the right direction) I finally found the little culprit which was causing all the issues I experienced with this script (which worked for every other server, so no need to change that obviously).
The issues were caused by the VERBOSE leven set in vz.conf (located in /etc/vz/ directory). There is an option in there called "VERBOSE" and in my case it was set to 3.
According to OpenVZ's website it does the following:
Increments logging level up from the default. Can be used multiple times.
Default value is set to the value of VERBOSE parameter in the global
configuration file vz.conf(5), or to 0 if not set by VERBOSE parameter.
After I changed VERBOSE=3 to VERBOSE=0 my script worked fine once again (as it did for every other server). :-)
So a big shoutout to Aaron for pointing me in the right direction. The answer is easy when you know where to look!
Sorry to say, but I am kinda disappointed by ndim's reaction. This is the 2nd time he was very unhelpful and rude in his response after that. He clearly didn't read the issue I posted correctly. Oh well.
I would make sure to properly parse the output of the command. In this case, we are only interested in lines of the form
entered into CT 12345
One way of doing this would be to pipe everything through sed and having sed print only the number when the line looks as above (untested, and I always forget which braces/brackets/parens need a backslash in front of them):
whateverthecommand | sed -n 's/^entered into CT ([0-9]{1,})$/\1/p'

What is the fastest way to perform a HTTP request and check for 404?

Recently I needed to check for a huge list of filenames if they exist on a server. I did this by running a for loop which tried to wget each of those files. That was efficient enough, but took about 30 minutes in this case. I wonder if there is a faster way to check whether a file exists or not (since wget is for downloading files and not performing thousands of requests).
I don't know if that information is relevant, but it's an Apache server.
Curl would be the best option in a for loop and here is a straight forward simple way, run this in your forloop
curl -I --silent http://www.yoururl/linktodetect | grep -m 1 -c 404
What this simply does is check the http response header for a 404 returned on the link and if its detected as a missing file/link throwing a 404 then the command line output will display you a number 1; otherwise, if the file/link is valid and does not return a 404 then the command line output will display you a number 0.

Curl range not working(downloads entire file)

curl -v -r 0-500 http://somefile -o localfile
It should download just the first 501 bytes, no? Instead, it just downloads the entire thing. All 67 megabytes. Thanks curl! Could my companies proxy servers be blocking this feature somehow? I am skeptical about that, since the downloads themselves do work, just not the range feature. Am I missing something?
As a client you could always abort the download when you have received what you want.
By using head, you will be able to limit the download to 500 bytes, even if the server does not accept the range-header
curl -v -r 0-500 http://somefile |head -c 500 > localfile
It should download just the first 501 bytes, no?
It depends on the server. From man curl:
You should also be aware that many HTTP/1.1 servers do not have this feature enabled, so that when you attempt to get a range, you'll instead get the whole document.
As you can see in the response from the server, it's using HTTP/1.1. So it's not surprising that the range feature is not supported at the server side.
Please use the following command
curl -H "range: bytes=354-500" -O http://example.com/file.extension

How to verify AB responses?

Is there a way to make sure that AB gets proper responses from server? For example:
To force it to output the response of a single request to STDOUT OR
To ask it to check that some text fragment is included into the response body
I want to make sure that authentication worked properly and i am measuring response time of the target page, not the login form.
Currently I just replace ab -n 100 -c 1 -C "$MY_COOKIE" $MY_REQUEST with curl -b "$MY_COOKIE" $MY_REQUEST | lynx -stdin .
If it's not possible, is there an alternative more comprehensive tool that can do that?
You can use the -v option as listed in the man doc:
-v verbosity
Set verbosity level - 4 and above prints information on headers, 3 and above prints response codes (404, 200, etc.), 2 and above prints warnings and info.
https://httpd.apache.org/docs/2.4/programs/ab.html
So it would be:
ab -n 100 -c 1 -C "$MY_COOKIE" -v 4 $MY_REQUEST
This will spit out the response headers and HTML content. The 3 value will be enough to check for a redirect header.
I didn't try piping it to Lynx but grep worked fine.
Apache Benchmark is good for a cursory glance at your system but is not very sophisticated. I am currently attempting to tune a web service and am finding that AB does not measure complete response time when considering the transfer of the body. Also as you mention you can not verify what is returned.
My current recommendation is Apache JMeter. http://jmeter.apache.org/
I am having much better success with it. You may find the Response Assertion useful for your situation. http://jmeter.apache.org/usermanual/component_reference.html#Response_Assertion

curl -i and curl -I returning different results

My understanding was that curl -i and curl -I would return virtually the same results except that curl -i would return the standard output along with the header and curl -I would only return the header -- the header of both being the same. We've been doing some gzip and un-gzipped testing with Varnish and stumbled upon the oddity that curl -i shows X-Cache: HIT but curl -I returns X-Cache: MISS! How this is possible, I am unsure and that is precisely my question in this post.
Here are some more details that may or may not make a difference:
The URL is usually SSL enforced (https) but both HTTP and HTTPS have been tested to receive same results
The results are consistent
Is Varnish Running site says "Yes! Sort of"
curl sends different HTTP requests to the server (or Varnish in this case) when you use the -I option. Normally, curl will send a GET request, but when you specify -I, it sends HEAD instead (essentially telling the server to just send the header, not the actual content). I'm not particularly familiar with Varnish, but it appears to normally cache both GET and HEAD requests -- but in your case it might be configured to do something different, or the backend server may be triggering a difference... In any case, I'm pretty sure it's GET vs. HEAD that's making the cache respond differently with -i vs. -I.
did you check in different orders?
see: http://anothersysadmin.wordpress.com/2008/04/22/x-cache-and-x-cache-lookup-headers-explained/ for some details on X-Cache

Resources