What is the fastest way to perform a HTTP request and check for 404? - bash

Recently I needed to check for a huge list of filenames if they exist on a server. I did this by running a for loop which tried to wget each of those files. That was efficient enough, but took about 30 minutes in this case. I wonder if there is a faster way to check whether a file exists or not (since wget is for downloading files and not performing thousands of requests).
I don't know if that information is relevant, but it's an Apache server.

Curl would be the best option in a for loop and here is a straight forward simple way, run this in your forloop
curl -I --silent http://www.yoururl/linktodetect | grep -m 1 -c 404
What this simply does is check the http response header for a 404 returned on the link and if its detected as a missing file/link throwing a 404 then the command line output will display you a number 1; otherwise, if the file/link is valid and does not return a 404 then the command line output will display you a number 0.

Related

Parse download speed from wget output in terminal

I have the following command
sudo wget --output-document=/dev/null http://speedtest.pixelwolf.ch which outputs
--2016-03-27 17:15:47-- http://speedtest.pixelwolf.ch/
Resolving speedtest.pixelwolf.ch (speedtest.pixelwolf.ch)... 178.63.18.88, 2a02:418:3102::6
Connecting to speedtest.pixelwolf.ch (speedtest.pixelwolf.ch) | 178.63.18.88|:80... connected.
HTTP Request sent, awaiting response... 200 OK
Length: 85 [text/html]
Saving to: `/dev/null`
100%[======================>]85 --.-K/s in 0s
2016-03-27 17:15:47 (8.79 MB/s) - `dev/null` saved [85/85]
I'd like to be able to parse the (8.79 MB/s) from the last line and store this in a file (or any other way I can get this into a local PHP file easily), I tried to store the full output by changing my command to --output-document=/dev/speedtest however this just saved "Could not reach website" in the file and not the terminal output of the command.
Not quite sure where to start with this, so any help would be awesome.
Not sure if it helps, but my intention is for this stored value (8.79) in this instance to be read by a PHP file and handled there, every 30 seconds which I'll achieve by: while true; do (run speed test and save speed variable to a file cmd); php handleSpeedTest.php; sleep 5; done where handleSpeedTest.php will be able to read that stored value and handle it accordingly.
I changed the URL to one that works. Redirected stderr onto stdout. Used grep --only-matching (-o) and a regex.
sudo wget -O /dev/null http://www.google.com 2>&1 | grep -o '\([0-9.]\+ [KM]B/s\)'

Curl range not working(downloads entire file)

curl -v -r 0-500 http://somefile -o localfile
It should download just the first 501 bytes, no? Instead, it just downloads the entire thing. All 67 megabytes. Thanks curl! Could my companies proxy servers be blocking this feature somehow? I am skeptical about that, since the downloads themselves do work, just not the range feature. Am I missing something?
As a client you could always abort the download when you have received what you want.
By using head, you will be able to limit the download to 500 bytes, even if the server does not accept the range-header
curl -v -r 0-500 http://somefile |head -c 500 > localfile
It should download just the first 501 bytes, no?
It depends on the server. From man curl:
You should also be aware that many HTTP/1.1 servers do not have this feature enabled, so that when you attempt to get a range, you'll instead get the whole document.
As you can see in the response from the server, it's using HTTP/1.1. So it's not surprising that the range feature is not supported at the server side.
Please use the following command
curl -H "range: bytes=354-500" -O http://example.com/file.extension

How to verify AB responses?

Is there a way to make sure that AB gets proper responses from server? For example:
To force it to output the response of a single request to STDOUT OR
To ask it to check that some text fragment is included into the response body
I want to make sure that authentication worked properly and i am measuring response time of the target page, not the login form.
Currently I just replace ab -n 100 -c 1 -C "$MY_COOKIE" $MY_REQUEST with curl -b "$MY_COOKIE" $MY_REQUEST | lynx -stdin .
If it's not possible, is there an alternative more comprehensive tool that can do that?
You can use the -v option as listed in the man doc:
-v verbosity
Set verbosity level - 4 and above prints information on headers, 3 and above prints response codes (404, 200, etc.), 2 and above prints warnings and info.
https://httpd.apache.org/docs/2.4/programs/ab.html
So it would be:
ab -n 100 -c 1 -C "$MY_COOKIE" -v 4 $MY_REQUEST
This will spit out the response headers and HTML content. The 3 value will be enough to check for a redirect header.
I didn't try piping it to Lynx but grep worked fine.
Apache Benchmark is good for a cursory glance at your system but is not very sophisticated. I am currently attempting to tune a web service and am finding that AB does not measure complete response time when considering the transfer of the body. Also as you mention you can not verify what is returned.
My current recommendation is Apache JMeter. http://jmeter.apache.org/
I am having much better success with it. You may find the Response Assertion useful for your situation. http://jmeter.apache.org/usermanual/component_reference.html#Response_Assertion

curl -i and curl -I returning different results

My understanding was that curl -i and curl -I would return virtually the same results except that curl -i would return the standard output along with the header and curl -I would only return the header -- the header of both being the same. We've been doing some gzip and un-gzipped testing with Varnish and stumbled upon the oddity that curl -i shows X-Cache: HIT but curl -I returns X-Cache: MISS! How this is possible, I am unsure and that is precisely my question in this post.
Here are some more details that may or may not make a difference:
The URL is usually SSL enforced (https) but both HTTP and HTTPS have been tested to receive same results
The results are consistent
Is Varnish Running site says "Yes! Sort of"
curl sends different HTTP requests to the server (or Varnish in this case) when you use the -I option. Normally, curl will send a GET request, but when you specify -I, it sends HEAD instead (essentially telling the server to just send the header, not the actual content). I'm not particularly familiar with Varnish, but it appears to normally cache both GET and HEAD requests -- but in your case it might be configured to do something different, or the backend server may be triggering a difference... In any case, I'm pretty sure it's GET vs. HEAD that's making the cache respond differently with -i vs. -I.
did you check in different orders?
see: http://anothersysadmin.wordpress.com/2008/04/22/x-cache-and-x-cache-lookup-headers-explained/ for some details on X-Cache

Using CURL to download file and view headers and status code

I'm writing a Bash script to download image files from Snapito's web page snapshot API. The API can return a variety of responses indicated by different HTTP response codes and/or some custom headers. My script is intended to be run as an automated Cron job that pulls URLs from a MySQL database and saves the screenshots to local disk.
I am using curl. I'd like to do these 3 things using a single CURL command:
Extract the HTTP response code
Extract the headers
Save the file locally (if the request was successful)
I could do this using multiple curl requests, but I want to minimize the number of times I hit Snapito's servers. Any curl experts out there?
Or if someone has a Bash script that can respond to the full documented set of Snapito API responses, that'd be awesome. Here's their API documentation.
Thanks!
Use the dump headers option:
curl -D /tmp/headers.txt http://server.com
Use curl -i (include HTTP header) - which will yield the headers, followed by a blank line, followed by the content.
You can then split out the headers / content (or use -D to save directly to file, as suggested above).
There are three options -i, -I, and -D
> curl --help | egrep '^ +\-[iID]'
-D, --dump-header FILE Write the headers to FILE
-I, --head Show document info only
-i, --include Include protocol headers in the output (H/F)

Resources