Using CURL to download file and view headers and status code - bash

I'm writing a Bash script to download image files from Snapito's web page snapshot API. The API can return a variety of responses indicated by different HTTP response codes and/or some custom headers. My script is intended to be run as an automated Cron job that pulls URLs from a MySQL database and saves the screenshots to local disk.
I am using curl. I'd like to do these 3 things using a single CURL command:
Extract the HTTP response code
Extract the headers
Save the file locally (if the request was successful)
I could do this using multiple curl requests, but I want to minimize the number of times I hit Snapito's servers. Any curl experts out there?
Or if someone has a Bash script that can respond to the full documented set of Snapito API responses, that'd be awesome. Here's their API documentation.
Thanks!

Use the dump headers option:
curl -D /tmp/headers.txt http://server.com

Use curl -i (include HTTP header) - which will yield the headers, followed by a blank line, followed by the content.
You can then split out the headers / content (or use -D to save directly to file, as suggested above).
There are three options -i, -I, and -D
> curl --help | egrep '^ +\-[iID]'
-D, --dump-header FILE Write the headers to FILE
-I, --head Show document info only
-i, --include Include protocol headers in the output (H/F)

Related

What is the fastest way to perform a HTTP request and check for 404?

Recently I needed to check for a huge list of filenames if they exist on a server. I did this by running a for loop which tried to wget each of those files. That was efficient enough, but took about 30 minutes in this case. I wonder if there is a faster way to check whether a file exists or not (since wget is for downloading files and not performing thousands of requests).
I don't know if that information is relevant, but it's an Apache server.
Curl would be the best option in a for loop and here is a straight forward simple way, run this in your forloop
curl -I --silent http://www.yoururl/linktodetect | grep -m 1 -c 404
What this simply does is check the http response header for a 404 returned on the link and if its detected as a missing file/link throwing a 404 then the command line output will display you a number 1; otherwise, if the file/link is valid and does not return a 404 then the command line output will display you a number 0.

Curl range not working(downloads entire file)

curl -v -r 0-500 http://somefile -o localfile
It should download just the first 501 bytes, no? Instead, it just downloads the entire thing. All 67 megabytes. Thanks curl! Could my companies proxy servers be blocking this feature somehow? I am skeptical about that, since the downloads themselves do work, just not the range feature. Am I missing something?
As a client you could always abort the download when you have received what you want.
By using head, you will be able to limit the download to 500 bytes, even if the server does not accept the range-header
curl -v -r 0-500 http://somefile |head -c 500 > localfile
It should download just the first 501 bytes, no?
It depends on the server. From man curl:
You should also be aware that many HTTP/1.1 servers do not have this feature enabled, so that when you attempt to get a range, you'll instead get the whole document.
As you can see in the response from the server, it's using HTTP/1.1. So it's not surprising that the range feature is not supported at the server side.
Please use the following command
curl -H "range: bytes=354-500" -O http://example.com/file.extension

How to verify AB responses?

Is there a way to make sure that AB gets proper responses from server? For example:
To force it to output the response of a single request to STDOUT OR
To ask it to check that some text fragment is included into the response body
I want to make sure that authentication worked properly and i am measuring response time of the target page, not the login form.
Currently I just replace ab -n 100 -c 1 -C "$MY_COOKIE" $MY_REQUEST with curl -b "$MY_COOKIE" $MY_REQUEST | lynx -stdin .
If it's not possible, is there an alternative more comprehensive tool that can do that?
You can use the -v option as listed in the man doc:
-v verbosity
Set verbosity level - 4 and above prints information on headers, 3 and above prints response codes (404, 200, etc.), 2 and above prints warnings and info.
https://httpd.apache.org/docs/2.4/programs/ab.html
So it would be:
ab -n 100 -c 1 -C "$MY_COOKIE" -v 4 $MY_REQUEST
This will spit out the response headers and HTML content. The 3 value will be enough to check for a redirect header.
I didn't try piping it to Lynx but grep worked fine.
Apache Benchmark is good for a cursory glance at your system but is not very sophisticated. I am currently attempting to tune a web service and am finding that AB does not measure complete response time when considering the transfer of the body. Also as you mention you can not verify what is returned.
My current recommendation is Apache JMeter. http://jmeter.apache.org/
I am having much better success with it. You may find the Response Assertion useful for your situation. http://jmeter.apache.org/usermanual/component_reference.html#Response_Assertion

curl -i and curl -I returning different results

My understanding was that curl -i and curl -I would return virtually the same results except that curl -i would return the standard output along with the header and curl -I would only return the header -- the header of both being the same. We've been doing some gzip and un-gzipped testing with Varnish and stumbled upon the oddity that curl -i shows X-Cache: HIT but curl -I returns X-Cache: MISS! How this is possible, I am unsure and that is precisely my question in this post.
Here are some more details that may or may not make a difference:
The URL is usually SSL enforced (https) but both HTTP and HTTPS have been tested to receive same results
The results are consistent
Is Varnish Running site says "Yes! Sort of"
curl sends different HTTP requests to the server (or Varnish in this case) when you use the -I option. Normally, curl will send a GET request, but when you specify -I, it sends HEAD instead (essentially telling the server to just send the header, not the actual content). I'm not particularly familiar with Varnish, but it appears to normally cache both GET and HEAD requests -- but in your case it might be configured to do something different, or the backend server may be triggering a difference... In any case, I'm pretty sure it's GET vs. HEAD that's making the cache respond differently with -i vs. -I.
did you check in different orders?
see: http://anothersysadmin.wordpress.com/2008/04/22/x-cache-and-x-cache-lookup-headers-explained/ for some details on X-Cache

How do I execute an HTTP PUT in bash?

I'm sending requests to a third-party API. It says I must send an HTTP PUT to http://example.com/project?id=projectId
I tried doing this with PHP curl, but I'm not getting a response from the server. Maybe something is wrong with my code because I've never used PUT before. Is there a way for me to execute an HTTP PUT from bash command line? If so, what is the command?
With curl it would be something like
curl --request PUT --header "Content-Length: 0" http://website.com/project?id=1
but like Mattias said you'd probably want some data in the body as well so you'd want the content-type and the data as well (plus content-length would be larger)
If you really want to only use bash it actually has some networking support.
echo -e "PUT /project?id=123 HTTP/1.1\r\nHost: website.com\r\n\r\n" > \
/dev/tcp/website.com/80
But I guess you also want to send some data in the body?
Like Mattias suggested, Bash can do the job without further tools. If you want to send data, you have to preset at least "Content-length". With variables "host", "port", "resource" and "data" defined, you can do a HTTP put with
echo -e "PUT /$resource HTTP/1.1\r\nHost: $host:$port\r\nContent-Length: ${#data}\r\n\r\n$data\r\n" > /dev/tcp/$host/$port
I tested this with a Rest API and it workes fine.

Resources