regular expression inside a cURL call - bash

I have a cURL call like this:
curl --silent --max-filesize 500 --write-out "%{http_code}\t%{url_effective}\n" 'http://fmdl.filemaker.com/maint/107-85rel/fmpa_17.0.2.[200-210].dmg' -o /dev/null
This call generates a list of of URLs with the HTTP code (200 or 404 normally) like this:
404 http://fmdl.filemaker.com/maint/107-85rel/fmpa_17.0.2.203.dmg
404 http://fmdl.filemaker.com/maint/107-85rel/fmpa_17.0.2.204.dmg
200 http://fmdl.filemaker.com/maint/107-85rel/fmpa_17.0.2.205.dmg
404 http://fmdl.filemaker.com/maint/107-85rel/fmpa_17.0.2.206.dmg
The only valid URLs are the ones preceded by the 200 HTTP code, so I would like to put a regular expression in the cURL so that it only downloads the lines that start with 200
Any ideas on how to do this without being a bash script?
Thank you in advance

You can use the following :
curl --silent -f --max-filesize 500 --write-out "%{http_code}\t%{url_effective}\n" -o '#1.dmg' 'http://fmdl.filemaker.com/maint/107-85rel/fmpa_17.0.2.[200-210].dmg'
This will try to reach every url and when it's not a 404 nor too large download it into a file whose name will be based on the index in the url.
The -f flag makes it avoid to output the content of the response when the HTTP code isn't a success one, while the -o flag specifies an output file, where #1 corresponds to the effective value of your [200-210] range (adding other [] or {} would let you refer to other parts of the URL by their index).
Note that during my tests, the --max-filesize 500 flag prevented the download of the only url which didn't end up in a 404, fmpa_17.0.2.205.dmg

Related

Getting a list of uls with wget using regex

I'm starting with page:
https://mysite/a"
I'd like to spider the page getting the full urls of any nested urls below this that begin with the same stem (like https://mysite/a/b ).
I've tried:
$ wget -r --spider --accept-regex "https://...*" 'https://.../' 2>test.txt
which produces a large amount of output inclusing what appear to be the urls I'm after like:
--2018-04-21 15:04:48-- https:/mysite/a/
Reusing existing connection to mysite:443.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'a/index.html.tmp.tmp'
How do I just print out a list of the urls?
Edit:
changed it to
$ wget -r --spider 'https://mysite/a/' |grep 'https://mysite/a*' 2>test.txt
as a test . No output is being saved in test.txt. The file is empty.

encode curl GET request and construct the URL

i've the following example dynamic generated url for GET request
http://api.jwplatform.com/v1/videos/create?api_format=json&api_key=dgfgfg&api_nonce=554566&api_timestamp=1500296525&custom.videoId=905581&description=تقليد بداية العام&downloadurl=http://media.com/media/mymedia.mp4&sourceformat=mp4&sourcetype=url&sourceurl=http://media.com/media/mymedia.mp4&title=الغطس بالمياه الباردة.. تقليد بداية العام&api_signature=5cd337198ead0768975610a135e2
which is include the following vars
api_key=
api_nonce=
api_timestamp=
custom.videoId=
description=
downloadurl=
sourceurl=
title=
api_signature=
sourceformat=mp4
sourcetype=url
i;m tring to get send curl GET command and get the response back i always failed for 2 reason ,
the urls in the request should be encoded utf8
and the 2nd one the i always get curl: (6) Couldn't resolve host for
each vars like its not taking the url as one url and break it into
different calls for example
curl: (6) Couldn't resolve host 'بداية'
Input domain encoded as `ANSI_X3.4-1968'
Failed to convert العام&downloadurl=http: to ACE; System iconv failed
getaddrinfo(3) failed for العام&downloadurl=http::80
curl -X -G -v -H "Content: agent-type: application/x-www-form-urlencoded" http://api.jwplatform.com/v1/videos/create?api_format=json&{vars}
and tips how i can reach this and construct the curl command in the right format

Curl taking too long to send https request using command line

I have implemented one shall script which send an https request to proxy with authorization header using GET request.
Here is my command :
curl -s -o /dev/null -w "%{http_code}" -X GET -H "Authorization: 123456:admin05" "https://www.mywebpage/api/request/india/?ID=123456&Number=9456123789&Code=01"
It takes around 12 second to wait and then sending request to proxy and revert back with some code like 200,400,500 etc..
Is it possible to reduce time and make it faster using CURL ?
Please advice me for such a case.
Thanks.
Use option -v or --verbose along with --trace-time
It gives details of actions begin taken along with timings.
Includes DNS resolution, SSL handshake, etc. Line starting with '>' means header/body being sent, '<' means being received.
Based on difference between operation sequence - you can decipher whether server is taking time to respond (time between request and response) or network latency or bandwidth(response taking) time.

What is the fastest way to perform a HTTP request and check for 404?

Recently I needed to check for a huge list of filenames if they exist on a server. I did this by running a for loop which tried to wget each of those files. That was efficient enough, but took about 30 minutes in this case. I wonder if there is a faster way to check whether a file exists or not (since wget is for downloading files and not performing thousands of requests).
I don't know if that information is relevant, but it's an Apache server.
Curl would be the best option in a for loop and here is a straight forward simple way, run this in your forloop
curl -I --silent http://www.yoururl/linktodetect | grep -m 1 -c 404
What this simply does is check the http response header for a 404 returned on the link and if its detected as a missing file/link throwing a 404 then the command line output will display you a number 1; otherwise, if the file/link is valid and does not return a 404 then the command line output will display you a number 0.

Using CURL to download file and view headers and status code

I'm writing a Bash script to download image files from Snapito's web page snapshot API. The API can return a variety of responses indicated by different HTTP response codes and/or some custom headers. My script is intended to be run as an automated Cron job that pulls URLs from a MySQL database and saves the screenshots to local disk.
I am using curl. I'd like to do these 3 things using a single CURL command:
Extract the HTTP response code
Extract the headers
Save the file locally (if the request was successful)
I could do this using multiple curl requests, but I want to minimize the number of times I hit Snapito's servers. Any curl experts out there?
Or if someone has a Bash script that can respond to the full documented set of Snapito API responses, that'd be awesome. Here's their API documentation.
Thanks!
Use the dump headers option:
curl -D /tmp/headers.txt http://server.com
Use curl -i (include HTTP header) - which will yield the headers, followed by a blank line, followed by the content.
You can then split out the headers / content (or use -D to save directly to file, as suggested above).
There are three options -i, -I, and -D
> curl --help | egrep '^ +\-[iID]'
-D, --dump-header FILE Write the headers to FILE
-I, --head Show document info only
-i, --include Include protocol headers in the output (H/F)

Resources