I want to write some code i can run in the bash that takes a list of URL's and checks if they return a 404. If the site is not returning a 404 i need the url to be written to the output list.
So in the end i should have a list with working sites.
I do not know how to realize the code.
This looks like something that could work right?:
How to check if a URL exists or returns 404 with Java?
You can use this code and build on it as necessary:
#!/bin/bash
array=( "http://www.stackoverflow.com" "http://www.google.com" )
for url in "${array[#]}"
do
if ! curl -s --head --request GET ${url} | grep "404 Not Found" > /dev/null
then
echo "Output URL not returning 404 ${url}"
fi
done
Thanks for your help. I found a package for linux called linkchecker. It does exactly what i want.
Related
I'm trying to get/download some files from an url. I'm make a tiny script in ruby to get this files. Follow the script:
require 'nokogiri'
require 'open-uri'
(1..2).each do |season|
(1..3).each do |ep|
season = season.to_s.rjust(2, '0')
ep = ep.to_s.rjust(2, '0')
page = Nokogiri::HTML(open("https://some-url/s#{season}e{ep}/releases"))
page.css('table.table tbody tr td a').each do |el|
link = el['href']
`curl "https://some-url#{link}"` if link.match('sujaidr.srt$')
end
end
end
puts "done"
But the response from curl is:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>Redirecting...</title>
<h1>Redirecting...</h1>
<p>You should be redirected automatically to target URL:
/some-url/s0Xe0Y/releases. If not click the link.
When I use wget the redirected page is downloaded. I tried to set the user agent but not works. The server always redirect the link only when I try download the files through curl or others cli's like wget, aria2c, httpie, etc. And I can't find any solution for now.
How can I do this?
Solved
I decide use Watir webdriver to do this. Works great for now.
If you want to download the file, rather then the page doing the redirection try using the option -L within your code for example:
curl -L "https://some-url#{link}"
From the curl man:
-L, --location
(HTTP) If the server reports that the requested page has moved to a different
location (indicated with a Location: header and a 3XX
response code), this option will make curl redo the request on
the new place.
If you are using ruby, instead of calling curl or other 3rd party tools, you may cat to use something like this:
require 'net/http'
# Must be somedomain.net instead of somedomain.net/, otherwise, it will throw exception.
Net::HTTP.start("somedomain.net") do |http|
resp = http.get("/flv/sample/sample.flv")
open("sample.flv", "wb") do |file|
file.write(resp.body)
end
end
puts "Done."
Check this answer from where the example came out: https://stackoverflow.com/a/2263547/1135424
I use Curl inside a Ruby script to scrape this kind of page:
https://www.example.com/page.html?a=1&b=2&c=3&d=4
Using this code:
value = `curl #{ARGV[0]} | grep "findMe:"`
result = value.scan(/findMe: (.*)/).flatten.first.split('$').last.gsub(',', '').to_f
puts result
But since that page is using a 301 redirect, it was failing, so I added the -L
attribute:
value = `curl -L #{ARGV[0]} | grep "findMe:"`
result = value.scan(/findMe: (.*)/).flatten.first.split('$').last.gsub(',', '').to_f
puts result
It now returns the result fine, but the script doesn't end. on SSH termail it is like asking me about the other parameters, after displaying result. so I must press enter to abort the script.
When I implement this in a web app, the script doesn't work, probably due to this parameter related problem.
Therefore, how I can I tell curl to abort right after displaying the result, or, to ignore any parameters in the given url, in the first place (Since they don't affect page content). So it would auto scrape this part:
https://www.example.com/page.html
I'm trying to use curl in a script I'm developing that will download a bunch of files. I used -# switch with curl to force showing progress bar instead of full details which are not of interest. However, the output looks something like that:
######################################################################## 100.0%
######################################################################## 100.0%
######################################################################## 100.0%
This is not descriptive at all and I thought of adding a line before any download to show what is going to be downloaded but I did not like the result. I'm asking if there is a way for curl to output something like what we get from wget:
file1.zip 100%[=============================>] 33.05K --.-KB/s in 0.1s
file2.zip 100%[=============================>] 46.26K --.-KB/s in 0.1s
file3.zip 100%[=============================>] 19.46K --.-KB/s in 0.1s
I don't want to use wget instead, though, as it is not available for OS X by default and will require whoever going to use my script to install wget first using port or other methods which is inconvenient.
I found a good way to solve this by using curl-progress script here (https://gist.github.com/sstephenson/4587282) which wraps curl with a custom-drawn progress bar.
By default, the script curl-progress does not show file name in front of the progress bar but it is totally customisable. I had to modify print_progress function so it use one additional argument which is the name of the file to be downloaded. Therefor, I modified the printf statement inside print_progress so it print the file name in suitable location before the progress bar:
print_progress() {
local bytes="$1"
local length="$2"
local fileName="$3" # I added this third variable
...
...
printf "\x1B[0G %-10s%-6s\x1B[7m%*s\x1B[0m%*s\x1B[1D" \
"$fileName" "${percent}%" "$on" "" "$off" "" >&4
}
Now print_progress method expect three arguments and for that I modified the call to print_progress to send the third argument:
print_progress "$bytes" "$length" "$2"
Where $2 refers to the second argument sent to curl-progress. Now this is an example to download an arbitrary file from the web:
$ ./curl-progress -so "file1.zip" "http://download.thinkbroadband.com/20MB.zip"
And this is the output:
I will still have to ship a copy of curl-progress script along with mine but it is better than asking the user to install wget first.
For the particular case where curl is used to download only 1 file like given in examples in this thread, we could use this one liner function.
# Usage : curlp URL
curlp(){ f=${1##*/};printf "%28s%s" "" $f;COLUMNS=27 curl -# $1 -o $f ; }
It provide a progress bar with 20 # (i.e a # every 5% progress). You can indeed hack it to add more options, may be 2 args one is the URL the second could be the local name., etc...
I'm using a web tool that has inbound webhooks. They provide me with a URL, to which I can POST a string and it logs it into the system.
I would like to create a script that me and my team can use from the terminal to do something like this:
~: appName
~: What is the webHook URL?
Here I can copy and paste the URL gives me, and stores it.
Then from now I can do this:
~: appName This is a message that I want to send...
And this sends as a POST to the webhook the string. This would ideally something I can share with non-techies and that's easy to set up. And I have no idea how to even start this.
I am assuming you want this to be strictly shell.
In the end you want to use something like curl (bash)
curl --data "msg=$2" $url
The $url variable could come from a flat file(app.txt) that is just key value with key=appName
You first script would need to append to the file(app.txt)
echo $1 $2 >> app.txt
This is how you can get started:
#!/bin/bash
msg=$1
url=""
[ ! -f webhookurl ] || url=`cat webhookurl` #webhookurl is a file where you put the url
if [ "$url" == "" ]; then
read -p "What is the webHook URL? " url
echo $url > webhookurl
fi
# Now start posting message
curl --data "msg=$msg" $url
save it with appname. Then run appname like this:
./appname "message to send"
It will ask for url for the first time and save it in webhookurl file in the same folder as the script for future use.
I would like to send POST message using bash script.In body there should be image but i don't know how to put them.
You can use cURL for that:
curl -F File=#/path/to/your/file http://your.url
If that does not work, please add more details to your question.