wget errors breaks shell script - how to prevent that? - shell

I have a huge file with lots of links to files of various types to download. Each line is one download command like:
wget 'URL1'
wget 'URL2'
...
and there are thousands of those.
Unfortunately some URLs look really ugly, like for example:
http://www.cepa.org.gh/archives/research-working-papers/WTO4%20(1)-charles.doc
It opens OK in a browser, but confuses wget.
I'm getting an error:
./tasks001.sh: line 35: syntax error near unexpected token `1'
./tasks001.sh: line 35: `wget 'http://www.cepa.org.gh/archives/research-working-papers/WTO4%20(1)-charles.doc''
I've tried both URL and 'URL' ways of specifying what to download.
Is there a way to make a script like that running unattended?
I'm OK if it'll just skip the file it couldn't download.

Do not (ab)use the shell.
Save your URLs to some file (let's say my_urls.lst) and do:
wget -i my_urls.lst
Wget will handle quoting etc on it's own

I think you need to used double-quotes (") and not single quotes (') around the URL.
If that still doesn't work, try escaping the paren characters ( and ) with a backslash: \( and \)
Which shell are you using? Bash? zsh?

This doesn't exactly answer your question but:
Both of the following commands work directly in a bash shell:
wget "http://www.cepa.org.gh/archives/research-working-papers/WTO4%20(1)-charles.doc"
and
wget 'http://www.cepa.org.gh/archives/research-working-papers/WTO4%20(1)-charles.doc'
Can you check to see if either of those work for you?

What seems to be happening is that your shell is doing something with the ( characters. I would try using double quotes " instead of single quotes ' around your URL.
If you wish to suppress errors you can use a >/dev/null under unix to redirect standard output or 2> /dev/null to redirect standard error. Under other operating systems it may be something else.

Related

sed with regular expression as a bash variable

We have an application that keeps some info in an encrypted file. To edit the file we have to put the text editor name in an environment variable in bash, for example, EDITOR=vi. Then we run the application and it opens the decrypted file in vi. I am trying to come up with a bash script that updates the encrypted file. The only solution that I can think of is passing sed command instead of vi to the EDITOR variable. It works perfectly for something like EDITOR='sed -i s#aaaa#bbbb#'.
Problem starts when I need space and regular expression. For example: EDITOR='sed -i -r "s#^(\s*masterkey: )(.*)#\1xxxxx#"' which return error. I tried running the EDITOR in bash with $EDITOR test.txt and I can see the problem. It doesn't like double quotes and space between them so I added a backslash before the double quotes and \s instead of space. Now it says unterminated address regex. For several hours I googled and couldn't find any solution. I tried replacing single quotes with double quotes and vice versa and everything that I could find on the internet but no luck.
How can I escape and which characters should I escape here?
Update:
Maybe if I explain the whole situation somebody could suggest an alternative solution. There is an application written by Ruby and it is inside a container. Ruby application has a secret_key_base for production and we supposed to change the key with EDITOR=vi rails credentials:edit --environment=production. I don't know Ruby and google did not return any ruby solution for automation so I could only think about sending sed instead of vi to Ruby.
How can I escape and which characters should I escape here?
That is not possible. Word splitting on the result of expansion cannot be escaped from inside the result of that expansion, it will always run. Note that filename expansion is also running over the result of the expansion.
Create an executable file with the script content and set EDITOR to it.
You could export a bash shell function, after some tries I got to:
myeditor() {
sed -i -E 's#^(\s*masterkey: )(.*)#\1xxxxx#' "$#"
}
export -f myeditor
EDITOR='bash -c "$#" _ myeditor'

wget print a mistake why?

I want to get an information of weather from weatherstack.com by wget.
But when I use wget on my Mac I faced a problem. Mistake:
[1] 7943
zsh: no matches found: http://api.weatherstack.com/current?access_key=ACCESS_KEY
[1] + exit 1 wget
Command: wget http://api.weatherstack.com/current?access_key=ACCESS_KEY&query=London
This has nothing to do with wget in particular, but with how the shell looks at command lines. Here, it detects the "&" character in the URL and interprets this as the "run this in the background" operator.
To avoid this, you need to put the URL in quotes to "hide" the special characters from the shell. Try
wget "http://api.weatherstack.com/current?access_key=YOUR_KEY&query=London"

cURL Scraping Wrong Webpage

I am attempting to scrape a webpage that requires a login using curl in the Mac Terminal but can't seem to get it right. I have a cookies.txt file with my login info that I am reading into the command, but I can't get it to scrape the intended page. When I run
curl -b /Users/dwm8/Desktop/cookies.txt -o /Users/dwm8/Desktop/file.txt https://kenpom.com/team.php?team=Duke&y=2002
the contents of file.txt are the webpage data from https://kenpom.com/team.php?team=Duke instead of https://kenpom.com/team.php?team=Duke&y=2002. Is there a fix for this? Thanks for the help.
& is a shell metacharacter that separates commands and indicates the command before it should be run in the background. So, your command:
curl ... https://kenpom.com/team.php?team=Duke&y=2002
gets parsed as two separate commands:
curl ... https://kenpom.com/team.php?team=Duke & # The & means run curl in the background
y=2002 # This just sets a shell variable
In order to get the shell to treat & as part of the argument to curl rather than a command separator, you need to quote it (either single- or double-quotes would work) or escape it with a backslash:
curl ... 'https://kenpom.com/team.php?team=Duke&y=2002'
curl ... "https://kenpom.com/team.php?team=Duke&y=2002"
curl ... https://kenpom.com/team.php\?team=Duke\&y=2002
Oh, and notice that I also escaped the ? in that last example? That's because ? is also a shell metacharacter (specifically, a wildcard). In this case it probably wouldn't cause any trouble, but it's safest to quote or escape it just in case. And since it's hard to keep track of exactly which characters can cause trouble, I'll recommend quoting instead of escaping, and just quoting everything that you're at all unsure about.
You need to wrap url part in quotes.

Escaping the output of date in a BASH script

I'm working on what should be a very simple BASH script. What I want to do is a pull an image from a webcamera using curl and write it to a file whose name is datestamped.
#! /bin/bash
DATE=$(date +%Y-%m-%d_%H-%M)
DIRECTORY1=home/manager/security_images/Studio_1/
TARGET1=${DIRECTORY1}${DATE}.jpg
curl http://web#192.168.180.211/snapshot.cgi > $TARGET1
When I try to run this I am told that there is no such file or directory. I believe this is due to an error in my escaping but I have tried seemingly every combination of quotation marks around the variables at each stage and still can't get it to work. I just don't understand what is going wrong and could really use some pointers towards what I'm doing wrong.
Many thanks
No, it’s just a typo.
DIRECTORY1=home/manager/security_images/Studio_1/
^^
Should be
DIRECTORY1=/home/manager/security_images/Studio_1/
of course.
As for escaping, even though only safe characters are used now, so quotes are technically superfluous, quoting out all $variables in every line by default is a good habit in shell scripting, there are very few cases when you do not want to use them.
Double quoting the redirection target should be enough:
curl http://web#192.168.180.211/snapshot.cgi > "$TARGET1"
Just make sure the path to it exists. You can run your script with
set -xv
to see how variables are interpolated.

How to format a Windows path to a Unix path on Cygwin command line

When using Cygwin, I frequently copy a Windows path and manually edit all of the slashes to Unix format. For example, if I am using Cygwin and need to change directory I enter:
cd C:\windows\path
then edit this to
cd C:/windows/path
(Typically, the path is much longer than that). Is there a way to use sed, or something else to do this automatically? For example, I tried:
echo C:\windows\path|sed 's|\\|g'
but got the following error
sed: -e expression #1, char 7: unterminated `s' command
My goal is to reduce the typing, so maybe I could write a program which I could call. Ideally I would type:
conversionScript cd C:/windows/path
and this would be equivalent to typing:
cd C:\windows\path
Thanks all. Apparently all I need are single quotes around the path:
cd 'C:\windows\path'
and Cygwin will convert it. Cygpath would work too, but it also needs the single quotes to prevent the shell from eating the backslash characters.
Read about the cygpath command.
somecommand `cygpath -u WIN_PATH`
e.g.
cmd.exe doesn't like single quotes. You should use double quotes
C:\test>echo C:\windows\path|sed "s|\\|/|g"
C:/windows/path
You replace back-slash by slash using unix sed
Below I use star "*" to seperate fields in s directive
sed "s*\\\*/*g"
The trick is to use one back-slash more than you might think needed
to answer your question to achieve
cd C:\windows\path
since you are in bash this just works as you want - but add single quotes
cd 'C:\windows\path'
As noted by #bmargulies and #Jennette - cygpath is your friend - it would be worth it to read the cygwin man page
man cygpath

Resources