Bash - how to escape underscore in URL - bash

I have a little script in bash on macOS, where I use an array with dates like 19000105 in the format yyyymmdd.
In that script I parse the dates of that array to a loop like:
for i in "${list[#]}"; do
wget -A pdf -nc -E -nd --no-check-certificate URL$iURL$i_tif.pdf
done
where wget opens an URL to download pdf. In order to make it work I need to add the date twice to the URL at different parts.
The URL, however, contains at one point an underscore right after I insert the date, which needs to look like this: 19000105_tif/jpegs/.
I thought I need to add curled brackets like {$i}_tif/ to escape, however, the URL is parsed like %7B18500105%7D_tif/, which is wrong.
If I leave the curled brackets like $i_tif/, the URL is parsed like /jpegs/, where the date and tif-part before is not parsed at all and completely gone.
How can I add the dates correctly with an underscore in the URL right after?

Using ${i} instead of $i should solve this

Related

How do I combine a string and a variable into one command and execute it in bash?

To get into detail, I am writing a shell script to automate usage of a plugin.
To do so, I am using xclip to pull a url from the x clipboard, then append it at the end of a command with arguments and execute the combined command.
I use url="$(xclip -o)" to get the url from the clipboard, then com='youtube-dl -x --audio-format mp3 ' to set the initial string.
I've been clumsily stumbling through attempts at printf and defining new strings as str=$com $url (and many variants of such. I haven't written anything in a long time and know I'm screwing up something pretty basic. Anybody able to help?
to concatenate two strings with a space in an assignment
str=$com' '$url
can be also written
str=$com" "$url
or
str="$com $url"
then the command can just be launched
$str
however
str=$com $url
is the syntax to call $url passing environment variable str=$com
also if url was a string which could contains spaces or tabs anan array should be used instead to avoid splitting when called
str=( $com "$url" )
"{str[#]}"

CURL cuts off URL after 74 characters

I'm writing a script that takes a list of ~300 URLs as input, which have the following format:
http://long.domain.prefix/folder/subfolder/filename.html
Of that URL, I'd like to save filename.htmlin ./folder/subfolder/ - if that folder structure doesn't exist, it must be created. This works, the folders are being written to disk, however no files are downloaded.
My script looks like this:
#!/bin/bash
for line in `cat list.txt`; do
# strips the URL prefix and trailing slash
name=${line#http://long.domain.prefix\/}
/usr/bin/curl -m 10 -f -o $name --create-dirs $fullname
done;
For some reason, the $name variable is cut off after exactly 74 characters, which obviously results in HTTP error codes. I can't give out the exact URLs, but rest assured they are correct, as long as the full URL is being used.
How can I prevent this odd cutting-off behavior?
Thanks to Etan Reisner, the solution was to convert the file to Unix-Style line endings.

Wget: read URL from file, add sequence of number to the URL

I am reading a file (with URL's) line by line:
#!/bin/bash
while read line
do
url=$line
wget $url
wget $url_{001..005}.jpg
done < $1
For first, I want to download primary url as you see wget $url. After that I want to add to the url sequence of numbers (_001.jpg, _002.jpg, _003.jpg, _004.jpg, _005.jpg):
wget $url_{001..005}.jpg
...but for some reason it's not working.
Sorry, missed out one thing: the url's are like http://xy.com/052914.jpg. Is there any easy way to add _001 before the extension? http://xy.com/052914_001.jpg. Or I have to remove ".jpg" from the file containing URL's then simply add later to the variable?
Another way escaping the underscore char:
wget $url\_{001..005}.jpg
Try encapsulating your variable name:
wget ${url}_{001..005}.jpg
Bash is trying to expand the variable $url_ in your command.
As for your jpg within the URL followup, see substring expansion in the bash manual.
wget ${url:0: -4}_{001..005}.jpg
The :0: -4 means, expand to the variable from position zero (the first character), minus the last 4 characters.
Or from this answer:
wget ${url%.jpg}_{001..005}.jpg
%.jpg removes .jpg specifically and will work on older versions of bash.

Sed/bash to append partial filename and date to new filename in bash

I have a bunch of csv files in a folder. I need a sed/bash script to accomplish the following for each file in the folder:
Take filename.csv, execute a script with each file, and paste the result of the execution into filename+CurrentDate.csv where current date is only the date without spaces.
For example:
abc.csv is turned into abc_02-03-15.csv after a function is performed on it.
I figure the code would look something like this:
#!/bin/bash
dir="/var/tmp/"
now=$(date)
for f in "$dir"/*; do
ScriptName
sed OriginalFileName > "OriginalFileName+$now.csv"
done
Please advise and thanks for the help in advance.
Assuming that your filtering script takes its input on stdin and writes to stdout:
dir="/var/tmp/"
now=$(date "+%Y-%m-%d")
for f in "$dir"/*; do
ScriptName <"$f" >"${f}_$now.csv"
done
As should be fairly obvious, within the loop, $f refers to the input name. When that's being substituted into part of a string with characters that are valid variable names after it, it's necessary to use curly braces to disambiguate: $f_$now.csv would be looking for a variable named f_, not f.
The dates, here, are in YYYY-MM-DD form. This is STRONGLY preferable to DD-MM-YY or MM-DD-YY for a few reasons: It sorts in lexographical order identically to its numeric order (so tools that do a plain ASCII sort, like ls, will show files named in this way newest-to-oldest or oldest-to-newest), and it's unambiguous for human readers, whether they come from countries where DD-MM or MM-DD is conventional.

Bash curl and variable in the middle of the url

I would need to read certain data using curl. I'm basically reading keywords from file
while read line
do
curl 'https://gdata.youtube.com/feeds/api/users/'"${line}"'/subscriptions?v=2&alt=json' \
> '/home/user/archive/'"$line"
done < textfile.txt
Anyway I haven't found a way to form the url to curl so it would work. I've tried like every possible single and double quoted versions. I've tried basically:
'...'"$line"'...'
"..."${line}"..."
'...'$line'...'
and so on.. Just name it and I'm pretty sure that I've tried it.
When I'm printing out the URL in the best case it will be formed as:
/subscriptions?v=2&alt=jsoneeds/api/users/KEYWORD FROM FILE
or something similar. If you know what could be the cause of this I would appreciate the information. Thanks!
It's not a quoting issue. The problem is that your keyword file is in DOS format -- that is, each line ends with carriage return & linefeed (\r\n) rather than just linefeed (\n). The carriage return is getting read into the line variable, and included in the URL. The giveaway is that when you echo it, it appears to print:
/subscriptions?v=2&alt=jsoneeds/api/users/KEYWORD FROM FILE"
but it's really printing:
https://gdata.youtube.com/feeds/api/users/KEYWORD FROM FILE
/subscriptions?v=2&alt=json
...with just a carriage return between them, so the second overwrites the first.
So what can you do about it? Here's a fairly easy way to trim the cr at the end of the line:
cr=$'\r'
while read line
do
line="${line%$cr}"
curl "https://gdata.youtube.com/feeds/api/users/${line}/subscriptions?v=2&alt=json" \
> "/home/user/archive/$line"
done < textfile.txt
Your current version should work, I think. More elegant is to use a single pair of double quotes around the whole URL with the variable in ${}:
"https://gdata.youtube.com/feeds/api/users/${line}/subscriptions?v=2&alt=json"
Just use it like this, should be sufficient enough:
curl "https://gdata.youtube.com/feeds/api/users/${line}/subscriptions?v=2&alt=json" > "/home/user/archive/${line}"
If your shell gives you issues with & just put \&, but it works fine for me without it.
If the data from the file can contain spaces and you have no objection to spaces in the file name in the /home/user/archive directory, then what you've got should be OK.
Given the contents of the rest of the URL, you could even just write:
while read line
do
curl "https://gdata.youtube.com/feeds/api/users/${line}/subscriptions?v=2&alt=json" \
> "/home/user/archive/${line}"
done < textfile.txt
where strictly the ${line} could be just $line in both places. This works because the strings are fixed and don't contain shell metacharacters.
Since you're code is close to this, but you claim that you're seeing the keywords from the file in the wrong place, maybe a little rewriting for ease of debugging is in order:
while read line
do
url="https://gdata.youtube.com/feeds/api/users/${line}/subscriptions?v=2&alt=json"
file="/home/user/archive/${line}"
curl "$url" > "$file"
done < textfile.txt
Since the strings may end up containing spaces, it seems (do you need to expand spaces to + in the URL?), the quotes around the variables are strongly recommended. You can now run the script with sh -x (or add a line set -x to the script) and see what the shell thinks it is doing as it is doing it.

Resources