Wget: read URL from file, add sequence of number to the URL - bash

I am reading a file (with URL's) line by line:
#!/bin/bash
while read line
do
url=$line
wget $url
wget $url_{001..005}.jpg
done < $1
For first, I want to download primary url as you see wget $url. After that I want to add to the url sequence of numbers (_001.jpg, _002.jpg, _003.jpg, _004.jpg, _005.jpg):
wget $url_{001..005}.jpg
...but for some reason it's not working.
Sorry, missed out one thing: the url's are like http://xy.com/052914.jpg. Is there any easy way to add _001 before the extension? http://xy.com/052914_001.jpg. Or I have to remove ".jpg" from the file containing URL's then simply add later to the variable?

Another way escaping the underscore char:
wget $url\_{001..005}.jpg

Try encapsulating your variable name:
wget ${url}_{001..005}.jpg
Bash is trying to expand the variable $url_ in your command.
As for your jpg within the URL followup, see substring expansion in the bash manual.
wget ${url:0: -4}_{001..005}.jpg
The :0: -4 means, expand to the variable from position zero (the first character), minus the last 4 characters.
Or from this answer:
wget ${url%.jpg}_{001..005}.jpg
%.jpg removes .jpg specifically and will work on older versions of bash.

Related

Strip text from string in shell script

I have the variables below in a large and important SH file and I need to remove some data from a variable and keep only part of the text.
I get "repoTest" with a link to an internal git repo and I need the variable "nameAppTest" to only receive the constant data after the last "/".
Example:
I get: repoTest="ssh://git#code.br.repo.local/code/ecsb/name-repo.git"
I try to do a split: nameAppTest=$(echo "$repoTest"|cut -d'/' -f5|sed -e 's/.git//g')
Response I get: echo "$nameAppTest" (ecsb).
What I expect to receive: name-repo
I tried like this and failed: nameAppTest=$(echo "$repoTest"|cut -d'/' -f5|sed -e 's/.git//g')
Here's a nifty trick:
nameAppTest=$(basename "$repoTest" .git)
Uses basename to get just the last component of the URL, and strip the extension all in one step.
You can also use sh parameter expansion to do it in two steps without any external programs:
# Remove everything up to and including the last /
temp="${repoTest##*/}"
# Remove the trailing .git
nameAppTest="${temp%.git}"

Bash - how to escape underscore in URL

I have a little script in bash on macOS, where I use an array with dates like 19000105 in the format yyyymmdd.
In that script I parse the dates of that array to a loop like:
for i in "${list[#]}"; do
wget -A pdf -nc -E -nd --no-check-certificate URL$iURL$i_tif.pdf
done
where wget opens an URL to download pdf. In order to make it work I need to add the date twice to the URL at different parts.
The URL, however, contains at one point an underscore right after I insert the date, which needs to look like this: 19000105_tif/jpegs/.
I thought I need to add curled brackets like {$i}_tif/ to escape, however, the URL is parsed like %7B18500105%7D_tif/, which is wrong.
If I leave the curled brackets like $i_tif/, the URL is parsed like /jpegs/, where the date and tif-part before is not parsed at all and completely gone.
How can I add the dates correctly with an underscore in the URL right after?
Using ${i} instead of $i should solve this

How to write a script to fetch the address of the links to .rar files on a webpage?

Have a pile of 50 .rar files on a web server and I want to download them all.
And, the names of the files have nothing in common other than .rar.
I wanted to try aria2 to download all of them altogether, but I think I need to write a script to fetch the addresses of all the .rar files.
I have no idea how to start writing the scrip. Any hint will be appreciated.
You can try to play with wget with -A parameter in your shell script:
wget -r "https://foo/" -P /tmp -A "*.rar"
Here is an explanation of what -A does
Specify comma-separated lists of file name suffixes or patterns to accept or reject (see Types of Files). Note that if any of the wildcard characters, ‘’, ‘?’, ‘[’ or ‘]’, appear in an element of acclist or rejlist, it will be treated as a pattern, rather than a suffix. In this case, you have to enclose the pattern into quotes to prevent your shell from expanding it, like in ‘-A ".mp3"’ or ‘-A '*.mp3'’.

CURL cuts off URL after 74 characters

I'm writing a script that takes a list of ~300 URLs as input, which have the following format:
http://long.domain.prefix/folder/subfolder/filename.html
Of that URL, I'd like to save filename.htmlin ./folder/subfolder/ - if that folder structure doesn't exist, it must be created. This works, the folders are being written to disk, however no files are downloaded.
My script looks like this:
#!/bin/bash
for line in `cat list.txt`; do
# strips the URL prefix and trailing slash
name=${line#http://long.domain.prefix\/}
/usr/bin/curl -m 10 -f -o $name --create-dirs $fullname
done;
For some reason, the $name variable is cut off after exactly 74 characters, which obviously results in HTTP error codes. I can't give out the exact URLs, but rest assured they are correct, as long as the full URL is being used.
How can I prevent this odd cutting-off behavior?
Thanks to Etan Reisner, the solution was to convert the file to Unix-Style line endings.

Bash curl and variable in the middle of the url

I would need to read certain data using curl. I'm basically reading keywords from file
while read line
do
curl 'https://gdata.youtube.com/feeds/api/users/'"${line}"'/subscriptions?v=2&alt=json' \
> '/home/user/archive/'"$line"
done < textfile.txt
Anyway I haven't found a way to form the url to curl so it would work. I've tried like every possible single and double quoted versions. I've tried basically:
'...'"$line"'...'
"..."${line}"..."
'...'$line'...'
and so on.. Just name it and I'm pretty sure that I've tried it.
When I'm printing out the URL in the best case it will be formed as:
/subscriptions?v=2&alt=jsoneeds/api/users/KEYWORD FROM FILE
or something similar. If you know what could be the cause of this I would appreciate the information. Thanks!
It's not a quoting issue. The problem is that your keyword file is in DOS format -- that is, each line ends with carriage return & linefeed (\r\n) rather than just linefeed (\n). The carriage return is getting read into the line variable, and included in the URL. The giveaway is that when you echo it, it appears to print:
/subscriptions?v=2&alt=jsoneeds/api/users/KEYWORD FROM FILE"
but it's really printing:
https://gdata.youtube.com/feeds/api/users/KEYWORD FROM FILE
/subscriptions?v=2&alt=json
...with just a carriage return between them, so the second overwrites the first.
So what can you do about it? Here's a fairly easy way to trim the cr at the end of the line:
cr=$'\r'
while read line
do
line="${line%$cr}"
curl "https://gdata.youtube.com/feeds/api/users/${line}/subscriptions?v=2&alt=json" \
> "/home/user/archive/$line"
done < textfile.txt
Your current version should work, I think. More elegant is to use a single pair of double quotes around the whole URL with the variable in ${}:
"https://gdata.youtube.com/feeds/api/users/${line}/subscriptions?v=2&alt=json"
Just use it like this, should be sufficient enough:
curl "https://gdata.youtube.com/feeds/api/users/${line}/subscriptions?v=2&alt=json" > "/home/user/archive/${line}"
If your shell gives you issues with & just put \&, but it works fine for me without it.
If the data from the file can contain spaces and you have no objection to spaces in the file name in the /home/user/archive directory, then what you've got should be OK.
Given the contents of the rest of the URL, you could even just write:
while read line
do
curl "https://gdata.youtube.com/feeds/api/users/${line}/subscriptions?v=2&alt=json" \
> "/home/user/archive/${line}"
done < textfile.txt
where strictly the ${line} could be just $line in both places. This works because the strings are fixed and don't contain shell metacharacters.
Since you're code is close to this, but you claim that you're seeing the keywords from the file in the wrong place, maybe a little rewriting for ease of debugging is in order:
while read line
do
url="https://gdata.youtube.com/feeds/api/users/${line}/subscriptions?v=2&alt=json"
file="/home/user/archive/${line}"
curl "$url" > "$file"
done < textfile.txt
Since the strings may end up containing spaces, it seems (do you need to expand spaces to + in the URL?), the quotes around the variables are strongly recommended. You can now run the script with sh -x (or add a line set -x to the script) and see what the shell thinks it is doing as it is doing it.

Resources