Wget with input-file and output-document - bash

I have a list of URLs which I would like to feed into wget using --input-file.
However I can't work out how to control the --output-document value at the same time,
which is simple if you issue the commands one by one.
I would like to save each document as the MD5 of its URL.
cat url-list.txt | xargs -P 4 wget
And xargs is there because I also want to make use of the max-procs features for parallel downloads.

Don't use cat. You can have xargs read from a file. From the man page:
--arg-file=file
-a file
Read items from file instead of standard input. If you use this
option, stdin remains unchanged when commands are run. Other‐
wise, stdin is redirected from /dev/null.

how about using a loop?
while read -r line
do
md5=$(echo "$line"|md5sum)
wget ... $line ... --output-document $md5 ......
done < url-list.txt

In your question you use -P 4 which suggests you want your solution to run in parallel. GNU Parallel http://www.gnu.org/software/parallel/ may help you:
cat url-list.txt | parallel 'wget {} --output-document "`echo {}|md5sum`"'

You can do that like this :
cat url-list.txt | while read url;
do
wget $url -O $( echo "$url" | md5 );
done
good luck

Related

bash pipelines and retaining colour

I prefer bat over pygmentize as bat has better language support and a better all round app in my opinion. However, one thing that I would like is the ability to retain it's syntax highlighted output even through more or grep or other programs. I guess that's because more or other apps do not support colour, but often in Linux, when something seems not possible, I find that there is some smart trick to achieve that thing, so if I pipe bat output to more or grep, is there a way to retain the colour that is part of the bat output?
e.g. bat ~/.bashrc | more
# Get latest bat release and install
bat_releases=https://github.com/sharkdp/bat/releases/
content=$(wget $bat_releases -q -O -)
firstlink=$(grep -oP 'href="/sharkdp/bat/releases/\K[^"]*_amd64\.deb' <<< "$content" | head -1)
DL=$bat_releases$firstlink
ver_and_filename=$(grep -oP 'https://github.com/sharkdp/bat/releases/download/\K[^"]*\.deb' <<< "$DL")
IFS='/' read -ra my_array <<< "$ver_and_filename"
ver=${my_array[0]}
filename=${my_array[1]}
IFS='.' read -ra my_array <<< "$filename"
extension=${my_array[-1]}
extension_with_dot="."$extension
filename_no_extension=${filename%%${extension_with_dot}*}
[ ! -f /tmp/$filename ] && exe wget -P /tmp/ $DL
sudo dpkg -i /tmp/$filename

How to Copy and Rename multiple files using shell

I want to copy only 20180721 files from Outgoing to Incoming folder. I also want to remove the first numbers from the file name and want to rename from -1 to -3. I want to keep my commands to minimum so I am using pax command below.
Filename:
216118105741_MOM-09330-20180721_102408-1.jar
Output expected:
MOM-09330-20180721_102408-3.jar
I have tried this command and it's doing most of the work apart from removing the number coming in front of the file name. Can anyone help?
Command used:
pax -rw -pe -s/-1/-3/ ./*20180721*.jar ../Incoming/
Try this simple script using just parameter expansion:
for file in *20180721*.jar; do
new=${file#*_}
cp -- "$file" "/path/to/destination/${new%-*}-3.jar"
done
You can try this
In general
for i in `ls files-to-copy-*`; do
cp $i `echo $i | sed "s/rename-from/rename-to/g"`;
done;
In your case
for i in `ls *_MOM*`; do
cp $i `echo $i | sed "s/_MOM/MOM/g" | sed "s/-1/-3/g"`;
done;
pax only applies the first successful substitution even if the -s option is specified more than once. You can pipe the output to a second pax instance, though.
pax -w -s ':^[^_]*_::p' *20180721*.jar | (builtin cd ../Incoming; pax -r -s ':1[.]jar$:3.jar:p')

how to print names of files being downloaded

I'm trying to write a bash script that downloads all the .txt files from a website 'http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/'.
So far I have wget -A txt -r -l 1 -nd 'http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/' but I'm struggling to find a way to print the name of each file to the screen (when downloading). That's the part I'm really stuck on. How would one print the names?
Thoughts?
EDIT this is what I have done so far, but I'm trying to remove a lot of stuff like ghcnd-inventory.txt</a></td><td align=...
wget -O- $LINK | tr '"' '\n' | grep -e .txt | while read line; do
echo Downloading $LINK$line ...
wget $LINK$line
done
LINK='http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/'
wget -O- $LINK | tr '"' '\n' | grep -e .txt | grep -v align | while read line; do
echo Downloading $LINK$line ...
wget -nv $LINK$line
done
Slight optimization of Sundeep's answer:
LINK='http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/'
wget -q -O- $LINK | sed -E '/.*href="[^"]*\.txt".*/!d;s/.*href="([^"]*\.txt)".*/\1/' | wget -nv -i- -B$LINK
The sed command eliminates all lines not matching href="xxx.txt" and extracts only the xxx.txt part of the others. It then passes the result to another wget that uses it as the list of files to retrieve. The -nv option tells wget to be as less verbose as possible. It will thus print the name of the file it currently downloads but almost nothing else. Warning: this works only for this particular web site and does not descend in the sub-directories.

Inline comments for Bash?

I'd like to be able to comment out a single flag in a one-line command. Bash only seems to have from # till end-of-line comments. I'm looking at tricks like:
ls -l $([ ] && -F is turned off) -a /etc
It's ugly, but better than nothing. Is there a better way?
The following seems to work, but I'm not sure whether it is portable:
ls -l `# -F is turned off` -a /etc
My preferred is:
Commenting in a Bash script
This will have some overhead, but technically it does answer your question
echo abc `#put your comment here` \
def `#another chance for a comment` \
xyz etc
And for pipelines specifically, there is a cleaner solution with no overhead
echo abc | # normal comment OK here
tr a-z A-Z | # another normal comment OK here
sort | # the pipelines are automatically continued
uniq # final comment
How to put a line comment for a multi-line command
I find it easiest (and most readable) to just copy the line and comment out the original version:
#Old version of ls:
#ls -l $([ ] && -F is turned off) -a /etc
ls -l -a /etc
$(: ...) is a little less ugly, but still not good.
Here's my solution for inline comments in between multiple piped commands.
Example uncommented code:
#!/bin/sh
cat input.txt \
| grep something \
| sort -r
Solution for a pipe comment (using a helper function):
#!/bin/sh
pipe_comment() {
cat -
}
cat input.txt \
| pipe_comment "filter down to lines that contain the word: something" \
| grep something \
| pipe_comment "reverse sort what is left" \
| sort -r
Or if you prefer, here's the same solution without the helper function, but it's a little messier:
#!/bin/sh
cat input.txt \
| cat - `: filter down to lines that contain the word: something` \
| grep something \
| cat - `: reverse sort what is left` \
| sort -r
Most commands allow args to come in any order. Just move the commented flags to the end of the line:
ls -l -a /etc # -F is turned off
Then to turn it back on, just uncomment and remove the text:
ls -l -a /etc -F
How about storing it in a variable?
#extraargs=-F
ls -l $extraargs -a /etc
If you know a variable is empty, you could use it as a comment. Of course if it is not empty it will mess up your command.
ls -l ${1# -F is turned off} -a /etc
§ 10.2. Parameter Substitution
For disabling a part of a command like a && b, I simply created an empty script x which is on path, so I can do things like:
mvn install && runProject
when I need to build, and
x mvn install && runProject
when not (using Ctrl + A and Ctrl + E to move to the beginning and end).
As noted in comments, another way to do that is Bash built-in : instead of x:
$ : Hello world, how are you? && echo "Fine."
Fine.
It seems that $(...) doesn't survive from ps -ef.
My scenario is that I want to have a dummy param that can be used to identify the very process. Mostly I use this method, but the method is not workable everywhere. For example, python program.py would be like
mkdir -p MyProgramTag;python MyProgramTag/../program.py
The MyProgramTag would be the tag for identifying the process started.

creating a file downloading script with checksum verification

I want to create a shellscript that reads files from a .diz file, where information about various source files are stored, that are needed to compile a certain piece of software (imagemagick in this case). i am using Mac OSX Leopard 10.5 for this examples.
Basically i want to have an easy way to maintain these .diz files that hold the information for up-to-date source packages. i would just need to update these .diz files with urls, version information and file checksums.
Example line:
libpng:1.2.42:libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks:http://downloads.sourceforge.net/project/libpng/00-libpng-stable/1.2.42/libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks:9a5cbe9798927fdf528f3186a8840ebe
script part:
while IFS=: read app version file url md5
do
echo "Downloading $app Version: $version"
curl -L -v -O $url 2>> logfile.txt
$calculated_md5=`/sbin/md5 $file | /usr/bin/cut -f 2 -d "="`
echo $calculated_md5
done < "files.diz"
Actually I have more than just one question concerning this.
how to calculate and compare the checksums the best? i wanted to store md5 checksums in the .diz file and compare it with string comparison with "cut"ting out the string
is there a way to tell curl another filename to save to? (in my case the filename gets ugly libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks)
i seem to have issues with the backticks that should direct the output of the piped md5 and cut into the variable $calculated_md5. is the syntax wrong?
Thanks!
The following is a practical one-liner:
curl -s -L <url> | tee <destination-file> |
sha256sum -c <(echo "a748a107dd0c6146e7f8a40f9d0fde29e19b3e8234d2de7e522a1fea15048e70 -") ||
rm -f <destination-file>
wrapping it up in a function taking 3 arguments:
- the url
- the destination
- the sha256
download() {
curl -s -L $1 | tee $2 | sha256sum -c <(echo "$3 -") || rm -f $2
}
while IFS=: read app version file url md5
do
echo "Downloading $app Version: $version"
#use -o for output file. define $outputfile yourself
curl -L -v $url -o $outputfile 2>> logfile.txt
# use $(..) instead of backticks.
calculated_md5=$(/sbin/md5 "$file" | /usr/bin/cut -f 2 -d "=")
# compare md5
case "$calculated_md5" in
"$md5" )
echo "md5 ok"
echo "do something else here";;
esac
done < "files.diz"
My curl has a -o (--output) option to specify an output file. There's also a problem with your assignment to $calculated_md5. It shouldn't have the dollar sign at the front when you assign to it. I don't have /sbin/md5 here so I can't comment on that. What I do have is md5sum. If you have it too, you might consider it as an alternative. In particular, it has a --check option that works from a file listing of md5sums that might be handy for your situation. HTH.

Resources