creating a file downloading script with checksum verification - bash

I want to create a shellscript that reads files from a .diz file, where information about various source files are stored, that are needed to compile a certain piece of software (imagemagick in this case). i am using Mac OSX Leopard 10.5 for this examples.
Basically i want to have an easy way to maintain these .diz files that hold the information for up-to-date source packages. i would just need to update these .diz files with urls, version information and file checksums.
Example line:
libpng:1.2.42:libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks:http://downloads.sourceforge.net/project/libpng/00-libpng-stable/1.2.42/libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks:9a5cbe9798927fdf528f3186a8840ebe
script part:
while IFS=: read app version file url md5
do
echo "Downloading $app Version: $version"
curl -L -v -O $url 2>> logfile.txt
$calculated_md5=`/sbin/md5 $file | /usr/bin/cut -f 2 -d "="`
echo $calculated_md5
done < "files.diz"
Actually I have more than just one question concerning this.
how to calculate and compare the checksums the best? i wanted to store md5 checksums in the .diz file and compare it with string comparison with "cut"ting out the string
is there a way to tell curl another filename to save to? (in my case the filename gets ugly libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks)
i seem to have issues with the backticks that should direct the output of the piped md5 and cut into the variable $calculated_md5. is the syntax wrong?
Thanks!

The following is a practical one-liner:
curl -s -L <url> | tee <destination-file> |
sha256sum -c <(echo "a748a107dd0c6146e7f8a40f9d0fde29e19b3e8234d2de7e522a1fea15048e70 -") ||
rm -f <destination-file>
wrapping it up in a function taking 3 arguments:
- the url
- the destination
- the sha256
download() {
curl -s -L $1 | tee $2 | sha256sum -c <(echo "$3 -") || rm -f $2
}

while IFS=: read app version file url md5
do
echo "Downloading $app Version: $version"
#use -o for output file. define $outputfile yourself
curl -L -v $url -o $outputfile 2>> logfile.txt
# use $(..) instead of backticks.
calculated_md5=$(/sbin/md5 "$file" | /usr/bin/cut -f 2 -d "=")
# compare md5
case "$calculated_md5" in
"$md5" )
echo "md5 ok"
echo "do something else here";;
esac
done < "files.diz"

My curl has a -o (--output) option to specify an output file. There's also a problem with your assignment to $calculated_md5. It shouldn't have the dollar sign at the front when you assign to it. I don't have /sbin/md5 here so I can't comment on that. What I do have is md5sum. If you have it too, you might consider it as an alternative. In particular, it has a --check option that works from a file listing of md5sums that might be handy for your situation. HTH.

Related

Bash input problem after computing size of folder with du for pv when gpg prompts user

I'm working on a script to cipher a bunch of folders through the use of tar, gzip and gpg, plus pv with du and awk to keep track of progress. Here is the line that causes problems
tar cf - "$f" | pv -s $(($(du -sk "$f" | awk '{print $1}') * 1024)) | gzip | gpg -e -o "$output/$(basename "$f").tar.gz.gpg"
This works well most of the time. However, if the output file already exists, gpg prompts the user, asking if we want to override the file or not. And in this case, when the script exits, the console kind of breaks: what I type does not appear anymore, pressing Enter does not create a new line, and so on.
The problem does not appear if the outfile does not exist yet, nor if the -s option of pv is missing or computed without du and awk (ex: $((500 * 500)). This won't break the console, but obviously the progress bar would be completely off)
The problem is reproducable even by using this command line outside of a script and replacing $f and $output with desired options.
Perhaps one or a combination of these changes will help.
Change the gpg command to write to stdout, redirected to the file you want: gpg -e -o - > "$output/$(basename "$f").tar.gz.gpg".
Calculate the file size with stat: stat -c "%s" "$f".
The whole line might then look like this:
tar cf - "$f" | pv -s $(stat -c "%s" "$f") | gzip | gpg -e -o - > "$output/$(basename "$f").tar.gz.gpg"

bash pipelines and retaining colour

I prefer bat over pygmentize as bat has better language support and a better all round app in my opinion. However, one thing that I would like is the ability to retain it's syntax highlighted output even through more or grep or other programs. I guess that's because more or other apps do not support colour, but often in Linux, when something seems not possible, I find that there is some smart trick to achieve that thing, so if I pipe bat output to more or grep, is there a way to retain the colour that is part of the bat output?
e.g. bat ~/.bashrc | more
# Get latest bat release and install
bat_releases=https://github.com/sharkdp/bat/releases/
content=$(wget $bat_releases -q -O -)
firstlink=$(grep -oP 'href="/sharkdp/bat/releases/\K[^"]*_amd64\.deb' <<< "$content" | head -1)
DL=$bat_releases$firstlink
ver_and_filename=$(grep -oP 'https://github.com/sharkdp/bat/releases/download/\K[^"]*\.deb' <<< "$DL")
IFS='/' read -ra my_array <<< "$ver_and_filename"
ver=${my_array[0]}
filename=${my_array[1]}
IFS='.' read -ra my_array <<< "$filename"
extension=${my_array[-1]}
extension_with_dot="."$extension
filename_no_extension=${filename%%${extension_with_dot}*}
[ ! -f /tmp/$filename ] && exe wget -P /tmp/ $DL
sudo dpkg -i /tmp/$filename

How to oneline two variables via echo?

I try to search for files and seperate path and version as variable because each will be needed later for creating a directory and to unzip a .jar in desired path.
file=$(find /home/user/Documents/test/ -path *.jar)
version=$(echo "$file" | grep -P -o '[0-9].[0-9].[0-9].[0-9]')
path=$(echo "$file" | sed 's/\(.*\)[/].*/\1/')
newpath=$(echo "${path}/${version}")
echo "$newpath"
result
> /home/user/Documents/test/gb0500
> /home/user/Documents/test/gb0500 /home/user/Documents/test/gb0500
> /home/user/Documents/test /home/user/Documents/test/1.3.2.0
> 1.3.2.1
> 1.3.2.2
> 1.2.0.0
> 1.3.0.0
It's hilarious that it's only working at one line.
what else I tried:
file=$(find /home/v990549/Dokumente/test/ -path *.jar)
version=$(grep -P -o '[0-9].[0-9].[0-9].[0-9]')
path=$(sed 's/\(.*\)[/].*/\1/')
while read $file
do
echo "$path$version"
done
I have no experience in scripting. Thats what I figured out some days ago. I am just practicing and trying to make life easier.
find output:
/home/user/Documents/test/gb0500/gb0500-koetlin-log4j2-web-1.3.2.0-javadoc.jar
/home/user/Documents/test/gb0500/gb0500-koetlin-log4j2-web-1.3.2.1-javadoc.jar
/home/user/Documents/test/gb0500/gb0500-koetlin-log4j2-web-1.3.2.2-javadoc.jar
/home/user/Documents/test/gb0500-co-log4j2-web-1.2.0.0-javadoc.jar
/home/user/Documents/test/gb0500-commons-log4j2-web-1.3.0.0-javadoc.jar
As the both variables version and path are newline-separated, how about:
file=$(find /home/user/Documents/test/ -path *.jar)
version=$(echo "$file" | grep -P -o '[0-9].[0-9].[0-9].[0-9]')
path=$(echo "$file" | sed 's/\(.*\)[/].*/\1/')
paste -d "/" <(echo "$path") <(echo "$version")
Result:
/home/user/Documents/test/gb0500/1.3.2.0
/home/user/Documents/test/gb0500/1.3.2.1
/home/user/Documents/test/gb0500/1.3.2.2
/home/user/Documents/test/1.2.0.0
/home/user/Documents/test/1.3.0.0
BTW I do not recommend to store multiple filenames in a single variable
as a newline-separated variable due to several reasons:
Filenames may contain a newline character.
It is not easy to manipulate the values of each line.
For instance you could simply say
the third line as path=${file%/*} if file contains just one.
Hope this helps.

how to print names of files being downloaded

I'm trying to write a bash script that downloads all the .txt files from a website 'http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/'.
So far I have wget -A txt -r -l 1 -nd 'http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/' but I'm struggling to find a way to print the name of each file to the screen (when downloading). That's the part I'm really stuck on. How would one print the names?
Thoughts?
EDIT this is what I have done so far, but I'm trying to remove a lot of stuff like ghcnd-inventory.txt</a></td><td align=...
wget -O- $LINK | tr '"' '\n' | grep -e .txt | while read line; do
echo Downloading $LINK$line ...
wget $LINK$line
done
LINK='http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/'
wget -O- $LINK | tr '"' '\n' | grep -e .txt | grep -v align | while read line; do
echo Downloading $LINK$line ...
wget -nv $LINK$line
done
Slight optimization of Sundeep's answer:
LINK='http://www1.ncdc.noaa.gov/pub/data/ghcn/daily/'
wget -q -O- $LINK | sed -E '/.*href="[^"]*\.txt".*/!d;s/.*href="([^"]*\.txt)".*/\1/' | wget -nv -i- -B$LINK
The sed command eliminates all lines not matching href="xxx.txt" and extracts only the xxx.txt part of the others. It then passes the result to another wget that uses it as the list of files to retrieve. The -nv option tells wget to be as less verbose as possible. It will thus print the name of the file it currently downloads but almost nothing else. Warning: this works only for this particular web site and does not descend in the sub-directories.

Rewriting 3 commands into one command or script that can be run on cron

Im currently using 3 different commands to achieve my goal of downloading a zip, extracting it, converting the txt file to utf8 and then converting the csv to json!
First I have:
wget https://www.example.com/example.zip -O temp.zip; unzip -o temp.zip; rm temp.zip
Which is good, but the problem to start with is how do I rename the file that is extracted so its the same every time for the next processes as it can be a different name within the zip every day? Next I run this script depending on the filename that converts the ISO to utf8:
sh dir_iconv.sh example1.txt ISO8859-1 UTF-8
Which is this script:
#!/bin/bash
ICONVBIN='/usr/bin/iconv' # path to iconv binary
if [ $# -lt 3 ]
then
echo "$0 dir from_charset to_charset"
exit
fi
for f in $1/*
do
if test -f $f
then
echo -e "\nConverting $f"
/bin/mv $f $f.old
$ICONVBIN -f $2 -t $3 $f.old > $f
rm -f $f.old
else
echo -e "\nSkipping $f - not a regular file";
fi
done
And then finally I run a ruby script csv2json - https://github.com/darwin/csv2json - that is called as follows (pipe delimited) to give me a json output:
csv2json -s '|' example1.txt > example1.json
Is there a simple way to roll this into one command or script that can be called?
Pipe all your commands one after another and, if necessary, throw them in a shell script file.
wget -qO- https://www.example.com/example.zip | unzip | iconv -f ISO8859-1 -t UTF-8 | csv2json > example.json

Resources