bash to remove files from a file passed as a variable - bash

I asked this once before but now the below bash seems to delete and download all the files in the input file. Basically all the lines (6) in input are files and are read into the $line variable. When I echo the $line I can see the files there and they do get deleted, but they also get downloaded and I dont need them to be, I am also not sure why they do. Thank you :).
strong text
file1.txt
file2.txt
file3.txt
file1.pdf
file2.pdf
file3.pdf
bash
# add filenames as variable and remove files from list
while read line; do
echo $line (only there to verify that the files are in the variable)
wget --user=xxxxx --password=xxx --xxxx --method=DELETE \
xxx://www.example.com/xx/xxx/xxx/$line
done < /home/cmccabe/list
rm /home/cmccabe/list

You can use -O /dev/null option i.e.:
wget --user=xxxxx --password=xxx --xxxx --method=DELETE \
xxx://www.example.com/xx/xxx/xxx/$line -O /dev/null
To discard wget's output and avoid saving the output file.

Related

Execute rm with string on file and delete line

I have a file (log.txt) with multiples lines.
Uploaded 1Y3JxCDpjsId_f8C7YAGAjvHHk-y-QVQM at 1.9 MB/s, total 3.9 MB
Uploaded 14v58hwKP457ZF32rwIaUFH216yrp9fAB at 317.3 KB/s, total 2.1 MB
Each line in log.txt represents a file that needs to be deleted.
I want to delete the file and then delete the respective line.
Example:
rm 1Y3JxCDpjsId_f8C7YAGAjvHHk-y-QVQM
and after deleting the file that the log.txt contains, delete the line, leaving only the others.
Uploaded 14v58hwKP457ZF32rwIaUFH216yrp9fAB at 317.3 KB/s, total 2.1 MB
Try this:
#!/bin/bash
logfile="logfile.txt"
logfilecopy=$( mktemp )
cp "$logfile" "$logfilecopy"
while IFS= read -r line
do
filename=$( echo "$line" | sed 's/Uploaded \(.*\) at .*/\1/' )
if [[ -f "$filename" ]]
then
tempfile=$( mktemp )
rm -f "$filename" && grep -v "$line" "$logfile" >"$tempfile" && mv "$tempfile" "$logfile"
fi
done < "$logfilecopy"
# Cleanup
rm -f "$logfilecopy"
It does:
keep a copy of the original log file.
read each line of this copy using while and read.
for each line, extract the filename. Note, done with sed since a filename could contain spaces. Therefore cut would not work as required.
If the file exists, delete it, remove the line from the log file and store it in a temporary file, move the temporary file into the log file.
that last step is done with && between commands to ensure that the last command is done before continuing. If the rm fails, the log entry must not be deleted.
finally delete the original log file copy.
you can add echo statements and-or -x to $!/bin/bash to debug if required.
The following code reads log.txt line by line, captures the filename with a bash ERE and tries to delete that file. When the regex or the deletion fails it outputs the original line.
#!/bin/bash
tmpfile=$( mktemp ) || exit 1
while IFS='' read -r line
do
[[ $line =~ ^Uploaded\ (.*)\ at ]] &&
rm -- "${BASH_REMATCH[1]}" ||
echo "$line"
done < log.txt > "$tmpfile" &&
mv "$tmpfile" log.txt
remark: the while loop final result is true unless there's a problem reading log.txt or generating "$tmpfile", so chaining the mv with && makes it so that you won't overwrite the original logfile abusively.
Another approach using bash4+ and GNU tools.
#!/usr/bin/env bash
##: Save the file names in an array named files using mapfile aka readarray.
##: Process Substitution and With GNU grep that supports the -P flag.
mapfile -t files < <(grep -Po '(?<=Uploaded ).*(?= at)' log.txt)
##: loop through the files ("${files[#]}") and check if it is existing (-e).
##: If it does, save them in an array named existing_file.
##: Add an additional test if need be, see "help test".
for f in "${files[#]}"; do
[[ -e $f ]] && existing_file+=("$f")
done
##: Format the array existing_file into a syntax that is accepted
##: by GNU sed, e.g. "/file1|file2|file3|file4/d" and save it
##: in a variable named to_delete.
to_delete=$(IFS='|'; printf '%s' "/${existing_file[*]}/d")
##: delete/remove the existing files.
##: Not sure if ARG_MAX will come up.
echo rm -v -- "${existing_file[#]}"
##: Remove the deleted files (lines that contains the file name)
##: from log.txt using GNU sed.
echo sed -E -i "$to_delete" log.txt
Remove all the echo if you're satisfied with the output.
This not exactly what you asked for and it is not perfect but it just might be what you need.

Get date from filename and sort into folders?

I'm running wget to get data from an FTP server like this:
wget -r -nH -N --no-parent ftp://username:password#example.com/ -P /home/data/
All of the files are in a format similar to this:
2016_07_10_bob-randomtext.csv.gz
2016_07_11_joe-importantinfo.csv.gz
Right now it's putting all of these files into /home/data/.
What I want to do is get the time from the filename and put it into their own folders based on the date. For example:
/home/data/2016_07_10/2016_07_10_bob-randomtext.csv.gz
/home/data/2016_07_11/2016_07_11_joe-importantinfo.csv.gz
Based off the answers here, it is possible to get the date from a file name. However, I'm not really sure how to turn that into a folder automatically...
Sorry if this is a bit confusing. Any help or advice would be appreciated.
Keeping the download of all the files into one directory, /home/files
destination=/home/data
for filename in /home/files/*; do
if [[ -f "$filename" ]]; then # ignore it if it's a directory (not a file)
name=$(basename "$filename")
datedir=$destination/${name:0:10} # first 10 characters of the filename
mkdir -p "$datedir" # create the directory if it doesn't exist
mv "$filename" "$datedir"
fi
done

download files using bash script using wget

I've been trying to create a simple script that will take a list of file to be downloaded from a .txt file, then using a loop it will read the .txt what files needs to be downloaded with the help of the other separated .txt file where in the address of the files where it will be downloaded. But my problem is I don't know how to do this. I've tried many times but I always failed.
file.txt
1.jpg
2.jpg
3.jpg
4.mp3
5.mp4
=====================================
url.txt
url = https://google.com.ph/
=====================================
download.sh
#!/bin/sh
url=$(awk -F = '{print $2}' url.txt)
for i in $(cat file.txt);
do
wget $url
done
Your help is greatly appreciated.
Other than the obvious issue that R Sahu pointed out in his answer, you can avoid:
Using awk to parse your url.txt file.
Using for $(cat file.txt) to iterate through file.txt file.
Here is what you can do:
#!/bin/bash
# Create an array files that contains list of filenames
files=($(< file.txt))
# Read through the url.txt file and execute wget command for every filename
while IFS='=| ' read -r param uri; do
for file in "${files[#]}"; do
wget "${uri}${file}"
done
done < url.txt
Instead of
wget $url
Try
wget "${url}${i}"

bash for command

#!/bin/bash
for i in /home/xxx/sge_jobs_output/split_rCEU_results/*.rCEU.bed
do
intersectBed -a /home/xxx/sge_jobs_output/split_rCEU_results/$i.rCEU.bed -b /home/xxx/sge_jobs_output/split_NA12878_results/$i.NA12878.bed -f 0.90 -r > $i.overlap_90.bed
done
However I got the errors like:
Error: can't determine file type of '/home/xug/sge_jobs_output/split_NA12878_results//home/xug/sge_jobs_output/split_rCEU_results/chr4.rCEU.bed.NA12878.bed': No such file or directory
Seems the computer mixes the two .bed files together, and I don't know why.
thx
Your i has the format /home/xxx/sge_jobs_output/split_rCEU_results/whatever.rCEU.bed, and you insert it to the file name, which leads to the duplication. It's probably simplest to switch to the directory and use basename, like this:
pushd /home/xxx/sge_jobs_output/split_rCEU_results
for i in *.rCEU.bed
do
intersectBed -a $i -b ../../sge_jobs_output/split_NA12878_results/`basename $i .rCEU.bed`.NA12878.bed -f 0.90 -r > `basename $i .NA12878.bed`.overlap_90.bed
done
popd
Notice the use of basename, with which you can replace the extension of a file: If you have a file called filename.foo.bar, basename filename.foo.bar .foo.bar returns just filename.

creating a file downloading script with checksum verification

I want to create a shellscript that reads files from a .diz file, where information about various source files are stored, that are needed to compile a certain piece of software (imagemagick in this case). i am using Mac OSX Leopard 10.5 for this examples.
Basically i want to have an easy way to maintain these .diz files that hold the information for up-to-date source packages. i would just need to update these .diz files with urls, version information and file checksums.
Example line:
libpng:1.2.42:libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks:http://downloads.sourceforge.net/project/libpng/00-libpng-stable/1.2.42/libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks:9a5cbe9798927fdf528f3186a8840ebe
script part:
while IFS=: read app version file url md5
do
echo "Downloading $app Version: $version"
curl -L -v -O $url 2>> logfile.txt
$calculated_md5=`/sbin/md5 $file | /usr/bin/cut -f 2 -d "="`
echo $calculated_md5
done < "files.diz"
Actually I have more than just one question concerning this.
how to calculate and compare the checksums the best? i wanted to store md5 checksums in the .diz file and compare it with string comparison with "cut"ting out the string
is there a way to tell curl another filename to save to? (in my case the filename gets ugly libpng-1.2.42.tar.bz2?use_mirror=biznetnetworks)
i seem to have issues with the backticks that should direct the output of the piped md5 and cut into the variable $calculated_md5. is the syntax wrong?
Thanks!
The following is a practical one-liner:
curl -s -L <url> | tee <destination-file> |
sha256sum -c <(echo "a748a107dd0c6146e7f8a40f9d0fde29e19b3e8234d2de7e522a1fea15048e70 -") ||
rm -f <destination-file>
wrapping it up in a function taking 3 arguments:
- the url
- the destination
- the sha256
download() {
curl -s -L $1 | tee $2 | sha256sum -c <(echo "$3 -") || rm -f $2
}
while IFS=: read app version file url md5
do
echo "Downloading $app Version: $version"
#use -o for output file. define $outputfile yourself
curl -L -v $url -o $outputfile 2>> logfile.txt
# use $(..) instead of backticks.
calculated_md5=$(/sbin/md5 "$file" | /usr/bin/cut -f 2 -d "=")
# compare md5
case "$calculated_md5" in
"$md5" )
echo "md5 ok"
echo "do something else here";;
esac
done < "files.diz"
My curl has a -o (--output) option to specify an output file. There's also a problem with your assignment to $calculated_md5. It shouldn't have the dollar sign at the front when you assign to it. I don't have /sbin/md5 here so I can't comment on that. What I do have is md5sum. If you have it too, you might consider it as an alternative. In particular, it has a --check option that works from a file listing of md5sums that might be handy for your situation. HTH.

Resources