shell script - Download files with wget only when file name is in my list - shell

I will download a lot of files from a server with wget. But the files should only be stored when the file name is in a given list. Otherwise wget should stop getting these file and start the next one.
I tried the following:
#!/bin/bash
etsienURL="http://www.etsi.org/deliver/etsi_en"
etsitsURL="http://www.etsi.org/deliver/etsi_ts"
listOfStandards=("en_302571" "en_3023630401" "en_3023630501" "en_3023630601" "en_30263702" "en_30263703" "en_302663" "en_302931" "ts_10153901" "ts_10153903" "ts_1026360501" "ts_1027331" "ts_10286801" "ts_10287103" "ts_10289401" "ts_10289402" "ts_102940" "ts_102941" "ts_102942" "ts_102943" "ts_103097" "ts_10324601" "ts_10324603")
wget -r -nd -nc -e robots=off -A.pdf $etsienURL
wget -r -nd -nc -e robots=off -A.pdf $etsitsURL
for file in *.pdf
do
relevant=false
for t in "${listOfStandards[#]}"
do
if [[ $(basename "$file" .pdf) == *"$t"* ]]
then
relevant=true
break
fi
done
if [ $relevant == false ]
then
rm "$file"
fi
done
With this code all files will be downloaded. After the download the script checks, if the filename or a part of this is in the list. Otherwise the script delete the file. But this cost a lot of disc space. I will only download a file, if the file name contains one if the list items.
Perhaps somebody can help to find a solution.

Found the solution. I forgot the --no-parent tag for wget.

Related

Access to zipped files without unzipping them

I have a zip file that contains a tar.gz file. I would like to access the content of the tar.gz file but without unzipping it
I could list the files in the zip file but of course when trying to untar one of those files bash says : "Cannot open: No such file or directory" since the file does not exist
for file in $archiveFiles;
#do echo ${file: -4};
do
if [[ $file == README.* ]]; then
echo "skipping readme, not relevant"
elif [[ $file == *.tar.gz ]]; then
echo "this is a tar.gz, must extract"
tarArchiveFiles=`tar -tzf $file`
for tarArchiveFile in $tarArchiveFiles;
do echo $tarArchiveFile
done;
fi
done;
Is this possible to extract it "on the fly" without storing it temporarily. I have the impression that this is doable in python
You can't do it without unzipping (obviously), but I assume what you mean is, without unzipping to the filesystem.
unzip has -c and -p options which both unzip to stdout. -c outputs the filename. -p just dumps the binary unzipped file data to stdout.
So:
unzip -p zipfile.zip path/within/zip.tar.gz | tar zxf -
Or if you want to list the contents of the tarfile:
unzip -p zipfile.zip path/within/zip.tar.gz | tar ztf -
If you don't know the path of the tarfile within the zipfile, you'd need to write something more sophisticated that consumes the output of unzip -c, recognises the filename lines in the output. It may well be better to write something in a "proper" language in this case. Python has a very flexible ZipFile library function, and most mainstream languages have something similar.
You can pipe an individual member of a zip file to stdout with the -p option
In your code change
tarArchiveFiles=`tar -tzf $file`
to
tarArchiveFiles=`unzip -p zipfile $file | tar -tzf -`
replace "zipfile" with the name of the zip archive where you sourced $archiveFiles from

Get date from filename and sort into folders?

I'm running wget to get data from an FTP server like this:
wget -r -nH -N --no-parent ftp://username:password#example.com/ -P /home/data/
All of the files are in a format similar to this:
2016_07_10_bob-randomtext.csv.gz
2016_07_11_joe-importantinfo.csv.gz
Right now it's putting all of these files into /home/data/.
What I want to do is get the time from the filename and put it into their own folders based on the date. For example:
/home/data/2016_07_10/2016_07_10_bob-randomtext.csv.gz
/home/data/2016_07_11/2016_07_11_joe-importantinfo.csv.gz
Based off the answers here, it is possible to get the date from a file name. However, I'm not really sure how to turn that into a folder automatically...
Sorry if this is a bit confusing. Any help or advice would be appreciated.
Keeping the download of all the files into one directory, /home/files
destination=/home/data
for filename in /home/files/*; do
if [[ -f "$filename" ]]; then # ignore it if it's a directory (not a file)
name=$(basename "$filename")
datedir=$destination/${name:0:10} # first 10 characters of the filename
mkdir -p "$datedir" # create the directory if it doesn't exist
mv "$filename" "$datedir"
fi
done

bash script to unzip recently uploaded file into server

I want to unzip file automatically after being uploaded into server.
I'm not experienced in bash but I've tried this
for file in *.zip
do
unzip -P pcp9100 "$file" -d ./
done
It's not working as I want.
Okay, assuming you want this to be continuously done in a loop, you can do something like:
while true; do
for file in *.zip; do
unzip -P pcp9100 "${file}" -d ./
rm "${file}"
done
sleep 3
done
Of course there are several things that can go wrong here.
File has an incorrect password
The file inside is also a zip file and does not have the same password
Permissions are incorrect
First, your permissions should be correct. Secondly, you can create a directory called "ExtractedFiles" and one called "IncorrectPasswords" which you can do something like:
while true; do
for file in *.zip; do
unzip -P pcp9100 "${file}" -d ./ExtractedFiles || mv "${file}" ./IncorrectPasswords
rm "${file}"
done
sleep 3
done

How to rename wget-downloaded files sequentially?

Let's say I am downloading image files from a website with wget.
wget -H -p -w 2 -nd -nc -A jpg,jpeg -R gif "forum.foo.com/showthread.php?t=12345"
there are 20 images in that page.. when downloaded, the images are saved as their original file names.
I want to rename the first image downloaded by wget as
001-original_filename.jpg, the second one as 002-original_filename.jpg, and so on..
What to do? Is bash or curl needed for this?
Note: I am on windows.
If you have bash installed, after files downloaded, run this.
i=1
ls -crt | while read file; do
newfile=$(printf "%.3d-%s\n" $i "$file")
mv "$file" "$newfile"
i=$((i+1))
done
ls -crt : list files using creation date, reverse order using time stamp.
.3d in printf will precision number to 3 digit

bash - For every file in a directory, copy it into another directory, only if it doesn't exists there already

Thank you very much in advance for helping!
I have a directory with some html files
$ ls template/content/html
devel.html
idex.html
devel_iphone.html
devel_ipad.html
I'd like to write a bash function to copy every file in that folder into a new location (introduction/files/), ONLY if a file with the same name doesn't exist already there.
This is what I have so far:
orig_html="template/content/html";
dest_html="introduction/files/";
function add_html {
for f in $orig_html"/*";
do
if [ ! -f SAME_FILE_IN_$dest_html_DIRECTORY ];
then
cp $f $dest_html;
fi
done
}
The capital letters is where I was stuck.
Thank you very much.
Would the -n option be enough for your needs?
-n, --no-clobber
do not overwrite an existing file (overrides a previous -i option)
use rsync like this:
rsync -c -avz --delete $orig_html $dest_html
which keep $orig_html indentical with $dest_html based file checksum.
Do you need a bash script ? cp supports the -r (recursive) option, and the -u (update) option. From the man page:
-u, --update
copy only when the SOURCE file is newer than the destination
file or when the destination file is missing
Your $f variable contains the full path, because of the /*.
Try doing something like:
for ff in $orig_html/*
do
thisFile=${ff##*/}
if [ ! -f ${dest_html}/$thisFile ]; then
cp $ff ${dest_html}
fi
done

Resources