shell script - Download files with wget only when file name is in my list

shell script - Download files with wget only when file name is in my list - shell

I will download a lot of files from a server with wget. But the files should only be stored when the file name is in a given list. Otherwise wget should stop getting these file and start the next one.
I tried the following:
#!/bin/bash
etsienURL="http://www.etsi.org/deliver/etsi_en"
etsitsURL="http://www.etsi.org/deliver/etsi_ts"
listOfStandards=("en_302571" "en_3023630401" "en_3023630501" "en_3023630601" "en_30263702" "en_30263703" "en_302663" "en_302931" "ts_10153901" "ts_10153903" "ts_1026360501" "ts_1027331" "ts_10286801" "ts_10287103" "ts_10289401" "ts_10289402" "ts_102940" "ts_102941" "ts_102942" "ts_102943" "ts_103097" "ts_10324601" "ts_10324603")
wget -r -nd -nc -e robots=off -A.pdf $etsienURL
wget -r -nd -nc -e robots=off -A.pdf $etsitsURL
for file in *.pdf
do
relevant=false
for t in "${listOfStandards[#]}"
do
if [[ $(basename "$file" .pdf) == *"$t"* ]]
then
relevant=true
break
fi
done
if [ $relevant == false ]
then
rm "$file"
fi
done
With this code all files will be downloaded. After the download the script checks, if the filename or a part of this is in the list. Otherwise the script delete the file. But this cost a lot of disc space. I will only download a file, if the file name contains one if the list items.
Perhaps somebody can help to find a solution.

Found the solution. I forgot the --no-parent tag for wget.

Related

Access to zipped files without unzipping them

I have a zip file that contains a tar.gz file. I would like to access the content of the tar.gz file but without unzipping it
I could list the files in the zip file but of course when trying to untar one of those files bash says : "Cannot open: No such file or directory" since the file does not exist
for file in $archiveFiles;
#do echo ${file: -4};
do
if [[ $file == README.* ]]; then
echo "skipping readme, not relevant"
elif [[ $file == *.tar.gz ]]; then
echo "this is a tar.gz, must extract"
tarArchiveFiles=`tar -tzf $file`
for tarArchiveFile in $tarArchiveFiles;
do echo $tarArchiveFile
done;
fi
done;
Is this possible to extract it "on the fly" without storing it temporarily. I have the impression that this is doable in python

You can't do it without unzipping (obviously), but I assume what you mean is, without unzipping to the filesystem.
unzip has -c and -p options which both unzip to stdout. -c outputs the filename. -p just dumps the binary unzipped file data to stdout.
So:
unzip -p zipfile.zip path/within/zip.tar.gz | tar zxf -
Or if you want to list the contents of the tarfile:
unzip -p zipfile.zip path/within/zip.tar.gz | tar ztf -
If you don't know the path of the tarfile within the zipfile, you'd need to write something more sophisticated that consumes the output of unzip -c, recognises the filename lines in the output. It may well be better to write something in a "proper" language in this case. Python has a very flexible ZipFile library function, and most mainstream languages have something similar.

You can pipe an individual member of a zip file to stdout with the -p option
In your code change
tarArchiveFiles=`tar -tzf $file`
to
tarArchiveFiles=`unzip -p zipfile $file | tar -tzf -`
replace "zipfile" with the name of the zip archive where you sourced $archiveFiles from

Get date from filename and sort into folders?

I'm running wget to get data from an FTP server like this:
wget -r -nH -N --no-parent ftp://username:password#example.com/ -P /home/data/
All of the files are in a format similar to this:
2016_07_10_bob-randomtext.csv.gz
2016_07_11_joe-importantinfo.csv.gz
Right now it's putting all of these files into /home/data/.
What I want to do is get the time from the filename and put it into their own folders based on the date. For example:
/home/data/2016_07_10/2016_07_10_bob-randomtext.csv.gz
/home/data/2016_07_11/2016_07_11_joe-importantinfo.csv.gz
Based off the answers here, it is possible to get the date from a file name. However, I'm not really sure how to turn that into a folder automatically...
Sorry if this is a bit confusing. Any help or advice would be appreciated.

Keeping the download of all the files into one directory, /home/files
destination=/home/data
for filename in /home/files/*; do
if [[ -f "$filename" ]]; then # ignore it if it's a directory (not a file)
name=$(basename "$filename")
datedir=$destination/${name:0:10} # first 10 characters of the filename
mkdir -p "$datedir" # create the directory if it doesn't exist
mv "$filename" "$datedir"
fi
done

bash script to unzip recently uploaded file into server

I want to unzip file automatically after being uploaded into server.
I'm not experienced in bash but I've tried this
for file in *.zip
do
unzip -P pcp9100 "$file" -d ./
done
It's not working as I want.

Okay, assuming you want this to be continuously done in a loop, you can do something like:
while true; do
for file in *.zip; do
unzip -P pcp9100 "${file}" -d ./
rm "${file}"
done
sleep 3
done
Of course there are several things that can go wrong here.
File has an incorrect password
The file inside is also a zip file and does not have the same password
Permissions are incorrect
First, your permissions should be correct. Secondly, you can create a directory called "ExtractedFiles" and one called "IncorrectPasswords" which you can do something like:
while true; do
for file in *.zip; do
unzip -P pcp9100 "${file}" -d ./ExtractedFiles || mv "${file}" ./IncorrectPasswords
rm "${file}"
done
sleep 3
done

How to rename wget-downloaded files sequentially?

Let's say I am downloading image files from a website with wget.
wget -H -p -w 2 -nd -nc -A jpg,jpeg -R gif "forum.foo.com/showthread.php?t=12345"
there are 20 images in that page.. when downloaded, the images are saved as their original file names.
I want to rename the first image downloaded by wget as
001-original_filename.jpg, the second one as 002-original_filename.jpg, and so on..
What to do? Is bash or curl needed for this?
Note: I am on windows.

If you have bash installed, after files downloaded, run this.
i=1
ls -crt | while read file; do
newfile=$(printf "%.3d-%s\n" $i "$file")
mv "$file" "$newfile"
i=$((i+1))
done
ls -crt : list files using creation date, reverse order using time stamp.
.3d in printf will precision number to 3 digit

bash - For every file in a directory, copy it into another directory, only if it doesn't exists there already

Thank you very much in advance for helping!
I have a directory with some html files
$ ls template/content/html
devel.html
idex.html
devel_iphone.html
devel_ipad.html
I'd like to write a bash function to copy every file in that folder into a new location (introduction/files/), ONLY if a file with the same name doesn't exist already there.
This is what I have so far:
orig_html="template/content/html";
dest_html="introduction/files/";
function add_html {
for f in $orig_html"/*";
do
if [ ! -f SAME_FILE_IN_$dest_html_DIRECTORY ];
then
cp $f $dest_html;
fi
done
}
The capital letters is where I was stuck.
Thank you very much.

Would the -n option be enough for your needs?
-n, --no-clobber
do not overwrite an existing file (overrides a previous -i option)

use rsync like this:
rsync -c -avz --delete $orig_html $dest_html
which keep $orig_html indentical with $dest_html based file checksum.

Do you need a bash script ? cp supports the -r (recursive) option, and the -u (update) option. From the man page:
-u, --update
copy only when the SOURCE file is newer than the destination
file or when the destination file is missing

Your $f variable contains the full path, because of the /*.
Try doing something like:
for ff in $orig_html/*
do
thisFile=${ff##*/}
if [ ! -f ${dest_html}/$thisFile ]; then
cp $ff ${dest_html}
fi
done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

shell script - Download files with wget only when file name is in my list - shell

Found the solution. I forgot the --no-parent tag for wget.

Related

Access to zipped files without unzipping them

Get date from filename and sort into folders?

bash script to unzip recently uploaded file into server

How to rename wget-downloaded files sequentially?

bash - For every file in a directory, copy it into another directory, only if it doesn't exists there already

Categories

Resources