download files using bash script using wget - bash

I've been trying to create a simple script that will take a list of file to be downloaded from a .txt file, then using a loop it will read the .txt what files needs to be downloaded with the help of the other separated .txt file where in the address of the files where it will be downloaded. But my problem is I don't know how to do this. I've tried many times but I always failed.
file.txt
1.jpg
2.jpg
3.jpg
4.mp3
5.mp4
=====================================
url.txt
url = https://google.com.ph/
=====================================
download.sh
#!/bin/sh
url=$(awk -F = '{print $2}' url.txt)
for i in $(cat file.txt);
do
wget $url
done
Your help is greatly appreciated.

Other than the obvious issue that R Sahu pointed out in his answer, you can avoid:
Using awk to parse your url.txt file.
Using for $(cat file.txt) to iterate through file.txt file.
Here is what you can do:
#!/bin/bash
# Create an array files that contains list of filenames
files=($(< file.txt))
# Read through the url.txt file and execute wget command for every filename
while IFS='=| ' read -r param uri; do
for file in "${files[#]}"; do
wget "${uri}${file}"
done
done < url.txt

Instead of
wget $url
Try
wget "${url}${i}"

Related

wget command to save output with user specified name using url input file

I am trying :
1. wget -i url.txt
and
2. wget -O output.ext
How do I join both? Download urls listed in url.txt and save them with the names I specify, as seperate files.
In this situation, i think, you need two files with the same number of lines, to map each url with a corresponding name:
url.txt (source file containing your urls, example content given here):
https://svn.apache.org/repos/asf/click/trunk/examples/click-spring-cayenne/README.txt
https://svn.apache.org/repos/asf/click/trunk/examples/click-spring-cayenne/README.txt
output_names.txt (filenames you want to assign):
readme1.txt
readme2.txt
Then you iterate over both files and pass the contents to wget, e.g. with the following script:
#!/bin/bash
IFS=$'\n' read -d '' -r -a url < "$1"
IFS=$'\n' read -d '' -r -a output < "$2"
len=${#url[#]}
for ((i=0;i<$len;i++))
do
wget "${url[$i]}" -O "${output[$i]}"
done
Call:
./script url.txt output_names.txt
Define all the URLs in url.txt and give this a try to see if this is what you need:
for url in $(cat url.txt); do wget $url -O $url.out ; done
If your URLs consist of one or more URIs, this would replace slash with underscore:
for url in $(cat url.txt); do wget $url -O $(echo $url | sed "s/\//_/g").out ; done

Get date from filename and sort into folders?

I'm running wget to get data from an FTP server like this:
wget -r -nH -N --no-parent ftp://username:password#example.com/ -P /home/data/
All of the files are in a format similar to this:
2016_07_10_bob-randomtext.csv.gz
2016_07_11_joe-importantinfo.csv.gz
Right now it's putting all of these files into /home/data/.
What I want to do is get the time from the filename and put it into their own folders based on the date. For example:
/home/data/2016_07_10/2016_07_10_bob-randomtext.csv.gz
/home/data/2016_07_11/2016_07_11_joe-importantinfo.csv.gz
Based off the answers here, it is possible to get the date from a file name. However, I'm not really sure how to turn that into a folder automatically...
Sorry if this is a bit confusing. Any help or advice would be appreciated.
Keeping the download of all the files into one directory, /home/files
destination=/home/data
for filename in /home/files/*; do
if [[ -f "$filename" ]]; then # ignore it if it's a directory (not a file)
name=$(basename "$filename")
datedir=$destination/${name:0:10} # first 10 characters of the filename
mkdir -p "$datedir" # create the directory if it doesn't exist
mv "$filename" "$datedir"
fi
done

bash to remove files from a file passed as a variable

I asked this once before but now the below bash seems to delete and download all the files in the input file. Basically all the lines (6) in input are files and are read into the $line variable. When I echo the $line I can see the files there and they do get deleted, but they also get downloaded and I dont need them to be, I am also not sure why they do. Thank you :).
strong text
file1.txt
file2.txt
file3.txt
file1.pdf
file2.pdf
file3.pdf
bash
# add filenames as variable and remove files from list
while read line; do
echo $line (only there to verify that the files are in the variable)
wget --user=xxxxx --password=xxx --xxxx --method=DELETE \
xxx://www.example.com/xx/xxx/xxx/$line
done < /home/cmccabe/list
rm /home/cmccabe/list
You can use -O /dev/null option i.e.:
wget --user=xxxxx --password=xxx --xxxx --method=DELETE \
xxx://www.example.com/xx/xxx/xxx/$line -O /dev/null
To discard wget's output and avoid saving the output file.

How to rename wget-downloaded files sequentially?

Let's say I am downloading image files from a website with wget.
wget -H -p -w 2 -nd -nc -A jpg,jpeg -R gif "forum.foo.com/showthread.php?t=12345"
there are 20 images in that page.. when downloaded, the images are saved as their original file names.
I want to rename the first image downloaded by wget as
001-original_filename.jpg, the second one as 002-original_filename.jpg, and so on..
What to do? Is bash or curl needed for this?
Note: I am on windows.
If you have bash installed, after files downloaded, run this.
i=1
ls -crt | while read file; do
newfile=$(printf "%.3d-%s\n" $i "$file")
mv "$file" "$newfile"
i=$((i+1))
done
ls -crt : list files using creation date, reverse order using time stamp.
.3d in printf will precision number to 3 digit

bash for command

#!/bin/bash
for i in /home/xxx/sge_jobs_output/split_rCEU_results/*.rCEU.bed
do
intersectBed -a /home/xxx/sge_jobs_output/split_rCEU_results/$i.rCEU.bed -b /home/xxx/sge_jobs_output/split_NA12878_results/$i.NA12878.bed -f 0.90 -r > $i.overlap_90.bed
done
However I got the errors like:
Error: can't determine file type of '/home/xug/sge_jobs_output/split_NA12878_results//home/xug/sge_jobs_output/split_rCEU_results/chr4.rCEU.bed.NA12878.bed': No such file or directory
Seems the computer mixes the two .bed files together, and I don't know why.
thx
Your i has the format /home/xxx/sge_jobs_output/split_rCEU_results/whatever.rCEU.bed, and you insert it to the file name, which leads to the duplication. It's probably simplest to switch to the directory and use basename, like this:
pushd /home/xxx/sge_jobs_output/split_rCEU_results
for i in *.rCEU.bed
do
intersectBed -a $i -b ../../sge_jobs_output/split_NA12878_results/`basename $i .rCEU.bed`.NA12878.bed -f 0.90 -r > `basename $i .NA12878.bed`.overlap_90.bed
done
popd
Notice the use of basename, with which you can replace the extension of a file: If you have a file called filename.foo.bar, basename filename.foo.bar .foo.bar returns just filename.

Resources