Use wget to download images from a list in CSV - bash

I have a CSV which has three columns: object-ID, image-url1, image-url2. I'd like to be able to run a bash script that does the following for each row in the CSV:
create a new folder using 'object-ID' as the folder name
download both images into that folder
repeat for each row
I've got this code but it needs some help!
IFS=$'\n';
for file in `cat <filename.csv>`; do
echo "Creating folder $object-ID";
mkdir $object-ID
echo "Downloading image 1";
wget $image-url1
echo "Downloading image 2";
wget $image-url2
done

Try this:
while IFS=, read objid url1 url2;
do
echo "Creating folder $objid"
mkdir -p "$objid"
# Run in a subshell
(
cd "$objid"
echo "Downloading image 1"
wget "$url1"
echo "Downloading image 2"
wget "$url2";
)
done < myfile.csv
It assumes your CSV uses comma (,) as a separator. This can be adjusted by changing the IFS=, part in the while loop.
Also, if $objid contains forward slashes (/) in it, mkdir -p will treat it as a path with subdirectories and create all of them. If that's undesirable you can replace / in $objid prior to mkdir like so:
objid="${objid//\//_}"

With read :
while IFS=',' read id image_one image_two; do
[ ! -d "${id}" ] && mkdir "${id}"
for img in ${image_one} ${image_two}; do
printf "Downloading %s" "${img}"
wget -P "${id}" "${img}"
echo "---"
done
done < file.csv
For each line : creates directory based on id value if directory doesn't exist and retrieves images in created dir (with -P option of the wget).

With awk:
awk -F "," '{
print "mkdir",$1"; echo wget -P",$1,$2"; echo wget -P",$1,$3
}' filename.csv | bash

Related

How to create a bash script to make directories and specific files inside each directory

I wrote a bash script trying to generate one directory named after each file inside the directory from which I run the script.
Original directory= /home/agalvez/data//sims/phylip_format
sim1.phylip
sim2.phylip
Directories to create = sim1 sim2
The contents of these new directories should be a copy of the original file that names the new directory and an extra file called "input". This file should contain the name of the .phylip file as well as the following:
"Name of original file"
U
5
Y
/home/agalvez/data/sims/trees/tree_nodenames.txt
After that I want to run the following command (sequentially) in all these new directories:
phylip dollop < input > screenout
My approach is the following one but it is not working:
!/bin/bash
for f in *.phylip;
mkdir /home/agalvez/data/sims/dollop/$f;
cp $f /home/agalvez/data/sims/dollop/$f;
cd /home/agalvez/data/sims/dollop/$f;
echo "$f" | cat > input;
echo "U" | cat >> input;
echo "5" | cat >> input;
echo "Y" | cat >> input;
echo "/home/agalvez/data/sims/trees/tree_nodenames.txt" | cat >> input;
phylip dollop < input > screenout;
;done
Edit: The error messge looks like this:
line 4: syntax error near unexpected token `mkdir'
line 4: ` mkdir /home/agalvez/data/sims/dollop/$f;'
FINAL SOLUTION:
#!/bin/bash
for f in *.phylip;
do
mkdir /home/agalvez/data/sims/dollop/$f;
cp /home/agalvez/data/sims/phylip_format/$f /home/agalvez/data/sims/dollop/$f;
cd /home/agalvez/data/sims/dollop/$f;
echo "$f" | cat > input;
echo "U" | cat >> input;
echo "5" | cat >> input;
echo "Y" | cat >> input;
echo "/home/agalvez/data/sims/trees/tree_nodenames.txt" | cat >> input;
phylip dollop < input > screenout;
done
The immediate problem is that you are lacking a do at the beginning of the loop body; but you'll want to refactor this code to avoid hardcoding the directory structure etc.
The first line needs to start with literally the two characters # and ! in order to be a valid shebang.
Notice also When to wrap quotes around a shell variable?
The printf could be replaced with a here document; I like the compactness of printf here.
#!/bin/bash
for f in *.phylip; do
mkdir -p dollop/"$f"
cp "$f" dollop/"$f"
cd dollop/"$f"
printf "%s\n" "$f" "U" "5" "Y" \
"/home/agalvez/data/sims/trees/tree_nodenames.txt" |
phylip dollop > screenout
done
Going forward, try http://shellcheck.net/ for diagnosing many common beginner problems in shell scripts.
Assuming you have a directory named pingping in your ${HOME} folder with files 1.txt, 2.txt, 3.txt. You can accomplish that like this. Modify this code to suit your needs.
#! /bin/bash
working_directory="${HOME}/pingping/"
cd $working_directory
for f in *.txt
do
mkdir "${f%%.*}"
if [ -f "${f%%.*}.txt" ]
then
if [ -d "${f%%.*}" ]
then
cp ${f%%.*}.txt ${f%%.*}
echo "Done copying"
#phylip dollop < input > screenout
#echo "Succesfully ran the command
fi
else
echo "not found"
fi
done

Download URLs from CSV into subdirectory given in first field

So I want to export my products into my new website. I have an csv file with these data:
product id,image1,image2,image3,image4,image5
1,https://img.url/img1-1.png,https://img.url/img1-2.png,https://img.url/img1-3.png,https://img.url/img1-4.png,https://img.url/img1-5.png
2,https://img.url/img2-1.png,https://img.url/img2-2.png,https://img.url/img2-3.png,https://img.url/img2-4.png,https://img.url/img2-5.png
What I want to do is to make a script to read from that file, make directory named with product id, download images of the product and put them inside their own folder (folder 1 => image1-image5 of product id 1, folder 2 => image1-image5 of product id 2, and so on).
I can make a normal text file instead of using the excel format if it's easier to do. Thanks before.
Sorry I'm really new here. I haven't done the code yet because I'm clueless, but what I want to do is something like this:
for id in $product_id; do
mkdir $id && cd $id && curl -o $img1 $img2 $img3 $img4 $img5 && cd ..
done
Here is a quick and dirty attempt which should hopefully at least give you an idea of how to handle this.
#!/bin/bash
tr ',' ' ' <products.csv |
while read -r prod urls; do
mkdir -p "$prod"
# Potential bug: urls mustn't contain shell metacharacters
for url in $urls; do
wget -P "$prod" "$url"
done
done
You could equivalently do ( cd "$prod" && curl -O "$url" ) if you prefer curl; I generally do, though the availability of an option to set the output directory with wget is convenient.
If your CSV contains quotes around the fields or you need to handle URLs which contain shell metacharacters (irregular spaces, wildcards which happen to match files in the current directory, etc; but most prominently & which means to run a shell command in the background) perhaps try something like
while IFS=, read -r prod url1 url2 url3 url4 url5; do
mkdir -p "$prod"
wget -P "$prod" "$url1"
wget -P "$prod" "$url2"
: etc
done <products.csv
which (modulo the fixed quoting) is pretty close to your attempt.
Or perhaps switch to a less wacky input format, maybe generate it on the fly from the CSV with
awk -F , 'function trim (value) {
# Trim leading and trailing double quotes
sub(/^"/, "", value); sub(/"$/, "", value);
return value; }
{ prod=trim($1);
for(i=2; i<=NF; ++i) {
# print space-separated prod, url
print prod, trim($i) } }' products.csv |
while read -r prod url; do
mkdir -p "$prod"
wget -P "$prod" "$url"
done
which splits the CSV into repeated lines with the same product ID and one URL each, and any CSV quoting removed, then just loops over that instead. mkdir with the -p option helfully doesn't mind if the directory already exists.
If you followed the good advice that #Aaron gave you, this code can help you, as you seem to be new with bash I commented out the code for better comprehension.
#!/bin/bash
# your csv file
myFile=products.csv
# number of lines of file
nLines=$(wc -l $myFile | awk '{print $1}')
echo "Total Lines=$nLines"
# loop over the lines of file
for i in `seq 1 $nLines`;
do
# first column value
id=$(sed -n $(($i+1))p $myFile | awk -F ";" '{print $1}')
line=$(sed -n $(($i+1))p $myFile)
#create the folder if not exist
mkdir $id 2>/dev/null
# number of images in the line
nImgs=$(($(echo $line | awk -F ";" '{print NF-1}')-1))
# go to id folder
cd $id
#loop inside the line values
for j in `seq 2 $nImgs`;
do
# getting the image url to download it
img=$(echo $line | cut -d ";" -f $j)
echo "Downloading image $img**";echo
# downloading the image
wget $img
done
# go back path
cd ..
done

create and rename multiple copies of files

I have a file input.txt that looks as follows.
abas_1.txt
abas_2.txt
abas_3.txt
1fgh.txt
3ghl_1.txt
3ghl_2.txt
I have a folder ff. The filenames of this folder are abas.txt, 1fgh.txt, 3ghl.txt. Based on the input file, I would like to create and rename the multiple copies in ff folder.
For example in the input file, abas has three copies. In the ff folder, I need to create the three copies of abas.txt and rename it as abas_1.txt, abas_2.txt, abas_3.txt. No need to copy and rename 1fgh.txt in ff folder.
Your valuable suggestions would be appreciated.
You can try something like this (to be run from within your folder ff):
#!/bin/bash
while IFS= read -r fn; do
[[ $fn =~ ^(.+)_[[:digit:]]+\.([^\.]+)$ ]] || continue
fn_orig=${BASH_REMATCH[1]}.${BASH_REMATCH[2]}
echo cp -nv -- "$fn_orig" "$fn"
done < input.txt
Remove the echo if you're happy with it.
If you don't want to run from within the folder ff, just replace the line
echo cp -nv -- "$fn_orig" "$fn"
with
echo cp -nv -- "ff/$fn_orig" "ff/$fn"
The -n option to cp so as to not overwrite existing files, and the -v option to be verbose. The -- tells cp that there are no more options beyond this point, so that it will not be confused if one of the files starts with a hyphen.
using for and grep :
#!/bin/bash
for i in $(ls)
do
x=$(echo $i | sed 's/^\(.*\)\..*/\1/')"_"
for j in $(grep $x in)
do
cp -n $i $j
done
done
Try this one
#!/bin/bash
while read newFileName;do
#split the string by _ delimiter
arr=(${newFileName//_/ })
extension="${newFileName##*.}"
fileToCopy="${arr[0]}.$extension"
#check for empty : '1fgh.txt' case
if [ -n "${arr[1]}" ]; then
#check if file exists
if [ -f $fileToCopy ];then
echo "copying $fileToCopy -> $newFileName"
cp "$fileToCopy" "$newFileName"
#else
# echo "File $fileToCopy does not exist, so it can't be copied"
fi
fi
done
You can call your script like this:
cat input.txt | ./script.sh
If you could change the format of input.txt, I suggest you adjust it in order to make your task easier. If not, here is my solution:
#!/bin/bash
SRC_DIR=/path/to/ff
INPUT=/path/to/input.txt
BACKUP_DIR=/path/to/backup
for cand in `ls $SRC_DIR`; do
grep "^${cand%.*}_" $INPUT | while read new
do
cp -fv $SRC_DIR/$cand $BACKUP_DIR/$new
done
done

shell get string

I have some lines have same structure like
1000 AS34_59329 RICwdsRSYHSD11-2-IPAAPEK-93 /ifshk5/BC_IP/PROJECT/T1
1073/T11073_RICekkR/Fq/AS34_59329/111220_I631_FCC0E5EACXX_L4_RICwdsRSYHSD11-2-IP
AAPEK-93_1.fq.gz /ifshk5/BC_IP/PROJECT/T11073/T11073_RICekkR/Fq/AS34_5932
9/111220_I631_FCC0E5EACXX_L4_RICwdsRSYHSD11-2-IPAAPEK-93_2.fq.gz /ifshk5/
BC_IP/PROJECT/T11073/T11073_RICekkR/Fq/AS34_59329/clean_111220_I631_FCC0E5EACXX_
L4_RICwdsRSYHSD11-2-IPAAPEK-93_1.fq.gz.total.info 11.824 0.981393
43.8283 95.7401 OK
And I want to get the Bold part to check whether in /home/jesse/ has this folder, if not create mkdir /home/jesse/AS34_59329
I use this code
! /bin/bash
myPath="/home/jesse/"
while read myline
do
dirname= echo "$myline" | awk -F ' ' '{print $2}'
echo $dirname
myPath= $myPath$dirname
echo $myPath
mkdir -p "$myPath"
done < T11073_all_3254.fq.list
But it can't mkdir and show the path name, it shows
-bash: /home/jesse/: is a directory
/home/jesse/
AS39_59324
read can read each field into a separate variable, and mkdir -p will create a dir only if it doesn't exist:
path="/home/jesse"
while read _ dir _
do
mkdir -p "$path/$dir"
done < T11073_all_3254.fq.list
for will iterate over each whitespace separated token. Try this instead.
#!/usr/bin/env bash
# Invoke with first arg as file containing the lines
# foo.sh <input_filename>
for i in `cat $1 | cut -d " " -f2`
do
if [ -d /home/jesse/$i ]
then
echo "Directory /home/jesse/$i exists"
else
mkdir /home/jesse/$i;
echo "Directory /home/jesse/$i created"
fi
done

How do I use Bash to create a copy of a file with an extra suffix before the extension?

This title is a little confusing, so let me break it down. Basically I have a full directory of files with various names and extensions:
MainDirectory/
image_1.png
foobar.jpeg
myFile.txt
For an iPad app, I need to create copies of these with the suffix #2X appended to the end of all of these file names, before the extension - so I would end up with this:
MainDirectory/
image_1.png
image_1#2X.png
foobar.jpeg
foobar#2X.jpeg
myFile.txt
myFile#2X.txt
Instead of changing the file names one at a time by hand, I want to create a script to take care of it for me. I currently have the following, but it does not work as expected:
#!/bin/bash
FILE_DIR=.
#if there is an argument, use that as the files directory. Otherwise, use .
if [ $# -eq 1 ]
then
$FILE_DIR=$1
fi
for f in $FILE_DIR/*
do
echo "Processing $f"
filename=$(basename "$fullfile")
extension="${filename##*.}"
filename="${filename%.*}"
newFileName=$(echo -n $filename; echo -n -#2X; echo -n $extension)
echo Creating $newFileName
cp $f newFileName
done
exit 0
I also want to keep this to pure bash, and not rely on os-specific calls. What am I doing wrong? What can I change or what code will work, in order to do what I need?
#!/bin/sh -e
cd "${1-.}"
for f in *; do
cp "$f" "${f%.*}#2x.${f##*.}"
done
It's very easy to do that with awk in one line like this:
ls -1 | awk -F "." ' { print "cp " $0 " " $1 "#2X." $2 }' | sh
with ls -1 you get just the bare list of files, then you pipe awk to use the dot (.) as separator. Then you build a shell command to create a copy of each file.
I suggest to run the command without the last sh pipe before, in order to check the cp commands are correct. Like this:
ls -1 | awk -F "." ' { print "cp " $0 " " $1 "#2X." $2 }'

Resources