Download URLs from CSV into subdirectory given in first field - bash

So I want to export my products into my new website. I have an csv file with these data:
product id,image1,image2,image3,image4,image5
1,https://img.url/img1-1.png,https://img.url/img1-2.png,https://img.url/img1-3.png,https://img.url/img1-4.png,https://img.url/img1-5.png
2,https://img.url/img2-1.png,https://img.url/img2-2.png,https://img.url/img2-3.png,https://img.url/img2-4.png,https://img.url/img2-5.png
What I want to do is to make a script to read from that file, make directory named with product id, download images of the product and put them inside their own folder (folder 1 => image1-image5 of product id 1, folder 2 => image1-image5 of product id 2, and so on).
I can make a normal text file instead of using the excel format if it's easier to do. Thanks before.
Sorry I'm really new here. I haven't done the code yet because I'm clueless, but what I want to do is something like this:
for id in $product_id; do
mkdir $id && cd $id && curl -o $img1 $img2 $img3 $img4 $img5 && cd ..
done

Here is a quick and dirty attempt which should hopefully at least give you an idea of how to handle this.
#!/bin/bash
tr ',' ' ' <products.csv |
while read -r prod urls; do
mkdir -p "$prod"
# Potential bug: urls mustn't contain shell metacharacters
for url in $urls; do
wget -P "$prod" "$url"
done
done
You could equivalently do ( cd "$prod" && curl -O "$url" ) if you prefer curl; I generally do, though the availability of an option to set the output directory with wget is convenient.
If your CSV contains quotes around the fields or you need to handle URLs which contain shell metacharacters (irregular spaces, wildcards which happen to match files in the current directory, etc; but most prominently & which means to run a shell command in the background) perhaps try something like
while IFS=, read -r prod url1 url2 url3 url4 url5; do
mkdir -p "$prod"
wget -P "$prod" "$url1"
wget -P "$prod" "$url2"
: etc
done <products.csv
which (modulo the fixed quoting) is pretty close to your attempt.
Or perhaps switch to a less wacky input format, maybe generate it on the fly from the CSV with
awk -F , 'function trim (value) {
# Trim leading and trailing double quotes
sub(/^"/, "", value); sub(/"$/, "", value);
return value; }
{ prod=trim($1);
for(i=2; i<=NF; ++i) {
# print space-separated prod, url
print prod, trim($i) } }' products.csv |
while read -r prod url; do
mkdir -p "$prod"
wget -P "$prod" "$url"
done
which splits the CSV into repeated lines with the same product ID and one URL each, and any CSV quoting removed, then just loops over that instead. mkdir with the -p option helfully doesn't mind if the directory already exists.

If you followed the good advice that #Aaron gave you, this code can help you, as you seem to be new with bash I commented out the code for better comprehension.
#!/bin/bash
# your csv file
myFile=products.csv
# number of lines of file
nLines=$(wc -l $myFile | awk '{print $1}')
echo "Total Lines=$nLines"
# loop over the lines of file
for i in `seq 1 $nLines`;
do
# first column value
id=$(sed -n $(($i+1))p $myFile | awk -F ";" '{print $1}')
line=$(sed -n $(($i+1))p $myFile)
#create the folder if not exist
mkdir $id 2>/dev/null
# number of images in the line
nImgs=$(($(echo $line | awk -F ";" '{print NF-1}')-1))
# go to id folder
cd $id
#loop inside the line values
for j in `seq 2 $nImgs`;
do
# getting the image url to download it
img=$(echo $line | cut -d ";" -f $j)
echo "Downloading image $img**";echo
# downloading the image
wget $img
done
# go back path
cd ..
done

Related

Extract a line from a text file using grep?

I have a textfile called log.txt, and it logs the file name and the path it was gotten from. so something like this
2.txt
/home/test/etc/2.txt
basically the file name and its previous location. I want to use grep to grab the file directory save it as a variable and move the file back to its original location.
for var in "$#"
do
if grep "$var" log.txt
then
# code if found
else
# code if not found
fi
this just prints out to the console the 2.txt and its directory since the directory has 2.txt in it.
thanks.
Maybe flip the logic to make it more efficient?
f=''
while read prev
do case "$prev" in
*/*) f="${prev##*/}"; continue;; # remember the name
*) [[ -e "$f" ]] && mv "$f" "$prev";;
done < log.txt
That walks through all the files in the log and if they exist locally, move them back. Should be functionally the same without a grep per file.
If the name is always the same then why save it in the log at all?
If it is, then
while read prev
do f="${prev##*/}" # strip the path info
[[ -e "$f" ]] && mv "$f" "$prev"
done < <( grep / log.txt )
Having the file names on the same line would significantly simplify your script. But maybe try something like
# Convert from command-line arguments to lines
printf '%s\n' "$#" |
# Pair up with entries in file
awk 'NR==FNR { f[$0]; next }
FNR%2 { if ($0 in f) p=$0; else p=""; next }
p { print "mv \"" p "\" \"" $0 "\"" }' - log.txt |
sh
Test it by replacing sh with cat and see what you get. If it looks correct, switch back.
Briefly, something similar could perhaps be pulled off with printf '%s\n' "$#" | grep -A 1 -Fxf - log.txt but you end up having to parse the output to pair up the output lines anyway.
Another solution:
for f in `grep -v "/" log.txt`; do
grep "/$f" log.txt | xargs -I{} cp $f {}
done
grep -q (for "quiet") stops the output

Bash Script to Change PDF Titles

I need to change the title on many pdf files. Pdftk works great and I tried to create a bash script (pdftitle) to make it a single pass:
#!/bin/bash
newtitle=$2
pdftk "$1" data_dump output "$1".data.txt;
sed 's/^InfoKey:\sTitle\nInfoValue:\s.*/InfoKey:\sTitle\nInfoValue:'"$newtitle/" "$1".data.txt > "$1".data.fixed.txt;
pdftk "$1" update_info *.data.fixed.txt output "$1".fixed;
mv "$1".fixed "$1";
rm -f ./*.txt
exit;
So on the cli I would enter
$> pdftitle mypdf.pdf "New Title"
The data.txt that pdftk creates has multiple lines, but only two relevant lines are the targets:
...
InfoBegin
InfoKey: Author
InfoValue: Not Me
InfoBegin
InfoKey: Title
InfoValue: Microsoft Word - Old Title.doc
InfoBegin
InfoKey: Creator
InfoValue: PScript5.dll Version 5.2
...
Of which the subsequent line needs to be replaced:
...
InfoKey: Title
InfoValue: Relevant New Title
...
No error messages are produced but the title remains intact. So it seems that sed is having problems here, but I cannot figure out where or how.
Any help will be greatly appreciated.
Here's a refactoring using Awk which assumes pdftk can write to and read from stdin/stdout using - as the pseudo-filename argument.
#!/bin/bash
filename=$1
shift
pdftk "$filename" data_dump output - |
awk -v title="$*" '/^InfoKey: Title/ { t=1 }
t && /^InfoValue:/ { $0 = "InfoValue: " title; t=0 }1' |
pdftk "$filename" update_info - output "$filename".fixed &&
mv "$filaname".fixed "$filename"
The pattern to set a flag variable when you see a pattern and then acting on a subsequent line if that variable is set is a simple and very common Awk idiom.
There is no need for trailing semicolons or an explicit exit at the end.
#tripleee provided the solution to make the bash script work perfectly:
#!/bin/bash
filename=$1
shift
pdftk "$filename" data_dump output |
awk -v title="$#" '/^InfoKey: Title/ { t=1 }
t && /^InfoValue:/ { $0 = "InfoValue: " title; t=0 }1' > data.txt
pdftk "$filename" update_info data.txt output "$filename".fixed &&
mv "$filename".fixed "$filename"
rm ./data.txt

Use wget to download images from a list in CSV

I have a CSV which has three columns: object-ID, image-url1, image-url2. I'd like to be able to run a bash script that does the following for each row in the CSV:
create a new folder using 'object-ID' as the folder name
download both images into that folder
repeat for each row
I've got this code but it needs some help!
IFS=$'\n';
for file in `cat <filename.csv>`; do
echo "Creating folder $object-ID";
mkdir $object-ID
echo "Downloading image 1";
wget $image-url1
echo "Downloading image 2";
wget $image-url2
done
Try this:
while IFS=, read objid url1 url2;
do
echo "Creating folder $objid"
mkdir -p "$objid"
# Run in a subshell
(
cd "$objid"
echo "Downloading image 1"
wget "$url1"
echo "Downloading image 2"
wget "$url2";
)
done < myfile.csv
It assumes your CSV uses comma (,) as a separator. This can be adjusted by changing the IFS=, part in the while loop.
Also, if $objid contains forward slashes (/) in it, mkdir -p will treat it as a path with subdirectories and create all of them. If that's undesirable you can replace / in $objid prior to mkdir like so:
objid="${objid//\//_}"
With read :
while IFS=',' read id image_one image_two; do
[ ! -d "${id}" ] && mkdir "${id}"
for img in ${image_one} ${image_two}; do
printf "Downloading %s" "${img}"
wget -P "${id}" "${img}"
echo "---"
done
done < file.csv
For each line : creates directory based on id value if directory doesn't exist and retrieves images in created dir (with -P option of the wget).
With awk:
awk -F "," '{
print "mkdir",$1"; echo wget -P",$1,$2"; echo wget -P",$1,$3
}' filename.csv | bash

How do I use Bash to create a copy of a file with an extra suffix before the extension?

This title is a little confusing, so let me break it down. Basically I have a full directory of files with various names and extensions:
MainDirectory/
image_1.png
foobar.jpeg
myFile.txt
For an iPad app, I need to create copies of these with the suffix #2X appended to the end of all of these file names, before the extension - so I would end up with this:
MainDirectory/
image_1.png
image_1#2X.png
foobar.jpeg
foobar#2X.jpeg
myFile.txt
myFile#2X.txt
Instead of changing the file names one at a time by hand, I want to create a script to take care of it for me. I currently have the following, but it does not work as expected:
#!/bin/bash
FILE_DIR=.
#if there is an argument, use that as the files directory. Otherwise, use .
if [ $# -eq 1 ]
then
$FILE_DIR=$1
fi
for f in $FILE_DIR/*
do
echo "Processing $f"
filename=$(basename "$fullfile")
extension="${filename##*.}"
filename="${filename%.*}"
newFileName=$(echo -n $filename; echo -n -#2X; echo -n $extension)
echo Creating $newFileName
cp $f newFileName
done
exit 0
I also want to keep this to pure bash, and not rely on os-specific calls. What am I doing wrong? What can I change or what code will work, in order to do what I need?
#!/bin/sh -e
cd "${1-.}"
for f in *; do
cp "$f" "${f%.*}#2x.${f##*.}"
done
It's very easy to do that with awk in one line like this:
ls -1 | awk -F "." ' { print "cp " $0 " " $1 "#2X." $2 }' | sh
with ls -1 you get just the bare list of files, then you pipe awk to use the dot (.) as separator. Then you build a shell command to create a copy of each file.
I suggest to run the command without the last sh pipe before, in order to check the cp commands are correct. Like this:
ls -1 | awk -F "." ' { print "cp " $0 " " $1 "#2X." $2 }'

Modify config file using bash script

I'm writing a bash script to modify a config file which contains a bunch of key/value pairs. How can I read the key and find the value and possibly modify it?
A wild stab in the dark for modifying a single value:
sed -c -i "s/\($TARGET_KEY *= *\).*/\1$REPLACEMENT_VALUE/" $CONFIG_FILE
assuming that the target key and replacement value don't contain any special regex characters, and that your key-value separator is "=". Note, the -c option is system dependent and you may need to omit it for sed to execute.
For other tips on how to do similar replacements (e.g., when the REPLACEMENT_VALUE has '/' characters in it), there are some great examples here.
Hope this helps someone. I created a self contained script, which required config processing of sorts.
#!/bin/bash
CONFIG="/tmp/test.cfg"
# Use this to set the new config value, needs 2 parameters.
# You could check that $1 and $2 is set, but I am lazy
function set_config(){
sudo sed -i "s/^\($1\s*=\s*\).*\$/\1$2/" $CONFIG
}
# INITIALIZE CONFIG IF IT'S MISSING
if [ ! -e "${CONFIG}" ] ; then
# Set default variable value
sudo touch $CONFIG
echo "myname=\"Test\"" | sudo tee --append $CONFIG
fi
# LOAD THE CONFIG FILE
source $CONFIG
echo "${myname}" # SHOULD OUTPUT DEFAULT (test) ON FIRST RUN
myname="Erl"
echo "${myname}" # SHOULD OUTPUT Erl
set_config myname $myname # SETS THE NEW VALUE
Assuming that you have a file of key=value pairs, potentially with spaces around the =, you can delete, modify in-place or append key-value pairs at will using awk even if the keys or values contain special regex sequences:
# Using awk to delete, modify or append keys
# In case of an error the original configuration file is left intact
# Also leaves a timestamped backup copy (omit the cp -p if none is required)
CONFIG_FILE=file.conf
cp -p "$CONFIG_FILE" "$CONFIG_FILE.orig.`date \"+%Y%m%d_%H%M%S\"`" &&
awk -F '[ \t]*=[ \t]*' '$1=="keytodelete" { next } $1=="keytomodify" { print "keytomodify=newvalue" ; next } { print } END { print "keytoappend=value" }' "$CONFIG_FILE" >"$CONFIG_FILE~" &&
mv "$CONFIG_FILE~" "$CONFIG_FILE" ||
echo "an error has occurred (permissions? disk space?)"
sed "/^$old/s/\(.[^=]*\)\([ \t]*=[ \t]*\)\(.[^=]*\)/\1\2$replace/" configfile
So I can not take any credit for this as it is a combination of stackoverflow answers and help from irc.freenode.net #bash channel but here are bash functions now to both set and read config file values:
# https://stackoverflow.com/a/2464883
# Usage: config_set filename key value
function config_set() {
local file=$1
local key=$2
local val=${#:3}
ensureConfigFileExists "${file}"
# create key if not exists
if ! grep -q "^${key}=" ${file}; then
# insert a newline just in case the file does not end with one
printf "\n${key}=" >> ${file}
fi
chc "$file" "$key" "$val"
}
function ensureConfigFileExists() {
if [ ! -e "$1" ] ; then
if [ -e "$1.example" ]; then
cp "$1.example" "$1";
else
touch "$1"
fi
fi
}
# thanks to ixz in #bash on irc.freenode.net
function chc() { gawk -v OFS== -v FS== -e 'BEGIN { ARGC = 1 } $1 == ARGV[2] { print ARGV[4] ? ARGV[4] : $1, ARGV[3]; next } 1' "$#" <"$1" >"$1.1"; mv "$1"{.1,}; }
# https://unix.stackexchange.com/a/331965/312709
# Usage: local myvar="$(config_get myvar)"
function config_get() {
val="$(config_read_file ${CONFIG_FILE} "${1}")";
if [ "${val}" = "__UNDEFINED__" ]; then
val="$(config_read_file ${CONFIG_FILE}.example "${1}")";
fi
printf -- "%s" "${val}";
}
function config_read_file() {
(grep -E "^${2}=" -m 1 "${1}" 2>/dev/null || echo "VAR=__UNDEFINED__") | head -n 1 | cut -d '=' -f 2-;
}
at first I was using the accepted answer's sed solution: https://stackoverflow.com/a/2464883/2683059
however if the value has a / char it breaks
in general it's easy to extract the info with grep and cut:
cat "$FILE" | grep "^${KEY}${DELIMITER}" | cut -f2- -d"$DELIMITER"
to update you could do something like this:
mv "$FILE" "$FILE.bak"
cat "$FILE.bak" | grep -v "^${KEY}${DELIMITER}" > "$FILE"
echo "${KEY}${DELIMITER}${NEWVALUE}" >> "$FILE"
this would not maintain the order of the key-value pairs obviously. add error checking to make sure you don't lose your data.
I have done this:
new_port=$1
sed "s/^port=.*/port=$new_port/" "$CONFIG_FILE" > /yourPath/temp.x
mv /yourPath/temp.x "$CONFIG_FILE"
This will change port= to port=8888 in your config file if you choose 8888 as $1 for example.
Suppose your config file is in below format:
CONFIG_NUM=4
CONFIG_NUM2=5
CONFIG_DEBUG=n
In your bash script, you can use:
CONFIG_FILE=your_config_file
. $CONFIG_FILE
if [ $CONFIG_DEBUG == "y" ]; then
......
else
......
fi
$CONFIG_NUM, $CONFIG_NUM2, $CONFIG_DEBUG is what you need.
After your read the values, write it back will be easy:
echo "CONFIG_DEBUG=y" >> $CONFIG_FILE

Resources