Bash script to download PDF using a CSV with name and url and auto-increment name - bash

I'm trying to create a bash script that reads a CSV with two columns:
first column = name
second column = URL
and try to download a PDF file from the URL on the second column with a random name with letters and numbers .pdf and change the name using the first column.
The PDF name could be duplicate so if is duplicate I want to add numbers like:
Example %20 $5000.pdf
Example %20 $5000.1.pdf
Example %20 $5000.2.pdf
Because if I try to download wget and curl will not auto-increment with the output option.
I tried a lot of things but my limitations are taking too much time.
I created a counter that add the line number to the end, but if I got a larger PDF there will be unnecessary auto-increment numbers. (code below)
There should be a better method, but my lack of knowledge is taking too much time. So any help with that will be really appreciated, I'm a beginner on bash scripts.
Thanks for any help in advance!
CSV example:
Example %20 $5000,HTTP://example.com/djdiede.pdf
Example %20 $5000,HTTP://example.com/djdi42322ede.pdf
Example %30 $1000,HTTP://example.com/djd4234iede.pdf
Example %50 $1000,HTTP://example.com/dj43566diede.pdf
Code so far:
#!/bin/bash -e
COUNTER=1
while IFS=, read -r field1 field2
do
COUNTER=$[$COUNTER +1]
if [ "$field1" == "" ]
then
echo "Line $COUNTER field1 is empty or no value set"
elif [ "$field2" == "" ]
then
echo "Line $COUNTER field2 is empty or no value set"
else
pdf_file=$(echo $field1 | tr '/' ' ')
echo "================================================"
echo "Downloading $COUNTER $pdf_file..."
echo "================================================"
pdf_file_test="$pdf_file.pdf"
if [ -e "$pdf_file_test" ]; then
echo -e "\033[32m ^^^ File already exists!!! Adding line number at the end of the file: $pdf_file.$COUNTER.pdf \033[0m" >&2
wget -q -nc -O "$pdf_file."$COUNTER.pdf $field2
else
wget -q -nc -O "$pdf_file".pdf $field2
fi
fi
done < test.csv

This should help. I tried to stay close to your own coding style:
#!/bin/bash -e
LINECOUNTER=0
while IFS=, read -r field1 field2
do
LINECOUNTER=$[$LINECOUNTER +1]
if [ "$field1" == "" ]
then
echo "Line $LINECOUNTER: field1 is empty or no value set"
elif [ "$field2" == "" ]
then
echo "Line $LINECOUNTER: field2 is empty or no value set"
else
pdf_file=$(echo "$field1" | tr '/' ' ')
echo "================================================"
echo "Downloading $LINECOUNTER: $pdf_file..."
echo "================================================"
pdf_file_saveas="$pdf_file.pdf"
FILECOUNTER=0
while [ -e "$pdf_file_saveas" ]
do
FILECOUNTER=$[$FILECOUNTER +1]
pdf_file_saveas="$pdf_file.$FILECOUNTER.pdf"
done
if [ $FILECOUNTER -gt 0 ]
then
echo -e "\033[32m ^^^ File already exists!!! Adding number at the end of the file: $pdf_file_saveas \033[0m" >&2
fi
wget -q -nc -O "$pdf_file_saveas" "$field2"
fi
done < test.csv
Here's what I did:
use two counters: one for lines, one for files
when a file already exists, use file counter + loop to find the next 'empty slot' (i.e. file named <filename>.<counter-value>.pdf that does not exist)
fixed wrong line numbers (line counter needs to start at 0 instead of 1)
added double quotes where necessary/advisable
If you want to improve your script further, here are some suggestions:
instead of the big if ... elif ... else contruct, you can use if + continue, e.g. if [ "$field1" == "" ]; then continue; fi or even [ "$field1" == "" ] && continue
instead of terminating on error (#!/bin/bash -e), you could add error detection and handling after the wget call, e.g. if [ $? -ne 0 ]; then echo "failed to download ..."; fi

Related

Is there a way to create an associative array from a text file in bash? [duplicate]

This question already has an answer here:
bash4 read file into associative array
(1 answer)
Closed 3 years ago.
I'm currently creating a list of commands so for example by saying "directory install plugin-name" I can install all needed plugins specified in an external list. This list is just a txt file with all plugin names. But I'm struggling getting all names in an associative array.
I've tried this one:
while IFS=";" read line;
do " communtyList[ $line ]=1 " ;
done < community-list.txt;
The desired output should be
communityList[test1]=1
communityList[test2]=1....
It need to be an associative array because I want to access it by words and not by index. This word will be implemented as parameters/arguments.
For example "install plugin" instead of "1 plugin"
So I can ask for example this way:
if [ ! -z "${!communtyList[$2]}" ];
Update, here the whole code:
#!/usr/bin/env bash
community(){
declare -A communtyList
while IFS= read line;
do communtyList[$line]=1 ;
done < community-list.txt;
# communtyList[test1]=1
# communtyList[test2]=1
# communtyList[test3]=1
# communtyList[test4]=1
if { [ $1 = 'install' ] || [ $1 = 'activate' ] || [ $1 = 'uninstall' ] || [ $1 = 'deactivate' ] ; } && [ ! -z $2 ] ; then
if [ $2 = 'all' ];
then echo "$1 all community plugins....";
while IFS= read -r line; do echo "$1 $line "; done < community-list.txt;
elif [ ! -z "${!communtyList[$2]}" ];
then echo "$1 community plugin '$2'....";
else
echo -e "\033[0;31m Something went wrong";
echo " Plugin '$2' does not exist.";
echo " Here a list of all available community plugins: ";
echo ${!communtyList[#]}
echo -e " \e[m"
fi
else
echo -e "\033[0;31m Something went wrong";
if [ -z $2 ];
then echo -e "[Plugin name] required. [community][action][plugin name] \e[m"
else
echo " Action '$1' does not exist.";
echo -e " Do you mean some of this? \n install \n activate \n uninstall \e[m"
fi
fi
echo ${!communtyList[#]}
}
"$#"
To use asociative array you have to declare it first
declare -A communityList
Then you can add values
communityList[test1]=1
communityList[test2]=2
...
Or with the declaration
declare -A communityList=(
communityList[test1]=1
communityList[test2]=2
...
)
The quotes around " communtyList[ $line ]=1 " mean you try to evaluate a command whose first character is a space. You want to take out those quotes, and probably put quotes around "$line" instead.
It's also unclear why you have IFS=";" -- you are not splitting the line into fields anyway, so this is not doing anything useful. Are there semicolons in your input file? Where and why; what do they mean?
You should probably prefer read -r unless you specifically require read to do odd things with backslashes in the input.
Finally, as suggested by Ivan, you have to declare the array's type as associative before you try to use it.
With those things out of the way, try
declare -A communityList
while read -r line; do
communtyList["$line"]=1
done < community-list.txt

read textoutput and skip current loop

I have a script with a loop over some directories and in each of them it executes a program.
folders=( "1" "2" )
for i in "${folders[#]}"
do
cd $i
output=$(program)
while read -r line; do
match "$line"
done <<< "$output"
some code here
cd ..
done
Now i want the script to stop the running program if $line matches with a given string and then start working on the next element of ${folders[#]}. Basically Strg+c from inside the script.
Edit: I cannot access the program and make it stop itself should the string appear.
Thanks
Now i want the script to stop the running program if $line matches
with a given string
if [ "$line" = "Put some similar text in here" ]
then
exit 0
fi
This will stop the programm, like you wanted.
then start working on the next element of ${folders[#]}
This is something different.
You can try to switch the code like this ...
folders=( "1" "2" )
for i in "${folders[#]}"
do
cd $i
output=$(program)
while read -r line; do
if [ "$line" = "Put some similar text in here" ]
then
break
fi
done <<< "$output"
# some commands ...
done
The if condition checks for similar text in a string and the break command will close the while loop.
Addition
The same code without using $output as temporary storage...
folders=( "1" "2" )
for i in "${folders[#]}"
do
cd $i
while read -r line; do
if [ "$line" = "Put some similar text in here" ]
then
break
fi
done <<< "$(program)"
# some commands ...
done
This way you will exit the extern programm in the loop.

How to remove contact from shell script?

I am creating a simple phonebook using unix shell scripts. I have gotten all of my functions to work except the removal of a contact after it has been created. I have tried combining grep and sed in order to accomplish this, but cannot seem to get over the hump. The removal shell i've tried is as follows.
#!/bin/sh
#removeContact.sh
echo “Remove Submenu”
echo “Please input First Name:”
read nameFirst
echo “Please input Last Name:”
read nameLast
x=$(grep -e “$nameFirst” -e “$nameLast” ContactList)
echo $x
sed '/'$x'/ d' ContactList;
echo “$nameFirst $nameLast is removed from your contacts”
exit 0
I'm not sure if I am declaring x incorrectly, or if my syntax is wrong when sed is used.
Any help would be greatly appreciated. Thank you.
#!/bin/bash
ContactList="contacts.txt"
export ContactList
exit=0
while [ $exit -ne 1 ]
do
echo "Main Menu"
echo "(a) Add a Contact"
echo "(r) Remove a Contact"
echo "(s) Search a Contact"
echo "(d) Display All Contact’s Information"
echo "(e) Exit"
echo "Your Choice?"
read choice
if [ "$choice" = "a" ]
then
./addContact.sh
elif [ "$choice" = "r" ]
then
./removeContact.sh
elif [ "$choice" = "s" ]
then
./searchContact.sh
elif [ "$choice" = "d" ]
then
./displayContact.sh
elif [ "$choice" = "e" ]
then
exit=1
else
echo "Error"
sleep 2
fi
done
exit 0
#!/bin/sh
#addContact.sh
ContactList="contacts.txt"
echo “Please input First Name:”
read nameFirst
echo “Please input Last Name:”
read nameLast
echo “Please input Phone Number:”
read number
echo “Please Input Address”
read address
echo “Please input Email:”
read email
echo $nameFirst:$nameLast:$number:$address:$email>> ContactList;
echo "A new contact is added to your book."
exit 0
sed '/'$x'/ d' ContactList
won't remove anything from the file ContactList, it will simply output the changes to standard output.
If you want to edit the file in-place, you'll need the -i flag (easy) or to make a temporary file which is then copied back over ContactList (not so easy, but needed if your sed has no in-place editing option).
In addition, since ContactList is a shell variable referencing the real file contacts.txt, you'll need to use $ContactList.
And, as a final note, since you're using the full line content to do deletion, the presence of an address like 1/15 Station St is going to royally screw up your sed command by virtue of the fact it contains the / character.
I would suggest using awk rather than sed for this task since it's much better suited to field-based data. With the record layout:
$nameFirst:$nameLast:$number:$address:$email
you could remove an entry with something like (including my patented paranoid perfect protection policy):
cp contacts.txt contacts.txt.$(date +%Y.%m.%d.%H.%M.%S_$$)
awk <contacts.txt >tmp.$$ -F: "-vF=$nameFirst" "-vL=$nameLast" '
F != $1 || L != $2 {print}'
mv tmp.$$ contacts.txt

Compare $1 with another string in bash

I've spent 2 hours with an if statement, that never works like I want:
#should return true
if [ "$1" == "355258054414904" ]; then
Here is the whole script:
#!/bin/bash
param=$1
INPUT=simu_900_imei_user_pass.csv
OLDIFS=$IFS
IFS=,
[ ! -f $INPUT ] && { echo "$INPUT ime not found"; exit 99; }
while read imei email pass
do
echo "First Parameter-IMEI: $1"
if [ "$1" == "355258054414904" ]; then
echo "GOOD"
fi
done < $INPUT
IFS=$OLDIFS
This is the output of the script:
First Parameter-IMEI: 355258054414904
First Parameter-IMEI: 355258054414904
First Parameter-IMEI: 355258054414904
I have seen a lot of pages about the subject, but I can't make it work :(
EDIT: I Join the content of csv for better understanding ! Tx for your help !
4790057be1803096,user1,pass1
355258054414904,juju,capp
4790057be1803096,user2,pass2
358854053154579,user3,pass3
The reason $1 does not match is because $1 means the first parameter given to the script on the command line, while you want it to match the first field read from the file. That value is in $imei.
You probably meant:
if [ "$imei" == "355258054414904" ]; then
echo "GOOD"
fi
Since it is inside the loop where you read input file line by line.
To check content of $1 use:
cat -vet <<< "$1"
UPDATE: To strip \r from $1 have this at top:
param=$(tr -d '\r' <<< "$1")
And then use "$param" in rest of your script.
To test string equality with [ you want to use a single '=' sign.

Shell script to validate logger date format in log file

I need to validate my log files:
-All new log lines shall start with date.
-This date will respect the ISO 8601 standard. Example:
2011-02-03 12:51:45,220Z -
Using shell script, I can validate it looping on each line and verifying the date pattern.
The code is below:
#!/bin/bash
processLine(){
# get all args
line="$#"
result=`echo $line | egrep "[0-9]{4}-[0-9]{2}-[0-9]{2} [012][0-9]:[0-9]{2}:[0-9]{2},[0-9]{3}Z" -a -c`
if [ "$result" == "0" ]; then
echo "The log is not with correct date format: "
echo $line
exit 1
fi
}
# Make sure we get file name as command line argument
if [ "$1" == "" ]; then
echo "You must enter a logfile"
exit 0
else
file="$1"
# make sure file exist and readable
if [ ! -f $file ]; then
echo "$file : does not exists"
exit 1
elif [ ! -r $file ]; then
echo "$file: can not read"
exit 2
fi
fi
# Set loop separator to end of line
BAKIFS=$IFS
IFS=$(echo -en "\n\b")
exec 3<&0
exec 0<"$file"
while read -r line
do
# use $line variable to process line in processLine() function
processLine $line
done
exec 0<&3
# restore $IFS which was used to determine what the field separators are
IFS=$BAKIFS
echo SUCCESS
But, there is a problem. Some logs contains stacktraces or something that uses more than one line, in other words, stacktrace is an example, it can be anything. Stacktrace example:
2011-02-03 12:51:45,220Z [ERROR] - File not found
java.io.FileNotFoundException: fred.txt
at java.io.FileInputStream.<init>(FileInputStream.java)
at java.io.FileInputStream.<init>(FileInputStream.java)
at ExTest.readMyFile(ExTest.java:19)
at ExTest.main(ExTest.java:7)
...
will not pass with my script, but is valid!
Then, if I run my script passing a log file with stacktraces for example, my script will failed, because it loops line by line.
I have the correct pattern and I need to validade the logger date format, but I don't have wrong date format pattern to skip lines.
I don't know how I can solve this problem. Does somebody can help me?
Thanks
You need to anchor your search for the date to the start of the line (otherwise the date could appear anywhere in the line - not just at the beginning).
The following snippet will loop over all lines that do not begin with a valid date. You still have to determine if the lines constitute errors or not.
DATEFMT='^[0-9]{4}-[0-9]{2}-[0-9]{2} [012][0-9]:[0-9]{2}:[0-9]{2},[0-9]{3}Z'
egrep -v ${DATEFMT} /path/to/log | while read LINE; do
echo ${LINE} # did not begin with date.
done
So just (silently) discard a single stack trace. In somewhat verbose bash:
STATE=idle
while read -r line; do
case $STATE in
idle)
if [[ $line =~ ^java\..*Exception ]]; then
STATE=readingexception
else
processLine "$line"
fi
;;
readingexception)
if ! [[ $line =~ ^' '*'at ' ]]; then
STATE=idle
processLine "$line"
fi
;;
*)
echo "Urk! internal error [$STATE]" >&2
exit 1
;;
esac
done <logfile
This relies on processLine not continuing on error, else you will need to track a tad more state to avoid two consecutive stack traces.
This makes 2 assumptions.
lines that begin with whitespace are continuations of previous lines. we're matching a leading space, or a leading tab.
lines that have non-whitespace characters starting at ^ are new log lines.
If a line matching #2 doesn't match the date format, we have an error, so print the error, and include the line number.
count=0
processLine() {
count=$(( count + 1 ))
line="$#"
result=$( echo $line | egrep '^[0-9]{4}-[0-9]{2}-[0-9]{2} [012][0-9]:[0-9]{2}:[0-9]{2},[0-9]{3}Z' -a -c )
if (( $result == 0 )); then
# if result = 0, then my line did not start with the proper date.
# if the line starts with whitespace, then it may be a continuation
# of a multi-line log entry (like a java stacktrace)
continues=$( echo $line | egrep "^ |^ " -a -c )
if (( $continues == 0 )); then
# if we got here, then the line did not start with a proper date,
# AND the line did not start with white space. This is a bad line.
echo "The line is not with correct date format: "
echo "$count: $line"
exit 1
fi
fi
}
Create a condition to check if the line starts with a date. If not, skip that line as it is part of a multi-line log.
processLine(){
# get all args
line="$#"
result=`echo $line | egrep "[0-9]{4}-[0-9]{2}-[0-9]{2} [012][0-9]:[0-9]{2}:[0-9]{2},[0-9]{3}Z" -a -c`
if [ "$result" == "0" ]; then
echo "Log entry is multi-lined - continuing."
fi
}

Resources