Bash code for image scraper not working

Bash code for image scraper not working - bash

I have the following code:
#!/bin/bash
#Desc: Images downloader
#Filename: img_downloader.sh
if [ $# -ne 3 ];
then
echo "Usage: $0 URL -d DIRECTORY"
exit -1
fi
for i in {1..4}
do
case $1 in
-d) shift; directory=$1; shift ;;
*) url=${url:-$1}; shift;;
esac
done
mkdir -p $directory;
baseurl=$(echo $url | egrep -o "https?://[a-z.]+")
echo Downloading $url
curl -s $url | egrep -o "<img src=[^>]*>" |
sed 's/<img src=\"\([^"]*\).*/\1/g' > /tmp/$$.list
sed -i "s|^/|$baseurl/|" /tmp/$$.list
cd $directory;
while read filename;
do
echo Downloading $filename
curl -s -O "$filename" --silent
done < /tmp/$$.list
And it’s run as is given as:
gavish#gavish-HP-Mini:~/Desktop$ ./img_downloader.sh http://pngimg.com/upload/tree_PNG3498.png -d ff
Then the next thing that happens is:
Downloading http://upload.wikimedia.org/wikipedia/commons/a/a9/Being_a_twin_means_you_always_have_a_pillow_or_blanket_handy.jpg
But the problem is the folder on the desktop remains empty even after the download is complete and I have no idea where the file is downloaded.

Related

How can I use wget to download specific files in a CSV file, and then store those files into specific directories?

I have been attempting to extract a CSV file full of URL's of images (about 1000).
Each row is a specific product with the first cell labelled "id".
I have taken the ID of each line in excel and created directories for them using a loop with mkdir.
My issue now is that I can't seem to figure out how to download the image, and then immediately store it into these folder's.
What I am attempting here is to use wget by concatenating "fold_name" and "EXT" to get it like a directory "/name_of_folder", and then getting the links to the images (in cell 5,6,7 and 8) and then using wget from these cells, into the directory.
Can anyone assist me with this?
I think this should be straight forward enough.
Thank you!
#!/usr/bin/bash
EXT='/'
while read line
do
fold_name= cut -d$',' -f1
concat= "%EXT" + "%fold_name"
img1= cut -d$',' -f5
img2= cut -d$',' -f6
img3= cut -d$',' -f7
img4= cut -d$',' -f8
wget -O "%img1" "%concat"
wget -O "%img2" "%concat"
wget -O "%img1" "%concat"
wget -O "%img2" "%concat"
done < file.csv

You might use -P switch to designate target directory, consider following simple example using some files from test-images/png repository
mkdir -p black
mkdir -p gray
mkdir -p white
wget -P black https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png
wget -P gray https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png
wget -P white https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png
will lead to following structure
black
cs-black-000.png
gray
cs-gray-7f7f7f.png
white
cs-white-fff.png

You should use variables names that are less ambiguous.
You need to provide the directory as part of the output filename.
"%" is not a bash variable designator. That is a formatting directive (for bash, awk, C, etc.).
The following will provide what you want.
#!/usr/bin/bash
DBG=1
INPUT="${1}"
INPUT="file.csv"
cat >"${INPUT}" <<"EnDoFiNpUt"
#topic_1,junk01,junk02,junk03,img_101.png,img_102.png,img_103.png,img_104.png
#topic_2,junk04,junk05,junk06,img_201.png,img_202.png,img_203.png,img_204.png
#
topic_1,junk01,junk02,junk03,https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png
EnDoFiNpUt
if [ ${DBG} -eq 1 ]
then
echo -e "\n Input file:"
cat "${INPUT}" | awk '{ printf("\t %s\n", $0 ) ; }'
echo -e "\n Hit return to continue ..." ; read k
fi
REPO_ROOT='/tmp'
grep -v '^#' "${INPUT}" |
while read line
do
topic_name=$(echo "${line}" | cut -f1 -d\, )
test ${DBG} -eq 1 && echo -e "\t topic_name= ${topic_name} ..."
folder="${REPO_ROOT}/${topic_name}"
test ${DBG} -eq 1 && echo -e "\t folder= ${folder} ..."
if [ ! -d "${folder}" ]
then
mkdir "${folder}"
else
rm -f "${folder}/"*
fi
if [ ! -d "${folder}" ]
then
echo -e "\n Unable to create directory '${folder}' for saving downloads.\n Bypassing 'wget' actions ..." >&2
else
test ${DBG} -eq 1 && ls -ld "${folder}" | awk '{ printf("\n\t %s\n", $0 ) ; }'
url1=$(echo "${line}" | cut -d\, -f5 )
url2=$(echo "${line}" | cut -d\, -f6 )
url3=$(echo "${line}" | cut -d\, -f7 )
url4=$(echo "${line}" | cut -d\, -f8 )
test ${DBG} -eq 1 && {
echo -e "\n URLs extracted:"
echo -e "\n\t ${url1}\n\t ${url2}\n\t ${url3}\n\t ${url4}"
}
#imageFile1=$( basename "${url1}" | sed 's+^img_+yourImagePrefix_+' )
#imageFile2=$( basename "${url2}" | sed 's+^img_+yourImagePrefix_+' )
#imageFile3=$( basename "${url3}" | sed 's+^img_+yourImagePrefix_+' )
#imageFile4=$( basename "${url4}" | sed 's+^img_+yourImagePrefix_+' )
imageFile1=$( basename "${url1}" | sed 's+^cs-+yourImagePrefix_+' )
imageFile2=$( basename "${url2}" | sed 's+^cs-+yourImagePrefix_+' )
imageFile3=$( basename "${url3}" | sed 's+^cs-+yourImagePrefix_+' )
test ${DBG} -eq 1 && {
echo -e "\n Image filenames assigned:"
#echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}\n\t ${imageFile4}"
echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}"
}
test ${DBG} -eq 1 && {
echo -e "\n WGET process log:"
}
### This form of wget does NOT work for me, although man page says it should.
#wget -P "${folder}" -O "${imageFile1}" "${url1}"
### This form of wget DOES work for me
wget -O "${folder}/${imageFile1}" "${url1}"
wget -O "${folder}/${imageFile2}" "${url2}"
wget -O "${folder}/${imageFile3}" "${url3}"
#wget -O "${folder}/${imageFile3}" "${url3}"
test ${DBG} -eq 1 && {
echo -e "\n Listing of downloaded files:"
ls -l /tmp/topic* 2>>/dev/null | awk '{ printf("\t %s\n", $0 ) ; }'
}
fi
done
The script is adapted for what I had to work with. :-)

Looping through each file in directory - bash

I'm trying to perform certain operation on each file in a directory but there is a problem with order it's going through. It should do one file at the time. The long line (unzipping, grepping, zipping) works fine on a single file without a script, so there is a problem with a loop. Any ideas?
Script should grep through through each zipped file and look for word1 or word2. If at least one of them exist then:
unzip file
grep word1 and word2 and save it to file_done
remove unzipped file
zip file_done to /donefiles/ with original name
remove file_done from original directory
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c 'word1\|word2' $file)
if [[ $counter -gt 0 ]]; then
echo $counter
for file in *.gz; do
filenoext=${file::-3}
filedone=${filenoext}_done
echo $file
echo $filenoext
echo $filedone
gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip -f -c $filedone > /donefiles/$file | rm -f $filedone
done
else
echo "nothing to do here"
fi
done

The code snipped you've provided has a few problems, e.g. unneeded nested for cycle and erroneous pipeline
(the whole line gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip...).
Note also your code will work correctly only if *.gz files don't have spaces (or special characters) in names.
Also zgrep -c 'word1\|word2' will also match strings like line_starts_withword1_orword2_.
Here is the working version of the script:
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c -E 'word1|word2' $file) # now counter is the number of word1/word2 occurences in $file
if [[ $counter -gt 0 ]]; then
name=$(basename $file .gz)
zcat $file | grep -E 'word1|word2' > ${name}_done
gzip -f -c ${name}_done > /donefiles/$file
rm -f ${name}_done
else
echo 'nothing to do here'
fi
done
What we can improve here is:
since we unzipping the file anyway to check for word1|word2 presence, we may do this to temp file and avoid double-unzipping
we don't need to count how many word1 or word2 is inside the file, we may just check for their presence
${name}_done can be a temp file cleaned up automatically
we can use while cycle to handle file names with spaces
#!/bin/bash
tmp=`mktemp /tmp/gzip_demo.XXXXXX` # create temp file for us
trap "rm -f \"$tmp\"" EXIT INT TERM QUIT HUP # clean $tmp upon exit or termination
find . -maxdepth 1 -mindepth 1 -type f -name '*.gz' | while read f; do
# quotes around $f are now required in case of spaces in it
s=$(basename "$f") # short name w/o dir
gunzip -f -c "$f" | grep -P '\b(word1|word2)\b' > "$tmp"
[ -s "$tmp" ] && gzip -f -c "$tmp" > "/donefiles/$s" # create archive if anything is found
done

It looks like you have an inner loop inside the outer one :
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c 'word1\|word2' $file)
if [[ $counter -gt 0 ]]; then
echo $counter
for file in *.gz; do #<<< HERE
filenoext=${file::-3}
filedone=${filenoext}_done
echo $file
echo $filenoext
echo $filedone
gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip -f -c $filedone > /donefiles/$file | rm -f $filedone
done
else
echo "nothing to do here"
fi
done
The inner loop goes through all the files in the directory if one of them contains file1 or file2. You probably want this :
#!/bin/bash
for file in *.gz; do
counter=$(zgrep -c 'word1\|word2' $file)
if [[ $counter -gt 0 ]]; then
echo $counter
filenoext=${file::-3}
filedone=${filenoext}_done
echo $file
echo $filenoext
echo $filedone
gunzip $file | grep 'word1\|word2' $filenoext > $filedone | rm -f $filenoext | gzip -f -c $filedone > /donefiles/$file | rm -f $filedone
else
echo "nothing to do here"
fi
done

Grep is not showing results even i used fgrep and -f options

I have used the below content to fetch some values .
But the grep in the code is not showing any results.
#!/bin/bash
file=test.txt
while IFS= read -r cmd;
do
check_address=`grep -c $cmd music.cpp`
if [ $check_address -ge 1 ]; then
echo
else
grep -i -n "$cmd" music.cpp
echo $cmd found
fi
done < "$file"
Note : there are no carriage return in my text file or .sh file.
i checked using
bash -x check.sh
It is just showing
+grep -i -n "$cmd" music.cpp

Unable to mute unzip output in script

I wrote a script that unzips certificates from zips and tests the certs against one of our servers:
#!/bin/bash
WORKINGDIR=$(pwd)
if [ ! -f ./users.zip ]; then
echo "users.zip not found. Exiting."
exit 1
else
unzip users.zip -d users
echo "users.zip extracted."
fi
cd ./users/client
echo "Extracting files..."
for file in `ls *.zip`; do
unzip -j $file -d `echo $file | cut -d . -f 1` &> /dev/null
done
echo "name,result" > $WORKINGDIR/results.csv
i=0 # Total counter
j=0 # Working counter
k=0 # Failed counter
for D in `ls -d */`; do
cd "$D"
SHORT=`find *.p12 | cut -f1 -d "."`
openssl pkcs12 -in `echo $SHORT".p12"` -passin file:./password -passout pass:testpass -out `echo $SHORT".pem"` &> /dev/null
echo "Trying: "$SHORT
((i++))
curl --cert ./`echo $SHORT".pem"`:testpass https://example.com -k &> /dev/null
OUT=$?
if [ $OUT -eq 0 ];then
((j++)) ; echo -e $(tput setaf 2)"\t"$SHORT": OK $(tput sgr0)" ; echo $SHORT",OK" >> $WORKINGDIR/results.csv
else
((k++)) ; echo -e $(tput setaf 1)"\t"$SHORT": FAILED $(tput sgr0)" ; echo $SHORT",FAILED" >> $WORKINGDIR/results.csv
fi
rm `echo $SHORT".pem"`
cd ..
done
echo "Test complete:"
echo "Tested: "$i
echo "Working: "$j
echo "Failed: "$k
echo "Results saved to "$WORKINGDIR"/results.csv"
exit 0
When it gets to the unzipping part I always get this output:
Archive: users.zip
creating: users/keys/
inflating: users/keys/user1.zip
inflating: users/keys/user2.zip
inflating: users/keys/user3.zip
inflating: users/keys/user4.zip
inflating: users/keys/user5.zip
inflating: users/keys/user6.zip
inflating: users/keys/user7.zip
inflating: users/keys/user8.zip
inflating: users/keys/user9.zip
inflating: users/keys/user10.zip
inflating: users/keys/user11.zip
I've tried to pipe the output to /dev/null in different ways:
&> /dev/null
1>&- 2>&-
2>&1
etc.
Nothing works. What's weird is that if I put just the unzipping part of the script into a separate script file:
#!/bin/bash
for file in `ls *.zip`; do
unzip -j $file -d `echo $file | cut -d . -f 1` &> /dev/null
done
It works no problem. Any thought on why this is happening?

The /dev/null behavior is really odd. It’s probably better to just use unzip’s -q (quiet) option.

I figured it out and feel like quite the simpleton.
I was getting the output from the first time I call unzip:
unzip users.zip -d users
and not from the loop:
for file in `ls *.zip`; do
unzip -j $file -d `echo $file | cut -d . -f 1` &> /dev/null
done
I added -qq to the first unzip:
unzip -qq users.zip -d users
and it works as expected.

In the script you posted there is no redirection for users.zip specifically.
unzip users.zip -d users

Converting FLAC file collection to ALAC in another directory with shell script

I have searched many forums and websites to create an ALAC collection from my FLAC collection with the same directory structure with no success. Therefore I coded my own shell script and decided to share here so others can use or improve on it.
Problems I wanted to solve:
Full automation of conversion. I did not want to go and run scripts
in each and every directory.
Recursive file search
Moving all the structure from one location to another by converting flac to alac and copying the artwork. nothing else.
I did not want flac and alac files in the same directory.(which the below
script I believe can do that)
Here is how the script turned out. It works for me, I hope it does for you as well. I am using Linux Mint and bash shell.
2014-12-08 - Made some changes and now it is working fine. Before it was creating multiple copies.
Usage is: ./FLACtoALAC.sh /sourcedirectory /targetdirectory
Here are some explanations:
Source: /a/b/c/d/e/ <- e has flac
/g/f/k <- k has artwork
/l <- l has mp3
Target: /z/u/v/g/f
when the command is run : ./FLACtoALAC.sh /a/b/ /z/u/
I want the structure look like:
/z/u/v/g/f <- f was already there
/c/d/e/ <- e had flac, so created with the tree following source (/a/b)
/c/g/f/k <- k had artwork, so created with the tree following source (/a/b)
not created l <- l did not have any of the png,jpg or flac files.
I do not want to create any directory that does not contain png, jpg or flac,
unless it is a parent to one of such those directories.
Now the updated code:
#!/bin/bash
if [[ $1 ]]
then
if [[ ${1:0:1} = / || ${1:0:1} = ~ ]]
then Source_Dir=$1
elif [[ ${1:0:1} = . ]]
then Source_Dir=`pwd`
else Source_Dir=`pwd`'/'$1
fi
else Source_Dir=`pwd`'/'
fi
if [[ $2 ]]
then
if [[ ${2:0:1} = / || ${2:0:1} = ~ ]]
then Target_Dir=$2
elif [[ ${2:0:1} = . ]]
then Target_Dir=`pwd`
else Target_Dir=`pwd`'/'$2
fi
else Target_Dir=`pwd`'/'
fi
echo "Source Directory : "$Source_Dir
echo "Target Directory : "$Target_Dir
typeset -i Source_Dir_Depth
Source_Dir_Depth=`echo $Source_Dir | grep -oi "\/" | wc -l`
typeset -i Target_Dir_Depth
Target_Dir_Depth=`echo $Target_Dir | grep -oi "\/" | wc -l`
echo "Depth of the Source Directory: "$Source_Dir_Depth
echo "Depth of the Target Directory: "$Target_Dir_Depth
echo "Let's check if the Target Directory exists, if not we will create"
typeset -i Number_of_depth_checks
Number_of_depth_checks=$Target_Dir_Depth+1
for depth in `seq 2 $Number_of_depth_checks`
do
Target_Directory_Tree=`echo ${Target_Dir} | cut -d'/' -f-${depth}`
if [[ -d "$Target_Directory_Tree" ]]
then
echo "This directory exists ("$Target_Directory_Tree"), moving on"
else
Create_Directory=`echo ${Target_Dir} | cut -d'/' -f-${depth}`
echo "Creating the directory/subdirectory $Create_Directory"
mkdir -pv "$Create_Directory"
fi
done
Directory_List=`find "${Source_Dir}" -type d -exec sh -c 'ls -tr -1 "{}" | sort | egrep -iq "*.(jpg|png|flac)$"' ';' -print`
oIFS=$IFS
IFS=$'\n'
for directories in $Directory_List
do
echo "Directories coming from the source : $directories"
typeset -i directories_depth
directories_depth=`echo $directories | grep -oi "\/" | wc -l`
echo "Number of sub-directories to be checked: $Source_Dir_Depth"
typeset -i number_of_directories_depth
number_of_directories_depth=$directories_depth+1
for depth in `seq 2 $number_of_directories_depth`
do
Source_Tree=`echo ${Source_Dir} | cut -d'/' -f-${depth}`
Subdirectory_Tree=`echo ${directories} | cut -d'/' -f-${depth}`
Subdirectory_Remaining_Tree=`echo ${directories} | cut -d'/' -f${depth}-`
echo "source tree : $Source_Tree"
echo "source tree : $Subdirectory_Tree"
if [[ $depth -le $Source_Dir_Depth && $Source_Tree = $Subdirectory_Tree ]]
then
echo "Common Directory, skipping ($Subdirectory_Tree)"
continue
else
export Targetecho=$(echo $Target_Dir | sed -e 's/\r//g')
export Destination_Directory=${Targetecho}${Subdirectory_Remaining_Tree}
echo "Destination directory is : $Destination_Directory"
export Sub_directories_depth=`echo $Destination_Directory | grep -oi "\/" | wc -l`
echo "Total destination depth : $Sub_directories_depth"
echo "Now we are checking target directory structure"
fi
break
done
echo "Gettin into the new loop to verify/create target structure"
typeset -i number_of_Sub_directories_depth
number_of_Sub_directories_depth=$Sub_directories_depth+1
for subdepth in `seq 2 $number_of_Sub_directories_depth`
do
Target_Subdirectory_Tree=`echo ${Destination_Directory} | cut -d'/' -f-${subdepth}`
if [[ $subdepth < $number_of_Sub_directories_depth && -d "$Target_Subdirectory_Tree" ]]
then
echo "Directory already exists in the destination ($Target_Subdirectory_Tree)"
elif [[ $subdepth < $number_of_Sub_directories_depth && ! -d "$Target_Subdirectory_Tree" ]]
then
echo "Creating the path in the destination ($Target_Subdirectory_Tree)"
mkdir -pv "$Target_Subdirectory_Tree"
elif [[ $subdepth -eq $number_of_Sub_directories_depth ]]
then
if [[ ! -d "$Destination_Directory" ]]
then
echo "Creating Directory: $Destination_Directory"
mkdir -pv "$Destination_Directory"
fi
echo "Directory already exists in the destination ($Destination_Directory)"
#Flac file processing starts here once the directory is found
Flac_File_List=`(shopt -s nocaseglob ; ls -tr "${directories}"/*.flac | sort)`
echo "List of files in $directories :"
echo $Flac_File_List
for flac_files in $Flac_File_List
do
echo "files : $flac_files"
typeset -i flac_file_depth
flac_file_depth=`echo $flac_files | grep -oi "\/" | wc -l`
flac_file_depth=$flac_file_depth+1
echo "flac_file_depth : $flac_file_depth"
Flac_File_Name=`echo ${flac_files} | cut -d'/' -f${flac_file_depth}`
echo "Flac_File Name : $Flac_File_Name"
Destination_File=${Destination_Directory}'/'${Flac_File_Name}
echo "will convert $Flac_File_Name from $flac_files to $Destination_File"
yes | ffmpeg -i "$flac_files" -vf "crop=((in_w/2)*2):((in_h/2)*2)" -c:a alac "${Destination_File%.flac}.m4a"
done
#Artwork file processing starts here once the directory is found
Art_File_List=`(shopt -s nocaseglob ; ls -tr "${directories}"/*.{png,jpg} | sort)`
echo "List of files in $directories :"
echo $Art_File_List
for art_files in $Art_File_List
do
echo "files : $art_files"
typeset -i art_file_depth
art_file_depth=`echo $art_files | grep -oi "\/" | wc -l`
art_file_depth=$art_file_depth+1
echo "file_depth : $art_file_depth"
Art_File_Name=`echo ${art_files} | cut -d'/' -f${art_file_depth}`
echo "File Name : $Art_File_Name"
Destination_File=${Destination_Directory}'/'${Art_File_Name}
echo "will copy $Art_File_Name from $art_files to $Destination_File"
cp "$art_files" "$Destination_File"
done
else
echo "did nothing!!!"
fi
done
done
IFS=$oIFS
feel free to change, improve, distribute.
Caglar

Try this out:
#!/bin/bash
src_dir="in"
dst_dir="out"
find ${src_dir} -type f -print0|while IFS= read -r -d '' src_file; do
dst_file=${src_file/$src_dir/$dst_dir}
echo "src_file=${src_file} dst_file=${dst_file}"
mkdir -pv `dirname $dst_file`
# use above variables and run convert command with it here
done
To test how it works:
mkdir in out
cd in
mkdir 1 2 3
find . -type d -exec touch {}/foo {}/bar {}/baz \;
cd ..
./run_my_script.sh
Now you only need to attach your convert function/script/command/whatever and improve it to read src_dir and dst_dir from the command line (I would recommend man bash - > getopts)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Bash code for image scraper not working - bash

Related

How can I use wget to download specific files in a CSV file, and then store those files into specific directories?

Looping through each file in directory - bash

Grep is not showing results even i used fgrep and -f options

Unable to mute unzip output in script

Converting FLAC file collection to ALAC in another directory with shell script

Categories

Resources