I have been attempting to extract a CSV file full of URL's of images (about 1000).
Each row is a specific product with the first cell labelled "id".
I have taken the ID of each line in excel and created directories for them using a loop with mkdir.
My issue now is that I can't seem to figure out how to download the image, and then immediately store it into these folder's.
What I am attempting here is to use wget by concatenating "fold_name" and "EXT" to get it like a directory "/name_of_folder", and then getting the links to the images (in cell 5,6,7 and 8) and then using wget from these cells, into the directory.
Can anyone assist me with this?
I think this should be straight forward enough.
Thank you!
#!/usr/bin/bash
EXT='/'
while read line
do
fold_name= cut -d$',' -f1
concat= "%EXT" + "%fold_name"
img1= cut -d$',' -f5
img2= cut -d$',' -f6
img3= cut -d$',' -f7
img4= cut -d$',' -f8
wget -O "%img1" "%concat"
wget -O "%img2" "%concat"
wget -O "%img1" "%concat"
wget -O "%img2" "%concat"
done < file.csv
You might use -P switch to designate target directory, consider following simple example using some files from test-images/png repository
mkdir -p black
mkdir -p gray
mkdir -p white
wget -P black https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png
wget -P gray https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png
wget -P white https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png
will lead to following structure
black
cs-black-000.png
gray
cs-gray-7f7f7f.png
white
cs-white-fff.png
You should use variables names that are less ambiguous.
You need to provide the directory as part of the output filename.
"%" is not a bash variable designator. That is a formatting directive (for bash, awk, C, etc.).
The following will provide what you want.
#!/usr/bin/bash
DBG=1
INPUT="${1}"
INPUT="file.csv"
cat >"${INPUT}" <<"EnDoFiNpUt"
#topic_1,junk01,junk02,junk03,img_101.png,img_102.png,img_103.png,img_104.png
#topic_2,junk04,junk05,junk06,img_201.png,img_202.png,img_203.png,img_204.png
#
topic_1,junk01,junk02,junk03,https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png
EnDoFiNpUt
if [ ${DBG} -eq 1 ]
then
echo -e "\n Input file:"
cat "${INPUT}" | awk '{ printf("\t %s\n", $0 ) ; }'
echo -e "\n Hit return to continue ..." ; read k
fi
REPO_ROOT='/tmp'
grep -v '^#' "${INPUT}" |
while read line
do
topic_name=$(echo "${line}" | cut -f1 -d\, )
test ${DBG} -eq 1 && echo -e "\t topic_name= ${topic_name} ..."
folder="${REPO_ROOT}/${topic_name}"
test ${DBG} -eq 1 && echo -e "\t folder= ${folder} ..."
if [ ! -d "${folder}" ]
then
mkdir "${folder}"
else
rm -f "${folder}/"*
fi
if [ ! -d "${folder}" ]
then
echo -e "\n Unable to create directory '${folder}' for saving downloads.\n Bypassing 'wget' actions ..." >&2
else
test ${DBG} -eq 1 && ls -ld "${folder}" | awk '{ printf("\n\t %s\n", $0 ) ; }'
url1=$(echo "${line}" | cut -d\, -f5 )
url2=$(echo "${line}" | cut -d\, -f6 )
url3=$(echo "${line}" | cut -d\, -f7 )
url4=$(echo "${line}" | cut -d\, -f8 )
test ${DBG} -eq 1 && {
echo -e "\n URLs extracted:"
echo -e "\n\t ${url1}\n\t ${url2}\n\t ${url3}\n\t ${url4}"
}
#imageFile1=$( basename "${url1}" | sed 's+^img_+yourImagePrefix_+' )
#imageFile2=$( basename "${url2}" | sed 's+^img_+yourImagePrefix_+' )
#imageFile3=$( basename "${url3}" | sed 's+^img_+yourImagePrefix_+' )
#imageFile4=$( basename "${url4}" | sed 's+^img_+yourImagePrefix_+' )
imageFile1=$( basename "${url1}" | sed 's+^cs-+yourImagePrefix_+' )
imageFile2=$( basename "${url2}" | sed 's+^cs-+yourImagePrefix_+' )
imageFile3=$( basename "${url3}" | sed 's+^cs-+yourImagePrefix_+' )
test ${DBG} -eq 1 && {
echo -e "\n Image filenames assigned:"
#echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}\n\t ${imageFile4}"
echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}"
}
test ${DBG} -eq 1 && {
echo -e "\n WGET process log:"
}
### This form of wget does NOT work for me, although man page says it should.
#wget -P "${folder}" -O "${imageFile1}" "${url1}"
### This form of wget DOES work for me
wget -O "${folder}/${imageFile1}" "${url1}"
wget -O "${folder}/${imageFile2}" "${url2}"
wget -O "${folder}/${imageFile3}" "${url3}"
#wget -O "${folder}/${imageFile3}" "${url3}"
test ${DBG} -eq 1 && {
echo -e "\n Listing of downloaded files:"
ls -l /tmp/topic* 2>>/dev/null | awk '{ printf("\t %s\n", $0 ) ; }'
}
fi
done
The script is adapted for what I had to work with. :-)
I have a txt file (which often gets updated) called hitlist.txt containing a list of words/strings I want to grep a directory against ... like:
# This is just a comment and will not be part of the search
* Blah - this is a category
foo
bar
sibilance
# A new category
* Meh - another category
snakefish
sex panther
My list is typically > 100 strings, and each is on its own line. Today, because of a deadline, I simply went through the list and executed the following command for each word:
find -iname "*" -type f -print0 | xargs -0 -HniI "foo" >> results.txt
As indicated in the command above, I am interested in the file path and name, as well as the line the contains the matched text. There are multiple categories list in the file (denoted by *) and I would like to be able to run my script against one, more, or all categories.
I would also like to be able to turn off the -i flag (case sensitivity) as an option. I have a script that recursively finds/lists all files in a directory, and the command I have been using above. Lastly the hitlist format can be changed completely if necessary.
Setup a ghl() (grep hitlist) shell function to do the work, (depends on GNU grep's -o switch, plus a little sed loop), the output is a list of words from hitlist.txt (or <filename>):
# usage ghl <glob> <filename>
ghl() { grep -o '\* '"$1"' -' "$2" | grep -o '[[:alpha:]]*' | \
while read x ; do \
sed -n '/\* '"$1"'/{:show ;n;/^[^ ]/{p;b show;}}' "$2" ; \
done ; }
Pipe the word list output of ghl with an ".*ah" wildcard, (which matches the Blah category), into grep -f -, plus some ad hoc bash process substitution to generate input text:
ghl '.*ah' hitlist.txt | grep -i -f - <(echo bar) <(echo foo) <(echo Foo)
Output:
/dev/fd/63:bar
/dev/fd/62:foo
/dev/fd/61:Foo
The 2nd grep above can be passed switches as desired, (see man grep). Example, the same thing, but case sensitive, (i.e. remove the -i switch):
ghl '.*ah' hitlist.txt | grep -f - <(echo bar) <(echo foo) <(echo Foo)
Output, (note missing uppercase item):
/dev/fd/63:bar
/dev/fd/62:foo
Since grep already has options to handle recursive searches, the rest is only a matter of adding switches as required.
Your question is extremely vague, but I'm imagining this is more or less what you are looking for.
awk -v cat='Blah|Meh' 'NR==FNR && /^#/ { next } # Skip comments
NR==FNR && /^\*/ { if ($0~cat) c=1; else c=0; next }
NR==FNR { if(c) a[$0]=1; next }
lower($0) in a { print FILENAME ":" FNR ":" $0 }' Hits.txt files to search
Figuring out how to selectively disable lower() and rigging it to read a list of file names other than Hits.txt from find should be fairly obvious.
This is what I ended up with:
hitlist format:
# MEH
never,going,to give,you up
# blah
word to,your,mother
Script:
# Set defaults
OUTPUT_FILE="hits.txt"
HITLIST_FILE="hitlist.txt"
# Hold on to the args
ARGLIST=($*)
# Declare any functiions
help ()
{
echo "--------------------------------- Luffa --------------------------------"
echo "Usage: luffa.sh [DIRTOSCRUB]"
echo ""
echo "Searches DIRTOSCRUB for category specific words in $HITLIST_FILE."
echo ""
echo "EXAMPLE: luffa.sh dirtoscrub"
echo ""
echo "--help display this help and exit"
echo "--version display version information and exit"
}
version ()
{
echo "luffa.sh v1.0"
}
process ()
{
if [ ${#FILEARG} -lt 1 ] # check for proper number of args
then
echo "ERROR: Specify directory to be searched."
help
exit 1
else
SEARCH_DIR=${ARGLIST[0]}
fi
echo ""
echo "--------------------------------------------------------- Luffa ---------------------------------------------------" | tee -a "$OUTPUT_FILE"
echo "search command: find [DIRTOSCRUB] -type f -print0 | xargs -0 grep -HniI --color=always $word | tee -a ../hits.txt | more" | tee -a "$OUTPUT_FILE"
echo
echo " .,,:::::." | tee -a "$OUTPUT_FILE"
echo " .,,::::~:::::.." | tee -a "$OUTPUT_FILE"
echo " ,,::::~~~~~~::~~:::." | tee -a "$OUTPUT_FILE"
echo " ,:,:~:~~~~~~~~~~~~~~::." | tee -a "$OUTPUT_FILE"
echo " ,,:::~:~~~~~~~~~~~~~~~~~~," | tee -a "$OUTPUT_FILE"
echo " .,,::::~~~~~~~~~~~~~~~~~~~~~~::" | tee -a "$OUTPUT_FILE"
echo " .,::~:~~~~~=~~~~=~~~~~~~~~~~=~~~~." | tee -a "$OUTPUT_FILE"
echo " ,::::~~:~~~=~~~~~~~~=~~=~~~===~~~~~~." | tee -a "$OUTPUT_FILE"
echo " ..:::~~~~=~~=~~~~~~=~~~~=~~===~~==~~~~~~," | tee -a "$OUTPUT_FILE"
echo " .,:::~~~~~~~~~~~~~~~~=~=~~~=~====~===~~~~~~~." | tee -a "$OUTPUT_FILE"
echo " .,::~~~~~~~~~~~~~~=~=~~~~~=~======~=~~~~=~=~~~:" | tee -a "$OUTPUT_FILE"
echo " ..,::~:~~~~~~=~~~=~~~~~~~~=~====+======~===~~~~~~~." | tee -a "$OUTPUT_FILE"
echo " ..,:,:~~~~~~=~::~~=~=~~~=~~=~=~=~======~~~==~~~~~~::." | tee -a "$OUTPUT_FILE"
echo " ,,.::~:=~~~~~~~~~~~~=~=~===~~~====+==~=====~~~~~::,." | tee -a "$OUTPUT_FILE"
echo " ,,,,:I++=:~==~=~~~~~~=~:==~=~+~====~=~===~~~~:~::,:" | tee -a "$OUTPUT_FILE"
echo " .,:+++?77+?=~~~~=~~=~=~~=~~+=~+~~+====~=~~~:::::,::," | tee -a "$OUTPUT_FILE"
echo " ..++++?++?II?=~~=~~~=~~~====~===~=====~~~:~::::::::,." | tee -a "$OUTPUT_FILE"
echo " ..=++?++++++???7+~~~~~~~~+~=~=====~~~~~~~~~::::~:::,,.." | tee -a "$OUTPUT_FILE"
echo " .=+++++++++++++++===:~~=~==+~~=~=~~:~~=~:~:::~::::,,.." | tee -a "$OUTPUT_FILE"
echo " .++++++?++++++?++=?~:~~~~===~==~==~~~~~:::::::::,,,..." | tee -a "$OUTPUT_FILE"
echo " ..=?+++++??+++++++===~::~~~~~~=~~~~~~:~~:::::,:,,,,,." | tee -a "$OUTPUT_FILE"
echo " ...=+?+++++++++=====~:,::,~:::~~~~~:~~~~::::~::::,,,,.." | tee -a "$OUTPUT_FILE"
echo " .=+++++++++++===~==::::,::~~,,,::~~~~~~::::::~:,:,,.." | tee -a "$OUTPUT_FILE"
echo " ..++++++++++=+===~,.,,:::,:~~~~~,.,:~:~::::::,::,:,.." | tee -a "$OUTPUT_FILE"
echo " ...++?++++++++=+=~~. ..,,,,,:,~,::~,:::,:,:,~::::,,.." | tee -a "$OUTPUT_FILE"
echo " .++++++++?++====~. ...,,:,~::~=::,::,:,:::,,,,.." | tee -a "$OUTPUT_FILE"
echo ".++?+++++?++++==~.. .,.:,,:::~,:,,,:::::,,,." | tee -a "$OUTPUT_FILE"
echo "++++++?+???==~=. ...,::~~~:,,:,:::,,." | tee -a "$OUTPUT_FILE"
echo "?+++?????+==~. ..,,,,::,:,,,,,." | tee -a "$OUTPUT_FILE"
echo "+?+++??+==~. ..,,,,,,,,." | tee -a "$OUTPUT_FILE"
echo "+I???+==~. ..,,.." | tee -a "$OUTPUT_FILE"
echo "??++==~." | tee -a "$OUTPUT_FILE"
echo "+===~." | tee -a "$OUTPUT_FILE"
echo "+=~." | tee -a "$OUTPUT_FILE"
echo "~" | tee -a "$OUTPUT_FILE"
echo "--------------------------------------------------------------------------------------------------------------------------" | tee -a "$OUTPUT_FILE"
echo "" | tee -a "$OUTPUT_FILE"
# Loop through hitlist
while read -re hitList || [[ -n "$hitList" ]]
do
# If first character is "#" it's a comment, or line is blank, skip
if [ "$(echo $hitListWords | head -c 1)" != "#" ]; then
if [ ! -z "$hitListWords" -a "$hitListWords" != "" ]; then
# Parse comma delimited category specific hitlist
IFS=',' read -ra categoryWords <<< "$hitListWords"
# Search for occurences/hits for the hitList word
for categoryWord in "${categoryWords[#]}"; do
echo "---------------------------------------------------" | tee -a "$OUTPUT_FILE"
echo "$category - \"$categoryWord"\" | tee -a "$OUTPUT_FILE"
echo "---------------------------------------------------" | tee -a "$OUTPUT_FILE"
eval 'find "$SEARCH_DIR" -type f -print0 | xargs -0 grep -HniI "$categoryWord" >> "$OUTPUT_FILE"'
eval 'find "$SEARCH_DIR" -type f -print0 | xargs -0 grep -HniI --color=always "$categoryWord" | more'
echo "" | tee -a "$OUTPUT_FILE"
done
fi
else
category="$(echo "$hitListWords" | cut -d "#" -f 2)"
fi
done < "$HITLIST_FILE"
exit $?
}
# Process the options
while [[ "${ARGLIST[0]}" == -* ]]; do
OPTION="${ARGLIST[0]}"
NUM_OPTS=1;
case $OPTION in
--version)
version
exit 0
;;
--help)
help
exit 0
;;
*)
help
exit 1
;;
esac
ARGLIST=(${ARGLIST[#]:$NUM_OPTS})
done
FILEARG=${ARGLIST[#]}
process
I am learning to customize wget in a bash function and having trouble. I would like to display Downloading (file):% instead of the messy output of wget. The function below seems close I am having trouble calling it for my specific needs.
For example, my standard wget is:
cd 'C:\Users\cmccabe\Desktop\wget'
wget -O getCSV.txt http://xxx.xx.xxx.xxx/data/getCSV.csv
and that downloads the .csv as a .txt in the directory specified with all the messy wget output.
This function seems like it will do more-or-less what I need, but I can not seem to get it to function correctly using my data. Below is what I have tried. Thank you :).
#!/bin/bash
download() {
local url=$1 wget -O getCSV.txt http://xxx.xx.xxx.xxx/data/getCSV.csv
local destin=$2 'C:\Users\cmccabe\Desktop\wget'
echo -n " "
if [ "$destin" ]; then
wget --progress=dot "$url" -O "$destin" 2>&1 | grep --line-buffered "%" | \
sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'
else
wget --progress=dot "$url" 2>&1 | grep --line-buffered "%" | \
sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'
fi
echo -ne "\b\b\b\b"
echo " DONE"
}
EDITED CODE
#!/bin/bash
download () {
url=http://xxx.xx.xxx.xxx/data/getCSV.csv
destin='C:\Users\cmccabe\Desktop\wget'
echo -n " "
if [ "$destin" ]; then
wget -O getCSV.txt --progress=dot "$url" -O "$destin" 2>&1 | grep --line-buffered "%" | \
sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'
else
wget -O getCSV.txt --progress=dot $url 2>&1 | grep --line-buffered "%" | \
sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'
fi
echo -ne "\b\b\b\b"
echo " DONE"
menu
}
menu() {
while true
do
printf "\n Welcome to NGS menu (v1), please make a selection from the MENU \n
==================================\n\n
\t 1 Patient QC\n
==================================\n\n"
printf "\t Your choice: "; read menu_choice
case "$menu_choice" in
1) patient ;;
*) printf "\n Invalid choice."; sleep 2 ;;
esac
done
}
I am checking to see if a process on a remote server has been killed. The code I'm using is:
if [ `ssh -t -t -i id_dsa headless#remoteserver.com "ps -auxwww |grep pipeline| wc -l" | sed -e 's/^[ \t]*//'` -lt 3 ]
then
echo "PIPELINE STOPPED SUCCESSFULLY"
exit 0
else
echo "PIPELINE WAS NOT STOPPED SUCCESSFULLY"
exit 1
fi
However when I execute this I get:
: integer expression expected
PIPELINE WAS NOT STOPPED SUCCESSFULLY
1
The actual value returned is "1" with no whitespace. I checked that by:
vim <(ssh -t -t -i id_dsa headless#remoteserver.com "ps -auxwww |grep pipeline| wc -l" | sed -e 's/^[ \t]*//')
and then ":set list" which showed only the integer and a line feed as the returned value.
I'm at a loss here as to why this is not working.
If the output of the ssh command is truly just an integer preceded by optional tabs, then you shouldn't need the sed command; the shell will strip the leading and/or trailing whitespace as unnecessary before using it as an operand for the -lt operator.
if [ $(ssh -tti id_dsa headless#remoteserver.com "ps -auxwww | grep -c pipeline") -lt 3 ]; then
It is possible that result of the ssh is not the same when you run it manually as when it runs in the shell. You might try saving it in a variable so you can output it before testing it in your script:
result=$( ssh -tti id_dsa headless#remoteserver.com "ps -auxwww | grep -c pipeline" )
if [ $result -lt 3 ];
The return value you get is not entirely a digit. Maybe some shell-metacharacter/linefeed/whatever gets into your way here:
#!/bin/bash
var=$(ssh -t -t -i id_dsa headless#remoteserver.com "ps auxwww |grep -c pipeline")
echo $var
# just to prove my point here
# Remove all digits, and look wether there is a rest -> then its not integer
test -z "$var" -o -n "`echo $var | tr -d '[0-9]'`" && echo not-integer
# get out all the digits to use them for the arithmetic comparison
var2=$(grep -o "[0-9]" <<<"$var")
echo $var2
if [[ $var2 -lt 3 ]]
then
echo "PIPELINE STOPPED SUCCESSFULLY"
exit 0
else
echo "PIPELINE WAS NOT STOPPED SUCCESSFULLY"
exit 1
fi
As user mbratch noticed I was getting a "\r" in the returned value in addition to the expected "\n". So I changed my sed script so that it stripped out the "\r" instead of the whitespace (which chepner pointed out was unnecessary).
sed -e 's/\r*$//'