I have been attempting to extract a CSV file full of URL's of images (about 1000).
Each row is a specific product with the first cell labelled "id".
I have taken the ID of each line in excel and created directories for them using a loop with mkdir.
My issue now is that I can't seem to figure out how to download the image, and then immediately store it into these folder's.
What I am attempting here is to use wget by concatenating "fold_name" and "EXT" to get it like a directory "/name_of_folder", and then getting the links to the images (in cell 5,6,7 and 8) and then using wget from these cells, into the directory.
Can anyone assist me with this?
I think this should be straight forward enough.
Thank you!
#!/usr/bin/bash
EXT='/'
while read line
do
fold_name= cut -d$',' -f1
concat= "%EXT" + "%fold_name"
img1= cut -d$',' -f5
img2= cut -d$',' -f6
img3= cut -d$',' -f7
img4= cut -d$',' -f8
wget -O "%img1" "%concat"
wget -O "%img2" "%concat"
wget -O "%img1" "%concat"
wget -O "%img2" "%concat"
done < file.csv
You might use -P switch to designate target directory, consider following simple example using some files from test-images/png repository
mkdir -p black
mkdir -p gray
mkdir -p white
wget -P black https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png
wget -P gray https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png
wget -P white https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png
will lead to following structure
black
cs-black-000.png
gray
cs-gray-7f7f7f.png
white
cs-white-fff.png
You should use variables names that are less ambiguous.
You need to provide the directory as part of the output filename.
"%" is not a bash variable designator. That is a formatting directive (for bash, awk, C, etc.).
The following will provide what you want.
#!/usr/bin/bash
DBG=1
INPUT="${1}"
INPUT="file.csv"
cat >"${INPUT}" <<"EnDoFiNpUt"
#topic_1,junk01,junk02,junk03,img_101.png,img_102.png,img_103.png,img_104.png
#topic_2,junk04,junk05,junk06,img_201.png,img_202.png,img_203.png,img_204.png
#
topic_1,junk01,junk02,junk03,https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png
EnDoFiNpUt
if [ ${DBG} -eq 1 ]
then
echo -e "\n Input file:"
cat "${INPUT}" | awk '{ printf("\t %s\n", $0 ) ; }'
echo -e "\n Hit return to continue ..." ; read k
fi
REPO_ROOT='/tmp'
grep -v '^#' "${INPUT}" |
while read line
do
topic_name=$(echo "${line}" | cut -f1 -d\, )
test ${DBG} -eq 1 && echo -e "\t topic_name= ${topic_name} ..."
folder="${REPO_ROOT}/${topic_name}"
test ${DBG} -eq 1 && echo -e "\t folder= ${folder} ..."
if [ ! -d "${folder}" ]
then
mkdir "${folder}"
else
rm -f "${folder}/"*
fi
if [ ! -d "${folder}" ]
then
echo -e "\n Unable to create directory '${folder}' for saving downloads.\n Bypassing 'wget' actions ..." >&2
else
test ${DBG} -eq 1 && ls -ld "${folder}" | awk '{ printf("\n\t %s\n", $0 ) ; }'
url1=$(echo "${line}" | cut -d\, -f5 )
url2=$(echo "${line}" | cut -d\, -f6 )
url3=$(echo "${line}" | cut -d\, -f7 )
url4=$(echo "${line}" | cut -d\, -f8 )
test ${DBG} -eq 1 && {
echo -e "\n URLs extracted:"
echo -e "\n\t ${url1}\n\t ${url2}\n\t ${url3}\n\t ${url4}"
}
#imageFile1=$( basename "${url1}" | sed 's+^img_+yourImagePrefix_+' )
#imageFile2=$( basename "${url2}" | sed 's+^img_+yourImagePrefix_+' )
#imageFile3=$( basename "${url3}" | sed 's+^img_+yourImagePrefix_+' )
#imageFile4=$( basename "${url4}" | sed 's+^img_+yourImagePrefix_+' )
imageFile1=$( basename "${url1}" | sed 's+^cs-+yourImagePrefix_+' )
imageFile2=$( basename "${url2}" | sed 's+^cs-+yourImagePrefix_+' )
imageFile3=$( basename "${url3}" | sed 's+^cs-+yourImagePrefix_+' )
test ${DBG} -eq 1 && {
echo -e "\n Image filenames assigned:"
#echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}\n\t ${imageFile4}"
echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}"
}
test ${DBG} -eq 1 && {
echo -e "\n WGET process log:"
}
### This form of wget does NOT work for me, although man page says it should.
#wget -P "${folder}" -O "${imageFile1}" "${url1}"
### This form of wget DOES work for me
wget -O "${folder}/${imageFile1}" "${url1}"
wget -O "${folder}/${imageFile2}" "${url2}"
wget -O "${folder}/${imageFile3}" "${url3}"
#wget -O "${folder}/${imageFile3}" "${url3}"
test ${DBG} -eq 1 && {
echo -e "\n Listing of downloaded files:"
ls -l /tmp/topic* 2>>/dev/null | awk '{ printf("\t %s\n", $0 ) ; }'
}
fi
done
The script is adapted for what I had to work with. :-)
I have a script that tails a log file, and then uploads the line. I would like to have it exit as soon as the first line is read:
#!/bin/bash
tail -n0 -F "$1" | while read LINE; do
(echo "$LINE" | grep -e "$3") && curl -X POST --silent --data-urlencode \
"payload={\"text\": \"$(echo $LINE | sed "s/\"/'/g")\"}" "$2";
done
If you want to exit as soon as the first line is uploaded you can just add a break:
#!/bin/bash
tail -n0 -F "$1" | while read LINE; do
(echo "$LINE" | grep -e "$3") && curl -X POST --silent --data-urlencode \
"payload={\"text\": \"$(echo $LINE | sed "s/\"/'/g")\"}" "$2" && break;
done
The issue was the tail command wasn't getting killed. A slightly modified version of my script (I didn't end up needing the echo to stdout)
#!/bin/bash
tail -n0 -F "$1" | while read LINE; do
curl -X POST --data-urlencode "payload={\"text\": \"$(echo $LINE | sed "s/\"/'/g")\"}" "$2" && pkill -P $$ tail
done
This answer helped as well: https://superuser.com/questions/270529/monitoring-a-file-until-a-string-is-found
I have a txt file (which often gets updated) called hitlist.txt containing a list of words/strings I want to grep a directory against ... like:
# This is just a comment and will not be part of the search
* Blah - this is a category
foo
bar
sibilance
# A new category
* Meh - another category
snakefish
sex panther
My list is typically > 100 strings, and each is on its own line. Today, because of a deadline, I simply went through the list and executed the following command for each word:
find -iname "*" -type f -print0 | xargs -0 -HniI "foo" >> results.txt
As indicated in the command above, I am interested in the file path and name, as well as the line the contains the matched text. There are multiple categories list in the file (denoted by *) and I would like to be able to run my script against one, more, or all categories.
I would also like to be able to turn off the -i flag (case sensitivity) as an option. I have a script that recursively finds/lists all files in a directory, and the command I have been using above. Lastly the hitlist format can be changed completely if necessary.
Setup a ghl() (grep hitlist) shell function to do the work, (depends on GNU grep's -o switch, plus a little sed loop), the output is a list of words from hitlist.txt (or <filename>):
# usage ghl <glob> <filename>
ghl() { grep -o '\* '"$1"' -' "$2" | grep -o '[[:alpha:]]*' | \
while read x ; do \
sed -n '/\* '"$1"'/{:show ;n;/^[^ ]/{p;b show;}}' "$2" ; \
done ; }
Pipe the word list output of ghl with an ".*ah" wildcard, (which matches the Blah category), into grep -f -, plus some ad hoc bash process substitution to generate input text:
ghl '.*ah' hitlist.txt | grep -i -f - <(echo bar) <(echo foo) <(echo Foo)
Output:
/dev/fd/63:bar
/dev/fd/62:foo
/dev/fd/61:Foo
The 2nd grep above can be passed switches as desired, (see man grep). Example, the same thing, but case sensitive, (i.e. remove the -i switch):
ghl '.*ah' hitlist.txt | grep -f - <(echo bar) <(echo foo) <(echo Foo)
Output, (note missing uppercase item):
/dev/fd/63:bar
/dev/fd/62:foo
Since grep already has options to handle recursive searches, the rest is only a matter of adding switches as required.
Your question is extremely vague, but I'm imagining this is more or less what you are looking for.
awk -v cat='Blah|Meh' 'NR==FNR && /^#/ { next } # Skip comments
NR==FNR && /^\*/ { if ($0~cat) c=1; else c=0; next }
NR==FNR { if(c) a[$0]=1; next }
lower($0) in a { print FILENAME ":" FNR ":" $0 }' Hits.txt files to search
Figuring out how to selectively disable lower() and rigging it to read a list of file names other than Hits.txt from find should be fairly obvious.
This is what I ended up with:
hitlist format:
# MEH
never,going,to give,you up
# blah
word to,your,mother
Script:
# Set defaults
OUTPUT_FILE="hits.txt"
HITLIST_FILE="hitlist.txt"
# Hold on to the args
ARGLIST=($*)
# Declare any functiions
help ()
{
echo "--------------------------------- Luffa --------------------------------"
echo "Usage: luffa.sh [DIRTOSCRUB]"
echo ""
echo "Searches DIRTOSCRUB for category specific words in $HITLIST_FILE."
echo ""
echo "EXAMPLE: luffa.sh dirtoscrub"
echo ""
echo "--help display this help and exit"
echo "--version display version information and exit"
}
version ()
{
echo "luffa.sh v1.0"
}
process ()
{
if [ ${#FILEARG} -lt 1 ] # check for proper number of args
then
echo "ERROR: Specify directory to be searched."
help
exit 1
else
SEARCH_DIR=${ARGLIST[0]}
fi
echo ""
echo "--------------------------------------------------------- Luffa ---------------------------------------------------" | tee -a "$OUTPUT_FILE"
echo "search command: find [DIRTOSCRUB] -type f -print0 | xargs -0 grep -HniI --color=always $word | tee -a ../hits.txt | more" | tee -a "$OUTPUT_FILE"
echo
echo " .,,:::::." | tee -a "$OUTPUT_FILE"
echo " .,,::::~:::::.." | tee -a "$OUTPUT_FILE"
echo " ,,::::~~~~~~::~~:::." | tee -a "$OUTPUT_FILE"
echo " ,:,:~:~~~~~~~~~~~~~~::." | tee -a "$OUTPUT_FILE"
echo " ,,:::~:~~~~~~~~~~~~~~~~~~," | tee -a "$OUTPUT_FILE"
echo " .,,::::~~~~~~~~~~~~~~~~~~~~~~::" | tee -a "$OUTPUT_FILE"
echo " .,::~:~~~~~=~~~~=~~~~~~~~~~~=~~~~." | tee -a "$OUTPUT_FILE"
echo " ,::::~~:~~~=~~~~~~~~=~~=~~~===~~~~~~." | tee -a "$OUTPUT_FILE"
echo " ..:::~~~~=~~=~~~~~~=~~~~=~~===~~==~~~~~~," | tee -a "$OUTPUT_FILE"
echo " .,:::~~~~~~~~~~~~~~~~=~=~~~=~====~===~~~~~~~." | tee -a "$OUTPUT_FILE"
echo " .,::~~~~~~~~~~~~~~=~=~~~~~=~======~=~~~~=~=~~~:" | tee -a "$OUTPUT_FILE"
echo " ..,::~:~~~~~~=~~~=~~~~~~~~=~====+======~===~~~~~~~." | tee -a "$OUTPUT_FILE"
echo " ..,:,:~~~~~~=~::~~=~=~~~=~~=~=~=~======~~~==~~~~~~::." | tee -a "$OUTPUT_FILE"
echo " ,,.::~:=~~~~~~~~~~~~=~=~===~~~====+==~=====~~~~~::,." | tee -a "$OUTPUT_FILE"
echo " ,,,,:I++=:~==~=~~~~~~=~:==~=~+~====~=~===~~~~:~::,:" | tee -a "$OUTPUT_FILE"
echo " .,:+++?77+?=~~~~=~~=~=~~=~~+=~+~~+====~=~~~:::::,::," | tee -a "$OUTPUT_FILE"
echo " ..++++?++?II?=~~=~~~=~~~====~===~=====~~~:~::::::::,." | tee -a "$OUTPUT_FILE"
echo " ..=++?++++++???7+~~~~~~~~+~=~=====~~~~~~~~~::::~:::,,.." | tee -a "$OUTPUT_FILE"
echo " .=+++++++++++++++===:~~=~==+~~=~=~~:~~=~:~:::~::::,,.." | tee -a "$OUTPUT_FILE"
echo " .++++++?++++++?++=?~:~~~~===~==~==~~~~~:::::::::,,,..." | tee -a "$OUTPUT_FILE"
echo " ..=?+++++??+++++++===~::~~~~~~=~~~~~~:~~:::::,:,,,,,." | tee -a "$OUTPUT_FILE"
echo " ...=+?+++++++++=====~:,::,~:::~~~~~:~~~~::::~::::,,,,.." | tee -a "$OUTPUT_FILE"
echo " .=+++++++++++===~==::::,::~~,,,::~~~~~~::::::~:,:,,.." | tee -a "$OUTPUT_FILE"
echo " ..++++++++++=+===~,.,,:::,:~~~~~,.,:~:~::::::,::,:,.." | tee -a "$OUTPUT_FILE"
echo " ...++?++++++++=+=~~. ..,,,,,:,~,::~,:::,:,:,~::::,,.." | tee -a "$OUTPUT_FILE"
echo " .++++++++?++====~. ...,,:,~::~=::,::,:,:::,,,,.." | tee -a "$OUTPUT_FILE"
echo ".++?+++++?++++==~.. .,.:,,:::~,:,,,:::::,,,." | tee -a "$OUTPUT_FILE"
echo "++++++?+???==~=. ...,::~~~:,,:,:::,,." | tee -a "$OUTPUT_FILE"
echo "?+++?????+==~. ..,,,,::,:,,,,,." | tee -a "$OUTPUT_FILE"
echo "+?+++??+==~. ..,,,,,,,,." | tee -a "$OUTPUT_FILE"
echo "+I???+==~. ..,,.." | tee -a "$OUTPUT_FILE"
echo "??++==~." | tee -a "$OUTPUT_FILE"
echo "+===~." | tee -a "$OUTPUT_FILE"
echo "+=~." | tee -a "$OUTPUT_FILE"
echo "~" | tee -a "$OUTPUT_FILE"
echo "--------------------------------------------------------------------------------------------------------------------------" | tee -a "$OUTPUT_FILE"
echo "" | tee -a "$OUTPUT_FILE"
# Loop through hitlist
while read -re hitList || [[ -n "$hitList" ]]
do
# If first character is "#" it's a comment, or line is blank, skip
if [ "$(echo $hitListWords | head -c 1)" != "#" ]; then
if [ ! -z "$hitListWords" -a "$hitListWords" != "" ]; then
# Parse comma delimited category specific hitlist
IFS=',' read -ra categoryWords <<< "$hitListWords"
# Search for occurences/hits for the hitList word
for categoryWord in "${categoryWords[#]}"; do
echo "---------------------------------------------------" | tee -a "$OUTPUT_FILE"
echo "$category - \"$categoryWord"\" | tee -a "$OUTPUT_FILE"
echo "---------------------------------------------------" | tee -a "$OUTPUT_FILE"
eval 'find "$SEARCH_DIR" -type f -print0 | xargs -0 grep -HniI "$categoryWord" >> "$OUTPUT_FILE"'
eval 'find "$SEARCH_DIR" -type f -print0 | xargs -0 grep -HniI --color=always "$categoryWord" | more'
echo "" | tee -a "$OUTPUT_FILE"
done
fi
else
category="$(echo "$hitListWords" | cut -d "#" -f 2)"
fi
done < "$HITLIST_FILE"
exit $?
}
# Process the options
while [[ "${ARGLIST[0]}" == -* ]]; do
OPTION="${ARGLIST[0]}"
NUM_OPTS=1;
case $OPTION in
--version)
version
exit 0
;;
--help)
help
exit 0
;;
*)
help
exit 1
;;
esac
ARGLIST=(${ARGLIST[#]:$NUM_OPTS})
done
FILEARG=${ARGLIST[#]}
process
Looking for an intercepting proxy made with netcat I found this script:
#!/bin/sh -e
if [ $# != 3 ]
then
echo "usage: $0 <src-port> <dst-host> <dst-port>"
exit 0
fi
TMP=`mktemp -d`
BACK=$TMP/pipe.back
SENT=$TMP/pipe.sent
RCVD=$TMP/pipe.rcvd
trap 'rm -rf "$TMP"' EXIT
mkfifo -m 0600 "$BACK" "$SENT" "$RCVD"
sed 's/^/ => /' <"$SENT" &
sed 's/^/<= /' <"$RCVD" &
nc -l -p "$1" <"$BACK" | tee "$SENT" | nc "$2" "$3" | tee "$RCVD" >"$BACK"
Which work nicely, as expected.
Since I need to look closely to the encoding used, hence the actual bytes passing, I tried to change some lines to use hexdump -vC:
#!/bin/sh -e
if [ $# != 3 ]
then
echo "usage: $0 <src-port> <dst-host> <dst-port>"
exit 0
fi
TMP=`mktemp -d`
BACK=$TMP/pipe.back
SENT=$TMP/pipe.sent
RCVD=$TMP/pipe.rcvd
trap 'rm -rf "$TMP"' EXIT
mkfifo -m 0600 "$BACK" "$SENT" "$RCVD"
( hexdump -vC | sed 's/^/ => /' ) <"$SENT" &
( hexdump -vC | sed 's/^/<= /' ) <"$RCVD" &
nc -l -p "$1" <"$BACK" | tee "$SENT" | nc "$2" "$3" | tee "$RCVD" >"$BACK"
Now it's not working anymore. Actually, I've lost the "realtime" feature of the previous script. Every byte sent is dumped in a single batch; then every byte received in another batch; and this all only after the connection is closed.
I'm suspecting some sort of buffering occurs in the pipe (|), but I'm not sure how to:
test this hypotesis;
fix the script to make it work in realtime again.
PS1. I'm using cygwin.
PS2. sh --version outputs:
GNU bash, version 4.1.10(4)-release (i686-pc-cygwin)
Edit:
Removind the | sed ... part (that is, leaving only hexdump -vC <"$SENT" and hexdump -vC <"$RCVD") the realtime feature is back, increasing my suspicion over the pipeline operator. But the output turns out to be confusing since sent and received bytes are mixed.
Still I couldn't manage to resolve the buffering (?) issue, but I could change the hexdump invocation to render the sed unnecessary:
#!/bin/sh -e
if [ $# != 3 ]
then
echo "usage: $0 <src-port> <dst-host> <dst-port>"
exit 0
fi
TMP=`mktemp -d`
BACK=$TMP/pipe.back
SENT=$TMP/pipe.sent
RCVD=$TMP/pipe.rcvd
trap 'rm -rf "$TMP"' EXIT
mkfifo -m 0600 "$BACK" "$SENT" "$RCVD"
hexdump -v -e '" => %08.8_Ax\n"' -e '" => %08.8_ax " 8/1 "%02x " " " 8/1 "%02x "' -e '" |" 16/1 "%_p" "|\n"' <"$SENT" &
hexdump -v -e '"<= %08.8_Ax\n"' -e '"<= %08.8_ax " 8/1 "%02x " " " 8/1 "%02x "' -e '" |" 16/1 "%_p" "|\n"' <"$RCVD" &
nc -l "$1" <"$BACK" | tee "$SENT" | nc "$2" "$3" | tee "$RCVD" >"$BACK"
Yes, the new hexdump looks ugly, but works.
This question for me is now open just for the sake of curiosity. I'm still willing to give the "correct answer" points to the one who explains (and fixes) the buffering (?) behavior.