How can I use wget to download specific files in a CSV file, and then store those files into specific directories?

How can I use wget to download specific files in a CSV file, and then store those files into specific directories? - bash

I have been attempting to extract a CSV file full of URL's of images (about 1000).
Each row is a specific product with the first cell labelled "id".
I have taken the ID of each line in excel and created directories for them using a loop with mkdir.
My issue now is that I can't seem to figure out how to download the image, and then immediately store it into these folder's.
What I am attempting here is to use wget by concatenating "fold_name" and "EXT" to get it like a directory "/name_of_folder", and then getting the links to the images (in cell 5,6,7 and 8) and then using wget from these cells, into the directory.
Can anyone assist me with this?
I think this should be straight forward enough.
Thank you!
#!/usr/bin/bash
EXT='/'
while read line
do
fold_name= cut -d$',' -f1
concat= "%EXT" + "%fold_name"
img1= cut -d$',' -f5
img2= cut -d$',' -f6
img3= cut -d$',' -f7
img4= cut -d$',' -f8
wget -O "%img1" "%concat"
wget -O "%img2" "%concat"
wget -O "%img1" "%concat"
wget -O "%img2" "%concat"
done < file.csv

You might use -P switch to designate target directory, consider following simple example using some files from test-images/png repository
mkdir -p black
mkdir -p gray
mkdir -p white
wget -P black https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png
wget -P gray https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png
wget -P white https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png
will lead to following structure
black
cs-black-000.png
gray
cs-gray-7f7f7f.png
white
cs-white-fff.png

You should use variables names that are less ambiguous.
You need to provide the directory as part of the output filename.
"%" is not a bash variable designator. That is a formatting directive (for bash, awk, C, etc.).
The following will provide what you want.
#!/usr/bin/bash
DBG=1
INPUT="${1}"
INPUT="file.csv"
cat >"${INPUT}" <<"EnDoFiNpUt"
#topic_1,junk01,junk02,junk03,img_101.png,img_102.png,img_103.png,img_104.png
#topic_2,junk04,junk05,junk06,img_201.png,img_202.png,img_203.png,img_204.png
#
topic_1,junk01,junk02,junk03,https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png
EnDoFiNpUt
if [ ${DBG} -eq 1 ]
then
echo -e "\n Input file:"
cat "${INPUT}" | awk '{ printf("\t %s\n", $0 ) ; }'
echo -e "\n Hit return to continue ..." ; read k
fi
REPO_ROOT='/tmp'
grep -v '^#' "${INPUT}" |
while read line
do
topic_name=$(echo "${line}" | cut -f1 -d\, )
test ${DBG} -eq 1 && echo -e "\t topic_name= ${topic_name} ..."
folder="${REPO_ROOT}/${topic_name}"
test ${DBG} -eq 1 && echo -e "\t folder= ${folder} ..."
if [ ! -d "${folder}" ]
then
mkdir "${folder}"
else
rm -f "${folder}/"*
fi
if [ ! -d "${folder}" ]
then
echo -e "\n Unable to create directory '${folder}' for saving downloads.\n Bypassing 'wget' actions ..." >&2
else
test ${DBG} -eq 1 && ls -ld "${folder}" | awk '{ printf("\n\t %s\n", $0 ) ; }'
url1=$(echo "${line}" | cut -d\, -f5 )
url2=$(echo "${line}" | cut -d\, -f6 )
url3=$(echo "${line}" | cut -d\, -f7 )
url4=$(echo "${line}" | cut -d\, -f8 )
test ${DBG} -eq 1 && {
echo -e "\n URLs extracted:"
echo -e "\n\t ${url1}\n\t ${url2}\n\t ${url3}\n\t ${url4}"
}
#imageFile1=$( basename "${url1}" | sed 's+^img_+yourImagePrefix_+' )
#imageFile2=$( basename "${url2}" | sed 's+^img_+yourImagePrefix_+' )
#imageFile3=$( basename "${url3}" | sed 's+^img_+yourImagePrefix_+' )
#imageFile4=$( basename "${url4}" | sed 's+^img_+yourImagePrefix_+' )
imageFile1=$( basename "${url1}" | sed 's+^cs-+yourImagePrefix_+' )
imageFile2=$( basename "${url2}" | sed 's+^cs-+yourImagePrefix_+' )
imageFile3=$( basename "${url3}" | sed 's+^cs-+yourImagePrefix_+' )
test ${DBG} -eq 1 && {
echo -e "\n Image filenames assigned:"
#echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}\n\t ${imageFile4}"
echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}"
}
test ${DBG} -eq 1 && {
echo -e "\n WGET process log:"
}
### This form of wget does NOT work for me, although man page says it should.
#wget -P "${folder}" -O "${imageFile1}" "${url1}"
### This form of wget DOES work for me
wget -O "${folder}/${imageFile1}" "${url1}"
wget -O "${folder}/${imageFile2}" "${url2}"
wget -O "${folder}/${imageFile3}" "${url3}"
#wget -O "${folder}/${imageFile3}" "${url3}"
test ${DBG} -eq 1 && {
echo -e "\n Listing of downloaded files:"
ls -l /tmp/topic* 2>>/dev/null | awk '{ printf("\t %s\n", $0 ) ; }'
}
fi
done
The script is adapted for what I had to work with. :-)

Related

bash optional command in variable

i have a code:
L12(){
echo -e "/tftpboot/log/archive/L12/*/*$sn*L12*.log /tftpboot/log/diag/*$sn*L12*.log"
command="| grep -v hdd"
}
getlog(){
echo $(ls -ltr $(${1}) 2>/dev/null `${command}` | tail -1)
}
however $command does not seem to be inserting | grep -v hdd correctly
i need $command to be either empty or | grep
is there a simple solution to my issue or should i go for different approach
edit:
there may be another problem in there
i am loading a few "modules"
EVAL.sh
ev(){
case "${1}" in
*FAIL*) paint $red "FAIL";;
*PASS*) paint $green "PASS";;
*)echo;;
esac
result=${1}
}
rackinfo.sh (the "main script")
#! /bin/bash
#set -x
n=0
for src in $(ls modules/)
do
source modules/$src && ((n++))
## debugging
# source src/$src || ((n++)) || echo "there may be an issue in $src"
done
## debugging
# x=($n - $(ls | grep src | wc -l))
# echo -e "$x plugin(s) failed to laod correctly"
# echo -e "loaded $n modules"
########################################################################
command=cat
tests=("L12" "AL" "BI" "L12-3")
while read sn
do
paint $blue "$sn\t"
for test in ${tests[#]}
do
log="$(ev "$(getlog ${test})")"
if [[ -z ${log} ]]
then
paint $cyan "${test} "; paint $red "!LOG "
else
paint $cyan "${test} ";echo -ne "$log "
fi
done
echo
done <$1
the results i get are still containing "hdd" for L12()

Set command to cat as a default.
Also, it's best to use an array for commands with arguments, in case any of the arguments is multiple words.
There's rarely a reason to write echo $(command). That's essentially the same as just writing command.
#default command does nothing
command=(cat)
L12(){
echo -e "/tftpboot/log/archive/L12/*/*$sn*L12*.log /tftpboot/log/diag/*$sn*L12*.log"
command=(grep -v hdd)
}
getlog(){
ls -ltr $(${1}) 2>/dev/null | "${command[#]}" | tail -1)
}

Shell: Add string to the end of each line, which match the pattern. Filenames are given in another file

I'm still new to the shell and need some help.
I have a file stapel_old.
Also I have in the same directory files like english_old_sync, math_old_sync and vocabulary_old_sync.
The content of stapel_old is:
english
math
vocabulary
The content of e.g. english is:
basic_grammar.md
spelling.md
orthography.md
I want to manipulate all files which are given in stapel_old like in this example:
take the first line of stapel_old 'english', (after that math, and so on)
convert in this case english to english_old_sync, (or after that what is given in second line, e.g. math to math_old_sync)
search in english_old_sync line by line for the pattern '.md'
And append to each line after .md :::#a1
The result should be e.g. of english_old_sync:
basic_grammar.md:::#a1
spelling.md:::#a1
orthography.md:::#a1
of math_old_sync:
geometry.md:::#a1
fractions.md:::#a1
and so on. stapel_old should stay unchanged.
How can I realize that?
I tried with sed -n, while loop (while read -r line), and I'm feeling it's somehow the right way - but I still get errors and not the expected result after 4 hours inspecting and reading.
Thank you!
EDIT
Here is the working code (The files are stored in folder 'olddata'):
clear
echo -e "$(tput setaf 1)$(tput setab 7)Learning directories:$(tput sgr 0)\n"
# put here directories which should not become flashcards, command: | grep -v 'name_of_directory_which_not_to_learn1' | grep -v 'directory2'
ls ../ | grep -v 00_gliederungsverweise | grep -v 0_weiter | grep -v bibliothek | grep -v notizen | grep -v Obsidian | grep -v z_nicht_uni | tee olddata/stapel_old
# count folders
echo -ne "\nHow much different folders: " && wc -l olddata/stapel_old | cut -d' ' -f1 | tee -a olddata/stapel_old
echo -e "Are this learning directories correct? [j ODER y]--> yes; [Other]-->no\n"
read lernvz_korrekt
if [ "$lernvz_korrekt" = j ] || [ "$lernvz_korrekt" = y ];
then
read -n 1 -s -r -p "Learning directories correct. Press any key to continue..."
else
read -n 1 -s -r -p "Learning directories not correct, please change in line 4. Press any key to continue..."
exit
fi
echo -e "\n_____________________________\n$(tput setaf 6)$(tput setab 5)Found cards:$(tput sgr 0)$(tput setaf 6)\n"
#GET && WRITE FOLDER NAMES into olddata/stapel_old
anzahl_zeilen=$(cat olddata/stapel_old |& tail -1)
#GET NAMES of .md files of every stapel and write All to 'stapelname'_old_sync
i=0
name="var_$i"
for (( num=1; num <= $anzahl_zeilen; num++ ))
do
i="$((i + 1))"
name="var_$i"
name=$(cat olddata/stapel_old | sed -n "$num"p)
find ../$name/ -name '*.md' | grep -v trash | grep -v Obsidian | rev | cut -d'/' -f1 | rev | tee olddata/$name"_old_sync"
done
(tput sgr 0)
I tried to add:
input="olddata/stapel_old"
while IFS= read -r line
do
sed -n "$line"p olddata/stapel_old
done < "$input"
The code to change only the english_old_sync is:
lines=$(wc -l olddata/english_old_sync | cut -d' ' -f1)
for ((num=1; num <= $lines; num++))
do
content=$(sed -n "$num"p olddata/english_old_sync)
sed -i "s/"$content"/""$content":::#a1/g"" olddata/english_old_sync
done
So now, this need to be a inner for-loop, of a outer for-loop which holds the variable for english, right?

stapel_old should stay unchanged.
You could try a while + read loop and embed sed inside the loop.
#!/usr/bin/env bash
while IFS= read -r files; do
echo cp -v "$files" "${files}_old_sync" &&
echo sed '/^.*\.md$/s/$/:::#a1/' "${files}_old_sync"
done < olddata/staple_old
convert in this case english to english_old_sync, (or after that what is given in second line, e.g. math to math_old_sync)
cp copies the file with a new name, if the goal is renaming the original file name from the content of the file staple_old then change cp to mv
The -n and -i flag from sed was ommited , include it, if needed.
The script also assumes that there are no empty/blank lines in the content of staple_old file. If in case there are/is add an addition test after the line where the do is.
[[ -n $files ]] || continue
It also assumes that the content of staple_old are existing files. Just in case add an additional test.
[[ -e $files ]] || { printf >&2 '%s no such file or directory.\n' "$files"; continue; }
Or an if statement.
if [[ ! -e $files ]]; then
printf >&2 '%s no such file or directory\n' "$files"
continue
fi
See also help test
See also help continue
Combining them all together should be something like:
#!/usr/bin/env bash
while IFS= read -r files; do
[[ -n $files ]] || continue
[[ -e $files ]] || {
printf >&2 '%s no such file or directory.\n' "$files"
continue
}
echo cp -v "$files" "${files}_old_sync" &&
echo sed '/^.*\.md$/s/$/:::#a1/' "${files}_old_sync"
done < olddata/staple_old
Remove the echo's If you're satisfied with the output so the script could copy/rename and edit the files.

bash : grep gives "No such file or directory" while searching for text in log

in bash script i have these relative lines:
find_time(){
args=( -A2 'Finished import for test' /var/apps/log/app.log )
while [ -z "$per_time" ] && [ $COUNTER -lt 3 ] ; do
if grep -q "${args[#]}" ; then
per_time=$(grep "${args[#]}" | grep Time)
output_time
elif ssh user#hostname grep -q -A2 'Finished import for test' /var/apps/log/app.log ; then
per_time=$(ssh user#hostname 'grep -s -A2 "Finished import for test" /var/apps/log/app.log | grep "Time"')
output_time
else
sleep 60
fi;
COUNTER=$((COUNTER+1))
done
echo "find_time OK"
}
output_time () {
time=${perf_time//[!0-9]/}
#convert to seconds and round the value
time=$(echo $time | sed 's/^\(.\{2\}\)/\1./' | awk '{ printf "%.1f\n",$1}')
echo -e "kpi5 : "$time" sec " >> "$result_file"
}
find_time
it gives me right result, however it shows in shell:
grep: import: No such file or directory
grep: for: No such file or directory
grep: test: No such file or directory
can not find out how to solve it.

You should put the whole command inside single quotes.
So, I think you can try this one:
$(ssh user#host 'grep -A2 "Finished import for test " /var/apps/log/app.log | grep "Time"')

bash script to scan for repeated episode numbers, append episode modifier

I use youtube-dl to archive specific blogs. I use a custom bash script (called tvify) to help me organize my content into Plex-ready filenames for later replay via my home Plex server.
Archiving the content works fine, unless a blogger posts more than one video on the same date - if that happens my script creates more than one file for a given month/date and plex sees a duplicate episode. In the plex app, it stuffs them together as distinct 'versions' of the same episode. The result is that the description of the video no longer matches its contents, and only one 'version' appears unless I access an additional sub menu.
The videos get downloaded by you tube-dl kicked off from a cron-job, and that downloader script runs the following to help format their filenames and stuff them into appropriate folders for 'seasons'.
The season is the year when the video was released, and the episode is the combination of the month and date in MMDD format.
Below is my 'tvify' script, which helps perform the filename manipulation and stuffs the file into the proper folder for the season.
#!/bin/bash
mySuff="$1"
echo mySuff="$mySuff"
if [ -z "$1" ]; then
mySuff="*.mp4"
fi
for i in $mySuff
do
prb=`ffprobe -- "$i" 2>&1`
myDate=`echo "$prb" | grep -E 'date\s+:' | cut -d ':' -f 2`
myartist=`echo "$prb" | grep -E 'artist\s+:' | cut -d ':' -f 2`
myTitle=`echo "$prb" | grep -E 'title\s+:' | cut -d ':' -f 2 | sed 's/\//_/g'`
cwd_stub=`pwd | awk -F'/' '{print $NF}'`
if [ -d "s${myDate:1:4}" ]; then echo "Directory found" > /dev/null; else mkdir "s${myDate:1:4}"; fi
[ -d "s${myDate:1:4}" ] && mv -- "$i" "s${myDate:1:4}/${myartist[#]:1} - s${myDate:1:4}e${myDate:5:8} - ${myTitle[#]:1:40} _$i" || mv -- "$i" "${myartist[#]:1} - s${myDate:1:4}e${myDate:5:8} - ${myTitle[#]:1:40} _$i"
done
How can I modify that script to identify if a conflicting year/MMDD file exists, and if so, append an appropriate suffix to the episode number so that plex will interpret them as distinct episodes?

I ended up implementing an array, counting the number of elements in the array, and using that to append the integer:
#!/bin/bash
mySuff="$1"
echo mySuff="$mySuff"
if [ -z "$1" ]; then
mySuff="*.mp4"
fi
for i in $mySuff
do
prb=`ffprobe -- "$i" 2>&1`
myDate=`echo "$prb" | grep -E 'date\s+:' | cut -d ':' -f 2`
myartist=`echo "$prb" | grep -E 'artist\s+:' | cut -d ':' -f 2`
myTitle=`echo "$prb" | grep -E 'title\s+:' | cut -d ':' -f 2 | sed 's/\//_/g'`
cwd_stub=`pwd | awk -F'/' '{print $NF}'`
readarray -t conflicts < <(find . -maxdepth 2 -iname "*s${myDate:1:4}e${myDate:5:8}*" -type f -printf '%P\n')
[ ${#conflicts[#]} -gt 0 ] && _inc=${#conflicts[#]} || _inc=
if [ -d "s${myDate:1:4}" ]; then echo "Directory found" > /dev/null; else mkdir "s${myDate:1:4}"; fi
[ -d "s${myDate:1:4}" ]
&& mv -- "$i" "s${myDate:1:4}/${myartist[#]:1} - s${myDate:1:4}e${myDate:5:8}$_inc - ${myTitle[#]:1:40} _$i"
|| mv -- "$i" "${myartist[#]:1} - s${myDate:1:4}e${myDate:5:8}$_inc - ${myTitle[#]:1:40} _$i"
done

wget bash function without messy output

I am learning to customize wget in a bash function and having trouble. I would like to display Downloading (file):% instead of the messy output of wget. The function below seems close I am having trouble calling it for my specific needs.
For example, my standard wget is:
cd 'C:\Users\cmccabe\Desktop\wget'
wget -O getCSV.txt http://xxx.xx.xxx.xxx/data/getCSV.csv
and that downloads the .csv as a .txt in the directory specified with all the messy wget output.
This function seems like it will do more-or-less what I need, but I can not seem to get it to function correctly using my data. Below is what I have tried. Thank you :).
#!/bin/bash
download() {
local url=$1 wget -O getCSV.txt http://xxx.xx.xxx.xxx/data/getCSV.csv
local destin=$2 'C:\Users\cmccabe\Desktop\wget'
echo -n " "
if [ "$destin" ]; then
wget --progress=dot "$url" -O "$destin" 2>&1 | grep --line-buffered "%" | \
sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'
else
wget --progress=dot "$url" 2>&1 | grep --line-buffered "%" | \
sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'
fi
echo -ne "\b\b\b\b"
echo " DONE"
}
EDITED CODE
#!/bin/bash
download () {
url=http://xxx.xx.xxx.xxx/data/getCSV.csv
destin='C:\Users\cmccabe\Desktop\wget'
echo -n " "
if [ "$destin" ]; then
wget -O getCSV.txt --progress=dot "$url" -O "$destin" 2>&1 | grep --line-buffered "%" | \
sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'
else
wget -O getCSV.txt --progress=dot $url 2>&1 | grep --line-buffered "%" | \
sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'
fi
echo -ne "\b\b\b\b"
echo " DONE"
menu
}
menu() {
while true
do
printf "\n Welcome to NGS menu (v1), please make a selection from the MENU \n
==================================\n\n
\t 1 Patient QC\n
==================================\n\n"
printf "\t Your choice: "; read menu_choice
case "$menu_choice" in
1) patient ;;
*) printf "\n Invalid choice."; sleep 2 ;;
esac
done
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How can I use wget to download specific files in a CSV file, and then store those files into specific directories? - bash

Related

bash optional command in variable

Shell: Add string to the end of each line, which match the pattern. Filenames are given in another file

bash : grep gives "No such file or directory" while searching for text in log

bash script to scan for repeated episode numbers, append episode modifier

wget bash function without messy output

Categories

Resources