Bash assistance recursively grepping directory using a file - bash
I have a txt file (which often gets updated) called hitlist.txt containing a list of words/strings I want to grep a directory against ... like:
# This is just a comment and will not be part of the search
* Blah - this is a category
foo
bar
sibilance
# A new category
* Meh - another category
snakefish
sex panther
My list is typically > 100 strings, and each is on its own line. Today, because of a deadline, I simply went through the list and executed the following command for each word:
find -iname "*" -type f -print0 | xargs -0 -HniI "foo" >> results.txt
As indicated in the command above, I am interested in the file path and name, as well as the line the contains the matched text. There are multiple categories list in the file (denoted by *) and I would like to be able to run my script against one, more, or all categories.
I would also like to be able to turn off the -i flag (case sensitivity) as an option. I have a script that recursively finds/lists all files in a directory, and the command I have been using above. Lastly the hitlist format can be changed completely if necessary.
Setup a ghl() (grep hitlist) shell function to do the work, (depends on GNU grep's -o switch, plus a little sed loop), the output is a list of words from hitlist.txt (or <filename>):
# usage ghl <glob> <filename>
ghl() { grep -o '\* '"$1"' -' "$2" | grep -o '[[:alpha:]]*' | \
while read x ; do \
sed -n '/\* '"$1"'/{:show ;n;/^[^ ]/{p;b show;}}' "$2" ; \
done ; }
Pipe the word list output of ghl with an ".*ah" wildcard, (which matches the Blah category), into grep -f -, plus some ad hoc bash process substitution to generate input text:
ghl '.*ah' hitlist.txt | grep -i -f - <(echo bar) <(echo foo) <(echo Foo)
Output:
/dev/fd/63:bar
/dev/fd/62:foo
/dev/fd/61:Foo
The 2nd grep above can be passed switches as desired, (see man grep). Example, the same thing, but case sensitive, (i.e. remove the -i switch):
ghl '.*ah' hitlist.txt | grep -f - <(echo bar) <(echo foo) <(echo Foo)
Output, (note missing uppercase item):
/dev/fd/63:bar
/dev/fd/62:foo
Since grep already has options to handle recursive searches, the rest is only a matter of adding switches as required.
Your question is extremely vague, but I'm imagining this is more or less what you are looking for.
awk -v cat='Blah|Meh' 'NR==FNR && /^#/ { next } # Skip comments
NR==FNR && /^\*/ { if ($0~cat) c=1; else c=0; next }
NR==FNR { if(c) a[$0]=1; next }
lower($0) in a { print FILENAME ":" FNR ":" $0 }' Hits.txt files to search
Figuring out how to selectively disable lower() and rigging it to read a list of file names other than Hits.txt from find should be fairly obvious.
This is what I ended up with:
hitlist format:
# MEH
never,going,to give,you up
# blah
word to,your,mother
Script:
# Set defaults
OUTPUT_FILE="hits.txt"
HITLIST_FILE="hitlist.txt"
# Hold on to the args
ARGLIST=($*)
# Declare any functiions
help ()
{
echo "--------------------------------- Luffa --------------------------------"
echo "Usage: luffa.sh [DIRTOSCRUB]"
echo ""
echo "Searches DIRTOSCRUB for category specific words in $HITLIST_FILE."
echo ""
echo "EXAMPLE: luffa.sh dirtoscrub"
echo ""
echo "--help display this help and exit"
echo "--version display version information and exit"
}
version ()
{
echo "luffa.sh v1.0"
}
process ()
{
if [ ${#FILEARG} -lt 1 ] # check for proper number of args
then
echo "ERROR: Specify directory to be searched."
help
exit 1
else
SEARCH_DIR=${ARGLIST[0]}
fi
echo ""
echo "--------------------------------------------------------- Luffa ---------------------------------------------------" | tee -a "$OUTPUT_FILE"
echo "search command: find [DIRTOSCRUB] -type f -print0 | xargs -0 grep -HniI --color=always $word | tee -a ../hits.txt | more" | tee -a "$OUTPUT_FILE"
echo
echo " .,,:::::." | tee -a "$OUTPUT_FILE"
echo " .,,::::~:::::.." | tee -a "$OUTPUT_FILE"
echo " ,,::::~~~~~~::~~:::." | tee -a "$OUTPUT_FILE"
echo " ,:,:~:~~~~~~~~~~~~~~::." | tee -a "$OUTPUT_FILE"
echo " ,,:::~:~~~~~~~~~~~~~~~~~~," | tee -a "$OUTPUT_FILE"
echo " .,,::::~~~~~~~~~~~~~~~~~~~~~~::" | tee -a "$OUTPUT_FILE"
echo " .,::~:~~~~~=~~~~=~~~~~~~~~~~=~~~~." | tee -a "$OUTPUT_FILE"
echo " ,::::~~:~~~=~~~~~~~~=~~=~~~===~~~~~~." | tee -a "$OUTPUT_FILE"
echo " ..:::~~~~=~~=~~~~~~=~~~~=~~===~~==~~~~~~," | tee -a "$OUTPUT_FILE"
echo " .,:::~~~~~~~~~~~~~~~~=~=~~~=~====~===~~~~~~~." | tee -a "$OUTPUT_FILE"
echo " .,::~~~~~~~~~~~~~~=~=~~~~~=~======~=~~~~=~=~~~:" | tee -a "$OUTPUT_FILE"
echo " ..,::~:~~~~~~=~~~=~~~~~~~~=~====+======~===~~~~~~~." | tee -a "$OUTPUT_FILE"
echo " ..,:,:~~~~~~=~::~~=~=~~~=~~=~=~=~======~~~==~~~~~~::." | tee -a "$OUTPUT_FILE"
echo " ,,.::~:=~~~~~~~~~~~~=~=~===~~~====+==~=====~~~~~::,." | tee -a "$OUTPUT_FILE"
echo " ,,,,:I++=:~==~=~~~~~~=~:==~=~+~====~=~===~~~~:~::,:" | tee -a "$OUTPUT_FILE"
echo " .,:+++?77+?=~~~~=~~=~=~~=~~+=~+~~+====~=~~~:::::,::," | tee -a "$OUTPUT_FILE"
echo " ..++++?++?II?=~~=~~~=~~~====~===~=====~~~:~::::::::,." | tee -a "$OUTPUT_FILE"
echo " ..=++?++++++???7+~~~~~~~~+~=~=====~~~~~~~~~::::~:::,,.." | tee -a "$OUTPUT_FILE"
echo " .=+++++++++++++++===:~~=~==+~~=~=~~:~~=~:~:::~::::,,.." | tee -a "$OUTPUT_FILE"
echo " .++++++?++++++?++=?~:~~~~===~==~==~~~~~:::::::::,,,..." | tee -a "$OUTPUT_FILE"
echo " ..=?+++++??+++++++===~::~~~~~~=~~~~~~:~~:::::,:,,,,,." | tee -a "$OUTPUT_FILE"
echo " ...=+?+++++++++=====~:,::,~:::~~~~~:~~~~::::~::::,,,,.." | tee -a "$OUTPUT_FILE"
echo " .=+++++++++++===~==::::,::~~,,,::~~~~~~::::::~:,:,,.." | tee -a "$OUTPUT_FILE"
echo " ..++++++++++=+===~,.,,:::,:~~~~~,.,:~:~::::::,::,:,.." | tee -a "$OUTPUT_FILE"
echo " ...++?++++++++=+=~~. ..,,,,,:,~,::~,:::,:,:,~::::,,.." | tee -a "$OUTPUT_FILE"
echo " .++++++++?++====~. ...,,:,~::~=::,::,:,:::,,,,.." | tee -a "$OUTPUT_FILE"
echo ".++?+++++?++++==~.. .,.:,,:::~,:,,,:::::,,,." | tee -a "$OUTPUT_FILE"
echo "++++++?+???==~=. ...,::~~~:,,:,:::,,." | tee -a "$OUTPUT_FILE"
echo "?+++?????+==~. ..,,,,::,:,,,,,." | tee -a "$OUTPUT_FILE"
echo "+?+++??+==~. ..,,,,,,,,." | tee -a "$OUTPUT_FILE"
echo "+I???+==~. ..,,.." | tee -a "$OUTPUT_FILE"
echo "??++==~." | tee -a "$OUTPUT_FILE"
echo "+===~." | tee -a "$OUTPUT_FILE"
echo "+=~." | tee -a "$OUTPUT_FILE"
echo "~" | tee -a "$OUTPUT_FILE"
echo "--------------------------------------------------------------------------------------------------------------------------" | tee -a "$OUTPUT_FILE"
echo "" | tee -a "$OUTPUT_FILE"
# Loop through hitlist
while read -re hitList || [[ -n "$hitList" ]]
do
# If first character is "#" it's a comment, or line is blank, skip
if [ "$(echo $hitListWords | head -c 1)" != "#" ]; then
if [ ! -z "$hitListWords" -a "$hitListWords" != "" ]; then
# Parse comma delimited category specific hitlist
IFS=',' read -ra categoryWords <<< "$hitListWords"
# Search for occurences/hits for the hitList word
for categoryWord in "${categoryWords[#]}"; do
echo "---------------------------------------------------" | tee -a "$OUTPUT_FILE"
echo "$category - \"$categoryWord"\" | tee -a "$OUTPUT_FILE"
echo "---------------------------------------------------" | tee -a "$OUTPUT_FILE"
eval 'find "$SEARCH_DIR" -type f -print0 | xargs -0 grep -HniI "$categoryWord" >> "$OUTPUT_FILE"'
eval 'find "$SEARCH_DIR" -type f -print0 | xargs -0 grep -HniI --color=always "$categoryWord" | more'
echo "" | tee -a "$OUTPUT_FILE"
done
fi
else
category="$(echo "$hitListWords" | cut -d "#" -f 2)"
fi
done < "$HITLIST_FILE"
exit $?
}
# Process the options
while [[ "${ARGLIST[0]}" == -* ]]; do
OPTION="${ARGLIST[0]}"
NUM_OPTS=1;
case $OPTION in
--version)
version
exit 0
;;
--help)
help
exit 0
;;
*)
help
exit 1
;;
esac
ARGLIST=(${ARGLIST[#]:$NUM_OPTS})
done
FILEARG=${ARGLIST[#]}
process
Related
How can I use wget to download specific files in a CSV file, and then store those files into specific directories?
I have been attempting to extract a CSV file full of URL's of images (about 1000). Each row is a specific product with the first cell labelled "id". I have taken the ID of each line in excel and created directories for them using a loop with mkdir. My issue now is that I can't seem to figure out how to download the image, and then immediately store it into these folder's. What I am attempting here is to use wget by concatenating "fold_name" and "EXT" to get it like a directory "/name_of_folder", and then getting the links to the images (in cell 5,6,7 and 8) and then using wget from these cells, into the directory. Can anyone assist me with this? I think this should be straight forward enough. Thank you! #!/usr/bin/bash EXT='/' while read line do fold_name= cut -d$',' -f1 concat= "%EXT" + "%fold_name" img1= cut -d$',' -f5 img2= cut -d$',' -f6 img3= cut -d$',' -f7 img4= cut -d$',' -f8 wget -O "%img1" "%concat" wget -O "%img2" "%concat" wget -O "%img1" "%concat" wget -O "%img2" "%concat" done < file.csv
You might use -P switch to designate target directory, consider following simple example using some files from test-images/png repository mkdir -p black mkdir -p gray mkdir -p white wget -P black https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png wget -P gray https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png wget -P white https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png will lead to following structure black cs-black-000.png gray cs-gray-7f7f7f.png white cs-white-fff.png
You should use variables names that are less ambiguous. You need to provide the directory as part of the output filename. "%" is not a bash variable designator. That is a formatting directive (for bash, awk, C, etc.). The following will provide what you want. #!/usr/bin/bash DBG=1 INPUT="${1}" INPUT="file.csv" cat >"${INPUT}" <<"EnDoFiNpUt" #topic_1,junk01,junk02,junk03,img_101.png,img_102.png,img_103.png,img_104.png #topic_2,junk04,junk05,junk06,img_201.png,img_202.png,img_203.png,img_204.png # topic_1,junk01,junk02,junk03,https://raw.githubusercontent.com/test-images/png/main/202105/cs-black-000.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-gray-7f7f7f.png,https://raw.githubusercontent.com/test-images/png/main/202105/cs-white-fff.png EnDoFiNpUt if [ ${DBG} -eq 1 ] then echo -e "\n Input file:" cat "${INPUT}" | awk '{ printf("\t %s\n", $0 ) ; }' echo -e "\n Hit return to continue ..." ; read k fi REPO_ROOT='/tmp' grep -v '^#' "${INPUT}" | while read line do topic_name=$(echo "${line}" | cut -f1 -d\, ) test ${DBG} -eq 1 && echo -e "\t topic_name= ${topic_name} ..." folder="${REPO_ROOT}/${topic_name}" test ${DBG} -eq 1 && echo -e "\t folder= ${folder} ..." if [ ! -d "${folder}" ] then mkdir "${folder}" else rm -f "${folder}/"* fi if [ ! -d "${folder}" ] then echo -e "\n Unable to create directory '${folder}' for saving downloads.\n Bypassing 'wget' actions ..." >&2 else test ${DBG} -eq 1 && ls -ld "${folder}" | awk '{ printf("\n\t %s\n", $0 ) ; }' url1=$(echo "${line}" | cut -d\, -f5 ) url2=$(echo "${line}" | cut -d\, -f6 ) url3=$(echo "${line}" | cut -d\, -f7 ) url4=$(echo "${line}" | cut -d\, -f8 ) test ${DBG} -eq 1 && { echo -e "\n URLs extracted:" echo -e "\n\t ${url1}\n\t ${url2}\n\t ${url3}\n\t ${url4}" } #imageFile1=$( basename "${url1}" | sed 's+^img_+yourImagePrefix_+' ) #imageFile2=$( basename "${url2}" | sed 's+^img_+yourImagePrefix_+' ) #imageFile3=$( basename "${url3}" | sed 's+^img_+yourImagePrefix_+' ) #imageFile4=$( basename "${url4}" | sed 's+^img_+yourImagePrefix_+' ) imageFile1=$( basename "${url1}" | sed 's+^cs-+yourImagePrefix_+' ) imageFile2=$( basename "${url2}" | sed 's+^cs-+yourImagePrefix_+' ) imageFile3=$( basename "${url3}" | sed 's+^cs-+yourImagePrefix_+' ) test ${DBG} -eq 1 && { echo -e "\n Image filenames assigned:" #echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}\n\t ${imageFile4}" echo -e "\n\t ${imageFile1}\n\t ${imageFile2}\n\t ${imageFile3}" } test ${DBG} -eq 1 && { echo -e "\n WGET process log:" } ### This form of wget does NOT work for me, although man page says it should. #wget -P "${folder}" -O "${imageFile1}" "${url1}" ### This form of wget DOES work for me wget -O "${folder}/${imageFile1}" "${url1}" wget -O "${folder}/${imageFile2}" "${url2}" wget -O "${folder}/${imageFile3}" "${url3}" #wget -O "${folder}/${imageFile3}" "${url3}" test ${DBG} -eq 1 && { echo -e "\n Listing of downloaded files:" ls -l /tmp/topic* 2>>/dev/null | awk '{ printf("\t %s\n", $0 ) ; }' } fi done The script is adapted for what I had to work with. :-)
How can I convert my output to JSON then save it as.json file
Get System Information Here is my script: #!/bin/bash echo -e "Manufacturer:\t"`cat /sys/class/dmi/id/chassis_vendor` echo -e "Product Name:\t"`cat /sys/class/dmi/id/product_name` echo -e "Version:\t"`cat /sys/class/dmi/id/bios_version` echo -e "Serial Number:\t"`cat /sys/class/dmi/id/product_serial` echo -e "PC Name:\t"`hostname` echo -e "Operating System:\t"`hostnamectl | grep "Operating System" | cut -d ' ' -f5-` echo -e "Architecture:\t"`arch` echo -e "Processor Name:\t"`awk -F':' '/^model name/ {print $2}' /proc/cpuinfo | uniq | sed -e 's/^[ \t]*//'` echo -e "Memory:\t" `dmidecode -t 17 | grep "Size.*MB" | awk '{s+=$2} END {print s / 1024 "GB"}'` echo -e "HDD Model:\t" `cat /sys/block/sda/device/model` echo -e "System Main IP:\t"`hostname -I` I Want to Display my Output like this ({"Manufacturer":"Lenovo","Product Name":"Thinkpad":"Version":"T590","Serial Number":"1234567890" }) Thanks in advance for your help!
Here is a pure bash option. escape takes care of " in either key or value. member generates a key: value pair, and members separate members with comma: escape() { echo -n "${1/\"/\\\"}" } member() { echo -en "\"$(escape "$1")\":\"$(escape "$2")\"\x00" } members() { local sep='' echo -n "{" while read -d $'\0' member do echo -n "${sep}$member" sep=, done echo -n "}" } declare -A a=( [Manufacturer]=`cat /sys/class/dmi/id/chassis_vendor` [Product Name]=`cat /sys/class/dmi/id/product_name` [Version]=`cat /sys/class/dmi/id/bios_version` ) for k in "${!a[#]}" do member "$k" "${a[$k]}" done | members > as.JSON
Store all the echo statements in one variable or yet file this return an array of json cat /path/to_your-text file | jq --raw-input . | jq --slurp
wget bash function without messy output
I am learning to customize wget in a bash function and having trouble. I would like to display Downloading (file):% instead of the messy output of wget. The function below seems close I am having trouble calling it for my specific needs. For example, my standard wget is: cd 'C:\Users\cmccabe\Desktop\wget' wget -O getCSV.txt http://xxx.xx.xxx.xxx/data/getCSV.csv and that downloads the .csv as a .txt in the directory specified with all the messy wget output. This function seems like it will do more-or-less what I need, but I can not seem to get it to function correctly using my data. Below is what I have tried. Thank you :). #!/bin/bash download() { local url=$1 wget -O getCSV.txt http://xxx.xx.xxx.xxx/data/getCSV.csv local destin=$2 'C:\Users\cmccabe\Desktop\wget' echo -n " " if [ "$destin" ]; then wget --progress=dot "$url" -O "$destin" 2>&1 | grep --line-buffered "%" | \ sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}' else wget --progress=dot "$url" 2>&1 | grep --line-buffered "%" | \ sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}' fi echo -ne "\b\b\b\b" echo " DONE" } EDITED CODE #!/bin/bash download () { url=http://xxx.xx.xxx.xxx/data/getCSV.csv destin='C:\Users\cmccabe\Desktop\wget' echo -n " " if [ "$destin" ]; then wget -O getCSV.txt --progress=dot "$url" -O "$destin" 2>&1 | grep --line-buffered "%" | \ sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}' else wget -O getCSV.txt --progress=dot $url 2>&1 | grep --line-buffered "%" | \ sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}' fi echo -ne "\b\b\b\b" echo " DONE" menu } menu() { while true do printf "\n Welcome to NGS menu (v1), please make a selection from the MENU \n ==================================\n\n \t 1 Patient QC\n ==================================\n\n" printf "\t Your choice: "; read menu_choice case "$menu_choice" in 1) patient ;; *) printf "\n Invalid choice."; sleep 2 ;; esac done }
unix (cygwin) fifo buffering
Looking for an intercepting proxy made with netcat I found this script: #!/bin/sh -e if [ $# != 3 ] then echo "usage: $0 <src-port> <dst-host> <dst-port>" exit 0 fi TMP=`mktemp -d` BACK=$TMP/pipe.back SENT=$TMP/pipe.sent RCVD=$TMP/pipe.rcvd trap 'rm -rf "$TMP"' EXIT mkfifo -m 0600 "$BACK" "$SENT" "$RCVD" sed 's/^/ => /' <"$SENT" & sed 's/^/<= /' <"$RCVD" & nc -l -p "$1" <"$BACK" | tee "$SENT" | nc "$2" "$3" | tee "$RCVD" >"$BACK" Which work nicely, as expected. Since I need to look closely to the encoding used, hence the actual bytes passing, I tried to change some lines to use hexdump -vC: #!/bin/sh -e if [ $# != 3 ] then echo "usage: $0 <src-port> <dst-host> <dst-port>" exit 0 fi TMP=`mktemp -d` BACK=$TMP/pipe.back SENT=$TMP/pipe.sent RCVD=$TMP/pipe.rcvd trap 'rm -rf "$TMP"' EXIT mkfifo -m 0600 "$BACK" "$SENT" "$RCVD" ( hexdump -vC | sed 's/^/ => /' ) <"$SENT" & ( hexdump -vC | sed 's/^/<= /' ) <"$RCVD" & nc -l -p "$1" <"$BACK" | tee "$SENT" | nc "$2" "$3" | tee "$RCVD" >"$BACK" Now it's not working anymore. Actually, I've lost the "realtime" feature of the previous script. Every byte sent is dumped in a single batch; then every byte received in another batch; and this all only after the connection is closed. I'm suspecting some sort of buffering occurs in the pipe (|), but I'm not sure how to: test this hypotesis; fix the script to make it work in realtime again. PS1. I'm using cygwin. PS2. sh --version outputs: GNU bash, version 4.1.10(4)-release (i686-pc-cygwin) Edit: Removind the | sed ... part (that is, leaving only hexdump -vC <"$SENT" and hexdump -vC <"$RCVD") the realtime feature is back, increasing my suspicion over the pipeline operator. But the output turns out to be confusing since sent and received bytes are mixed.
Still I couldn't manage to resolve the buffering (?) issue, but I could change the hexdump invocation to render the sed unnecessary: #!/bin/sh -e if [ $# != 3 ] then echo "usage: $0 <src-port> <dst-host> <dst-port>" exit 0 fi TMP=`mktemp -d` BACK=$TMP/pipe.back SENT=$TMP/pipe.sent RCVD=$TMP/pipe.rcvd trap 'rm -rf "$TMP"' EXIT mkfifo -m 0600 "$BACK" "$SENT" "$RCVD" hexdump -v -e '" => %08.8_Ax\n"' -e '" => %08.8_ax " 8/1 "%02x " " " 8/1 "%02x "' -e '" |" 16/1 "%_p" "|\n"' <"$SENT" & hexdump -v -e '"<= %08.8_Ax\n"' -e '"<= %08.8_ax " 8/1 "%02x " " " 8/1 "%02x "' -e '" |" 16/1 "%_p" "|\n"' <"$RCVD" & nc -l "$1" <"$BACK" | tee "$SENT" | nc "$2" "$3" | tee "$RCVD" >"$BACK" Yes, the new hexdump looks ugly, but works. This question for me is now open just for the sake of curiosity. I'm still willing to give the "correct answer" points to the one who explains (and fixes) the buffering (?) behavior.
Syntax error: “(” unexpected (expecting “fi”)
filein="users.csv" IFS=$'\n' if [ ! -f "$filein" ] then echo "Cannot find file $filein" else #... groups=(`cut -d: -f 6 "$filein" | sed 's/ //'`) fullnames=(`cut -d: -f 1 "$filein"`) userid=(`cut -d: -f 2 "$filein"`) usernames=(`cut -d: -f 1 "$filein" | tr [A-Z] [a-z] | awk '{print substr($1,1,1) $2}'`) #... for group in ${groups[*]} do grep -q "^$group" /etc/group ; let x=$? if [ $x -eq 1 ] then groupadd "$group" fi done #... x=0 created=0 for user in ${usernames[*]} do useradd -n -c ${fullnames[$x]} -g "${groups[$x]}" $user 2> /dev/null if [ $? -eq 0 ] then let created=$created+1 fi #... echo "${userid[$x]}" | passwd --stdin "$user" > /dev/null #... echo "Welcome! Your account has been created. Your username is $user and temporary password is \"$password\" without the quotes." | mail -s "New Account for $user" -b root $user x=$x+1 echo -n "..." sleep .25 done sleep .25 echo " " echo "Complete. $created accounts have been created." fi
I'm guessing the problem is that you're trying to capture command output in arrays without actually using command substitution. You want something like this: groups=( $( cut... ) ) Note the extra set of parentheses with $ in front of the inner set.