Bash - Strings, Commands and Escaping (oh, my!) - bash

I'm wasting so much time right now trying to figure out something so simple....
pseudo code (mixture of several syntax's, sorry):
cmd1 = "find /my/starting/path -type f | grep -v -f /my/exclude/files"
cmd2 = " nl -ba -s' ' "
cmd3 = " | xargs mv -t /move/here/dir "
echo run_command_and_return_output($cmd1$cmd2)
$cmd1$cmd3 # just this now...
# i don't actually want a function... but the name explains what i want to do
function run_command_and_return_output(){ /* magic */ }
this works....
FIND=$(find $LOG_DIR -type f | grep -v -f $EXCLUDE | nl -ba -s' ')
printf "%s\n" "$FIND"
this does not...
NL="nl -ba -s' '"
FIND=$(find $LOG_DIR -type f -mtime +$ARCH_AGE | grep -v -f $EXCLUDE | $NL)
printf "%s\n" "$FIND"
and neither does this...
NL='nl -ba -s'\'' '\'' '
this definitely does work, though:
find /my/starting/path -type f | grep -v -f /my/exclude/files | nl -ba -s' '
or
FIND=$(find $LOG_DIR -type f -mtime +$ARCH_AGE | grep -v -f $EXCLUDE | nl -ba -s' ' )

Short form: Expanding $foo unquoted runs the content through string-splitting and glob expansion, but not syntactical parsing. This means that characters which would do quoting and escaping in a different context aren't honored as syntax, but are only treated as data.
If you want to run a string through syntactical parsing, use eval -- but mind the caveats, which are large and security-impacting.
Much better is to use the right tools for the job -- building individual simple commands (not pipelines!) in shell arrays, and using functions as the composable unit for constructing complex commands. BashFAQ #50 describes these tools -- and goes into in-depth discussion on which of them is appropriate when.
To get a bit more concrete:
nl=( nl -ba -s' ' )
find_output=$(find "$log_dir" -type f -mtime "+$arch_age" | grep -v -f "$exclude" | "${nl[#]}")
printf "%s\n" "$find_output"
...would be correct, since it tracks the simple command nl as an array.

Related

escaping single quotes inside a sh -c call on Mac terminal

I'm trying to pipe a series of manipulations into an xargs call that I can use to swap the first value with the second using the sed command (sed is optional if there's a better way).
Basically I'm grabbing method signature in camel case and appending a prefix while trying to retain camel case.
So it should take...
originalMethodSignature
and replace it with...
givenOriginalMethodSignature
Because I'm using a series of pipes to find and modify the text, I was hoping to use multiple params with xargs, but it seems that most of the questions involving that use sh -c which would be fine but in order for the sed command to be interactive on a Mac terminal I need to use single quotes inside the shell calls' single quotes.
Something like this, where the double quotes preserve the functionality of the single quotes in the sed command...
echo "somePrecondition SomePrecondition" | xargs -L1 sh -c 'find ~/Documents/BDD/Definitions/ -type f -name "Given$1.swift" -exec sed -i "''" "'"s/ $0/ given$1/g"'" {} +'
assuming there's a file called "~/Documents/BDD/Definitions/GivenSomePrecondition.swift" with below code...
protocol GivenSomePrecondition { }
extension GivenSomePrecondition {
func somePrecondition() {
print("empty")
}
}
The first awk is going through a list of swift protocols that start with the Given keyword (e.g. GivenSomePrecondition), then they strip it down to "somePrecondition SomePrecondition" before hitting the final pipe. My intent is that the final xargs call can replace $0 with given$1 interactively (overwriting the file).
The original command in context...
awk '{ if ($1 ~ /^Given/) print $0;}' ~/Documents/Sell/SellUITests/BDDLite/Definitions/HasStepDefinitions.swift \
| tr -d "\t" \
| tr -d " " \
| tr -d "," \
| sort -u \
| xargs -I string sh -c 'str=$(echo string); echo ${str#"Given"}' \
| awk '{ print tolower(substr($1,1,1)) substr($1, 2)" "$1 }' \
| xargs -L1 sh -c '
find ~/Documents/Sell/SellUITests/BDDLite/Definitions/ \
-type f \
-name "Given$1.swift" \
-exec sed -i '' "'"s/ $0/ given$1/g"'" {} +'
You don't need xargs or sh -c, and taking them out reduces the amount of work involved.
echo "somePrecondition SomePrecondition" |
while read -r source replace; do
find ~/Documents/BDD/Definitions/ -type f -name "Given${replace}.swift" -print0 |
while IFS= read -r -d '' filename; do
sed -i '' -e "s/ ${source}/ given${replace}/g" "$filename"
done
done
However, to answer your questions as opposed to sidestepping it, you can write functions that use any kind of quotes you want, and export them into your subshell, either with export -f yourFunction in a parent process or by putting "$(declare -f yourFunction)" inside the string passed after bash -c (assuming that bash is the same shell used in the parent process defining those functions).
#!/usr/bin/env bash
replaceOne() {
local source replace
source=$1; shift || return
replace=$1; shift || return
sed -i '' -e "s/ $1/ given$2/g" "$#"
}
# substitute replaceOne into a new copy of bash, no matter what kind of quotes it has
bash -c "$(declare -f replaceOne)"'; replaceOne "$#"'

bash - forcing globstar asterisk expansion when passed to loop

I am attempting to write a script that tried to use globstar expressions to execute a command (for example ls)
#!/usr/bin/env bash
shopt -s globstar nullglob
DISCOVERED_EXTENSIONS=$(find . -type f -name '*.*' | sed 's|.*\.||' | sort -u | tr '\n' ' ' | sed "s| | ./\**/*.|g" | rev | cut -c9- | rev | echo "./**/*.$(</dev/stdin)")
IFS=$'\n'; set -f
for f in $(echo $DISCOVERED_EXTENSIONS | tr ' ' '\n'); do
ls $f;
done
unset IFS; set +f
shopt -u globstar nullglob
The script output is:
ls: ./**/*.jpg: No such file or directory
ls: ./**/*.mp4: No such file or directory
It is passing ls "./**/*.avi" instead of ls ./**/*.avi (no variable expansion). I attempted to use eval, envsubst and even used a custom expand function, to no avail
The result of echo "$DISCOVERED_EXTENSIONS" is:
./**/*.jpg ./**/*.mp4
What changes can be recommended so that value of $f is the result of glob expansion and not the expression itself?
EDIT: I'm keeping the question up as I have resolved my problem by not using globstar at all which solves my immediate problem but doesn't solve the question.
As pynexj points out, the set -f un-does shopt -s globstar nullglob so that makes the script I've written as non-functional 'cause removing set -f breaks this script
$f is the result of glob expansion
The result of glob expansion is a list of arguments. It could be saved in an array. Saving it is just calling a subshell and transfering data.
mapfile -t -d '' arr < <(bash -c 'printf "%s\0" '"$f")
ls "${arr[#]}"
Notes:
Do not do for i in $(....). Use a while IFS= read -r loop. Bashfaq how to read a stream line by line.
I have no idea what is going on at that DISCOVERED_EXTENSIONS long line, but I would find . -maxdepth 1 -type f -name '*.*' -exec bash -c 'printf "%s\n" "${0##*.}"' {} \; | sort -u.
I usually recommend using find instead of glubulation and working on pipelines/streams. I guess I would write it as: find . -maxdepth 1 -type f -name '*.*' -exec bash -c 'printf "%s\n" "${0##*.}"' {} \; | sort -u | while IFS= read -r ext; do find . -type f -name "*.$ext" | xargs -d '\n' ls; done

Bash grep -P with a list of regexes from a file

Problem: hundreds of thousands of files in hundreds of directories must be tested against a number of PCRE regexp to count and categorize files and to determine which of regex are more viable and inclusive.
My approach for a single regexp test:
find unsorted_test/. -type f -print0 |
xargs -0 grep -Pazo '(?P<message>User activity exceeds.*?\:\s+(?P<user>.*?))\s' |
tr -d '\000' |
fgrep -a unsorted_test |
sed 's/^.*unsorted/unsorted/' |
cut -d: -f1 > matched_files_unsorted_test000.txt ;
wc -l matched_files_unsorted_test000.txt
find | xargs allows to sidestep "the too many arguments" error for grep
grep -Pazo is the one doing the heavy lifing -P is for PCRE regex -a is to make sure files are read as text and -z -o are simply because it doesn't work otherwise with the filebase I have
tr -d '\000' is to make sure the output is not binary
fgrep -a is to get only the line with the filename
sed is to counteract sure the grep's awesome habit of appending trailing lines to each other (basically removes everything in a line before the filepath)
cut -d: -f1 cuts off the filepath only
wc -l counts the result size of the matched filelist
Result is a file with 10k+ lines like these: unsorted/./2020.03.02/68091ec4-cf04-4843-a4b2-95420756cd53 which is what I want in the end.
Obviously this is not very good, but this works fine for something made out of sticks and dirt. My main objective here is to test concepts and regex, not count for further scaling or anything, really.
So, since grep -P does not support -f parameter, I tried using the while read loop:
(while read regexline ;
do echo "$regexline" ;
find unsorted_test/. -type f -print0 |
xargs -0 grep -Pazo "$regexline" |
tr -d '\000' |
fgrep -a unsorted_test |
sed 's/^.*unsorted/unsorted/' |
cut -d: -f1 > matched_files_unsorted_test000.txt ;
wc -l matched_files_unsorted_test000.txt |
sed 's/^ *//' ;
done) < regex_1.txt
And as you can imagine - it fails spectacularly: zero matches for everything.
I've experimented with the quotemarks in the grep, with the loop type etc. Nothing.
Any help with the current code or suggestions on how to do this otherwise are very appreciated.
Thank you.
P.S. Yes, I've tried pcregrep, but it returns zero matches even on a single pattern. Dunno why.
You could do this which will be impossible slow:
find unsorted_test/. -type f -print0 |
while IFS= read -d '' -r file; do
while IFS= read -r regexline; do
grep -Pazo "$regexline" "$file"
done < regex_1.txt
done |
tr -d '\000' | fgrep -a unsorted_test... blablabla
Or for each line:
find unsorted_test/. -type f -print0 |
while IFS= read -d '' -r file; do
while IFS= read -r line; do
while IFS= read -r regexline; do
if grep -Pazo "$regexline" <<<"$line"; then
break
fi
done < regex_1.txt
done |
tr -d '\000' | fgrep -a unsorted_test... blablabl
Or maybe with xargs.
But I believe just join the regular expressions from the file with |:
find unsorted_test/. -type f -print0 |
{
regex=$(< regex_1.txt paste -sd '|')
# or maybe with braces
# regex=$(< regex_1.txt sed 's/.*/(&)/' | paste -sd '|')
xargs -0 grep -Pazo "$regex"
} |
....
Notes:
To read lines from file use IFS= read -r line. The -d '' option to read is bash syntax.
Lines with spaces, tabs and comments only after pipe are ignored. You can just put your commands on separate lines.
Use grep -F instead of deprecated fgrep.

bash: programmatically assemble list

I'm trying to write a shell script which is assembling a list that will later be passed to sort -n. If I do:
find . -type f -printf "%s\n" | sort -n
the output is sorted just as I expect. What I can't figure out is how to assemble the list from inside the script itself. Here is the current script which tries to sum up how much space is used in a directory, sorted by file extension:
#!/bin/sh
echo -n "Enter directory/path to analyze: "
read path
extList=` find $path -type f -print | awk ' BEGIN {FS="."}{ print $NF }' | grep -v '/' | sort | uniq `
for ext in $extList; do
byteList=`find $path -type f -name \*.$ext -printf '%s\n' `
sum=0
for b in $byteList; do
sum=$(( $sum + $b ))
done
sum=$(( $sum/1024 ))
list+=`printf " $sum KB $ext\n"`
done
echo $list | sort -n
I've tried a lot of things for the list+= line, but I don't get a true list. I wind up with everything appearing as a single line, unsorted.
Here's a Minimal, Complete, and Verifiable example of what you're seeing:
echo "$(printf 'foo\n')$(printf 'bar\n')"
Expected:
foo
bar
Actual:
foobar
This is because trailing linefeeds are stripped in the contents of $(..) and `..` command substitution.
Instead, you can use $'\n' or a literal linefeed. Both of these will correctly append a linefeed:
list+="foo"$'\n'
list+="bar
"
Once you fix that, here's your next MCVE:
list="foo
bar"
echo $list
Expected:
foo
bar
Actual:
foo bar
This is due to the lack of quoting in echo $list. It should be echo "$list".
However, none of this is the bash way of doing things. Instead of accumulating into a variable and then using the variable, just pipe the data. This is what you're doing:
list=""
for word in foo bar baz
do
list+="$word"$'\n'
done
echo "$list" | sort -n
This is more canonical:
for word in foo bar baz
do
echo "$word"
done | sort -n
One problem is that `cmd` strips trailing newlines. Another is that echo $list doesn't quote "$list", so newlines are printed as spaces.
There's no need to build a list variable to then sort it later, though. Instead, try sorting all of the loop's output.
for ext in $extList; do
...
printf " %s KB %s\n" "$sum" "$ext"
done | sort -n
I'd suggest not storing the extension list in a string either. You could use a function:
extList() {
find "$path" -maxdepth 1 -type f -printf '%P\n' | awk -F. 'NF>1 {print $NF}' | sort -u
}
extList | while IFS= read -r ext; do
...
done | sort -n
Or store them in an array:
readarray -t extList < <(find "$path" -maxdepth 1 -type f -printf '%P\n' | awk -F. 'NF>1 {print $NF}' | sort -u)
for ext in "${extList[#]}"; do
...
done | sort -n

Escape single quotes in long directory name then pass it to xargs [Bash 3.2.48]

In my directory I have subfolders, and I want to list all directories like this:
- ./subfolder
- ./subfolder/subsubfolder1
- ./subfolder/subsubfolder2
- ./subfolder/subsubfolder2/subsubsubfolder
I want to list this structure:
./fol'der/subfol'der/
Here is my code:
echo -n "" > myfile
find . -type d -print0 | xargs -0 -I# | cat | grep -v -P "^.$" | sed -e "s/'/\\\'/g" | xargs -I# echo "- #" >> myfile
The desired output would be like this:
- ./fol'der
- ./fol'der/subfol'der
But the output is:
- ./fol'der
- #
It seems like sed fails at the second occurrence of the single quote (') character, or something. I have no idea. Can you help me? (I'm on OS X 10.7.4.)
I've been grep-ing and sed-ing like an idiot. Thought about a little bit, and I came up with a much more simple solution, a for loop.
echo -n "" > myfile
for folder in $(find . -type d)
do
if [[ $folder != "." ]]
then
echo "- ${folder}" >> myfile
fi
done
My previous solution wasn't working with names containing whitespaces, so the correct one is:
echo -n "" > myfile
find . -type d -print0 | while read -d $'\0' folder
do
if [[ "${folder}" != "." ]]
then
echo "- ${folder}" >> myfile
fi
done
With GNU Parallel you can do:
find . -type d -print0 | parallel -q -0 echo '- '{}
Your output will be screwed up if you have any dirs with \n in its name. If you do not have any dirs with \n in the name you can do:
find . -type d -print | parallel -q echo '- '{}
The -q is only needed if you really need two spaces after '-'.
You can install GNU Parallel simply by:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
Watch the intro videos for GNU Parallel to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
This is on Linux, but it should work on OS X:
find . -type d -print0 | xargs -0 -I # echo '- #'
It works for me regardless of whether the last set of quotes are single or double.
Output:
- ./fol'der
- ./fol'der/subfol'der

Resources