Can any one help me to understand this? - shell

I'm new to shell scripting and I Can't understand those lines:
wc -l $x|sed 's/\s\+/|/g'
rc=`echo "$BTEQ_OUT"|grep "RC (return code)"| sed 's/ //g' | cut -d '=' -f2|tr -d "\r\n "`;

When you see a long pipeline, one useful technique for understanding it is to execute it piece by piece:
first, what's in $x?
echo $x
is that the name of a file?
ls -l $x
what does wc do?
wc -l $x
ok, what does the sed part do? (note, \s requires GNU sed)
wc -l $x | sed 's/\s\+/|/g'
Similarly:
echo "$BTEQ_OUT"
echo "$BTEQ_OUT"|grep "RC (return code)"
echo "$BTEQ_OUT"|grep "RC (return code)"| sed 's/ //g'
echo "$BTEQ_OUT"|grep "RC (return code)"| sed 's/ //g' | cut -d '=' -f2
echo "$BTEQ_OUT"|grep "RC (return code)"| sed 's/ //g' | cut -d '=' -f2|tr -d "\r\n ";

wc -l $x|sed 's/\s\+/|/g'
wc is a tools used for counting, with the -l flag, this will count the lines in a file or a string.
$x is the variable holding probably a file name to be passed into wc
| called 'pipe' passes the output of the command before as the input into the command after
sed is another scripting tool used to edit text in files.
's/\s\+/|/g' is regex which globally (g) substitutes any number of white space chars with pipe symbols '|'
This program does the following
Count how many lines are in $x and whatever you output replace empty characters with pipe symbols.
The fact that they expect multiple outputs from wc -l hints that $x might store more than one file ...
I'd suggest looking into what some of the other commands are and what they do, and how they interact. List below
echo
tr
cut
pipe

Related

Inline array substitution

I have file with a few lines:
x 1
y 2
z 3 t
I need to pass each line as paramater to some program:
$ program "x 1" "y 2" "z 3 t"
I know how to do it with two commands:
$ readarray -t a < file
$ program "${a[#]}"
How can i do it with one command? Something like that:
$ program ??? file ???
The (default) options of your readarray command indicate that your file items are separated by newlines.
So in order to achieve what you want in one command, you can take advantage of the special IFS variable to use word splitting w.r.t. newlines (see e.g. this doc) and call your program with a non-quoted command substitution:
IFS=$'\n'; program $(cat file)
As suggested by #CharlesDuffy:
you may want to disable globbing by running beforehand set -f, and if you want to keep these modifications local, you can enclose the whole in a subshell:
( set -f; IFS=$'\n'; program $(cat file) )
to avoid the performance penalty of the parens and of the /bin/cat process, you can write instead:
( set -f; IFS=$'\n'; exec program $(<file) )
where $(<file) is a Bash equivalent to to $(cat file) (faster as it doesn't require forking /bin/cat), and exec consumes the subshell created by the parens.
However, note that the exec trick won't work and should be removed if program is not a real program in the PATH (that is, you'll get exec: program: not found if program is just a function defined in your script).
Passing a set of params should be more organized :
In this example case I'm looking for a file containing chk_disk_issue=something etc.. so I set the values by reading a config file which I pass in as a param.
# -- read specific variables from the config file (if found) --
if [ -f "${file}" ] ;then
while IFS= read -r line ;do
if ! [[ $line = *"#"* ]]; then
var="$(echo $line | cut -d'=' -f1)"
case "$var" in
chk_disk_issue)
chk_disk_issue="$(echo $line | tr -d '[:space:]' | cut -d'=' -f2 | sed 's/[^0-9]*//g')"
;;
chk_mem_issue)
chk_mem_issue="$(echo $line | tr -d '[:space:]' | cut -d'=' -f2 | sed 's/[^0-9]*//g')"
;;
chk_cpu_issue)
chk_cpu_issue="$(echo $line | tr -d '[:space:]' | cut -d'=' -f2 | sed 's/[^0-9]*//g')"
;;
esac
fi
done < "${file}"
fi
if these are not params then find a way for your script to read them as data inside of the script and pass in the file name.

count all the lines in all folders in bash [duplicate]

wc -l file.txt
outputs number of lines and file name.
I need just the number itself (not the file name).
I can do this
wc -l file.txt | awk '{print $1}'
But maybe there is a better way?
Try this way:
wc -l < file.txt
cat file.txt | wc -l
According to the man page (for the BSD version, I don't have a GNU version to check):
If no files are specified, the standard input is used and no file
name is
displayed. The prompt will accept input until receiving EOF, or [^D] in
most environments.
To do this without the leading space, why not:
wc -l < file.txt | bc
Comparison of Techniques
I had a similar issue attempting to get a character count without the leading whitespace provided by wc, which led me to this page. After trying out the answers here, the following are the results from my personal testing on Mac (BSD Bash). Again, this is for character count; for line count you'd do wc -l. echo -n omits the trailing line break.
FOO="bar"
echo -n "$FOO" | wc -c # " 3" (x)
echo -n "$FOO" | wc -c | bc # "3" (√)
echo -n "$FOO" | wc -c | tr -d ' ' # "3" (√)
echo -n "$FOO" | wc -c | awk '{print $1}' # "3" (√)
echo -n "$FOO" | wc -c | cut -d ' ' -f1 # "" for -f < 8 (x)
echo -n "$FOO" | wc -c | cut -d ' ' -f8 # "3" (√)
echo -n "$FOO" | wc -c | perl -pe 's/^\s+//' # "3" (√)
echo -n "$FOO" | wc -c | grep -ch '^' # "1" (x)
echo $( printf '%s' "$FOO" | wc -c ) # "3" (√)
I wouldn't rely on the cut -f* method in general since it requires that you know the exact number of leading spaces that any given output may have. And the grep one works for counting lines, but not characters.
bc is the most concise, and awk and perl seem a bit overkill, but they should all be relatively fast and portable enough.
Also note that some of these can be adapted to trim surrounding whitespace from general strings, as well (along with echo `echo $FOO`, another neat trick).
How about
wc -l file.txt | cut -d' ' -f1
i.e. pipe the output of wc into cut (where delimiters are spaces and pick just the first field)
How about
grep -ch "^" file.txt
Obviously, there are a lot of solutions to this.
Here is another one though:
wc -l somefile | tr -d "[:alpha:][:blank:][:punct:]"
This only outputs the number of lines, but the trailing newline character (\n) is present, if you don't want that either, replace [:blank:] with [:space:].
Another way to strip the leading zeros without invoking an external command is to use Arithmetic expansion $((exp))
echo $(($(wc -l < file.txt)))
Best way would be first of all find all files in directory then use AWK NR (Number of Records Variable)
below is the command :
find <directory path> -type f | awk 'END{print NR}'
example : - find /tmp/ -type f | awk 'END{print NR}'
This works for me using the normal wc -l and sed to strip any char what is not a number.
wc -l big_file.log | sed -E "s/([a-z\-\_\.]|[[:space:]]*)//g"
# 9249133

why shell for expression cannot parse xargs parameter correctly

I have a black list to save tag id list, e.g. 1-3,7-9, actually it represents 1,2,3,7,8,9. And could expand it by below shell
for i in {1..3,7..9}; do for j in {$i}; do echo -n "$j,"; done; done
1,2,3,7,8,9
but first I should convert - to ..
echo -n "1-3,7-9" | sed 's/-/../g'
1..3,7..9
then put it into for expression as a parameter
echo -n "1-3,7-9" | sed 's/-/../g' | xargs -I # for i in {#}; do for j in {$i}; do echo -n "$j,"; done; done
zsh: parse error near `do'
echo -n "1-3,7-9" | sed 's/-/../g' | xargs -I # echo #
1..3,7..9
but for expression cannot parse it correctly, why is so?
Because you didn't do anything to stop the outermost shell from picking up the special keywords and characters ( do, for, $, etc ) that you mean to be run by xargs.
xargs isn't a shell built-in; it gets the command line you want it to run for each element on stdin, from its arguments. just like any other program, if you want ; or any other sequence special to be bash in an argument, you need to somehow escape it.
It seems like what you really want here, in my mind, is to invoke in a subshell a command ( your nested for loops ) for each input element.
I've come up with this; it seems to to the job:
echo -n "1-3,7-9" \
| sed 's/-/../g' \
| xargs -I # \
bash -c "for i in {#}; do for j in {\$i}; do echo -n \"\$j,\"; done; done;"
which gives:
{1..3},{7..9},
Could use below shell to achieve this
# Mac newline need special treatment
echo "1-3,7-9" | sed -e 's/-/../g' -e $'s/,/\\\n/g' | xargs -I# echo 'for i in {#}; do echo -n "$i,"; done' | bash
1,2,3,7,8,9,%
#Linux
echo "1-3,7-9" | sed -e 's/-/../g' -e 's/,/\n/g' | xargs -I# echo 'for i in {#}; do echo -n "$i,"; done' | bash
1,2,3,7,8,9,
but use this way is a little complicated maybe awk is more intuitive
# awk
echo "1-3,7-9,11,13-17" | awk '{n=split($0,a,","); for(i=1;i<=n;i++){m=split(a[i],a2,"-");for(j=a2[1];j<=a2[m];j++){print j}}}' | tr '\n' ','
1,2,3,7,8,9,11,13,14,15,16,17,%
echo -n "1-3,7-9" | perl -ne 's/-/../g;$,=",";print eval $_'

Speed up bash filter function to run commands consecutively instead of per line

I have written the following filter as a function in my ~/.bash_profile:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
exit 0
}
to find lines of anything piped into it matching a regular expression, and highlight matches using ANSI escape codes on a VT100-compatible terminal.
For example, the following finds and highlights the strings bin, U or 1 which are whole words in the last 10 lines of /etc/passwd:
tail /etc/passwd | hilite "\b(bin|[U1])\b"
However, the script runs very slowly as each line forks an echo, egrep and sed.
In this case, it would be more efficient to do egrep on the entire input, and then run sed on its output.
How can I modify my function to do this? I would prefer to not create any temporary files if possible.
P.S. Is there another way to find and highlight lines in a similar way?
sed can do a bit of grepping itself: if you give it the -n flag (or #n instruction in a script) it won't echo any output unless asked. So
while read line
do
echo $line | egrep "$1" | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
done
could be simplified to
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
EDIT:
Here's the whole function:
hilite() {
REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g");
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
}
That's all there is to it - no while loop, reading, grepping, etc.
If your egrep supports --color, just put this in .bash_profile:
hilite() { command egrep --color=auto "$#"; }
(Personally, I would name the function egrep; hence the usage of command).
I think you can replace the whole while loop with simply
sed -n "s/$REGEX_SED/\x1b[7m&\x1b[0m/gp"
because sed can read from stdin line-by-line so you don't need read
I'm not sure if running egrep and piping to sed is faster than using sed alone, but you can always compare using time.
Edit: added -n and p to sed to print only highlighted lines.
Well, you could simply do this:
egrep "$1" $line | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
But I'm not sure that it'll be that much faster ; )
Just for the record, this is a method using a temporary file:
hilite() {
export REGEX_SED=$(echo $1 | sed "s/[|()]/\\\&/g")
export FILE=$2
if [ -z "$FILE" ]
then
export FILE=~/tmp
echo -n > $FILE
while read line
do
echo $line >> $FILE
done
fi
egrep "$1" $FILE | sed "s/$REGEX_SED/\x1b[7m&\x1b[0m/g"
return $?
}
which also takes a file/pathname as the second argument, for case like
cat /etc/passwd | hilite "\b(bin|[U1])\b"

How to get "wc -l" to print just the number of lines without file name?

wc -l file.txt
outputs number of lines and file name.
I need just the number itself (not the file name).
I can do this
wc -l file.txt | awk '{print $1}'
But maybe there is a better way?
Try this way:
wc -l < file.txt
cat file.txt | wc -l
According to the man page (for the BSD version, I don't have a GNU version to check):
If no files are specified, the standard input is used and no file
name is
displayed. The prompt will accept input until receiving EOF, or [^D] in
most environments.
To do this without the leading space, why not:
wc -l < file.txt | bc
Comparison of Techniques
I had a similar issue attempting to get a character count without the leading whitespace provided by wc, which led me to this page. After trying out the answers here, the following are the results from my personal testing on Mac (BSD Bash). Again, this is for character count; for line count you'd do wc -l. echo -n omits the trailing line break.
FOO="bar"
echo -n "$FOO" | wc -c # " 3" (x)
echo -n "$FOO" | wc -c | bc # "3" (√)
echo -n "$FOO" | wc -c | tr -d ' ' # "3" (√)
echo -n "$FOO" | wc -c | awk '{print $1}' # "3" (√)
echo -n "$FOO" | wc -c | cut -d ' ' -f1 # "" for -f < 8 (x)
echo -n "$FOO" | wc -c | cut -d ' ' -f8 # "3" (√)
echo -n "$FOO" | wc -c | perl -pe 's/^\s+//' # "3" (√)
echo -n "$FOO" | wc -c | grep -ch '^' # "1" (x)
echo $( printf '%s' "$FOO" | wc -c ) # "3" (√)
I wouldn't rely on the cut -f* method in general since it requires that you know the exact number of leading spaces that any given output may have. And the grep one works for counting lines, but not characters.
bc is the most concise, and awk and perl seem a bit overkill, but they should all be relatively fast and portable enough.
Also note that some of these can be adapted to trim surrounding whitespace from general strings, as well (along with echo `echo $FOO`, another neat trick).
How about
wc -l file.txt | cut -d' ' -f1
i.e. pipe the output of wc into cut (where delimiters are spaces and pick just the first field)
How about
grep -ch "^" file.txt
Obviously, there are a lot of solutions to this.
Here is another one though:
wc -l somefile | tr -d "[:alpha:][:blank:][:punct:]"
This only outputs the number of lines, but the trailing newline character (\n) is present, if you don't want that either, replace [:blank:] with [:space:].
Another way to strip the leading zeros without invoking an external command is to use Arithmetic expansion $((exp))
echo $(($(wc -l < file.txt)))
Best way would be first of all find all files in directory then use AWK NR (Number of Records Variable)
below is the command :
find <directory path> -type f | awk 'END{print NR}'
example : - find /tmp/ -type f | awk 'END{print NR}'
This works for me using the normal wc -l and sed to strip any char what is not a number.
wc -l big_file.log | sed -E "s/([a-z\-\_\.]|[[:space:]]*)//g"
# 9249133

Resources