What does the `l` option mean in GNU sed? - bash

I have read the sed manual for the -l command. There it says:
-l
--line-length=N
Specify the default line-wrap length for the l command. A length of 0 (zero) means to never wrap long lines. If not specified, it is taken to be 70.
I don't know how this is useful. Can someone give me an example?
I think it like this,but it result:
[root#kvm ~]# echo 'abcdefg' | sed -l 3 -n '/a/p'
abcdefg

Why not try it and see?
$ echo 'abcdefg' | sed -l 3 'l'
ab\
cd\
ef\
g$
abcdefg
$ echo 'abcdefg' | sed -l 4 'l'
abc\
def\
g$
abcdefg

From sed manual:
Commands which accept address ranges
...
l List out the current line in a ``visually unambiguous''
form.
l width List out the current line in a ``visually
unambiguous'' form,
breaking it at width characters. This is a GNU extension.
The -l N, --line-length=N option allows to specify the desired line-wrap length for the 'l' command (when the wrap-width argument is not explicitly provided in the sed script).
$ echo abcdefgh | sed -n 'l 5'
abcd\
efgh$
$ echo abcdefgh | sed -n -l 5 'l'
abcd\
efgh$
$ echo abcdefgh | sed -n -l 5 'l 3'
ab\
cd\
ef\
gh$

Related

How to use awk to find a char in a string in bash

I have a char variable called sign and a given string sub. I need to find out how many times this sign appears in the sub and cannot use grep.
For example:
sign = c
sub = mechanic cup cat
echo "$sub" | awk <code i am asking for> | wc -l
And the output should be 4 because c appears 4 times. What should be inside <>?
sign=c
sub='mechanic cup cat'
echo "$sub" |
awk -v sign="$sign" -F '' '{for (i=1;i<=NF;i++){if ($i==sign) cnt++}} END{print cnt}'
Edit:
Changes for the requirements in the comment:
Test if the length of sign is 1 (no = present). If true, change sign and sub to lowercase to ignore the case.
Use ${sign:0:1} to only pass the first character to awk.
sign=c
sub='mechanic Cup cat'
if [ "${#sign}" -eq 1 ]; then
sign=${sign,,}
sub=${sub,,}
fi
echo "$sub" |
awk -v sign="${sign:0:1}" -F '' '{for (i=1;i<=NF;i++){if ($i==sign) cnt++}} END{print cnt}'
A combination of Quasimodo's comment and Freddy's lower-case example:
$ sign=c
$ sub='mechanic Cup cat'
A tr + wc solution if ${sign} is a single character.
Count the number of times ${sign} shows up in ${sub}, ignoring case:
$ tr -cd [${sign,,}] <<< ${sub,,} | wc -c
4
Where:
${sign,,} & {sub,,} - convert to all lowercase
tr -cd [...] - find all characters listed inside the brackets ([]), -d says to delete/remove said characters while -c says to take the complement (ie, remove all but the characters in the brackets), so -cp [${sign,,] says to remove all but the character stored in ${sign}
<<< .... - here string (allows passing a variable/string in as an argument to tr
wc -c count the number of chracers
NOTE: This only works if ${sign} contains a single character.
A sed solution that should work regardless of the number of characters in ${sign}.
$ sub='mechanic Cup cat'
First we embed a new line character before each occurrence of ${sign,,}:
$ sign=c
$ sed "s/\(${sign,,}\)/\n\1/g" <<< ${sub,,}
me
chani
c
cup
cat
$ sign=cup
$ sed "s/\(${sign,,}\)/\n\1/g" <<< ${sub,,}
mechanic
cup cat
Where:
\(${sign,,}\) - find the pattern that matches ${sign} (all lowercase) and assign to position 1
\n\1 - place a newline (\n) in the stream just before our pattern in position 1
At this point we just want the lines that start with ${sign,,}, which is where tail +2 comes into play (ie, display lines 2 through n):
$ sign=c
$ sed "s/\(${sign,,}\)/\n\1/g" <<< ${sub,,} | tail +2
chani
c
cup
cat
$ sign=cup
$ sed "s/\(${sign,,}\)/\n\1/g" <<< ${sub,,} | tail +2
cup cat
And now we pipe to wc -l to get a line count (ie, count the number of times ${sign} shows up in ${sub} - ignoring case):
$ sign=c
$ sed "s/\(${sign,,}\)/\n\1/g" <<< ${sub,,} | tail +2 | wc -l
4
$ sign=cup
$ sed "s/\(${sign,,}\)/\n\1/g" <<< ${sub,,} | tail +2 | wc -l
1

Get exact output of a shell command

The bash manual says regarding command substitution:
Bash performs the expansion by executing command and replacing the command substitution with the standard output of the command, with any trailing newlines deleted.
Demonstration - 3 characters, newlines first:
$ output="$(printf "\n\nx")"; echo -n "$output" | wc -c
3
Here the newlines are not at the end, and do not get removed, so the count is 3.
Demonstration - 3 characters, newlines last:
$ output="$(printf "x\n\n")"; echo -n "$output" | wc -c
1
Here the newlines are removed from the end, so the count is 1.
TL;DR
What is a robust work-around to get the binary-clean output of a command into a variable?
Bonus points for Bourne shell compatibility.
The only way to do it in a "Bourne compatible" way is to use external utilities.
Beside writting one in c, you can use xxd and expr (for example):
$ output="$(printf "x\n\n"; printf "X")" # get the output ending in "X".
$ printf '%s' "${output}" | xxd -p # transform the string to hex.
780a0a58
$ hexstr="$(printf '%s' "${output}" | xxd -p)" # capture the hex
$ expr "$hexstr" : '\(.*\)..' # remove the last two hex ("X").
780a0a
$ hexstr="$(expr "$hexstr" : '\(.*\)..') # capture the shorter str.
$ printf "$hexstr" | xxd -p -r | wc -c # convert back to binary.
3
Shortened:
$ output="$(printf "x\n\n"; printf "X")"
$ hexstr="$(printf '%s' "${output}" | xxd -p )"
$ expr "$hexstr" : '\(.*\)..' | xxd -p -r | wc -c
3
The command xxd is being used for its ability to convert back to binary.
Note that wc will fail with many UNICODE characters (multibyte chars):
$ printf "Voilà" | wc -c
6
$ printf "★" | wc -c
3
It will print the count of bytes, not characters.
The length of a variable ${#var} will also fail in older shells.
Of course, to get this to run in a Bourne shell you must use `…` instead of $(…).
In bash, the ${parameter%word} form of Shell Parameter Expansion can be used:
$ output="$(printf "x\n\n"; echo X)"; echo -n "${output%X}" | wc -c
3
This is substitution is also specified by POSIX.1-2008.

Count of matching word, pattern or value from unix korn shell scripting is returning just 1 as count

I'm trying to get the count of a matching pattern from a variable to check the count of it, but it's only returning 1 as the results, here is what I'm trying to do:
x="HELLO|THIS|IS|TEST"
echo $x | grep -c "|"
Expected result: 3
Actual Result: 1
Do you know why is returning 1 instead of 3?
Thanks.
grep -c counts lines not matches within a line.
You can use awk to get a count:
x="HELLO|THIS|IS|TEST"
echo "$x" | awk -F '|' '{print NF-1}'
3
Alternatively you can use tr and wc:
echo "$x" | tr -dc '|' | wc -c
3
$ echo "$x" | grep -o '|' | grep -c .
3
grep -c does not count the number of matches. It counts the number of lines that match. By using grep -o, we put the matches on separate lines.
This approach works just as well with multiple lines:
$ cat file
hello|this|is
a|test
$ grep -o '|' file | grep -c .
3
The grep manual says:
grep, egrep, fgrep - print lines matching a pattern
and for the -c flag:
instead print a count of matching lines for each input file
and there is just one line that match
You don't need grep for this.
pipe_only=${x//[^|]} # remove everything except | from the value of x
echo "${#pipe_only}" # output the length of pipe_only
Try this :
$ x="HELLO|THIS|IS|TEST"; echo -n "$x" | sed 's/[^|]//g' | wc -c
3
With only one pipe with perl:
echo "$x" |
perl -lne 'print scalar(() = /\|/g)'

Split from 40900000 to 409-00-000

Does anybody knows a way to convert "40900000" to "409-00-000" with single command, sed or awk.
I already tried couple of ways with sed but no luck at all. I need to do this in a bulk, there is around 40k line and some of this lines are not proper, so they need to be fixed.
Thanks in advance
Using GNU sed, I would do it like this:
sed -r 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
# or, equivalently
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/' filename
The -r or -E enables extended regex mode, which avoids the need to escape all the parentheses
\1 is the first capture group (the bits in between the ( ))
[0-9] means the range zero to nine
{3} means three of the preceeding character or range
edit: Thanks for all the comments.
On other systems that lack the -r switch, or its alias -E, you have to escape the ( ) and { } above. That leaves you with:
sed 's/\([0-9]\{3\}\)\([0-9]\{2\}\)\([0-9]\{3\}\)/\1-\2-\3/' filename
At the expense of repetition, you can avoid some of the escapes by simply repeating the [0-9]:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/' filename
For the record, Perl is equally capable of doing this sort of thing:
perl -pwe 's/(\d{3})(\d{2})(\d{3})/$1-$2-$3/' filename
-p means print
-w means enable warnings
-e means execute one line
\d is the "digit" character class (zero to nine)
No need to run external commands, bash or ksh can do it themselves.
$ a=12345678
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
123-45-678
$ a=abc-de-fgh
$ [ ${#a} = 8 ] && { b=${a:0:3}-${a:3:2}-${a:5};a=$b;}
$ echo $a
abc-de-fgh
You can use sed, like this:
sed 's/\([0-9][0-9][0-9]\)\([0-9][0-9]\)\([0-9][0-9][0-9]\)/\1-\2-\3/'
or more succinctly, with extended regex syntax:
sed -E 's/([0-9]{3})([0-9]{2})([0-9]{3})/\1-\2-\3/'
For golfing:
$ echo "40900000" | awk '$1=$1' FIELDWIDTHS='3 2 3' OFS='-'
409-00-000
With sed:
sed 's/\(...\)\(..\)\(...\)/\1-\2-\3/'
The dot matches character, and the surrounding with \( and \) makes it a group. The \1 references the first group.
Just for the fun of it, an awk
echo "40900000" | awk '{a=$0+0} length(a)==8 {$0=substr(a,1,3)"-"substr(a,4,2)"-"substr(a,6)}1'
409-00-000
This test if there are 8 digits.
A more complex version (need gnu awk due to gensub):
echo "40900000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000
echo "409-00-000" | awk --re-interval '{print gensub(/([0-9]{3})([0-9]{2})([0-9]{3})/,"\\1-\\2-\\3","g")}'
409-00-000
Turnarround from STDIN:
echo "40900000" | grep -E "[0-9]{8}" | cut -c "1-3,4-5,6-8" --output-delimiter=-
from file:
grep -E "[0-9]{8}" filename | cut -c "1-3,4-5,6-8" --output-delimiter=-
But I prefect Tom Fenech's solution.

Extract numbers from filename

In BASH I thought to use sed, but can't figure how to extract pattern instead usual replace.
For example:
FILENAME = 'blah_blah_#######_blah.ext'
number of ciphers (in above example written with "#" substitute) could be either 7 or 10
I want to extract only the number
If all you need is to remove anything but digits, you could use
ls | sed -e s/[^0-9]//g
to get all digits grouped per filename (123test456.ext will become 123456), or
ls | egrep -o [0-9]+
for all groups of numbers (123test456.ext will turn up 123 and 456)
You can use this simple code:
filename=zc_adsf_qwer132467_xcvasdfrqw
echo ${filename//[^0-9]/} # ==> 132467
Just bash:
shopt -s extglob
filename=zc_adsf_qwer132467_xcvasdfrqw
tmp=${filename##+([^0-9])}
nums=${tmp%%+([^0-9])}
echo $nums # ==> 132467
or, with bash 4
[[ "$filename" =~ [0-9]+ ]] && nums=${BASH_REMATCH[0]}
Is there any number anywhere else in the file name? If not:
ls | sed 's/[^0-9][^0-9]*\([0-9][0-9]*\).*/\1/g'
Should work.
A Perl one liner might work a bit better because Perl simply has a more advanced regular expression parsing and will give you the ability to specify the range of digits must be between 7 and 10:
ls | perl -ne 's/.*\D+(\d{7,10}).*/$1/;print if /^\d+$/;'
$ ls -1
blah_blah_123_blah.ext
blah_blah_234_blah.ext
blah_blah_456_blah.ext
Having such files in a directory you run:
$ ls -1 | sed 's/blah_blah_//' | sed 's/_blah.ext//'
123
234
456
or with a single sed run:
$ ls -1 | sed 's/^blah_blah_\([0-9]*\)_blah.ext$/\1/'
This will work for you -
echo $FILENAME | sed -e 's/[^(0-9|)]//g' | sed -e 's/|/,/g'

Resources