BASH: Separate a line into words using tr - bash

Part of my Bash assignment includes reading a text file, then separating each line into words and using them.
The words are separated by |, lines are separated by \n. We were told to use the tr command, but I couldn't find an elegant solution.
An example:
Hello | My | Name | Is | Bill
should give:
Hello
My
Name
Is
Bill
One word per iteration.

You only need one invocation of tr to do the job:
$ echo "Hello | My | Name | Is | Bill" | tr -cs '[:alpha:]' '\n'
Hello
My
Name
Is
Bill
$
The -c option is for 'the complement' of the characters in the first pattern; the -s option 'squeezes' out duplicate replacement characters. So, anything that's not alphabetic is converted to a newline, but consecutive newlines are squeezed to a single newline.
Clearly, if you need to keep 'Everyone else | can | call | me | Fred' with the two words in the first line of output, then you have to work considerably harder:
$ echo "Everyone else | can | call | me | Fred" |
> tr '|' '\n' |
> sed 's/ *$//;s/^ *//'
Everyone else
can
call
me
Fred
$
The sed script here removes leading and trailing blanks, leaving intermediate blanks unchanged. You can replace multiple blanks with a single blank if you need to, and so on and so forth. You can't use tr to conditionally replace a given character (to change some blanks and leave others alone, for example).

some other options:
awk:
awk -F'\\| ' -v OFS="\n" '$1=$1'
example:
kent$ echo "Hello | My | Name | Is | Bill" |awk -F'\\| ' -v OFS="\n" '$1=$1'
Hello
My
Name
Is
Bill
grep
grep -o '[^ |]*'
example:
kent$ echo "Hello | My | Name | Is | Bill"|grep -o '[^ |]*'
Hello
My
Name
Is
Bill
sed
sed 's/ | /\n/g'
example:
kent$ echo "Hello | My | Name | Is | Bill" |sed 's/ | /\n/g'
Hello
My
Name
Is
Bil

My favorite perl :)
echo "Hello | My | Name | Is | Bill" | perl -pe 's/\s*\|\s*/\n/g'
will remove the excessive spaces too, so
echo "Hello | My | Name | Is | Bill" | perl -pe 's/\s*\|\s*/\n/g' | cat -vet
will print
Hello$
My$
Name$
Is$
Bill$

Using tr:
echo "Hello | My | Name | Is | Bill" | tr -s '\| ' '\n'
OR if you decide to give awk a chance:
echo "Hello | My | Name | Is | Bill" | awk -F '\|' '{for (i=1; i<=NF; i++) {
sub(/ /, "", $i); print $i}}'

This code should do it, converts '|' to newline, remove leading/trailing space:
echo "Hello | My | Name | Is | Bill" | tr '|' '\n' | tr -d [:blank:]

File temp: Hello | My | Name | Is | Bill
$ cat temp | tr '|' '\n' | sed 's/^ *//g'
Hello
My
Name
Is
Bill
$
The sed part gets rid of leading spaces (because there is a space between the '|' and the word. This will also work for "Hello everyone | My | Name | Is | Bill":
$ cat temp | tr '|' '\n' | sed 's/^ *//g'
Hello everyone
My
Name
Is
Bill
$

Related

How to print before and after dot text Unix

I am trying to print only specific output from sentence like below
Before and after dot text should be printed
InputVar="ABC SDFSG XYZ.AFGAJK JKK"
Expected output :
XYZ.AFGAJK
I am using cut command not working
echo "$InputVar" | cut -d'' -f2
Any other approach ?
Here are a few suggestions. awk with RS set to a space seems easiest. YMMV
$ echo "$InputVar" | cut -d ' ' -f 3
XYZ.AFGAJK
$ echo "$InputVar" | awk '/\./' RS=' '
XYZ.AFGAJK
$ echo "$InputVar" | awk '{for(i=1;i<=NF;i++) if(match($i,"\\.")) print $i}'
XYZ.AFGAJK
$ echo "$InputVar" | sed -n 's/.* \([^ .]*[.][^ .]*\) .*/\1/p'
XYZ.AFGAJK
Using cut:
If you really want to use cut, then you could try:
echo "$InputVar" | cut -d' ' -f3
Which uses a space character as a delimiter (you originally had an empty string, which is not allowed), and extracts field 3 rather than field 2.
Using grep:
You can use grep rather than cut, to match & extract specifically what you want:
echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+'
Explanation:
The -E option is for extended regex
The -o option is for extracting the matched component only
The regex matches a literal ., surrounded by a non-empty sequence of non-space characters
Comparing the two methods:
Either of these will work with your shown example. But, suppose the input string was instead:
InputVar="ABC SDFSG XYZ.AFGAJK JKK XYZ.ABC"
The version using grep would give all the matches (a literal . with non-space characters on either side).
Using cut however, you would need to specify the specific fields you want, i.e.
$ echo "$InputVar" | cut -d' ' -f3,5
XYZ.AFGAJK
XYZ.ABC
If you instead wanted just the n-th match, using the grep approach, you could use sed to select the n-th match, e.g.
$ echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+'
XYZ.AFGAJK
XYZ.ABC
$ echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+' | sed '1q;d'
XYZ.AFGAJK
$ echo "$InputVar" | grep -Eo '[^ ]+\.[^ ]+' | sed '2q;d'
XYZ.ABC

Dealing with linebreaks in while read line

Maybe I'm using the wrong tool for the job here...
My data looks like this (this is from a json file which has been converted to a csv):
"hostname1",1,""
"hostname2",1,""
"hostname3",0,"yay_some_text
more_text
more_text
"
The first column is the hostname, second is the exit code and the third the result. I usually do something like this and make a moderately pretty table:
cat tmp.file | ( while read line
do
name=$(echo $line | awk -F "," '{print $1}')
exit_code=$(echo $line | awk -F "," '{print $2}')
output=$(echo $line | awk -F "," '{print $3}')
#I can then do stuff with the output here and ultimately do this:
echo -e "|${name}\t|${exit_code}\t|${output}\t|"
done
)
However the third column is causing me no end of problems; I think regardless of what I do, the read line bit will make this impossible. Does anyone have a better method of sorting this? I'd ideally like to keep the linebreaks, but if thats going to be too hard, I'll happily replace them with commas.
Desired output (either is fine):
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text
more_text
more_text |
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text, more_text, more_text |
Whichever of these you prefer will work robustly* and efficiently using any awk in any shell on every UNIX box:
$ cat tst.awk
{ rec = rec $0 ORS }
/"$/ {
gsub(/[[:space:]]*"[[:space:]]*/,"",rec)
gsub(/,/," | ",rec)
printf "| %s |\n", rec
rec = ""
}
.
$ awk -f tst.awk file
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text
more_text
more_text |
.
$ cat tst.awk
{ rec = rec $0 RS }
/"$/ {
gsub(/[[:space:]]*"[[:space:]]*/,"",rec)
gsub(/,/," | ",rec)
gsub(RS,", ",rec)
printf "| %s |\n", rec
rec = ""
}
.
$ awk -f tst.awk file
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text, more_text, more_text |
*robustly assuming your quoted strings never contain commas or escaped double quotes, i.e. it looks like the example you provided and your existing code relies on.
$ gawk -v RS='"\n' -v FPAT='[^,]*|"[^"]*"' -v OFS=' | ' '
{gsub(/"/,""); $1=$1; print OFS $0 OFS}' file
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text
more_text
more_text
|
In your case, one way is , you can transform the file to a simpler structure before using
awk '/[^"]$/ { printf("%s", $0); next } 1' tmp.file | ( while read line
do
name=$(echo $line | awk -F ',' '{print $1}')
exit_code=$(echo $line | awk -F ',' '{print $2}')
output=$(echo $line | awk -F ',' '{print $3}')
#I can then do stuff with the output here and ultimately do this:
echo -e "|${name}\t|${exit_code}\t|${output}\t|"
done
)
If all you want to do is to display as a table, you can use column utility
awk '/[^"]$/ { printf("%s", $0); next } 1' tmp.file | column -t -o " | " -s ,
If you are so particular about the starting and ending seperator '|', you can simply pipe the output of this command to a sed|awk.

Shell Scripting array not printing proper values

I have this simple Shell Script where I am searching for ID and Port Number from the file and saving it in Array. However When I try to print them I am not getting expected results. I am looping the array to print the 1st and 2nd element and then increasing by two to print 3rd and 4th element. I also want to print them like each ID Port in separate line, like this:
ID Port
ID Port
My code is:
myarr=($(less radius-req | grep C4-3A-BE-18-C1-2D -B75 | grep '2018-11\|Port' | grep -v User | grep Source -B1 | awk -F "Port:|id=" '{print $2}' )); for ((i=0;i<"${#myarr[#]}";i+=2)) ; do echo $i; printf "%s\n" "${myarr[$i]}" "${myarr[$i+1]}" ; done;
Even If I try to echo the whole array I only see the last element, whereas I could print each individual element without an issue.
$ myarr=($(less radius-req | grep C4-3A-BE-18-C1-2D -B75 | grep '2018-11\|Port' | grep -v User | grep Source -B1 | awk -F "Port:|id=" '{print $2}' )); echo ${myarr[#]}
45210
$ myarr=($(less radius-req | grep C4-3A-BE-18-C1-2D -B75 | grep '2018-11\|Port' | grep -v User | grep Source -B1 | awk -F "Port:|id=" '{print $2}' )); echo ${myarr[0]}
19
$ myarr=($(less radius-req | grep C4-3A-BE-18-C1-2D -B75 | grep '2018-11\|Port' | grep -v User | grep Source -B1 | awk -F "Port:|id=" '{print $2}' )); echo ${myarr[1]}
45210
$ myarr=($(less radius-req | grep C4-3A-BE-18-C1-2D -B75 | grep '2018-11\|Port' | grep -v User | grep Source -B1 | awk -F "Port:|id=" '{print $2}' )); echo ${myarr[2]}
20
$ myarr=($(less radius-req | grep C4-3A-BE-18-C1-2D -B75 | grep '2018-11\|Port' | grep -v User | grep Source -B1 | awk -F "Port:|id=" '{print $2}' )); echo ${myarr[3]}
45210
From the output you give, I suspect that the problem is due to carriage return characters in the radius-req file. My guess is the file is from Windows (or maybe a web download), which uses carriage return + linefeed as a line terminator. Unix uses just linefeed (aka newline) as a terminator, and unix programs will treat the carriage return as part of the content of the line. Net result: you get things like "19<CR>" and "45210<CR>" as array values, and when you print them it prints them all over top of each other.
If I'm right about the problem, it's pretty easy to fix. Just replace less radius-req (which you shouldn't use anyway, see William Pursell's comment) with tr -d '\r' <radius-req. The tr command does character replacements, -d means just delete instead of replacing, and \r is its notation for the carriage return character. Result: it deletes the carriage returns before they have a chance to mess things up.

shell - replace string with incrementing value

I have this String:
"a | a | a | a | a | a | a | a"
and I want to replace every " | " with an incrementing value like so:
"a0a1a2a3a4a5a6a"
I know I can use gsub to replace strings:
> echo "a | a | a | a | a | a | a | a" | awk '{gsub(/\ \|\ /, ++i)}1'
a1a1a1a1a1a1a1a
But it seems gsub only increments after each newline, so my solution for now would be first putting a newline after each " | ", then using gsub and deleting the newlines again:
> echo "a | a | a | a | a | a | a | a" | awk '{gsub(/\ \|\ /, " | \n")}1' | awk '{gsub(/\ \|\ /, ++i)}1' | tr -d '\n'
a1a2a3a4a5a6a7a
Which is honestly just disgusting...
Is there a better way to do this?
If perl is okay:
$ echo 'a | a | a | a | a | a | a | a' | perl -pe 's/ *\| */$i++/ge'
a0a1a2a3a4a5a6a
*\| * match | surrounded by zero or more spaces
e modifier allows to use Perl code in replacement section
$i++ use value of $i and increment (default value 0)
You can use awk like this:
s="a | a | a | a | a | a | a | a"
awk -F ' *\\| *' -v OFS="" '{s=""; for(i=1; i<NF; i++) s = s $i i-1; print s $i}' <<< "$s"
a0a1a2a3a4a5a6a
-F ' *\\| *' will sets | surrounded by optional spaces as input field separator.
for loop just goes through each field and appends field incrementing position after each field.
If using just sh is an option, then perhaps substitute until a fixed point is reached:
s=$1 # first argument passed to script, "a | a | a |..."
n=0
while true
do
prev=$s
s=${s%" | a"}
test "$s" = "$prev" && break
result=$result${n}"a"
n=$((n + 1))
done
echo $s$result
If this program lives in script file digits.sh,
$ sh digits.sh "a | a | a | a | a | a | a | a"
a0a1a2a3a4a5a6a
$
Another solution using awk
echo "a | a | a | a | a | a | a | a" |
awk -v RS="[ ]+[|][ ]+" '{printf "%s%s",(f?NR-2:""),$0; f=1}'
you get,
a0a1a2a3a4a5a6a

how to parse a string in Shell script

I want to parse the following string in shell script.
VERSION=2.6.32.54-0.11.def
Here I want to get two value.
first = 263254
second = 11
I am using following to get the first value:
first=`expr substr $VERSION 1 9| sed "s/\.//g" |sed "s/\-//g"`
to get the second:
second=`expr substr $VERSION 10 6| sed "s/\.//g" |sed "s/\-//g"`
Using above code the output is:
first=263254
second=11
The result wont be consistent if version is changed to:
VERSION=2.6.32.54-0.1.def
Here second value will become 1d, but I want it give output of 1 only.
How can I directly parse the number after '-' and before '.d'?
$ first=$(echo $VERSION | cut -d- -f1 | sed 's/\.//g')
$ second=$(echo $VERSION | cut -d- -f2 | cut -d. -f2)
$ first=$(echo $VERSION | cut -d- -f1 | tr -d '.')
$ second=$(echo $VERSION | cut -d- -f2 | cut -d. -f2)
$ echo $first
263254
$ echo $second
11
you don't need multiple processes (sed|sed|sed...). single process with awk should work.
if you have VERSION=xxxx as string:
to get the first:
awk -F'[-=]' '{gsub(/\./,"",$2)}$0=$2'
to get the second:
awk -F'-|\\.def' '{split($2,a,".")}$0=a[2]'
test:
first:
kent$ echo "VERSION=2.6.32.54-0.1.def"|awk -F'[-=]' '{gsub(/\./,"",$2)}$0=$2'
263254
second
kent$ echo "VERSION=2.6.32.54-0.1.def"|awk -F'-|\\.def' '{split($2,a,".")}$0=a[2]'
1
kent$ echo "VERSION=2.6.32.54-0.1234.def"|awk -F'-|\\.def' '{split($2,a,".")}$0=a[2]'
1234
if you have VERSION=xxx as variable $VERSION:
first:
awk -F'-' '{gsub(/\./,"",$1)}$0=$1'
second:
awk -F'-|\\.def' '{split($2,a,".")}$0=a[2]'
test:
VERSION=2.6.32.54-0.1234.def
kent$ echo $VERSION|awk -F'-' '{gsub(/\./,"",$1)}$0=$1'
263254
7pLaptop 11:18:22 /tmp/test
kent$ echo $VERSION|awk -F'-|\\.def' '{split($2,a,".")}$0=a[2]'
1234
You should use regular expressions instead of the number of characters.
first=`sed 's/.//g' | sed 's/\(.*\)-.*/\1/'`
second=`sed 's/.//g' | sed 's/.*-\([0-9]*\).*/\1/'`
\(...\) are used to create a capturing group, and \1 output this group.
first=$(echo ${VERSION} | sed -e 's/^\([^-]*\)-0\.\([0-9]*\)\.def/\1/' -e 's/\.//g')
second=$(echo ${VERSION} | sed -e 's/^\([^-]*\)-0\.\([0-9]*\)\.def/\2/' -e 's/\.//g')
$ first=$(echo $VERSION | awk -F"\." '{gsub(/-.*/,"",$4);print $1$2$3$4}')
$ second=$(echo $VERSION | awk -F"\." '{print $5}' )

Resources