Extract values from a line containing a string in a specific column - bash

I have an output like this:
| Value | Value2 | Name1 | Type | Date | Status |
| Value1 | Value1 | Name1 | Type1 | Date | Success |
| Value2 | Value2 | Name2 | Type1 | Date | Failed |
| Value2 | Value2 | Name3 | Type1 | Date | Pending |
I want to get each column values in variables for each line containing status "Pending" in the last column.
Here the matching line would be:
| Value2 | Value2 | Name3 | Type1 | Date | Pending |
I want to get each column of this line in a variable:
myvar1=Value2
myvar2=Value2
myvar3=Name3
myvar4=Type1
myvar5=Date
What is the best way to do that?
Thanks

Simply:
while IFS= read -r line ;do
IFS='|' read -r foo myvar{1..6} foo <<<"$line"
[ "${myvar6}" ] && [ -z "${myvar6//*Pending*}" ] && echo "$line"
done <inputfile ;
Will print:
| Value2 | Value2 | Name3 | Type1 | Date | Pending |

I'm going to assume the output you mention comes from a command named your_command. If you have it in a file, for example, that command could be cat that_file.
I think that a switch inside a loop is a legible elegant solution.
your_command | (
while read line; do
case $line in
*'Pending |')
IFS='|' read -ra myvar <<< "$line"
echo ${myvar[1]}
echo ${myvar[2]}
echo ${myvar[3]}
echo ${myvar[4]}
echo ${myvar[5]}
;;
*)
echo ...IGNORED $line
;;
esac
done
)
The output with the example you have given is the following
...IGNORED | Value | Value2 | Name1 | Type | Date | Status |
...IGNORED | Value1 | Value1 | Name1 | Type1 | Date | Success |
...IGNORED | Value2 | Value2 | Name2 | Type1 | Date | Failed |
Value2
Value2
Name3
Type1
Date
If you don't want to use an array, because whatever reason, you can change the IFS='|' read -ra myvar <<< "$line" line for
myvar1=$(echo $line | cut -d'|' -f 2)
myvar2=$(echo $line | cut -d'|' -f 3)
myvar3=$(echo $line | cut -d'|' -f 4)
myvar4=$(echo $line | cut -d'|' -f 5)
myvar5=$(echo $line | cut -d'|' -f 6)

First you can select the line. If it is only one ending with "Pending", this would work:
line=$(grep '| Pending |$' file.txt | sed 's/\s*|\s*/|/g' | sed 's/^|//g')
The variable line now has only the values separeted with the pipe symbol, without the spaces around it and no pipe symbols at the beginning the line.
Then, if you do not use an array, you can manually assign the variables like
myvar1=$(echo $line | awk -F'|' '{print $1}')
myvar2=$(echo $line | awk -F'|' '{print $2}')
...
If there are many lines containing the keyword "Pending" you have to use an array or a dynamic structure instead of static variable names.

First, what you asked for:
$: while read -r myvar1 myvar2 myvar3 myvar4 myvar5 Pending
> do echo "myvar1=[$myvar1] myvar2=[$myvar2] myvar3=[$myvar3] myvar4=[$myvar4] myvar5=[$myvar5]"
> done < <( sed -n '/[|]\s*Pending\s*[|]\s*$/{ s,[ |], ,g; s/^ //; s/ $//; p; }' file )
myvar1=[Value2] myvar2=[Value2] myvar3=[Name3] myvar4=[Type1] myvar5=[Date]
The sed selects only the records you want (/[|]\s*Pending\s*[|]\s*$/) converts all the delimiter-crap to single spaces (s,[ |], ,g;, breaks if you have any spaces embedded in your data), strips leading and trailing delimiters (s/^ //; s/ $//;), and prints the result (-n says don't print by default, p; says do print this records now).
But I think you should seriously reconsider the spaces around your delimiter, unless you wanted to keep them as part of the data. I'd leave off the leading and maybe the trailing delimiter. I'd also really consider putting them into an array, though I do understand you may want to use the fields by name...just don't call them myvar1, etc.

Another option is this:
awk 'BEGIN { FS=OFS="|" } $(NF-1)~/Pending/ { gsub(/^\s*\|\s*/, "", $0); NF-=2; print $0; }' file.txt | while IFS='|' read myVar1 myVar2 myVar3 myVar4 myVar5
do
#Do something
done

Related

Dealing with linebreaks in while read line

Maybe I'm using the wrong tool for the job here...
My data looks like this (this is from a json file which has been converted to a csv):
"hostname1",1,""
"hostname2",1,""
"hostname3",0,"yay_some_text
more_text
more_text
"
The first column is the hostname, second is the exit code and the third the result. I usually do something like this and make a moderately pretty table:
cat tmp.file | ( while read line
do
name=$(echo $line | awk -F "," '{print $1}')
exit_code=$(echo $line | awk -F "," '{print $2}')
output=$(echo $line | awk -F "," '{print $3}')
#I can then do stuff with the output here and ultimately do this:
echo -e "|${name}\t|${exit_code}\t|${output}\t|"
done
)
However the third column is causing me no end of problems; I think regardless of what I do, the read line bit will make this impossible. Does anyone have a better method of sorting this? I'd ideally like to keep the linebreaks, but if thats going to be too hard, I'll happily replace them with commas.
Desired output (either is fine):
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text
more_text
more_text |
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text, more_text, more_text |
Whichever of these you prefer will work robustly* and efficiently using any awk in any shell on every UNIX box:
$ cat tst.awk
{ rec = rec $0 ORS }
/"$/ {
gsub(/[[:space:]]*"[[:space:]]*/,"",rec)
gsub(/,/," | ",rec)
printf "| %s |\n", rec
rec = ""
}
.
$ awk -f tst.awk file
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text
more_text
more_text |
.
$ cat tst.awk
{ rec = rec $0 RS }
/"$/ {
gsub(/[[:space:]]*"[[:space:]]*/,"",rec)
gsub(/,/," | ",rec)
gsub(RS,", ",rec)
printf "| %s |\n", rec
rec = ""
}
.
$ awk -f tst.awk file
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text, more_text, more_text |
*robustly assuming your quoted strings never contain commas or escaped double quotes, i.e. it looks like the example you provided and your existing code relies on.
$ gawk -v RS='"\n' -v FPAT='[^,]*|"[^"]*"' -v OFS=' | ' '
{gsub(/"/,""); $1=$1; print OFS $0 OFS}' file
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text
more_text
more_text
|
In your case, one way is , you can transform the file to a simpler structure before using
awk '/[^"]$/ { printf("%s", $0); next } 1' tmp.file | ( while read line
do
name=$(echo $line | awk -F ',' '{print $1}')
exit_code=$(echo $line | awk -F ',' '{print $2}')
output=$(echo $line | awk -F ',' '{print $3}')
#I can then do stuff with the output here and ultimately do this:
echo -e "|${name}\t|${exit_code}\t|${output}\t|"
done
)
If all you want to do is to display as a table, you can use column utility
awk '/[^"]$/ { printf("%s", $0); next } 1' tmp.file | column -t -o " | " -s ,
If you are so particular about the starting and ending seperator '|', you can simply pipe the output of this command to a sed|awk.

How to get nth line of a file in bash?

I want to extract from a file named datax.txt the second line being :
0/0/0/0/0/0 | 0/0/0/0/0/0 | 0/0/0/0/0/0
And then I want to store in 3 variables the 3 sequences 0/0/0/0/0/0.
How am I supposed to do?
Read the 2nd line into variables a,b and c.
read a b c <<< $(awk -F'|' 'NR==2{print $1 $2 $3}' datax)
the keys is to split the problem in two:
you want to get the nth line of a file -> see here
you want to split a line in chunks according to a delimiter -> that's the job of many tools, cut is one of them
For future questions, be sure to include a more complete dataset, here is one for now. I changed a bit the second line so that we can verify that we got the right column:
f.txt
4/4/4/4/4/4 | 4/4/4/4/4/4 | 4/4/4/4/4/4
0/0/0/0/a/0 | 0/0/0/0/b/0 | 0/0/0/0/c/0
8/8/8/8/8/8 | 8/8/8/8/8/8 | 8/8/8/8/8/8
8/8/8/8/8/8 | 8/8/8/8/8/8 | 8/8/8/8/8/8
Then a proper script building on the two key actions described above:
extract.bash
file=$1
target_line=2
# get the n-th line
# https://stackoverflow.com/questions/6022384/bash-tool-to-get-nth-line-from-a-file
line=$(cat $file | head -n $target_line | tail -1)
# get the n-th field on a line, using delimiter '|'
var1=$(echo $line | cut --delimiter='|' --fields=1)
echo $var1
var2=$(echo $line | cut --delimiter='|' --fields=2)
echo $var2
var3=$(echo $line | cut --delimiter='|' --fields=3)
echo $var3
aaand:
$ ./extract.bash f.txt
0/0/0/0/a/0
0/0/0/0/b/0
0/0/0/0/c/0
Please try the following:
IFS='|' read a b c < <(sed -n 2P < datax | tr -d ' ')
Then the variables a, b and c are assigned to each field of the 2nd line.
You can use sed to print a specific line of a file, so for your example on the second line:
sed -n -e 2p ./datax
Set the output of the sed to be a variable:
Var=$(sed -n -e 2p ./datax)
Then split the string into the 3 variables you need:
A="$(echo $Var | cut -d'|' -f1)"
B="$(echo $Var | cut -d'|' -f2)"
C="$(echo $Var | cut -d'|' -f3)"

shell - replace string with incrementing value

I have this String:
"a | a | a | a | a | a | a | a"
and I want to replace every " | " with an incrementing value like so:
"a0a1a2a3a4a5a6a"
I know I can use gsub to replace strings:
> echo "a | a | a | a | a | a | a | a" | awk '{gsub(/\ \|\ /, ++i)}1'
a1a1a1a1a1a1a1a
But it seems gsub only increments after each newline, so my solution for now would be first putting a newline after each " | ", then using gsub and deleting the newlines again:
> echo "a | a | a | a | a | a | a | a" | awk '{gsub(/\ \|\ /, " | \n")}1' | awk '{gsub(/\ \|\ /, ++i)}1' | tr -d '\n'
a1a2a3a4a5a6a7a
Which is honestly just disgusting...
Is there a better way to do this?
If perl is okay:
$ echo 'a | a | a | a | a | a | a | a' | perl -pe 's/ *\| */$i++/ge'
a0a1a2a3a4a5a6a
*\| * match | surrounded by zero or more spaces
e modifier allows to use Perl code in replacement section
$i++ use value of $i and increment (default value 0)
You can use awk like this:
s="a | a | a | a | a | a | a | a"
awk -F ' *\\| *' -v OFS="" '{s=""; for(i=1; i<NF; i++) s = s $i i-1; print s $i}' <<< "$s"
a0a1a2a3a4a5a6a
-F ' *\\| *' will sets | surrounded by optional spaces as input field separator.
for loop just goes through each field and appends field incrementing position after each field.
If using just sh is an option, then perhaps substitute until a fixed point is reached:
s=$1 # first argument passed to script, "a | a | a |..."
n=0
while true
do
prev=$s
s=${s%" | a"}
test "$s" = "$prev" && break
result=$result${n}"a"
n=$((n + 1))
done
echo $s$result
If this program lives in script file digits.sh,
$ sh digits.sh "a | a | a | a | a | a | a | a"
a0a1a2a3a4a5a6a
$
Another solution using awk
echo "a | a | a | a | a | a | a | a" |
awk -v RS="[ ]+[|][ ]+" '{printf "%s%s",(f?NR-2:""),$0; f=1}'
you get,
a0a1a2a3a4a5a6a

According to the specified text mode to find sth

I have same log file, like this:
2016-04-01 11:16:30.745:[11878][TEST][test]
2016-04-01 11:16:30.745:[11878][TEST][wait|hold|name(0x03154246) 101ms]
....
at first, I use grep wait to found the log
2016-04-01 11:16:30.745:[11878][TEST][wait|hold|name(0x03154246) 101ms]
then, how can i get the field value
value1: 2016-04-01 11:16:30.745
value2: 0x03154246
value3: 101
It's hard to figure out from what you wrote but something like may be suitable for you:
#!/usr/bin/env sh
set -e
string=$(grep wait "$1")
value1=$(echo "$string" | rev | cut -d ":" -f 2- | rev)
value2=$(echo "$string" | grep -o -E "\(0x.+\)" | sed 's,),,' | sed 's,(,,')
value3=$(echo "$string" | grep -o -E "[0-9]+ms]$" | sed 's,ms],,')
echo vaule1: "$value1"
echo vaule2: "$value2"
echo vaule3: "$value3"
Usage:
$ ./get.sh LOG
value1 is everything up to last :.
value2 are digits between last (0x and ).
value3 are digits before ms] at the end of the string.

BASH: Separate a line into words using tr

Part of my Bash assignment includes reading a text file, then separating each line into words and using them.
The words are separated by |, lines are separated by \n. We were told to use the tr command, but I couldn't find an elegant solution.
An example:
Hello | My | Name | Is | Bill
should give:
Hello
My
Name
Is
Bill
One word per iteration.
You only need one invocation of tr to do the job:
$ echo "Hello | My | Name | Is | Bill" | tr -cs '[:alpha:]' '\n'
Hello
My
Name
Is
Bill
$
The -c option is for 'the complement' of the characters in the first pattern; the -s option 'squeezes' out duplicate replacement characters. So, anything that's not alphabetic is converted to a newline, but consecutive newlines are squeezed to a single newline.
Clearly, if you need to keep 'Everyone else | can | call | me | Fred' with the two words in the first line of output, then you have to work considerably harder:
$ echo "Everyone else | can | call | me | Fred" |
> tr '|' '\n' |
> sed 's/ *$//;s/^ *//'
Everyone else
can
call
me
Fred
$
The sed script here removes leading and trailing blanks, leaving intermediate blanks unchanged. You can replace multiple blanks with a single blank if you need to, and so on and so forth. You can't use tr to conditionally replace a given character (to change some blanks and leave others alone, for example).
some other options:
awk:
awk -F'\\| ' -v OFS="\n" '$1=$1'
example:
kent$ echo "Hello | My | Name | Is | Bill" |awk -F'\\| ' -v OFS="\n" '$1=$1'
Hello
My
Name
Is
Bill
grep
grep -o '[^ |]*'
example:
kent$ echo "Hello | My | Name | Is | Bill"|grep -o '[^ |]*'
Hello
My
Name
Is
Bill
sed
sed 's/ | /\n/g'
example:
kent$ echo "Hello | My | Name | Is | Bill" |sed 's/ | /\n/g'
Hello
My
Name
Is
Bil
My favorite perl :)
echo "Hello | My | Name | Is | Bill" | perl -pe 's/\s*\|\s*/\n/g'
will remove the excessive spaces too, so
echo "Hello | My | Name | Is | Bill" | perl -pe 's/\s*\|\s*/\n/g' | cat -vet
will print
Hello$
My$
Name$
Is$
Bill$
Using tr:
echo "Hello | My | Name | Is | Bill" | tr -s '\| ' '\n'
OR if you decide to give awk a chance:
echo "Hello | My | Name | Is | Bill" | awk -F '\|' '{for (i=1; i<=NF; i++) {
sub(/ /, "", $i); print $i}}'
This code should do it, converts '|' to newline, remove leading/trailing space:
echo "Hello | My | Name | Is | Bill" | tr '|' '\n' | tr -d [:blank:]
File temp: Hello | My | Name | Is | Bill
$ cat temp | tr '|' '\n' | sed 's/^ *//g'
Hello
My
Name
Is
Bill
$
The sed part gets rid of leading spaces (because there is a space between the '|' and the word. This will also work for "Hello everyone | My | Name | Is | Bill":
$ cat temp | tr '|' '\n' | sed 's/^ *//g'
Hello everyone
My
Name
Is
Bill
$

Resources