Dealing with linebreaks in while read line - bash

Maybe I'm using the wrong tool for the job here...
My data looks like this (this is from a json file which has been converted to a csv):
"hostname1",1,""
"hostname2",1,""
"hostname3",0,"yay_some_text
more_text
more_text
"
The first column is the hostname, second is the exit code and the third the result. I usually do something like this and make a moderately pretty table:
cat tmp.file | ( while read line
do
name=$(echo $line | awk -F "," '{print $1}')
exit_code=$(echo $line | awk -F "," '{print $2}')
output=$(echo $line | awk -F "," '{print $3}')
#I can then do stuff with the output here and ultimately do this:
echo -e "|${name}\t|${exit_code}\t|${output}\t|"
done
)
However the third column is causing me no end of problems; I think regardless of what I do, the read line bit will make this impossible. Does anyone have a better method of sorting this? I'd ideally like to keep the linebreaks, but if thats going to be too hard, I'll happily replace them with commas.
Desired output (either is fine):
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text
more_text
more_text |
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text, more_text, more_text |

Whichever of these you prefer will work robustly* and efficiently using any awk in any shell on every UNIX box:
$ cat tst.awk
{ rec = rec $0 ORS }
/"$/ {
gsub(/[[:space:]]*"[[:space:]]*/,"",rec)
gsub(/,/," | ",rec)
printf "| %s |\n", rec
rec = ""
}
.
$ awk -f tst.awk file
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text
more_text
more_text |
.
$ cat tst.awk
{ rec = rec $0 RS }
/"$/ {
gsub(/[[:space:]]*"[[:space:]]*/,"",rec)
gsub(/,/," | ",rec)
gsub(RS,", ",rec)
printf "| %s |\n", rec
rec = ""
}
.
$ awk -f tst.awk file
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text, more_text, more_text |
*robustly assuming your quoted strings never contain commas or escaped double quotes, i.e. it looks like the example you provided and your existing code relies on.

$ gawk -v RS='"\n' -v FPAT='[^,]*|"[^"]*"' -v OFS=' | ' '
{gsub(/"/,""); $1=$1; print OFS $0 OFS}' file
| hostname1 | 1 | |
| hostname2 | 1 | |
| hostname3 | 0 | yay_some_text
more_text
more_text
|

In your case, one way is , you can transform the file to a simpler structure before using
awk '/[^"]$/ { printf("%s", $0); next } 1' tmp.file | ( while read line
do
name=$(echo $line | awk -F ',' '{print $1}')
exit_code=$(echo $line | awk -F ',' '{print $2}')
output=$(echo $line | awk -F ',' '{print $3}')
#I can then do stuff with the output here and ultimately do this:
echo -e "|${name}\t|${exit_code}\t|${output}\t|"
done
)
If all you want to do is to display as a table, you can use column utility
awk '/[^"]$/ { printf("%s", $0); next } 1' tmp.file | column -t -o " | " -s ,
If you are so particular about the starting and ending seperator '|', you can simply pipe the output of this command to a sed|awk.

Related

Extract values from a line containing a string in a specific column

I have an output like this:
| Value | Value2 | Name1 | Type | Date | Status |
| Value1 | Value1 | Name1 | Type1 | Date | Success |
| Value2 | Value2 | Name2 | Type1 | Date | Failed |
| Value2 | Value2 | Name3 | Type1 | Date | Pending |
I want to get each column values in variables for each line containing status "Pending" in the last column.
Here the matching line would be:
| Value2 | Value2 | Name3 | Type1 | Date | Pending |
I want to get each column of this line in a variable:
myvar1=Value2
myvar2=Value2
myvar3=Name3
myvar4=Type1
myvar5=Date
What is the best way to do that?
Thanks
Simply:
while IFS= read -r line ;do
IFS='|' read -r foo myvar{1..6} foo <<<"$line"
[ "${myvar6}" ] && [ -z "${myvar6//*Pending*}" ] && echo "$line"
done <inputfile ;
Will print:
| Value2 | Value2 | Name3 | Type1 | Date | Pending |
I'm going to assume the output you mention comes from a command named your_command. If you have it in a file, for example, that command could be cat that_file.
I think that a switch inside a loop is a legible elegant solution.
your_command | (
while read line; do
case $line in
*'Pending |')
IFS='|' read -ra myvar <<< "$line"
echo ${myvar[1]}
echo ${myvar[2]}
echo ${myvar[3]}
echo ${myvar[4]}
echo ${myvar[5]}
;;
*)
echo ...IGNORED $line
;;
esac
done
)
The output with the example you have given is the following
...IGNORED | Value | Value2 | Name1 | Type | Date | Status |
...IGNORED | Value1 | Value1 | Name1 | Type1 | Date | Success |
...IGNORED | Value2 | Value2 | Name2 | Type1 | Date | Failed |
Value2
Value2
Name3
Type1
Date
If you don't want to use an array, because whatever reason, you can change the IFS='|' read -ra myvar <<< "$line" line for
myvar1=$(echo $line | cut -d'|' -f 2)
myvar2=$(echo $line | cut -d'|' -f 3)
myvar3=$(echo $line | cut -d'|' -f 4)
myvar4=$(echo $line | cut -d'|' -f 5)
myvar5=$(echo $line | cut -d'|' -f 6)
First you can select the line. If it is only one ending with "Pending", this would work:
line=$(grep '| Pending |$' file.txt | sed 's/\s*|\s*/|/g' | sed 's/^|//g')
The variable line now has only the values separeted with the pipe symbol, without the spaces around it and no pipe symbols at the beginning the line.
Then, if you do not use an array, you can manually assign the variables like
myvar1=$(echo $line | awk -F'|' '{print $1}')
myvar2=$(echo $line | awk -F'|' '{print $2}')
...
If there are many lines containing the keyword "Pending" you have to use an array or a dynamic structure instead of static variable names.
First, what you asked for:
$: while read -r myvar1 myvar2 myvar3 myvar4 myvar5 Pending
> do echo "myvar1=[$myvar1] myvar2=[$myvar2] myvar3=[$myvar3] myvar4=[$myvar4] myvar5=[$myvar5]"
> done < <( sed -n '/[|]\s*Pending\s*[|]\s*$/{ s,[ |], ,g; s/^ //; s/ $//; p; }' file )
myvar1=[Value2] myvar2=[Value2] myvar3=[Name3] myvar4=[Type1] myvar5=[Date]
The sed selects only the records you want (/[|]\s*Pending\s*[|]\s*$/) converts all the delimiter-crap to single spaces (s,[ |], ,g;, breaks if you have any spaces embedded in your data), strips leading and trailing delimiters (s/^ //; s/ $//;), and prints the result (-n says don't print by default, p; says do print this records now).
But I think you should seriously reconsider the spaces around your delimiter, unless you wanted to keep them as part of the data. I'd leave off the leading and maybe the trailing delimiter. I'd also really consider putting them into an array, though I do understand you may want to use the fields by name...just don't call them myvar1, etc.
Another option is this:
awk 'BEGIN { FS=OFS="|" } $(NF-1)~/Pending/ { gsub(/^\s*\|\s*/, "", $0); NF-=2; print $0; }' file.txt | while IFS='|' read myVar1 myVar2 myVar3 myVar4 myVar5
do
#Do something
done

Shell Script fixed space issues

I am trying to create a file with a header as well as some extracts from the input files.. I am always getting space alignment issues through I am using fixed space..
Any advise pls.. Thanks, here is the sample code
# print header
printf '%-50s | %-30s | %-30s | %-30s | %-30s | %-30s %-5s \n' "FileName" "Amount Rec" "Payments" "Total Adjustments" "Adjustment Amount" "File Date" "|" >> $msgfile
for file in "$SEARCH_DIR"/*; do
file=`basename "$file"`
recamt=$(awk -F "*" '/BPR/{print $3}' $file)
amount1=$(awk -F "*" '/TS3/{print $5}' $file)
amount1=$(awk -F "*" '/PLB/{print $7}' $file)
printf '%-50s | %-30s | %-30s | %-30s | %-30s | %-30s %-5s \n' "$file" "$recamt" "$amount1" "1" "$amount1" "$weekdate" "|" >> $msgfile
The output when I open it in notepad looks good, but when I send this as a email ( mailx), I see space issues the output.. Any thoughts ?
My output values as below..
xxxxxxx.xxxxx.xxxxxxxxx.txt | 906262.23 | 393 | 1 | 75297.4 | 06-Mar-2019 |

shell - replace string with incrementing value

I have this String:
"a | a | a | a | a | a | a | a"
and I want to replace every " | " with an incrementing value like so:
"a0a1a2a3a4a5a6a"
I know I can use gsub to replace strings:
> echo "a | a | a | a | a | a | a | a" | awk '{gsub(/\ \|\ /, ++i)}1'
a1a1a1a1a1a1a1a
But it seems gsub only increments after each newline, so my solution for now would be first putting a newline after each " | ", then using gsub and deleting the newlines again:
> echo "a | a | a | a | a | a | a | a" | awk '{gsub(/\ \|\ /, " | \n")}1' | awk '{gsub(/\ \|\ /, ++i)}1' | tr -d '\n'
a1a2a3a4a5a6a7a
Which is honestly just disgusting...
Is there a better way to do this?
If perl is okay:
$ echo 'a | a | a | a | a | a | a | a' | perl -pe 's/ *\| */$i++/ge'
a0a1a2a3a4a5a6a
*\| * match | surrounded by zero or more spaces
e modifier allows to use Perl code in replacement section
$i++ use value of $i and increment (default value 0)
You can use awk like this:
s="a | a | a | a | a | a | a | a"
awk -F ' *\\| *' -v OFS="" '{s=""; for(i=1; i<NF; i++) s = s $i i-1; print s $i}' <<< "$s"
a0a1a2a3a4a5a6a
-F ' *\\| *' will sets | surrounded by optional spaces as input field separator.
for loop just goes through each field and appends field incrementing position after each field.
If using just sh is an option, then perhaps substitute until a fixed point is reached:
s=$1 # first argument passed to script, "a | a | a |..."
n=0
while true
do
prev=$s
s=${s%" | a"}
test "$s" = "$prev" && break
result=$result${n}"a"
n=$((n + 1))
done
echo $s$result
If this program lives in script file digits.sh,
$ sh digits.sh "a | a | a | a | a | a | a | a"
a0a1a2a3a4a5a6a
$
Another solution using awk
echo "a | a | a | a | a | a | a | a" |
awk -v RS="[ ]+[|][ ]+" '{printf "%s%s",(f?NR-2:""),$0; f=1}'
you get,
a0a1a2a3a4a5a6a

doing an awk followed by a head command + bash

This gives me the duplicates and the number of times it is repeated
$ awk -F "\"*,\"*" '{print $2}' file.csv | sort | uniq -c | sort -nr | head -n 2
4 12345
3 56789
What I then want to do is add the first column (3+4). I can do this if i write the output above to a file test. I can do this as follows:
$ awk -F" " '{print $1}' test
4
3
$ awk -F" " '{print $1}' test | paste -sd+
4+3
$ awk -F" " '{print $1}' test | paste -sd+ | bc
7
But I want to be able to do this in 1 line, and ideally don't want to write to a file, would like to understand why the following does not work
awk -F "\"*,\"*" '{print $2}' file.csv | sort | uniq -c | sort -nr | head -n 2 | awk -F" " '{print $1}' | paste -sd+ | bc
My 2nd awk seems to not like the input.
Can anyone advise how I do this, and what I am doing wrong?
EDIT1 - file.csv looks like:
"Date","Number"
"2015-11-01","12345"
"2015-11-01","12345"
"2015-11-01","12345"
"2015-11-01","12345"
"2015-11-01","56789"
"2015-11-01","56789"
"2015-11-01","56789"
awk to the rescue!
... | sort -nr | awk 'NR<=2{sum+=$1} END{print sum}'
you can pick the first two rows and summation in awk as well.

BASH: Separate a line into words using tr

Part of my Bash assignment includes reading a text file, then separating each line into words and using them.
The words are separated by |, lines are separated by \n. We were told to use the tr command, but I couldn't find an elegant solution.
An example:
Hello | My | Name | Is | Bill
should give:
Hello
My
Name
Is
Bill
One word per iteration.
You only need one invocation of tr to do the job:
$ echo "Hello | My | Name | Is | Bill" | tr -cs '[:alpha:]' '\n'
Hello
My
Name
Is
Bill
$
The -c option is for 'the complement' of the characters in the first pattern; the -s option 'squeezes' out duplicate replacement characters. So, anything that's not alphabetic is converted to a newline, but consecutive newlines are squeezed to a single newline.
Clearly, if you need to keep 'Everyone else | can | call | me | Fred' with the two words in the first line of output, then you have to work considerably harder:
$ echo "Everyone else | can | call | me | Fred" |
> tr '|' '\n' |
> sed 's/ *$//;s/^ *//'
Everyone else
can
call
me
Fred
$
The sed script here removes leading and trailing blanks, leaving intermediate blanks unchanged. You can replace multiple blanks with a single blank if you need to, and so on and so forth. You can't use tr to conditionally replace a given character (to change some blanks and leave others alone, for example).
some other options:
awk:
awk -F'\\| ' -v OFS="\n" '$1=$1'
example:
kent$ echo "Hello | My | Name | Is | Bill" |awk -F'\\| ' -v OFS="\n" '$1=$1'
Hello
My
Name
Is
Bill
grep
grep -o '[^ |]*'
example:
kent$ echo "Hello | My | Name | Is | Bill"|grep -o '[^ |]*'
Hello
My
Name
Is
Bill
sed
sed 's/ | /\n/g'
example:
kent$ echo "Hello | My | Name | Is | Bill" |sed 's/ | /\n/g'
Hello
My
Name
Is
Bil
My favorite perl :)
echo "Hello | My | Name | Is | Bill" | perl -pe 's/\s*\|\s*/\n/g'
will remove the excessive spaces too, so
echo "Hello | My | Name | Is | Bill" | perl -pe 's/\s*\|\s*/\n/g' | cat -vet
will print
Hello$
My$
Name$
Is$
Bill$
Using tr:
echo "Hello | My | Name | Is | Bill" | tr -s '\| ' '\n'
OR if you decide to give awk a chance:
echo "Hello | My | Name | Is | Bill" | awk -F '\|' '{for (i=1; i<=NF; i++) {
sub(/ /, "", $i); print $i}}'
This code should do it, converts '|' to newline, remove leading/trailing space:
echo "Hello | My | Name | Is | Bill" | tr '|' '\n' | tr -d [:blank:]
File temp: Hello | My | Name | Is | Bill
$ cat temp | tr '|' '\n' | sed 's/^ *//g'
Hello
My
Name
Is
Bill
$
The sed part gets rid of leading spaces (because there is a space between the '|' and the word. This will also work for "Hello everyone | My | Name | Is | Bill":
$ cat temp | tr '|' '\n' | sed 's/^ *//g'
Hello everyone
My
Name
Is
Bill
$

Resources