How to parse data vertically in shell script? - bash

I have data printed out in the console like this:
A B C D E
1 2 3 4 5
I want to manipulate it so A:1 B:2 C:3 D:4 E:5 is printed.
What is the best way to go about it? Should I tokenize the two lines and then print it out using arrays?
How do I go about it in bash?

Awk is good for this.
awk 'NR==1{for(i=0;i<NF;i++){row[i]=$i}} NR==2{for(i=0;i<NF;i++){printf "%s:%s",row[i],$i}}' oldfile > newfile
A slightly more readable version for scripts:
#!/usr/bin/awk -f
NR == 1 {
for(i = 0; i < NF; i++) {
first_row[i] = $i
}
}
NR == 2 {
for(i = 0; i < NF; i++) {
printf "%s:%s", first_row[i], $i
}
print ""
}
If you want it to scale vertically, you'll have to say how.

For two lines with any number of elements:
(read LINE;
LINE_ONE=($LINE);
read LINE;
LINE_TWO=($LINE);
for i in `seq 0 $((${#LINE_ONE[#]} - 1))`;
do
echo ${LINE_ONE[$i]}:${LINE_TWO[$i]};
done)
To do pairs of lines just wrap it in a loop.

This might work for you:
echo -e "A B C D E FFF GGG\n1 2 3 4 5 666 7" |
sed 's/ \|$/:/g;N;:a;s/\([^:]*:\)\([^\n]*\)\n\([^: ]\+ *\)/\2\1\3\n/;ta;s/\n//'
A:1 B:2 C:3 D:4 E:5 FFF:666 GGG:7

Perl one-liner:
perl -lane 'if($.%2){#k=#F}else{print join" ",map{"$k[$_]:$F[$_]"}0..$#F}'
Somewhat more legible version:
#!/usr/bin/perl
my #keys;
while (<>) {
chomp;
if ($. % 2) { # odd lines are keys
#keys = split ' ', $_;
} else { # even lines are values
my #values = split ' ', $_;
print join ' ', map { "$keys[$_]:$values[$_]" } 0..$#values;
}
}

Related

awk or bash for split lines

I would like to split a csv file which looks like this:
a|b|1,2,3
c|d|4,5
e|f|6,7,8
the goal is this format:
a|b|1
a|b|2
a|b|3
c|d|4
c|d|5
e|f|6
e|f|7
e|f|8
How can I do this in bash or awk?
With bash:
while IFS="|" read -r a b c; do for n in ${c//,/ }; do echo "$a|$b|$n"; done; done <file
Output:
a|b|1
a|b|2
a|b|3
c|d|4
c|d|5
e|f|6
e|f|7
e|f|8
$ cat hm.awk
{
s = $0; p = ""
while (i = index(s, "|")) { # `p': up to the last '|'
# `s': the rest
p = p substr(s, 1 , i)
s = substr(s, i + 1)
}
n = split(s, a, ",")
for (i = 1; i <= n; i++)
print p a[i]
}
Usage:
awk -f hm.awk file.csv
In Gnu awk (split):
$ awk '{n=split($0,a,"[|,]");for(i=3;i<=n;i++) print a[1] "|" a[2] "|" a[i]}' file
with perl
$ cat ip.csv
a|b|1,2,3
c|d|4,5
e|f|6,7,8
$ perl -F'\|' -lane 'print join "|", #F[0..1],$_ foreach split /,/,$F[2]' ip.csv
a|b|1
a|b|2
a|b|3
c|d|4
c|d|5
e|f|6
e|f|7
e|f|8
splits input line on | into #F array
then for every comma separated value in 3rd column, print in desired format
For a generic last column,
perl -F'\|' -lane 'print join "|", #F[0..$#F-1],$_ foreach split /,/,$F[-1]' ip.csv

find if two consecutive lines are different and where

How to find the difference and point of difference between two consecutive lines of a fixed width file ?
sample file:
cat test.txt
1111111111111111122211111111111111
1111111111111111132211111111111111
output :
it should inform user that there is difference between two lines and the position of difference is at :18th character.(as in above example)
It would be really helpful if it could list all the positions in case of multiple variations.For example:
11111111111111111211113111
11111111111111111211114111
Here is should say : difference spotted in 18th and 26th characters.
I was trying things in following lines, but seems lost.
while read line
do
echo $line |sed 's/./ &/g' |xargs -n1 #NOt able to apply diff (stupid try)
done <test.txt
Perl to the rescue:
$ echo '11131111111111111211113111
11111111111111111211114111' \
| perl -le '$d = <> ^ <>;
print pos $d while $d =~ /[^\0]/g'
4
23
It XORs the two input strings and reports all positions where the result isn't the null byte, i.e. where the strings were different.
You can use an empty field separator to make each character a field in awk and compare entries of every even record with odd numbered record:
awk 'BEGIN{ FS="" } NR%2 {
split($0, a)
next
}
{
print "line # ", NR
for (i=1; i<=NF; i++)
if ($i != a[i])
print "difference spotted in position:", i
}' test.txt
line # 2
difference spotted in position: 18
line # 4
difference spotted in position: 18
difference spotted in position: 23
Where input data is:
cat test.txt
1111111111111111122211111111111111
1111111111111111132211111111111111
11111111111111111211113111
11111111111111111311114111
PS: It will only work on awk versions that split records into chars when FS is null, eg GNU awk, OSX awk etc.
$ cat tst.awk
{ curr = $0 }
(NR%2)==0 {
currLgth = length(curr)
prevLgth = length(prev)
maxLgth = (currLgth > prevLgth ? currLgth : prevLgth)
print "Comparing:"
print prev
print curr
for (i=1; i<=maxLgth; i++) {
prevChar = substr(prev,i,1)
currChar = substr(curr,i,1)
if ( prevChar != currChar ) {
printf "Difference: char %d line %d = \"%s\", line %d = \"%s\"\n", i, NR-1, prevChar, NR, currChar
}
}
print ""
}
{ prev = curr }
.
$ cat file
1111111111111111122211111111111111
1111111111111111132211111111111111
11111111111111111111111111
11111111111111111111111
$ awk -f tst.awk file
Comparing:
1111111111111111122211111111111111
1111111111111111132211111111111111
Difference: char 18 line 1 = "2", line 2 = "3"
Comparing:
11111111111111111111111111
11111111111111111111111
Difference: char 24 line 3 = "1", line 4 = ""
Difference: char 25 line 3 = "1", line 4 = ""
Difference: char 26 line 3 = "1", line 4 = ""

change range of letters in a specific column

I want to change in column 11 these characters
!"#$%&'()*+,-.\/0123456789:;<=>?#ABCDEFGHIJ
for these characetrs:
#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi
so, if I have in column 11 000#!, it should be PPP_#. I tried awk:
awk '{a = gensub(/[#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghi]/, /[!\"\#$%&'\''()*+,-.\/0123456789:;<=>?#ABCDEFGHIJ]/, "g", $11); print a }' file.txt
but it does not work...
Try Perl.
perl -lane '$F[10] =~ y/!"#$%&'"'"'()*+,-.\/0-9:;<=>?#A-J/#A-Z[\\]^_`a-i/;
print join(" ", #F)'
I am assuming by "column 11" you mean a string of several characters after the tenth run of successive whitespace, which is what the -a option splits on by default (basically to simulate Awk). Unfortunately, changes to the array #F do not show up in the output directly, so you have to reconstruct the output line from (the modified) #F, which will normalize the field delimiter to just a single space.
Just change f = 2 to f = 11:
$ cat tst.awk
BEGIN {
f = 2
old = "!\"#$%&'()*+,-.\\/0123456789:;<=>?#ABCDEFGHIJ"
new = "#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\\\]^_`abcdefghi"
n = length(old)
for (i=1; i<=n; i++) {
map[substr(old,i,1)] = substr(new,i,1)
}
}
{
n = length($f)
newStr = ""
for (i=1; i<=n; i++) {
oldChar = substr($f,i,1)
newStr = newStr (oldChar in map ? map[oldChar] : oldChar)
}
$f = newStr
print
}
$ cat file
a 000#! b
$ awk -f tst.awk file
a PPP_# b
Note that you have to escape "s and \s in strings.

how to sum each column in a file using bash

I have a file on the following format
id_1,1,0,2,3,lable1
id_2,3,2,2,1,lable1
id_3,5,1,7,6,lable1
and I want the summation of each column ( I have over 300 columns)
9,3,11,10,lable1
how can I do that using bash.
I tried using what described here but didn't work.
Using awk:
$ awk -F, '{for (i=2;i<NF;i++)a[i]+=$i}END{for (i=2;i<NF;i++) printf a[i]",";print $NF}' file
9,3,11,10,lable1
This will print the sum of each column (from i=2 .. i=n-1) in a comma separated file followed the value of the last column from the last row (i.e. lable1).
If the totals would need to be grouped by the label in the last column, you could try this:
awk -F, '
{
L[$NF]
for(i=2; i<NF; i++) T[$NF,i]+=$i
}
END{
for(i in L){
s=i
for(j=NF-1; j>1; j--) s=T[i,j] FS s
print s
}
}
' file
If the labels in the last column are sorted then you could try without arrays and save memory:
awk -F, '
function labelsum(){
s=p
for(i=NF-1; i>1; i--) s=T[i] FS s
print s
split(x,T)
}
p!=$NF{
if(p) labelsum()
p=$NF
}
{
for(i=2; i<NF; i++) T[i]+=$i
}
END {
labelsum()
}
' file
Here's a Perl one-liner:
<file perl -lanF, -E 'for ( 0 .. $#F ) { $sums{ $_ } += $F[ $_ ]; } END { say join ",", map { $sums{ $_ } } sort keys %sums; }'
It will only do sums, so the first and last column in your example will be 0.
This version will follow your example output:
<file perl -lanF, -E 'for ( 1 .. $#F - 1 ) { $sums{ $_ } += $F[ $_ ]; } END { $sums{ $#F } = $F[ -1 ]; say join ",", map { $sums{ $_ } } sort keys %sums; }'
A modified version based on the solution you linked:
#!/bin/bash
colnum=6
filename="temp"
for ((i=2;i<$colnum;++i))
do
sum=$(cut -d ',' -f $i $filename | paste -sd+ | bc)
echo -n $sum','
done
head -1 $filename | cut -d ',' -f $colnum
Pure bash solution:
#!/usr/bin/bash
while IFS=, read -a arr
do
for((i=1;i<${#arr[*]}-1;i++))
do
((farr[$i]=${farr[$i]}+${arr[$i]}))
done
farr[$i]=${arr[$i]}
done < file
(IFS=,;echo "${farr[*]}")

change random line with shellscript

how can i easily (quick and dirty) change, say 10, random lines of a file with a simple shellscript?
i though about abusing ed and generating random commands and line ranges, but i'd like to know if there was a better way
awk 'BEGIN{srand()}
{ lines[++c]=$0 }
END{
while(d<10){
RANDOM = int(1 + rand() * c)
if( !( RANDOM in r) ) {
r[RANDOM]
print "do something with " lines[RANDOM]
++d
}
}
}' file
or if you have the shuf command
shuf -n 10 $file | while read -r line
do
sed -i "s/$line/replacement/" $file
done
Playing off #Dennis' version, this will always output 10.
Doing random numbers in a separate array could create
duplicates and, consequently, fewer than 10 modifications.
file=~/testfile
c=$(wc -l < "$file")
awk -v c=$c '
BEGIN {
srand();
count = 10;
}
{
if (c*rand() < count) {
--count;
print "do something with " $0;
} else
print;
--c;
}
' "$file"
This seems to be quite a bit faster:
file=/your/input/file
c=$(wc -l < "$file")
awk -v c=$c 'BEGIN {
srand();
for (i=0;i<10;i++) lines[i] = int(1 + rand() * c);
asort(lines);
p = 1
}
{
if (NR == lines[p]) {
++p
print "do something with " $0
}
else print
}' "$file"
I

Resources