Shell script to parse multiple rows from a single column - shell

I am working through a really complex and long multi-conditional statement to do this and was wondering if anyone knew of a simpler method. I have a multi-column/multi-row list that I am trying to parse. What I need to do is take the first row which has the "*" in the 5th position and copy all those entries into the blank spaces on the next few rows and then discard the original top row. What complicates this a bit is that sometimes the next few rows may not have an empty space in all the other fields (see bottom half of original list). If that's the case, I want to take extra entry (Q1 below) and put it at the end of row, in a new column.
Original list:
A B C D ***** F G
E1
E2
E3
Q R S T ***** V W
U1
Q1 U2
Final output:
A B C D E1 F G
A B C D E2 F G
A B C D E3 F G
Q R S T U1 V W
Q R S T U2 V W Q1
Thanks in advance for help!

The concise/cryptic one liner:
awk '/[*]/{f=$0;p="[*]+";next}{r=$2?$2:$1;sub(p,r,f);p=r;print $2?f" "$1:f}' file
A B C D E1 F G
A B C D E2 F G
A B C D E3 F G
Q R S T U1 V W
Q R S T U2 V W Q1
Explanation:
/[*]+/ { # If line matches line with pattern to replace
line = $0 # Store line
pat="[*]+" # Store pattern
next # Skip to next line
}
{
if (NF==2) # If the current line has 2 fields
replace = $2 # We want to replace with the second
else # Else
replace = $1 # We want to replace with first first
sub(pat,replace,line) # Do the substitution
pat=replace # Next time the pattern to replace will have changed
if (NF==2) # If the current line has 2 fields
print line,$1 # Print the line with the replacement and the 1st field
else # Else
print line # Just print the line with the replacement
}
To run the script save it to a file such as script.awk and run awk -f script.awk file.

Related

grep a list into a multi columns file and get fully matching lines

not sure how to ask this question but an example would surely clarify. Suppose I have this file:
$ cat intoThat
a b
a h
a l
a m
b c
b d
b m
c b
c d
c f
c g
c p
d h
d f
d p
and this list:
cat grepThis
a
b
c
d
now I would like to grepThis intoThat and I would do this:
$grep -wf grepThis intoThat
which will give an output like this:
**a b**
a h
a l
a m
**b c**
**b d**
b m
**c b**
**c d**
c f
c g
c p
d h
d f
d p
now the asterisks are used to highlight those lines that I would like grep to return. These are the lines that have a full match but...how to tell grep (or awk or whatever) to get only these lines?
Of course it is possible that some lines do not match any pattern, e.g. in the intoThat file I may have some other letters like g, h, l, s, t, etc...
With awk, you could do:
awk 'NR==FNR{ seen[$0]++; next } ($1 in seen && $2 in seen)' grepThis intoThat
a b
b c
b d
c b
c d
NR is set to 1 when the first record read by awk and incrementing for each next records reading either in single or multiple input files until all records/line read.
FNR is set to 1 when the first record read by awk and incrementing for each next records reading in current file and reset back to 1 for the next input file if multiple input files.
so NR == FNR is always a true condition for first input file and the block followed by this will perform actions on the first file only.
The seen is an associated awk array named seen (you can use different name as you want) with the key of whole line $0 and value with occurrences of each line occurred (this way usually is using to remove duplicated records in awk too).
The next token skips to executing rest of the commands and those will only execute actually for next file(s) except first.
In next (....), we are just checking if both column$1 and $2 are present in the array, if so they will goes in output.

I would like to sort rows of a data file by NF increasing

I would like to sort rows of a data file by NF increasing.
input
z a b c d k l p m
m x y h j i
y w
g t y u
output
y w
g t y u
m x y h j i
z a b c d k l p m
I had tried sort command, but it no works.
How to?
Thanks for help.
Typically you solve these types of problems by modifying the input stream to add some data, operating on that data, and then removing it. In this case, we want to add the field count to the input stream, sort (numerically) on the field count, and then remove it (using a space as the field delimiter):
awk '{ print NF, $0 }' | sort -n | cut -d' ' -f2-
You can either pipe your data to awk or pass the filename as another argument to awk.

Compare file with different number format

First of all I'd like to thank your community, you have been helping me tremendously over the past couple of months, thanks to your detailed answers and your comments.
However I came accross a snag. I want to compare 2 files containing simulation data. These files are the result of a previous operation which consists in extracting the desired data from 2 of output files.
So output-file1-> sorteddata1
Output-file2-> sorteddata2
Sorteddata1 looks like that
0.200000e-4 a b c d e
0.400000e-4 f g h i j
0.560000e-4 k l m n o
.
.
.
Sorteddata2
2.000000E-5 A
3.600000E-5 B
5.600000E-5 C
.
.
.
And what I would like this, sorteddata3:
0.200000e-4 a b c d e A
0.400000e-4 f g h i j
0.560000e-4 k l m n o C
.
.
.
So if the number in the first column is the same, add the corresponding value from sorteddata2 in the 7th column of sorteddata1.
I wanted to start from here:
Compare files with awk
But the number format from the first column of each file is different, so I don't get any return. I really want to use awk for this (personal preference, I kind of like it)
The goal is to plot this using gnuplot, so hopefully a blank in the last column won't be a problem.
Any thoughts on this?
You can use sprintf to make the number stick to the same format:
sprintf(format, expression1, ...)
Return (without printing) the string that printf would have printed
out with the same arguments (see Printf).
Then, the logic is the same as in the linked answer, adding an if/ default case to print either the current line or it together with the matched line from the other file.
awk 'NR==FNR {value=sprintf("%e", $1)
a[value]=$2
next
}
{value2=sprintf("%e", $1)
print $0, a[value2]
}' f2 f1
For your given input, it returns:
$ awk 'NR==FNR{value=sprintf("%e", $1); a[value]=$2; next} {value2=sprintf("%e", $1); if (value2 in a) {print $0, a[value2] }' f2 f1
0.200000e-4 a b c d e A
0.400000e-4 f g h i j
0.560000e-4 k l m n o C
Note in comments you say that E format shows a "unterminated string" error to you. Hence, you can replace the E with e in the number format with sub("E","e",$1). All together:
awk 'NR==FNR{value=sprintf("%e", $1); a[value]=$2; next} {sub("E","e",$1); value2=sprintf("%e", $1); print $0, a[value2] }' f2 f1

How to exchange lines in the same file?

I have a text like this (in rows):
A
B
C
D
E
F
and I'd like to change line B by line D, and line C to by line E, obtaining (in rows):
A
D
E
B
C
F
is it any simple way to do it with bash?
You can use the mapfile builtin to read the entire file into an array of lines. Then in that array reorder however you want and write the array back out to a file.

How to delete the last space at the end of each line in a text file? (shell scripts)

I have a file like this:
z E l f
A l t E^ t
d Y s
m u s t
z E l f s
x # w e s t
s t e t s
h E p
w i
t E n
o G #
o G # n
m I s x i n
s t O n t
and I need to remove a space at the end of each line.
How can I do it? Thank you in advance.
I assume you want to delete trailing spaces and tabs from the end of each line.
awk '{ sub(/[ \t]+$/, ""); print }' file

Resources