change the position of a line in a file using sed - bash

I would like to know how to change the position of a line in a file (preferably using sed). For example, consider the file that contains
goal identifier statement
let statement 1
let statement 2
forall statement
other statements
I would like to be able to do this
goal identifier statement
forall statement
let statement 1
let statement 2
other statements
where I change the position of the forall line and bring it after the goal line. forall and goal are regexps that can be used to identify the lines.

you can try, for move line 4 to line 2, I want to move line A to line B, where A>B
sed -n '2{h; :a; n; 4{p;x;bb}; H; ba}; :b; p' file
or A<B
sed -n '2{h; d}; 4{p; x;}; p' file
you get, in first case: move line 4 to line 2
goal identifier statement
forall statement
let statement 1
let statement 2
other statements
you get, in second case: move line 2 to line 4
goal identifier statement
let statement 2
forall statement
let statement 1
other statements
Explanation
sed -n ' #silent option ON
2{ #if is line 2
h #Replace the contents of the hold space with the contents of the pattern space
:a #label "a"
n #fetch the next line
4{ #if is line 4
p #print line 4
x #Exchange the contents of the hold and pattern spaces
bb #goto "b"
}
H #appends line from the pattern space to the hold space, with a newline before it.
ba #goto "a"
}
:b #Label "b"
p #print
' file
EDIT
If You want use regex for identify the lines, you can modify first command
sed -n '/goal/{p;n;h;:a;n;/forall/{p;x;bb};H;ba};:b;p' file

$ cat r.awk
BEGIN {
forall_re = "^forall" # examples of regexps
goal_re = "^goal"
}
function tag(l) { # tag a line
if (l ~ goal_re ) return "goal"
else if (l ~ forall_re) return "forall"
else return "rest"
}
{ # store entire file in array; give a tag to every line
lines[NR] = $0
tags[NR] = tag($0)
}
function swap0(a, i, j, tmp) {
tmp = a[i]; a[i] = a[j]; a[j] = tmp
}
function swap(i, j) {
swap0(lines, i, j); swap0(tags, i, j)
}
function rise(i) {
# TODO: add error check
while (i - 1 > 0 && tags[i - 1] != "goal") {
swap(i, i - 1); i--
}
}
function process( i) {
for (i = 1; i <= NR; i++)
if (tags[i] == "forall") rise(i)
}
function dump( i) { # print the array
for (i = 1; i <= NR; i++)
print lines[i]
}
END {
process()
dump()
}
An example of input file
$ cat r.txt
goal identifier statement
let statement 1
let statement 2
forall statement A
other statements
goal identifier statement
let statement 1
let statement 2
forall statement B
other statements
Usage:
$ awk -f r.awk r.txt
goal identifier statement
forall statement A
let statement 1
let statement 2
other statements
goal identifier statement
forall statement B
let statement 1
let statement 2
other statements

sed is for simple substitutions on individual lines, that is all. For anything else you should use awk for every desirable attribute of software (clarity, simplicity, portability, etc.:
$ awk 'NR==FNR{if (/forall/) {f=FNR; v=$0} next} FNR!=f; /goal/{print v} ' file file
goal identifier statement
forall statement
let statement 1
let statement 2
other statements

sed -r '/goal/{ # if match "goal" line
:X # this is a lable for branch command
N # append next line
/forall[^\n]*$/{ # if match "forall" line move to "goal" line below
s#^([^\n]*)(.*)(\n[^\n]*)$#\1\3\2#
b # after move finished branch to end
}
bX # branch to :X for appending next line
}' file
goal identifier statement
forall statement
let statement 1
let statement 2
other statements

A less terrible way using vim to find a line in $FILENAME using regex $REGEX_EXPRESSION and move that line to $LINE_NUMBER:
vim -c "g:$REGEX_EXPRESSION:m$LINE_NUMBER" -cwq "$FILENAME"
explanation: -c is command in vim, so it goes to the first line that matches that regex and then moves it to the line number specified, and then does the command wq (or write and quit).

Related

How can I find out how many lines are between a number and the next occurrence of the same number in a file?

I have large files that each store results from very long calculations. Here's an example of a file where there are results for five time steps; there are problems with the output at the third, fourth, and fifth time steps.
(Please note that I have been lazy and have used the same numbers to represent the results at each time step in my example. In reality, the numbers would be unique at each time step.)
3
i = 1, time = 1.000, E = 1234567
Mg 22.9985897185 6.9311166109 0.7603733573
O 23.0438129644 6.4358253659 1.5992513709
O 23.8223149199 7.2029442290 0.4030956770
3
i = 2, time = 1.500, E = 1234567
Mg 22.9985897185 6.9311166109 0.7603733573
O 23.0438129644 6.4358253659 1.5992513709
O 23.8223149199 7.2029442290 0.4030956770
3
i = 3, time = 2.000, E = 1234567
Mg 22.9985897185 6.9311166109 0.7603733573
O 23.0438129644 6.4358253659 1.5992513709
O 23.8223149199 (<--Problem: calculation stopped and some numbers are missing)
3
i = 4, time = 2.500, E = 1234567
Mg 22.9985897185 6.9311166109 0.7603733573
O 23.0438129644 6.4358253659 1.5992513709 (Problem: calculation stopped and entire row is missing below)
3
i = 5, time = 3.000, E = 1234567
Mg 22.9985897185 6.9311166109 0.7603733573
O 23.0438129644 6.4358253659 1.5992513709
O 23.8223149199 7.2029442290 0.4030956770 sdffs (<--Problem: rarely, additional characters can be printed but I figured out how to identify the longest lines in the file and don't have this problem this time)
The problem is that the calculations can fail (and then need to be restarted) as a result is printing to a file. That means that when I try to use the results, I have problems.
My question is, how can I find out when something has gone wrong and the results file has been messed up? The most common problem is that there are not "3" lines of results (plus the header, which is the line where there's i = ...)? If I could find a problem line, I could then delete that time step.
Here is an example of error output I get when trying to use a messed-up file:
Traceback (most recent call last):
File "/mtn/storage/software/languages/anaconda/Anaconda3-2018.12/lib/python3.7/site-packages/aser/io/extxyz.py", line 593, in read_xyz
nentss = int(line)
ValueError: invalid literal for int() with base 10: '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pythonPostProcessingCode.py", line 25, in <module>
path = read('%s%s' % (filename, fileext) , format='xyz', index=':') # <--This line tells me that Python cannot read in a particular time step because the formatting is messed up.
I am not experienced with scripting/Awk, etc, so if anyone thinks I have not used appropriate question tags, a heads-up would be welcome. Thank you.
The header plus 330 mean 331 lines of text and so
awk 'BEGIN { RS="i =" } { split($0,bits,"\n");if (length(bits)-1==331) { print RS$0 } }' file > newfile
Explanation:
awk 'BEGIN {
RS="i ="
}
{
split($0,bits,"\n");
if (length(bits)-1==331) {
print RS$0
}
}' file > newfile
Before processing any lines from the file called file, set the record separator equal to "i =". Then, for each record, use, split to split the record ($0) into an array bits based on a new line as the separator. Where the length of the array bits, less 1 is 331 print the record separator plus the record, redirecting the output to a new file called newfile
It sounds like this is what you want:
$ cat tst.awk
/^ i =/ {
prt()
expNumLines = prev + 1
actNumLines = 2
rec = prev RS $0
next
}
NF == 4 {
rec = rec RS $0
actNumLines++
}
{ prev = $0 }
END { prt() }
function prt() {
if ( (actNumLines == expNumLines) && (rec != "") ) {
print "-------------"
print rec
}
}
$ awk -f tst.awk file
-------------
3
i = 3, time = 2.000, E = 1234567
Mg 22.9985897185 6.9311166109 0.7603733573
O 23.0438129644 6.4358253659 1.5992513709
-------------
3
i = 5, time = 3.000, E = 1234567
Mg 22.9985897185 6.9311166109 0.7603733573
O 23.0438129644 6.4358253659 1.5992513709
Just change the prt() function to do whatever it is you want to do with valid records.
This answer is not really bash-related, but may be of interest if performance is an issue, since you seem to handle very large files.
Considering that you can compile some very basic C programs, you may build this code:
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Constants are hardcoded to make the program more readable
// But they could be passed as program argument
const char separator[]="i =";
const unsigned int requiredlines=331;
int main(void) {
char* buffer[331] = { NULL, };
ssize_t buffersizes[331] = { 0, };
size_t n = requiredlines+1; // Ignore lines until the separator is found
char* line = NULL;
size_t len = 0;
ssize_t nbread;
size_t i;
// Iterate through all lines
while ((nbread = getline(&line, &len, stdin)) != -1) {
// If the separator is found:
// - print the record (if valid)
// - reset the record (always)
if (strstr(line, separator)) {
if (n == requiredlines) {
for (i = 0 ; i < requiredlines ; ++i) printf("%s", buffer[i]);
}
n = 0;
}
// Add the line to the buffer, unless too many lines have been read
// (in which case we may discard lines until the separator is found again)
if (n < requiredlines) {
if (buffersizes[n] > nbread) {
strncpy(buffer[n], line, nbread);
buffer[n][nbread] = '\0';
} else {
free(buffer[n]);
buffer[n] = line;
buffersizes[n] = nbread+1;
line = NULL;
len = 0;
}
}
++n;
}
// Don't forget about the last record, if valid
if (n == requiredlines) {
for (i = 0 ; i < requiredlines ; ++i) printf("%s", buffer[i]);
}
free(line);
for (i = 0 ; i < requiredlines ; ++i) free(buffer[i]);
return 0;
}
The program can be compiled like this:
gcc -c prog.c && gcc -o prog prog.o
Then it may be executed like this:
./prog < infile > outfile
To simplify the code, it reads from stdin and outputs to stdout, but that’s more than enough in Bash considering all the options at your disposal to redirect streams. If need be, the code can be adapted to read/write directly from/to files.
I have tested it on a generated file with 10 million lines and compared it to the awk-based solution.
(time awk 'BEGIN { RS="i =" } { split($0,bits,"\n");if (length(bits)-1==331) { printf "%s",RS$0 } }' infile) > outfile
real 0m24.655s
user 0m24.357s
sys 0m0.279s
(time ./prog < infile) > outfile
real 0m1.414s
user 0m1.291s
sys 0m0.121s
With this example it runs approximately 18 times faster than the awk solution. Your mileage may vary (different data, different hardware) but I guess it should always be significantly faster.
I should mention that the awk solution is impressively fast (for a scripted solution, that is). I have first tried to code the solution in C++ and it had similar performance as awk, and sometimes it was even slower than awk.
It's a little bit more difficult to spread the match of a record across 2 lines to try and incorporate the i = ..., but I don't think you actually need to. It looks like a new record can be distinguished by the occurrence of a line with only one column. If that is the case, you could do something like:
awk -v c=330 -v p=1 'function pr(n) {
if( n - p == c) printf "%s", buf;
buf = ""}
NF == 1 { pr(NR); p = NR; c = $1 }
{buf = sprintf("%s%s\n", buf, $0)}
END {pr(NR+1)}' input-file
In the above, whenever a line is seen with a single record, the expectation is that many lines will be in the following record. If that number is not matched, the record is not printed. To avoid that logic, just remove the c = $1 near the end of line 4. The only reason you need the -v c=330 is to enable the removal of that assignment; if you want the single column line to be the line count of the record, you can omit -v c=330.
You can use csplit to grab each record in separate files, if you are allowed to write intermediate files.
csplit -k infile '/i = /' {*}
Then you can see which records are complete and which ones are not using wc -l xx* (note: xx is the default prefix of split files).
Then you can do whatever you want with those records, including listing files that have exactly 331 lines:
wc -l xx* | sed -n 's/^ *331 \(xx.*\)/\1/p'
Should you want to build a new file with all valid records, simply concatenate them:
wc -l xx* | sed -n 's/^ *331 \(xx.*\)/\1/p' | xargs cat > newfile
You can also, among other uses, archive failed records:
wc -l xx* | sed -e '$d' -e '/^ *331 \(xx.*\)/d' | xargs cat > failures

How to delete ruby surrounding block (do/end) in vim

How to delete the surround block delimited by do/end in ruby with vim
For example
(10..20).map do |i| <CURSOR HERE>
(1..10).map do |j|
p j
end
end
I want to do something like dsb (delete surround block) and get
(1..10).map do |j|
p j
end
Maybe you can make nnormap.
Every end/do pair is on the same indent, so firstly you should find pair indent - in this case, next line for the same indent (Cause your cursor is in do line.)
So you can make vimscript function with finding next indent line and delete it.
This is an example of the function. You can customize you want - i.e.) set indent for resting lines.
function! DeleteWithSameIndent(inc)
" Get the cursor current position
let currentPos = getpos('.')
let currentLine = currentPos[1]
let firstLine = currentPos[1]
let matchIndent = 0
d
" Look for a line with the same indent level whithout going out of the buffer
while !matchIndent && currentLine != line('$') + 1 && currentLine != -1
let currentLine += a:inc
let matchIndent = indent(currentLine) == indent('.')
endwhile
" If a line is found go to this line
if (matchIndent)
let currentPos[1] = currentLine
call setpos('.', currentPos)
d
endif
endfunction
nnoremap di :call DeleteWithSameIndent(1)<CR>

awk: next is illegal inside a function

I have a short shell function to convert human readable byte units into an integer of bytes, so, e.g.,
10m to 10000000
4kb to 4000
1kib to 1024
2gib to 2147483648
Here is the code:
dehumanise() {
for v in "$#"
do
echo $v | awk \
'BEGIN{IGNORECASE = 1}
function printpower(n,b,p) {printf "%u\n", n*b^p; next}
/[0-9]$/{print $1;next};
/K(B)?$/{ printpower($1, 10, 3)};
/M(B)?$/{ printpower($1, 10, 6)};
/G(B)?$/{ printpower($1, 10, 9)};
/T(B)?$/{ printpower($1, 10, 12)};
/Ki(B)?$/{printpower($1, 2, 10)};
/Mi(B)?$/{printpower($1, 2, 20)};
/Gi(B)?$/{printpower($1, 2, 30)};
/Ti(B)?$/{printpower($1, 2, 40)}'
done
}
I found the code also somewhere on the internet and I am not so confident with awk. The function worked fine until I re-installed my MacBook a few days ago. Now it throws an error
awk: next is illegal inside a function at source line 2 in function printpower
context is
function printpower(n,b,p) {printf "%u\n", n*b^p; >>> next} <<<
As far as I understand, next is used in awk to directly end the record. Hence in this case it would end the awk statement as it only has one input.
I tried to move the next statement simply behind printpower(...);next.
But this causes the function to give no output at all.
Could someone please help me repair the awk statement?
# awk --version
awk version 20121220
macOS awk version
solved
The no output thing was probably an issue with the macOS awk version. I installed and replaced it with gawk:
brew install gawk
brew link --overwrite gawk
Now it works fine without the next statement.
Software design fundamentals - avoid inversion of control. In this case you don't want some subordinate function suddenly taking charge of your whole processing control flow and IT deciding "screw you all, I'm deciding to jump to the next record". So yes, don't put next inside a function! Having said that, POSIX doesn't say you cannot use next in a function but neither does it explicitly say you can so some awk implementations (apparently the one you are using) have decided to disallow it while gawk and some other awks allow it.
You also have gawk-specific code in your script (IGNORECASE) so it will ONLY work with gawk anyway.
Here's how to really write your script to work in any awk:
awk '
{ $0=tolower($0); b=p=0 }
/[0-9]$/ { b = 1; p = 1 }
/kb?$/ { b = 10; p = 3 }
/mb?$/ { b = 10; p = 6 }
/gb?$/ { b = 10; p = 9 }
/tb?$/ { b = 10; p = 12 }
/kib$/ { b = 2; p = 10 }
/mib$/ { b = 2; p = 20 }
/gib$/ { b = 2; p = 30 }
/tib$/ { b = 2; p = 40 }
p { printf "%u\n", $2*b^p }
'
You can add ; next after every p assignment in the main body if you like but it won't affect the output, just improve the efficiency which would matter if your input was thousands of lines long.
As the message says, you can't use next in a function. You have to place it after each function call:
/KB?$/ { printpower($1, 10, 3); next; }
/MB?$/ { printpower($1, 10, 6); next; }
...
But you can just let awk test the remaining patterns (no next anywhere) if you don't mind the extra CPU cycles. Note that the parentheses around B are redundant and I have removed them.
$ dehumanise 1000MiB 19Ki
1048576000
19456
You could use a control variable in your function and check the value of the variable to decide to use next in the main routine.
# MAIN
{
myfunction(test)
if (result == 1) next
# result is not 1, just continue
# more statements
}
function myfunction(a) {
# default result is 0
result = 0
# some test
if ($1 ~ /searchterm/) {
result = 1
}
}

How to remove line `v u` from a file when line `u v` already exists using unix command

I have the following test data :
a b
a c
b a
b c
b d
c a
c b
c d
d b
d c
and I want to remove lines v u when line u v already exists using unix command. For example here I want to obtain :
a b
a c
b c
b d
c d
I've tried with an awk script but on a long file it takes too much time :
{
if(NR==1){
n1=$1
n2=$2
test=0
k=0
i = 0
column1[i]=$1
column2[i]=$2
printf "%s %s\n", column1[i], column2[i]
}
else{
for(k=0; k<=i;k++){
if(column1[k]==$2){
test=1
tmp=i
break
}
}
if(test==1){
if(column2[tmp]==$1){
n1=$1
n2=$2
}
}
else if(n1!=$1||n2!=$2){
n1=$1
n2=$2
i++
column1[i]=$1
column2[i]=$2
printf "%s %s\n", column1[i], column2[i]
}
test=0
}
}
Does someone have an idea ?
I think this can be achieved pretty simply:
awk '!seen[$1,$2]++ && !seen[$2,$1]' file
This only prints lines (the default action) when the first and second column have not yet been seen in either order.
The array seen keeps track of every pair of fields by setting a key containing the first and second field. The expression !seen[key]++ is only true the first time that a specific key is tested because the value in the array is incremented each time.

awk print between nth occurence of matching patterns

I am attempting to extract data between the nth occurrence of 2 patterns.
Pattern 1: CardDetail
Pattern 2: ]
The input file, input.txt has thousands of lines that vary in what each line contains. The lines I'm concerned with grabbing data from will always contain CardDetail somewhere in the line. Finding the matching lines is easy enough using awk, but pulling the data between each match and placing it onto seperate lines each is where I'm falling short.
input.txt contains data about network gear and any attached/child devices. It looks something like this:
DeviceDetail [baseProductId=router-5000, cardDetail=[CardDetail [baseCardId=router-5000NIC1, cardDescription=Router 5000 NIC, cardSerial=5000NIC1], CardDetail [baseCardId=router-5000NIC2, cardDescription=Router 5000 NIC, cardSerial=5000NIC2]], deviceSerial=5000PRIMARY, deviceDescription=Router 5000 Base Model]
DeviceDetail [baseProductId=router-100, cardDetail=[CardDetail [baseCardId=router-100NIC1, cardDescription=Router 100 NIC, cardSerial=100NIC1], CardDetail [baseCardId=router-100NIC2, cardDescription=Router 100 NIC, cardSerial=100NIC2]], deviceSerial=100PRIMARY, deviceDescription=Router 100 Base Model]
* UPDATE: I forgot to mention in the initial post that I also need the device's PARENT serials (deviceSerial) listed with them as well. *
What I would like the output.txt to look like is something like this:
"router-5000NIC1","Router 5000 NIC","5000NIC1","5000PRIMARY"
"router-5000NIC2","Router 5000 NIC","5000NIC2","5000PRIMARY"
"router-100NIC1","Router 100 NIC","100NIC1","100PRIMARY"
"router-100NIC2","Router 100 NIC","100NIC2","100PRIMARY"
The number of occurrences of CardDetail on a single line could vary between 0 to hundreds depending on the device. I need to be able to extract all of the data by field between each occurrence of CardDetail and the next occurrence of ] and transport them to their own line in a CSV format.
If you have gawk or mawk available, you can do this by (mis)using the record and field splitting capabilities:
awk -v RS='CardDetail *\\[' -v FS='[=,]' -v OFS=',' -v q='"' '
NR > 1 { sub("\\].*", ""); print q $2 q, q $4 q, q $6 q }'
Output:
"router-5000NIC1","Router 5000 NIC","5000NIC1"
"router-5000NIC2","Router 5000 NIC","5000NIC2"
"router-100NIC1","Router 100 NIC","100NIC1"
"router-100NIC2","Router 100 NIC","100NIC2"
Is it sufficient?
$> grep -P -o "(?<=CardDetail).*?(?=\])" input.txt | grep -P -o "(?<=\=).*?(?=\,)"
router-5000NIC1
Router 5000 NIC
router-5000NIC2
Router 5000 NIC
router-100NIC1
Router 100 NIC
router-100NIC2
Router 100 NIC
Here is an example that uses regular expressions. If there are minor variations in the text format, this will handle them. Also this collects all the values in an array; you could then do further processing (sort values, remove duplicates, etc.) if you wish.
#!/usr/bin/awk -f
BEGIN {
i_result = 0
DQUOTE = "\""
}
{
line = $0
for (;;)
{
i = match(line, /CardDetail \[ **([^]]*) *\]/, a)
if (0 == i)
break
# a[1] has the text from the parentheses
s = a[1]
# replace from this: a, b, c to this: "a","b","c"
gsub(/ *, */, "\",\"", s)
s = DQUOTE s DQUOTE
results[i_result++] = s
line = substr(line, RSTART + RLENGTH - 1)
}
}
END {
for (i = 0; i < i_result; ++i)
print results[i]
}
P.S. Just for fun I made a Python version.
#!/usr/bin/python
import re
import sys
DQUOTE = "\""
pat_card = re.compile("CardDetail \[ *([^]]*) *\]")
pat_comma = re.compile(" *, *")
results = []
def collect_cards(line, results):
while True:
m = re.search(pat_card, line)
if not m:
return
len_matched = len(m.group(0))
s = m.group(1)
s = DQUOTE + re.sub(pat_comma, '","', s) + DQUOTE
results.append(s)
line = line[len_matched:]
if __name__ == "__main__":
for line in sys.stdin:
collect_cards(line, results)
for card in results:
print card
EDIT: Here's a new version that also looks for "deviceID" and puts the matched text as the first field.
In AWK you concatenate strings just by putting them next to each other in an expression; there is an implicit concatenation operator when two strings are side by side. So this gets the deviceID text into a variable called s0, using concatenation to put double quotes around it; then later uses concatenation to put s0 at the start of the matched string.
#!/usr/bin/awk -f
BEGIN {
i_result = 0
DQUOTE = "\""
COMMA = ","
}
{
line = $0
for (;;)
{
i = match(line, /deviceID=([A-Za-z_0-9]*),/, a)
s0 = DQUOTE a[1] DQUOTE
i = match(line, /CardDetail \[ **([^]]*) *\]/, a)
if (0 == i)
break
# a[1] has the text from the parentheses
s = a[1]
# replace from this: foo=a, bar=b, other=c to this: "a","b","c"
gsub(/[A-Za-z_][^=,]*=/, "", s)
# replace from this: a, b, c to this: "a","b","c"
gsub(/ *, */, "\",\"", s)
s = s0 COMMA DQUOTE s DQUOTE
results[i_result++] = s
line = substr(line, RSTART + RLENGTH - 1)
}
}
END {
for (i = 0; i < i_result; ++i)
print results[i]
}
Try this
#awk -f myawk.sh temp.txt
BEGIN { RS="CardDetail"; FS="[=,]"; OFS=","; print "Begin Processing "}
$0 ~ /baseCardId/ {gsub("]","",$0);print $2, $4 , $6}
END {print "Process Complete"}

Resources