Inline reordering of substrings via bash - bash

Assume a text file that contains specific lines whose word order should be altered. The words (substrings) are delimited by single whitespaces. The lines to be altered can be identified by their first character (e.g., ">").
# cat test.txt
>3 test This is
foo bar baz
foo bar qux
>2 test This is
foo bar baz
>1 test This is
foo bar qux
What command (probably in awk) would you use to apply the same ordering process across all lines starting with the key character?
# cat test.txt | sought_command
>This is test 3
foo bar baz
foo bar qux
>This is test 2
foo bar baz
>This is test 1
foo bar qux

Here's one way you could do it using awk:
awk 'sub(/^>/, "") { print ">"$3, $4, $2, $1; next } 1' file
sub returns true (1) when it makes a substitution. The 1 at the end is the shortest true condition, to trigger the default action { print }.

According to your example, like this:
awk '$1~"^>" {sub(">","",$1);print ">"$3,$4,$2,$1;next} {print}' test.txt

The tool best suited for simple substitutions on individual lines is sed:
$ sed -E 's/>([^ ]+)( [^ ]+ )(.*)/>\3\2\1/' file
>This is test 3
foo bar baz
foo bar qux
>This is test 2
foo bar baz
>This is test 1
foo bar qux
Awk is the right tool for anything more complicated/interesting. Note that unlike the awk solutions you've received so far the above will continue to work if/when you have more than 4 "words" on a line, e.g.:
$ cat file
>3 test Now is the Winter of our discontent
foo bar baz
foo bar qux
>2 test This is
foo bar baz
>1 test This is
foo bar qux
$ sed -E 's/>([^ ]+)( [^ ]+ )(.*)/>\3\2\1/' file
>Now is the Winter of our discontent test 3
foo bar baz
foo bar qux
>This is test 2
foo bar baz
>This is test 1
foo bar qux
$ awk 'sub(/^>/, "") { print ">"$3, $4, $2, $1; next } 1' file
>Now is test 3
foo bar baz
foo bar qux
>This is test 2
foo bar baz
>This is test 1
foo bar qux

Related

Awk: write in newline every two fields

I have my doc with differents lines size :
TITLE
NAME VALUE foo bar foo bar foo bar foo bar foo bar foo bar
NAME VALUE foo2 bar2 foo2 bar2
NAME VALUE foo3 bar3
I want to delete the title and delete the two firsts fields, then print newline every two fields like this :
foo bar
foo bar
foo bar
foo bar
foo bar
foo2 bar2
foo2 bar2
foo3 bar3
My output is actually :
foo bar foo bar foo bar foo bar foo bar foo bar
foo2 bar2 foo2 bar2
foo3 bar3
With this code :
awk -F' ' 'NR>1, NF>2 {
s = ""; for(i = 3; i <= NF; i++) s = s $i " "; print s
}' file_input.txt > file_output.txt
I don't found solution.
If someone can help me.
First time on stack overflow !
Thank you
I suggest:
awk 'NR>1{ for(i=3; i<=NF; i=i+2){ print $i,$(i+1) } }' file
Output:
foo bar
foo bar
foo bar
foo bar
foo bar
foo bar
foo2 bar2
foo2 bar2
foo3 bar3
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
Another option using gnu awk with FPAT having a pattern matching 2 fields that are delimited by 1 or more whitespace characters.
As the title is a single field:
awk -v FPAT='\\S+\\s+\\S+' '{for(i=2;i<=NF;i++) print $i}' file
Output
foo bar
foo bar
foo bar
foo bar
foo bar
foo bar
foo2 bar2
foo2 bar2
foo3 bar3
Or starting from the second line:
awk -v FPAT='\\S+\\s+\\S+' 'NR>1{for(i=2;i<=NF;i++) print $i}' file
sed alternative, won't win in readability compared to awk.
$ sed -E '1d;s/\w+ \w+ //;s/( \w+) /\1\n/g' file
foo bar
foo bar
foo bar
foo bar
foo bar
foo bar
foo2 bar2
foo2 bar2
foo3 bar3

Find a line with certain string then remove it's newline character at the end in bash

I'm trying to prepare a file for a report. What I have is like this:
foo
bar bar oof
bar oof
foo
bar bar
I'm trying to get an output like this:
foo bar bar oof
bar off
foo bar bar
I wanted to search for a string, in this case 'foo', and within the line where the string is found I have to remove the newline.
I did search but I can only find solutions where 'foo' is also removed. How can I do this?
Using awk:
awk -v search='foo' '$0 ~ search{printf $0; next}1' infile
You may use printf $0 OFS like below, if your field doesn't have leading space before newline char
awk -v search='foo' '$0 ~ search{printf $0 OFS; next}1' infile
Test Results:
$ cat infile
foo
bar bar oof
bar oof
foo
bar bar
$ awk -v search='foo' '$0 ~ search{printf $0; next}1' infile
foo bar bar oof
bar oof
foo bar bar
Explanation:
-v search='foo' - set variable search
$0 ~ search - if lines/record/row contains regexp/pattern/string mentioned in variable
{printf $0; next} - print current record without record separator and go to next line
}1 1 at the end does default operation that is print current record/row.
You can do this quite easily with sed, for example:
$ sed '/^foo$/N;s/\n/ /' file
foo bar bar oof
bar oof
foo bar bar
Explanation
/^foo$/ find lines containing only foo
N read/append next line of input into pattern space.
s/\n/ / substitute the '\n' with a space.

Discrete to continuous number ranges via awk

Assume a text file file which contains multiple discrete number ranges, one per line. Each range is preceded by a string (i.e., the range name). The lower and upper bound of each range is separated by a dash. Each number range is succeeded by a semi-colon. The individual ranges are sorted (i.e., range 101-297 comes before 1299-1301) and do not overlap.
$cat file
foo 101-297;
bar 1299-1301;
baz 1314-5266;
Please note that in the example above the three ranges do not form a continuous range that starts at integer 1.
I believe that awk is the appropriate tool to fill the missing number ranges such that all ranges taken together form a continuous range from {1} to {upper bound of the last range}. If so, what awk command/function would you use to perform the task?
$cat file | sought_awk_command
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
new3 1302-1313;
baz 1314-5266;
--
Edit 1: Upon closer evaluation, the code suggested below fails at another simple example.
$cat example2
foo 101-297;
bar 1299-1301;
baz 1302-1314; # Notice that ranges "bar" and "baz" are continuous to one another
qux 1399-5266;
$ awk -F'[ -]' '$3-Q>1{print "new"++o,Q+1"-"$3-1";";Q=$4} 1' example2
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
baz 1302-1314;
new3 1302-1398; # ERROR HERE: Notice that range "new3" has a lower bound that is equal to upper bound of "bar", not of "baz".
qux 1399-5266;
--
Edit 2: Many thanks to RavinderSingh13 for assistance with solving this question. However, the suggested code still generates output inconsistent with the given objective.
$ cat example3
foo 35025-35144;
bar 35259-35375;
baz 35376-35624;
qux 37911-39434;
$ awk -F'[ -]' '$3-Q+0>=1{print "new"++o,Q+1"-"$3-1";";Q=$4} {Q=$4;print}' example3
new1 1-35024;
foo 35025-35144;
new2 35145-35258;
bar 35259-35375;
new3 35376-35375; # ERROR HERE: Notice that range "new3" has been added, even though ranges "bar" and "baz" are contiguous.
baz 35376-35624;
new4 35625-37910;
qux 37911-39434;
try:
awk -F'[ -]' '$3-Q>1{print "new"++o,Q+1"-"$3-1";";Q=$4} 1' Input_file
EDIT: Adding a non-one liner solution for same too now with proper explanation.
awk -F'[ -]' ' ###Setting field separator as space, dash here.
$3-Q>1{ ###Checking here if 3rd field and variable Qs subtraction is greater than 1, if yes then perform following.
print "new"++o,Q+1"-"$3-1";"; ###printing the string new with a incrementing value of variable o each time, then variable Qs value with adding 1 to it, then current line $4-1 and semi colon.
Q=$4 ###Assigning the variable Q value to 4th field of the current line here too.
}
1 ###printing the current line here.
' Input_file ###Mentioning the Input_file here too.
EDIT2: Adding one more answer as per OP's a condition.
awk -F'[ -]' '$3-Q+0>=1{print "new"++o,Q+1"-"$3-1";";Q=$4} {Q=$4;print}' Input_file
This has no problem with ranges that can overlap as you showed in your original example2 where bar 1299-1301; and baz 1301-1314; overlapped at 1301.
$ cat tst.awk
{ split($2,curr,/[-;]/); currStart=curr[1]; currEnd=curr[2] }
currStart > (prevEnd+1) { print "new"++cnt, prevEnd+1 "-" currStart-1 ";" }
{ print; prevEnd=currEnd }
$ awk -f tst.awk file
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
new3 1302-1313;
baz 1314-5266;
$ awk -f tst.awk example2
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
baz 1301-1314;
new3 1315-1398;
qux 1399-5266;
$ awk -f tst.awk example3
new1 1-35024;
foo 35025-35144;
new2 35145-35258;
bar 35259-35375;
baz 35376-35624;
new3 35625-37910;
qux 37911-39434;
$ cat file1
foo 2-100
bar 102-200
$ awk F' +|[-;}' 'p+1<$2{print "new" ++q, p+1 "-" $2-1 ";"}p=$3' file1
new1 1-1;
foo 2-100
new2 101-101;
bar 102-200
$ cat file2
foo 101-297;
bar 1299-1301;
baz 1314-5266;
$ awk -F' +|[-;]' 'p+1<$2{print "new" ++q, p+1 "-" $2-1 ";"}p=$3' file2
new1 1-100;
foo 101-297;
new2 298-1298;
bar 1299-1301;
new3 1302-1313;
baz 1314-5266;
Explained:
$ awk -F' +|[-;]' ' # FS is ; - or a bunch of spaces
p+1 < $2 { # if p revious $3+1 is still less than new $2
print "new"++q,p+1 "-" $2-1 ";" # print a "new" line
}
p=$3 # set future p and implicit print of record *
' file2 # * as all values are above 0

Output only parts from a logfile (a function name and a param value of it)

I dont have find a question like for this specific case, I have a logfile like this:
"foo function1 para1=abc para2=def para3=ghi bar
foo function2 para1=jkl para2=mno para3=pqr bar"
Now i want execute a one-liner on a gnu bash with this output:
function1 def
function2 mno
foo indicates the start for the function name and bar is the sign for the end of this block. So i want to search for the word "foo", extract the next word (the function name) and then search for the param2 and extract only the value.
How can I do this with a one-liner (not a script)?
If this isn't all you need:
$ awk -F'[ =]' '{print $2, $6}' file
function1 def
function2 mno
then edit your question to clarify your requirements and provide more meaningful and truly representative sample input/output.
#Simi: Try:
awk -F'[ ="]' '{for(i=1;i<=NF;i++){if($i=="foo"){printf("%s",$(i+1))};if($i=="para2"){printf(" %s\n",$(i+1))}}}' Input_file
Here I am making field separator as space or = or ("), then I am traversing into all the fields of a line then searching for strings(foo,para2) if any field has these values then simply printing the next field's values as per your requirement. Let me know if this helps you.
Perl One Liner
perl -lane 'print "$F[1] ",(split(/=/,$F[3]))[1]' logfile
Input
"foo function1 para1=abc para2=def para3=ghi bar
foo function2 para1=jkl para2=mno para3=pqr bar"
Output
function1 def
function2 mno
foo indicates the start for the function name and bar is the sign for
the end of this block. So i want to search for the word "foo", extract
the next word (the function name) and then search for the param2 and
extract only the value.
With some assumption, if your log file would look like below then
Either
$ cat mylog
foo function1 para1=abc para2=def para3=ghi bar foo function2 para1=jkl para2=mno para3=pqr bar
OR with line-break
$ cat mylog
foo function1 para1=abc para2=def para3=ghi bar
foo function2 para1=jkl para2=mno para3=pqr bar
Output
$ awk -F'[ =]' '{for(i=1;i<=NF;i++)if($i=="foo")print $(i+1),$(i+5) }' mylog
function1 def
function2 mno
If in case your para2 is not in order you may use this
$ awk -F'[ =]' '{f=""; for(i=1;i<=NF;i++){if($i=="foo")f=$(i+1); if(f && $i=="para2")print f,$(i+1)}}' mylog
This is how awk see fields in record with -F'[ =]'
With -F'[ =]' awk would see fields in record like below
foo function1 para1=abc para2=def para3=ghi bar
^ ^ ^ ^ ^ ^
1 2 3 4 5 6
i (i+1) (i+5)
Explanation
awk -F'[ =]' ' # Set field separator space and =
{
# Loop through no of fields in record,
# NF gives no of fields in current record
for(i=1;i<=NF;i++)
# If field equal to foo then
if($i=="foo")
# print next(i+1) and 5th(i+5) field from current field index
print $(i+1),$(i+5)
}
' mylog # input file
if your real input is something else, please post input and expected output

sed multiline delete with pattern

I want to delete all multiline occurences of a pattern like
{START-TAG
foo bar
ID: 111
foo bar
END-TAG}
{START-TAG
foo bar
ID: 222
foo bar
END-TAG}
{START-TAG
foo bar
ID: 333
foo bar
END-TAG}
I want to delete all portions between START-TAG and END-TAG that contain specific IDs.
So to delete ID: 222 only this would remain:
{START-TAG
foo bar 2
ID: 111
foo bar 3
END-TAG}
{START-TAG
foo bar 2
ID: 333
foo bar 3
END-TAG}
I have a blacklist of IDs that should be removed.
I assume a quite simple multiline sed regex script would do it. Can anyone help?
It is very similar to Question: sed multiline replace but not the same.
You can use the following:
sed '/{START-TAG/{:a;N;/END-TAG}/!ba};/ID: 222/d' data.txt
Breakdown:
/{START-TAG/ { # Match '{START-TAG'
:a # Create label a
N # Read next line into pattern space
/END-TAG}/! # If not matching 'END-TAG}'...
ba # Then goto a
} # End /{START-TAG/ block
/ID: 222/d # If pattern space matched 'ID: 222' then delete it.
Don't use sed for anything that involves multiple lines, just use awk for a robust, portable solution. Given the sample input from the question you referenced, if the blocks are always separated by blank lines:
$ awk -v RS= -v ORS='\n\n' '!/ID: 222/' file
{START-TAG
foo bar
ID: 111
foo bar
END-TAG}
{START-TAG
foo bar
ID: 333
foo bar
END-TAG}
Otherwise:
$ awk '/{START-TAG/{f=1} f{rec=rec $0 ORS} /END-TAG}/{if (rec !~ /ID: 222/) print rec; rec=f=""}' file
{START-TAG
foo bar
ID: 111
foo bar
END-TAG}
{START-TAG
foo bar
ID: 333
foo bar
END-TAG}

Resources