Starting a new cycle if condition is met in sed - bash

I am performing several commands (GNU sed) on a line and if certain condition is met, I want to skip rest of the commands.
Example:
I want to substitute all d with 4
If line start with A, C or E, skip the rest of the commands (another substitutions etc)
I want to use basic regular expressions only. If I could use extended regex, this would be trivial:
sed -r 's/d/4/g; /^(A|C|E)/! { s/a/1/g; s/b/2/g; s/c/3/g }' data
Now, with BRE, this will work fine but for more conditions, it will be really ugly:
sed 's/d/4/g; /^A/! { /^C/! { /^E/! { s/a/1/g; s/b/2/g; s/c/3/g } } }' data
Example input:
Aaabbccdd
Baabbccdd
Caabbccdd
Daabbccdd
Eaabbccdd
Example output:
Aaabbcc44
B11223344
Caabbcc44
D11223344
Eaabbcc44
This is just an example. I am not looking for different ways to approach the problem. I want to know some better ways to start a new cycle.

I suggest to use b:
/^\(A\|C\|E\)/b
From man sed:
b label: Branch to label; if label is omitted, branch to end of script.

Related

Replace string which is two lines previous to matched pattern

I have a big config.js file and I would like to replace default:false, to default:true, which is on top of field:'$scope.keepEffort'. I tried multiple sed command solutions but nothing seems to work.
{
default:false,
enabled:true,
field:'criticalPath',
filter:false,
filterValue:'',
id:'show-critical-path',
operator:'colorize'
},{
default:false,
enabled:true,
field:'$scope.keepEffort',
filter:false,
filterValue:'',
id:'effort-constant',
operator:'var'
},{
default:false,
enabled:true,
field:'$scope.automaticProgress',
filter:false,
filterValue:'',
id:'automatic-progress',
operator:'var'
},{
default:false,
enabled:true,
field:'groupView',
filter:false,
filterValue:'',
id:'gantt-group-view',
operator:'var'
},{
This is a job for awk. The following does not attempt to match the single quotes since doing so requires some shell quoting that obfuscates the solution. Also, a trailing { is printed. That is easy enough to remove, and the code for doing so is omitted for clarity:
awk '/field:.\$scope.keepEffort/{gsub("default:false","default:true")}1' RS=\{ ORS=\{ input-file
The idea is simply to separate the records by { and then perform the substitution (via gsub) only on records that match the desired line.
This might work for you (GNU sed):
sed ':a;/{/{n;:b;N;/}/!bb;/\$scope.keepEffort/s/\(default:\)false,/\1true,/;ba}' file
Gather up lines between { and } and if those lines contain $scope.keepEffort replace default:false by default:true.
N.B. The addition of the n after matching { which allows the matching of }. Also, the return to :a after gathering a collection so as to be able to match another {.

sed to get string between two patterns

I am working on a latex file from which I need to pick out the references marked by \citep{}. This is what I am doing using sed.
cat file.tex | grep citep | sed 's/.*citep{\(.*\)}.*/\1/g'
Now this one works if there is only one pattern in a line. If there are more than one patterns i.e. \citep in a line, it fails. It fails even when there is only one pattern but more than one closing bracket }. What should I do, so that it works for all the patterns in a line and also for the exclusive bracket I am looking for?
I am working on bash. And a part of the file looks like this:
of the Asian crust further north \citep{TapponnierM76, WangLiu2009}. This has led to widespread deformation both within and
\citep{BilhamE01, Mitraetal2005} and by distributed seismicity across the region (Fig. \ref{fig1_2}). Recent GPS Geodetic
across the Dawki fault and Naga Hills, increasing eastwards from $\sim$3~mm/yr to $\sim$13~mm/yr \citep{Vernantetal2014}.
GPS velocity vectors \citep{TapponnierM76, WangLiu2009}. Sikkim Himalaya lies at the transition between this relatively simple
this transition includes deviation of the Himalaya from a perfect arc beyond 89\deg\ longitude \citep{BendickB2001}, reduction
\citep{BhattacharyaM2009, Mitraetal2010}. Rivers Tista, Rangit and Rangli run through Sikkim eroding the MCT and Ramgarh
thrust to form a mushroom-shaped physiography \citep{Mukuletal2009,Mitraetal2010}. Within this sinuous physiography,
\citep{Pauletal2015} and also in accordance with the findings of \citet{Mitraetal2005} for northeast India. In another study
field results corroborate well with seismic studies in this region \citep{Actonetal2011, Arunetal2010}. From studies of
On one line, I get answer like this
BilhamE01, TapponnierM76} and by distributed seismicity across the region (Fig. \ref{fig1_2
whereas I am looking for
BilhamE01, TapponnierM76
Another example with more than one /citep patterns gives output like this
Pauletal2015} and also in accordance with the findings of \citet{Mitraetal2005} for northeast India. In another study
whereas I am looking for
Pauletal2015 Mitraetal2005
Can anyone please help?
it's a greedy match change the regex match the first closing brace
.*citep{\([^}]*\)}
test
$ echo "\citep{string} xyz {abc}" | sed 's/.*citep{\([^}]*\)}.*/\1/'
string
note that it will only match one instance per line.
If you are using grep anyway, you can as well stick with it (assuming GNU grep):
$ echo $str | grep -oP '(?<=\\citep{)[^}]+(?=})'
BilhamE01, TapponierM76
For what it's worth, this can be done with sed:
echo "\citep{string} xyz {abc} \citep{string2},foo" | \
sed 's/\\citep{\([^}]*\)}/\n\1\n\n/g; s/^[^\n]*\n//; s/\n\n[^\n]*\n/, /g; s/\n.*//g'
output:
string, string2
But wow, is that ugly. The sed script is more easily understood in this form, which happens to be suitable to be fed to sed via a -f argument:
# change every \citep{string} to <newline>string<newline><newline>
s/\\citep{\([^}]*\)}/\n\1\n\n/g
# remove any leading text before the first wanted string
s/^[^\n]*\n//
# replace text between wanted strings with comma + space
s/\n\n[^\n]*\n/, /g
# remove any trailing unwanted text
s/\n.*//
This makes use of the fact that sed can match and sub the newline character, even though reading a new line of input will not result in a newline initially appearing in the pattern space. The newline is the one character that we can be certain will appear in the pattern space (or in the hold space) only if sed puts it there intentionally.
The initial substitution is purely to make the problem manageable by simplifying the target delimiters. In principle, the remaining steps could be performed without that simplification, but the regular expressions involved would be horrendous.
This does assume that the string in every \citep{string} contains at least one character; if the empty string must be accommodated, too, then this approach needs a bit more refinement.
Of course, I can't imagine why anyone would prefer this to #Lev's straight grep approach, but the question does ask specifically for a sed solution.
f.awk
BEGIN {
pat = "\\citep"
latex_tok = "\\\\[A-Za-z_][A-Za-z_]*" # match \aBcD
}
{
f = f $0 # store content of input file as a sting
}
function store(args, n, k, i) { # store `keys' in `d'
gsub("[ \t]", "", args) # remove spaces
n = split(args, keys, ",")
for (i=1; i<=n; i++) {
k = keys[i]
d[k]
}
}
function ntok() { # next token
if (match(f, latex_tok)) {
tok = substr(f, RSTART ,RLENGTH)
f = substr(f, RSTART+RLENGTH-1 )
return 1
}
return 0
}
function parse( i, rc, args) {
for (;;) { # infinite loop
while ( (rc = ntok()) && tok != pat ) ;
if (!rc) return
i = index(f, "{")
if (!i) return # see `pat' but no '{'
f = substr(f, i+1)
i = index(f, "}")
if (!i) return # unmatched '}'
# extract `args' from \citep{`args'}
args = substr(f, 1, i-1)
store(args)
}
}
END {
parse()
for (k in d)
print k
}
f.example
of the Asian crust further north \citep{TapponnierM76, WangLiu2009}. This has led to widespread deformation both within and
\citep{BilhamE01, Mitraetal2005} and by distributed seismicity across the region (Fig. \ref{fig1_2}). Recent GPS Geodetic
across the Dawki fault and Naga Hills, increasing eastwards from $\sim$3~mm/yr to $\sim$13~mm/yr \citep{Vernantetal2014}.
GPS velocity vectors \citep{TapponnierM76, WangLiu2009}. Sikkim Himalaya lies at the transition between this relatively simple
this transition includes deviation of the Himalaya from a perfect arc beyond 89\deg\ longitude \citep{BendickB2001}, reduction
\citep{BhattacharyaM2009, Mitraetal2010}. Rivers Tista, Rangit and Rangli run through Sikkim eroding the MCT and Ramgarh
thrust to form a mushroom-shaped physiography \citep{Mukuletal2009,Mitraetal2010}. Within this sinuous physiography,
\citep{Pauletal2015} and also in accordance with the findings of \citet{Mitraetal2005} for northeast India. In another study
field results corroborate well with seismic studies in this region \citep{Actonetal2011, Arunetal2010}. From studies of
Usage:
awk -f f.awk f.example
Expected ouput:
BendickB2001
Arunetal2010
Pauletal2015
Mitraetal2005
BilhamE01
Mukuletal2009
TapponnierM76
WangLiu2009
BhattacharyaM2009
Mitraetal2010
Actonetal2011
Vernantetal2014

Code formatting with bash script

I would like to search through a file and find all instances where the last non-blank character is a comma and move the line below that up one. Essentially, undoing line continuations like
private static final double SOME_NUMBERS[][] = {
{1.0, -6.032174644509064E-23},
{-0.25, -0.25},
{-0.16624879837036133, -2.6033824355191673E-8}
};
and transforming that to
private static final double SOME_NUMBERS[][] = {
{1.0, -6.032174644509064E-23}, {-0.25, -0.25}, {-0.16624879837036133, -2.6033824355191673E-8}
};
Is there a good way to do this?
As mjswartz suggests in the comments, we need a sed substitution command like s/,\n/ /g. That, however, does not work by itself because, by default, sed reads in only one line at a time. We can fix that by reading in the whole file first and then doing the substitution:
$ sed 'H;1h;$!d;x; s/,[[:blank:]]*\n[[:blank:]]*/, /g;' file
private static final double SOME_NUMBERS[][] = {
{1.0, -6.032174644509064E-23}, {-0.25, -0.25}, {-0.16624879837036133, -2.6033824355191673E-8}
};
Because this reads in the whole file at once, this is not a good approach for huge files.
The above was tested with GNU sed.
How it works
H;1h;$!d;x;
This series of commands reads in the whole file. It is probably simplest to think of this as an idiom. If you really want to know the gory details:
H - Append current line to hold space
1h - If this is the first line, overwrite the hold space with it
$!d - If this is not the last line, delete pattern space and jump to the next line.
x - Exchange hold and pattern space to put whole file in pattern space
s/,[[:blank:]]*\n[[:blank:]]*/, /g
This looks for lines that end with a comma, optionally followed by blanks, followed by a newline and replaces that, and any leading space on the following line, with a comma and a single space.
I think for large files awk would be better:
awk -vRS=", *\n" -vORS=", " '1' file
On lua-shell, just write like this:
function nextlineup()
vim:normal("j^y$k$pjddk")
end
vim:open("code.txt")
vim:normal("G")
while vim:k() do
vim:normal("$")
if(vim.currc == string.byte(',')) nextlineup();
end
If you are not familier with vim ,this script seems a bit scary and not robust. In fact, every operation in it is precise(and much quicker, because tey are built-in functions).
Since you are processing a code file, i suggest you try it.
here is a demo
Here is a perl solution.
cat file | perl -e '{$c = 0; while () { s/^\s+/ / if ($c); s/,\s*$/,/; print($_); $c = (m/,\s*$/) ? 1: 0; }}'

How to print comments in lex?

So the title might be a little bit misleading, but I can't think of any better way to phrase it.
Basically, I'm writing a lexical-scanner using cygwin/lex. A part of the code reads a token /* . It the goes into a predefined state C_COMMENT, and ends when C_COMMENT"/*". Below is the actual code
"/*" {BEGIN(C_COMMENT); printf("%d: /*", linenum++);}
<C_COMMENT>"*/" { BEGIN(INITIAL); printf("*/\n"); }
<C_COMMENT>. {printf("%s",yytext);}
The code works when the comment is in a single line, such as
/* * Example of comment */
It will print the current line number, with the comment behind. But it doesn't work if the comment spans multiple lines. Rewriting the 3rd line into
<C_COMMENT>. {printf("%s",yytext);
printf("\n");}
doesn't work. It will result in \n printed for every letter in the comment. I'm guessing it has something to do with C having no strings or maybe I'm using the states wrong.
Hope someone will be able to help me out :)
Also if there's any other info you need, just ask, and I'll provide.
The easiest way to echo the token scanned by a pattern is to use the special action ECHO:
"/*" { printf("%d: ", linenum++); ECHO; BEGIN(C_COMMENT); }
<C_COMMENT>"*/" { ECHO; BEGIN(INITIAL); }
<C_COMMENT>. { ECHO; }
None of the above rules matches a newline inside a comment, because in (f)lex . doesn't match newlines:
<C_COMMENT>\n { linenum++; ECHO; }
A faster way of recognizing C comments is with a single regular expression, although it's a little hard to read:
[/][*][^*]*[*]+([^/*][^*][*]+)*[/]
In this case, you'll have to rescan the comment to count newlines, unless you get flex to do the line number counting.
flex scanners maintain a line number count in yylineno, if you request that feature (using %option yylineno). It's often more efficient and always more reliable than keeping the count yourself. However, in the action, the value of yylineno is the line number count at the end of the pattern, not at the beginning, which can be misleading for multiline patterns. A common workaround is to save the value of yylineno in another variable at the beginning of the token scan.

Ruby script for matching 3 patterns on ruby

I have a fail2ban.log from which I want to grab specific fields, from 'Ban' strings. I can grab the data I need using regex one at the time, but I am not able to combine them. A typical 'fail2ban' log file has many strings. I'm interested in strings like these:
2012-05-02 14:47:40,515 fail2ban.actions: WARNING [ssh-iptables] Ban 84.xx.xx.242
xx = numbers (digits)
I want to grab: a) Date and Time, b) Ban (keyword), c) IP address
Here is my regex:
IP = (\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
date & time = ^(\d{4}\W\d{2}\W\d{2}\s\d{2}\W\d{2}\W\d{2})
My problem here is, how can I combine these three together. I tried something like this:
^(?=^\d{4}\W\d{2}\W\d{2}\s\d{2}\W\d{2}\W\d{2})(?=\.*d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$)(?=^(?Ban).)*$).*$
but does not work as I would wanted too.
To give a clearer example, here is what I want:
greyjewel:FailMap atma$ cat fail2ban.log |grep Ban|awk -F " " '{print $1, $2, $7}'|tail -n 3
2012-05-02 14:47:40,515 84.51.18.242
2012-05-03 00:35:44,520 202.164.46.29
2012-05-03 17:55:03,725 203.92.42.6
Best Regards
A pretty direct translation of the example
ruby -alne 'BEGIN {$,=" "}; print $F.values_at(0,1,-1) if /Ban/' fail2ban.log
And because I figure you must want them from within Ruby
results = File.foreach("input").grep(/Ban/).map { |line| line.chomp.split.values_at 0, 1, -1 }
If the field placement doesn't change, you don't even need a regex here:
log_line =
'2012-05-02 14:47:40,515 fail2ban.actions: WARNING [ssh-iptables] Ban 84.12.34.242'
date, time, action, ip = log_line.split.values_at(0,1,-2,-1)

Resources