Replace string which is two lines previous to matched pattern - bash

I have a big config.js file and I would like to replace default:false, to default:true, which is on top of field:'$scope.keepEffort'. I tried multiple sed command solutions but nothing seems to work.
{
default:false,
enabled:true,
field:'criticalPath',
filter:false,
filterValue:'',
id:'show-critical-path',
operator:'colorize'
},{
default:false,
enabled:true,
field:'$scope.keepEffort',
filter:false,
filterValue:'',
id:'effort-constant',
operator:'var'
},{
default:false,
enabled:true,
field:'$scope.automaticProgress',
filter:false,
filterValue:'',
id:'automatic-progress',
operator:'var'
},{
default:false,
enabled:true,
field:'groupView',
filter:false,
filterValue:'',
id:'gantt-group-view',
operator:'var'
},{

This is a job for awk. The following does not attempt to match the single quotes since doing so requires some shell quoting that obfuscates the solution. Also, a trailing { is printed. That is easy enough to remove, and the code for doing so is omitted for clarity:
awk '/field:.\$scope.keepEffort/{gsub("default:false","default:true")}1' RS=\{ ORS=\{ input-file
The idea is simply to separate the records by { and then perform the substitution (via gsub) only on records that match the desired line.

This might work for you (GNU sed):
sed ':a;/{/{n;:b;N;/}/!bb;/\$scope.keepEffort/s/\(default:\)false,/\1true,/;ba}' file
Gather up lines between { and } and if those lines contain $scope.keepEffort replace default:false by default:true.
N.B. The addition of the n after matching { which allows the matching of }. Also, the return to :a after gathering a collection so as to be able to match another {.

Related

skip over a pattern in sed

I wrote a correctly working sed script which replaces multiple spaces with single space between tokens (it skips lines with # or //) :
#!/bin/sed -f
/.*#/ !{
/\/\//n
# handle more than one space between tokens
s/\([^ ]\)\s\+/\1 /g
}
i run it on ubuntu like this: ./spaces.sed < spa.txt
spa.txt:
/** spa.txt text
date : some date
hih+jjhh jgjg
if ( hjh>=hjhjh )
y **/
# this is a comment
// this is a comment
lines begins here ;
/****** this line is comment ****/
some more lines
// again comment
more lines words
/** again multi line co
mmment it
comment line
follows till here**/
file ends
now i want to add the functionality that script should skip over lines between a pattern (pattern can be distributed in multiple lines). This is the pattern: /* and */
I tried many things but of no use:
#!/bin/sed -f
/.*#/ !{
/\/\*/,/\*\// {
/\/\*/n #it skips successfully the /* line
n #also skips next line
/\*\// !{
}
}
/\/\//n
# handle more than one space between tokens
s/\([^ ]\)\s\+/\1 /g
}
but script isn't working as expected.
Expected output:
/** spa.txt text
date : some date
hih+jjhh jgjg
if ( hjh>=hjhjh )
y **/
# this is a comment
// this is a comment
lines begins here ;
/****** this line is comment ****/
some more lines
// again comment
more lines words
/** again multi line co
mmment it
comment line
follows till here**/
file ends
suggestions?
Thanks
I'd re-engineer the script a bit, to handle # and // comments on their own. With the /* … */ comments, you have to deal with single-line and multi-line variants separately. I'd also use the [[:space:]] notation to spot spaces or tabs. I prefer to avoid backslashes (an aversion caused by working with troff in the days of my youth — if you've never needed 16 backslashes in a row to get the desired effect, you've not suffered enough), so I use \%…% to choose the % character as the search marker instead of / (which means there's no need to escape the slashes in the pattern with a backslash), and I use [*] instead of \*. The { p; d; } notation prints the current line and then deletes it and moves onto the next line. (Using n appends the next line to the current line; it isn't what you need.). The second semicolon isn't required by GNU sed but is by BSD (macOS) sed. The spaces in those braces are optional but make it easier to read.
Putting this together, you might have spaces.sed like this:
#!/bin/sed -f
# Comments with a #
/#/ { p; d; }
# Comments with //
\%//% { p; d; }
# Single line /* ... */ comments
\%/[*].*[*]/% { p; d; }
# Multi-line /* ... */ comments
\%/[*]%,\%[*]/% { p; d; }
s/\([^[:space:]]\)[[:space:]]\{2,\}/\1 /g
On your sample data (thanks for including it!), this produces:
/** spa.txt text
date : some date
hih+jjhh jgjg
if ( hjh>=hjhjh )
y **/
# this is a comment
// this is a comment
lines begins here ;
/****** this line is comment ****/
some more lines
// again comment
more lines words
/** again multi line co
mmment it
comment line
follows till here**/
file ends
That looks like what you wanted.
Limitations
It doesn't remove multiple spaces at the start of a line.
the leading blanks are not removed.
If you have a line with multiple spaces and // or #, the multiple spaces remain:
these spaces // survive
so do # these
If you have multiple single line comments on a single line, you don't get spaces removed in between them:
/* these */ spaces are not /* removed */
If you have a single-line comment and the start of a multi-line comment on a single line, the multi-line comment is not spotted. Similarly, if you have a multi-line comment that ends on a line and has a single-line comment starting after it, then if there are any multiple spaces between the end of the one comment and the start of the next, they are not handled.
/* this */ is not /* handled
very well */ nor are these /* spaces */
This doesn't deal with the subtleties of backslash-newline in the middle of a start or end comment symbol, nor with backslash-newline at the end of a // comment. Only brain-dead programs (or programmers) produce such comments, so it shouldn't be a real problem. Fortunately, you're not writing a compiler; those have to deal with the nonsense. And don't get me started on trigraphs!
It doesn't handle comment-like sequences inside strings (or multi-character character constants):
"/* this is not a comment */"
'/*', ' ', '*/'
However, most of these issues are subtle enough that you're probably OK without dealing with them. If you must deal with them, then you need a program, not a sed script (assuming you value your sanity).

Starting a new cycle if condition is met in sed

I am performing several commands (GNU sed) on a line and if certain condition is met, I want to skip rest of the commands.
Example:
I want to substitute all d with 4
If line start with A, C or E, skip the rest of the commands (another substitutions etc)
I want to use basic regular expressions only. If I could use extended regex, this would be trivial:
sed -r 's/d/4/g; /^(A|C|E)/! { s/a/1/g; s/b/2/g; s/c/3/g }' data
Now, with BRE, this will work fine but for more conditions, it will be really ugly:
sed 's/d/4/g; /^A/! { /^C/! { /^E/! { s/a/1/g; s/b/2/g; s/c/3/g } } }' data
Example input:
Aaabbccdd
Baabbccdd
Caabbccdd
Daabbccdd
Eaabbccdd
Example output:
Aaabbcc44
B11223344
Caabbcc44
D11223344
Eaabbcc44
This is just an example. I am not looking for different ways to approach the problem. I want to know some better ways to start a new cycle.
I suggest to use b:
/^\(A\|C\|E\)/b
From man sed:
b label: Branch to label; if label is omitted, branch to end of script.

Replace string with text only when a given text precedes it

I have about one hundred Markdown files that contain snippets of Latex like this:
<div latex="true" class="task" id="Task">
(#) Delete the fourth patterns from your .teach file and your .data files. Remember to change the second line in each so that Tlearn knows there are now only three patterns.
- They should look like [#fig:dataTeach]
</div>
I'd like to replace the <div> tags with pseudotags that are easier to read, like this:
<task>
(#) Delete the fourth patterns from your .teach file and your .data files. Remember to change the second line in each so that Tlearn knows there are now only three patterns.
- They should look like [#fig:dataTeach]
</task>
This would be trivial if all my <div> tags were marking 'tasks', but I have similar divs for 'journal' and 'highlight'. I need a process that will change the </div> to </task> only when the preceding <div> has the class or id 'task', and likewise for 'journal' and 'highlight'.
Having looked around Stack Overflow for a while, I find many examples of multiline search and replace that do almost what I want to do, but the syntax (particularly for sed) is so difficult to untangle I can't adapt it for the above case. My next option is to write a bash script to loop through line by line, but I have a feeling this might be too fragile.
Cheers
Ian
The following awk command works generically, under the following assumptions:
All opening and closing div tags are on their own lines.
Attributes all use "-quoting.
The new tag name is derived from the value of the class attribute only (this could be generalized if the rules were clearer).
awk -F ' class="' '
/^<div / && NF > 1 { tag=$2; sub("\".*", "", tag); printf "<%s>\n", tag; next }
/^<\/div>/ && tag != "" { printf "</%s>\n", tag; tag=""; next }
1
' file
-F ' class="' effectively splits each line into before (field 1, $1) and after (field 2, $2) the class attribute, if present. Only lines that have such an attribute will therefore have more than 1 field (NF > 1).
Processing the opening div tag:
Pattern /^<div / && NF > 1 therefor only matches lines that start with (^) <div and (&&) contain a class attribute (NF > 1)
tag=$2; sub("\".*", "", tag) extracts the class attribute value from the 2nd field, by replacing everything from the first " (the closing " of the attribute value) with the empty string, effectively retaining the attribute value only in variable tag.
printf "<%s>\n", tag prints the attribute value as the replacement opening tag.
next skips the rest of the script and moves to the next input line.
Processing the closing div tag:
/^<\/div>/ && tag != "" matches the closing div tag, assuming that a class attribute value was found in the previous opening tag (tag != "").
printf "</%s>\n", tag prints the new closing tag.
tag="" resets the most recent replacement tag so that any subsequent div elements that do not have class attributes don't accidentally get renamed too.
next skips the rest of the script and moves to the next input line.
All other lines:
1 simply prints all other lines as-is. (1 is a common Awk shorthand for { print }: Pattern 1, interpreted as a Boolean, is by definition true, and a pattern without associated action { ... } prints the input line by default).
No loop needed. Just pipe the files though this...
sed '/Task/s/<div.*>/<task>/g;s/<\/div>/<\/task>/g'
/Task at the beginning makes sed edit lines with the name Task in it only.
With s/NAME/NEWNAME/ you replace some text one by one.
Adding .* will replace all text starting at this point.
Last but not least, g stands for global and will edit all entries this way.
Second command (after ;) will replace </div> with </task>. Its a part of the same command like before. The difference this time is that a / (slash) will be used by sed it self, if not declared other wise! This can be archived via a \ (backslash).
Here you go. The output of your file will look like this....
<task>
(#) Delete the fourth patterns from your .teach file and your .data files. Remember to change the second line in each so that Tlearn knows there are now only three patterns.
- They should look like [#fig:dataTeach]
</task>
This might work for you (GNU sed):
v='task|journal|highlight'
sed -ri '/^<div/{:a;N;/^<\/div/M!ba;s/^<.*class="('$v')"[^>]*(.*<\/)div/<\1\2\1/}' file1 file2 file3 ...
This stores the div statements in the pattern space and then substitutes (or not) the required values depending on the shell variable set beforehand.
N.B. the alternatives are stored in the shell variable v separated by |
This should do the trick:
$msys\bin\sed -En "s/<div latex=\"true\" class=\"task\" id=\"Task\">/<task>/;T;{:a;N;s/<\/div>/<\/task>/;Ta;p;}" input.txt
These are the building blocks, in case you want to adapt it:
make a loop:{:a;
it ends when the second replacement triggers: s/<\/div>/<\/task>/;Ta;
only start it, if the first replacement triggered:
s/<div latex=\"true\" class=\"task\" id=\"Task\">/<task>/;T;
inside the loop just collect lines into pattern space:N;
at the end of the loop just print:p;}
called with extended regular expressions and without default-printing
(mine is a windows/msys sed, just so you know):$msys\bin\sed -En

Using awk or sed to print column of CSV file enclosed in double quotes

I'm working on a csv file like the one below, comma delimited, each cell is enclosed in double quotes, but some of them contain double quote and/or comma inside double quote enclosure. The actual file contain around 300 columns and 200,000 rows.
"Column1","Column2","Column3","Column4","Column5","Column6","Column7"
"abc","abc","this, but with "comma" and a quote","18"" inch TV","abc","abc","abc"
"cde","cde","cde","some other, "cde" here","cde","cde","cde"
I'll need to remove some unless columns, and merge last few columns, instead of having "," in between them, I need </br>. and move second column to the end. Anything within the cells should be the same, with double quotes and commas as the original file. Below is an example of the output that I need.
"Column1","Column4","Column5","Column2"
"abc","18"" inch TV","abc</br>abc</br>abc","abc"
"cde","some other, "cde" here","cde</br>cde</br>cde","cde"
In this example I want to remove column3 and merge column 5, 6, 7.
Below is the code that I tried to use, but it is reading either double quote and/or comma, which is end of the row to be different than what I expected.
awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, '{print $1,$4,$5"</br>"$6"</br>"$7",$2}' inputfile.csv
sed -i 's#"</br>"#</br>#g' inputfile.csv
sed is used to remove beginning and ending double quote of a cell.
The output file that I'm getting right now, if previous field contains a double quote, it will consider that is the beginning of a cell, so the following values are often pushed up a column.
Other code that I have used consider every comma as beginning of a cell, so that won't work as well.
awk -F',' 'BEGIN{OFS=",";} {print $1,$4,$5"</br>"$6"</br>"$7",$2}' inputfile.csv
sed -i 's#"</br>"#</br>#g' inputfile.csv
Any help is greatly appreciated. thanks!
CSV is a loose format. There may be subtle variations in formatting. Your particular format may or may not be expressible with a regular grammar/regular expression. (See this question for a discussion about this.) Even if your particular formatting can be expressed with regular expressions, it may be easier to just whip out a parser from an existing library.
It is not a bash/awk/sed solution as you may have wanted or needed, but Python has a csv module for parsing CSV files. There are a number of options to tweak the formatting. Try something like this:
#!/usr/bin/python
import csv
with open('infile.csv', 'r') as infile, open('outfile.csv', 'wb') as outfile:
inreader = csv.reader(infile)
outwriter = csv.writer(outfile, quoting=csv.QUOTE_ALL)
for row in inreader:
# Merge fields 5,6,7 (indexes 4,5,6) into one
row[4] = "</br>".join(row[4:7])
del row[5:7]
# Copy second field to the end
row.append(row[1])
# Remove second and third fields
del row[1:3]
# Write manipulated row
outwriter.writerow(row)
Note that in Python, indexes start with 0 (e.g. row[1] is the second field). The first index of a slice is inclusive, the last is exclusive (row[1:3] is row[1] and row[2] only). Your formatting seems to require quotes around every field, hence the quoting=csv.QUOTE_ALL. There are more options at Dialects and Formatting Parameters.
The above code produces the following output:
"Column1","Column4","Column5</br>Column6</br>Column7","Column2"
"abc","18"" inch TV","abc</br>abc</br>abc","abc"
"cde","some other, cde"" here""","cde</br>cde</br>cde","cde"
There are two issues with this:
It doesn't treat the first row any differently, so the headers of columns 5, 6, and 7 are merged like the other rows.
Your input CSV contains "some other, "cde" here" (third row, fourth column) with unescaped quotes around the cde. There is another case of this on line two, but it was removed since it is in column 3. The result contains incorrect quotes.
If these quotes are properly escaped, your sample input CSV file becomes
infile.csv (escaped quotes):
"Column1","Column2","Column3","Column4","Column5","Column6","Column7"
"abc","abc","this, but with ""comma"" and a quote","18"" inch TV","abc","abc","abc"
"cde","cde","cde","some other, ""cde"" here","cde","cde","cde"
Now consider this modified Python script that doesn't merge columns on the first row:
#!/usr/bin/python
import csv
with open('infile.csv', 'r') as infile, open('outfile.csv', 'wb') as outfile:
inreader = csv.reader(infile)
outwriter = csv.writer(outfile, quoting=csv.QUOTE_ALL)
first_row = True
for row in inreader:
if first_row:
first_row = False
else:
# Merge fields 5,6,7 (indexes 4,5,6) into one
row[4] = "</br>".join(row[4:7])
del row[5:7]
# Copy second field (index 1) to the end
row.append(row[1])
# Remove second and third fields
del row[1:3]
# Write manipulated row
outwriter.writerow(row)
The output outfile.csv is
"Column1","Column4","Column5","Column2"
"abc","18"" inch TV","abc</br>abc</br>abc","abc"
"cde","some other, ""cde"" here","cde</br>cde</br>cde","cde"
This is your sample output, but with properly escaped "some other, ""cde"" here".
This may not be precisely what you wanted, not being a sed or awk solution, but I hope it is still useful. Processing more complicated formats may justify more complicated tools. Using an existing library also removes a few opportunities to make mistakes.
This might be an oversimplification of the problem but this has worked for me with your test data:
cat /tmp/inputfile.csv | sed 's#\"\,\"#|#g' | sed 's#"</br>"#</br>#g' | awk 'BEGIN {FS="|"} {print $1 "," $4 "," $5 "</br>" $6 "</br>" $7 "," $2}'
Please not that I am on Mac probably that's why I had to wrap the commas in the AWK script in quotation marks.

Code formatting with bash script

I would like to search through a file and find all instances where the last non-blank character is a comma and move the line below that up one. Essentially, undoing line continuations like
private static final double SOME_NUMBERS[][] = {
{1.0, -6.032174644509064E-23},
{-0.25, -0.25},
{-0.16624879837036133, -2.6033824355191673E-8}
};
and transforming that to
private static final double SOME_NUMBERS[][] = {
{1.0, -6.032174644509064E-23}, {-0.25, -0.25}, {-0.16624879837036133, -2.6033824355191673E-8}
};
Is there a good way to do this?
As mjswartz suggests in the comments, we need a sed substitution command like s/,\n/ /g. That, however, does not work by itself because, by default, sed reads in only one line at a time. We can fix that by reading in the whole file first and then doing the substitution:
$ sed 'H;1h;$!d;x; s/,[[:blank:]]*\n[[:blank:]]*/, /g;' file
private static final double SOME_NUMBERS[][] = {
{1.0, -6.032174644509064E-23}, {-0.25, -0.25}, {-0.16624879837036133, -2.6033824355191673E-8}
};
Because this reads in the whole file at once, this is not a good approach for huge files.
The above was tested with GNU sed.
How it works
H;1h;$!d;x;
This series of commands reads in the whole file. It is probably simplest to think of this as an idiom. If you really want to know the gory details:
H - Append current line to hold space
1h - If this is the first line, overwrite the hold space with it
$!d - If this is not the last line, delete pattern space and jump to the next line.
x - Exchange hold and pattern space to put whole file in pattern space
s/,[[:blank:]]*\n[[:blank:]]*/, /g
This looks for lines that end with a comma, optionally followed by blanks, followed by a newline and replaces that, and any leading space on the following line, with a comma and a single space.
I think for large files awk would be better:
awk -vRS=", *\n" -vORS=", " '1' file
On lua-shell, just write like this:
function nextlineup()
vim:normal("j^y$k$pjddk")
end
vim:open("code.txt")
vim:normal("G")
while vim:k() do
vim:normal("$")
if(vim.currc == string.byte(',')) nextlineup();
end
If you are not familier with vim ,this script seems a bit scary and not robust. In fact, every operation in it is precise(and much quicker, because tey are built-in functions).
Since you are processing a code file, i suggest you try it.
here is a demo
Here is a perl solution.
cat file | perl -e '{$c = 0; while () { s/^\s+/ / if ($c); s/,\s*$/,/; print($_); $c = (m/,\s*$/) ? 1: 0; }}'

Resources