How to delete everything between two :'s, but not if between {}'s? [duplicate] - bash

This question already has an answer here:
How to delete a pattern when it is not found between two symbols in Perl?
(1 answer)
Closed 8 years ago.
I have a text file like this:
This is {an example} of : some of the: text.
This is yet {another : example :} of some of the text.
:This: is :still :yet another {:example:} of :some text:.
I need to delete any text found inside any :'s, including the :'s, but not if they fall inside a pair of { and }.
Anything between a { and } is safe, including :'s.
Anything not between a { and } but found between : and : is deleted.
The :'s found outside { and } are all deleted.
The output would look like this:
This is {an example} of text.
This is yet {another : example :} of some of the text.
is yet another {:example:} of .
There is only one set of braces per line.
The paired braces are never split across lines.
There could be any number of :'s on the line, inside or outside the braces.
:'s always come in pairs.
How can I delete everything between colons, including the colons themselves, but not when protected by braces?
My best attempt so far is to use awk -F"{" '{ print $1 }' > file1.txt, awk -F"{" '{ print $2 }' > file2.txt, etc. to split the lines around the braces into different, run sed on the specific files to remove the parts, but not on the files containing the data inside the braces, then to assemble it back together with paste, but this solution is far too complicated.

This will do as you ask
use strict;
use warnings;
my $data = do {
local $/;
<DATA>;
};
my #parts = split m/ ( \{ [^{}]* \} ) /x, $data;
for (#parts) {
s/ : [^:]* : //gx unless /^\{/;
}
print #parts, "\n";
__DATA__
This is {an example} of : some of the: text.
This is yet {another : example :} of some of the text.
:This: is :still :yet another {:example:} of :some text:.
output
This is {an example} of text.
This is yet {another : example :} of some of the text.
is yet another {:example:} of .

this is simple, try the following:
perl -pe 's/({[^{}]*})|:[^:]*:/$1/g' file
all texts inside { } are saved in $1 and thus skipped:)

In Perl:
#!/usr/bin/env perl
while (<>) {
my #chars = split //;
foreach my $c (#chars) {
if ($c eq "{" .. $c eq "}") {
print "$c";
} elsif ($c eq ":" ... $c eq ":") {
}
else {
print "$c";
}
}
}
or put more succinctly:
while (<>) {
print grep {/\{/ .. /\}/ or not /:/ ... /:/} split //;
}

Counting braces and colons:
perl -ne '
$b = $c = 0;
for $char (split //) {
$b++ if $char eq "{";
$b-- if $char eq "}";
if ($b > 0) {
print $char;
}
else {
if ($c == 0 and $char eq ":") {
$c++;
}
else {
print $char if $c == 0;
$c-- if $c == 1 and $char eq ":";
}
}
}
' <<END
This is {an example} of : some of the: text.
This is yet {another : example :} of some of the text.
:This: is :still :yet another {:example:} of :some text:.
END
This is {an example} of text.
This is yet {another : example :} of some of the text.
is yet another {:example:} of .

Related

Perl - Substitute after nth delimiter

i would like some help with a substitution i want to do on the lines of a file that look like this :
aoipp;dadada.12312;ss;1245454;Xiop;12.12;45.3;47.897;31.5;
asdfafd;14355.54664;peasd;125.1;900.2;76.897;67.456;asdfdf;
perio;777.2;ipoes;900.34;2;1980.45;870.98;67.67;
I want to replace every . with , but only after the fifth occurrence of the delimiter ;. Everything else needs to remain unchanged. So the desired output file would look like this :
aoipp;dadada.12312;ss;1245454;Xiop;12,12;45,3;47,897;31,5;
asdfafd;14355.54664;peasd;125.1;900.2;76,897;67,456;asdfdf;
perio;777.2;ipoes;900.34;2;1980,45;870,98;67,67;
I m interested in doing this primarily in perl so i can incorporate it to a larger program, but any solutions in bash / awk are welcome as well. Thanks in advance.
This awk one-liner should work for you:
awk -F';' -v OFS=";" '{for(i=6;i<=NF;i++)gsub("[.]",",",$i)}7' file
It starts from the 6th field (; separated), for each field replace all . by ,.
Test with your data:
kent$ cat f
aoipp;dadada.12312;ss;1245454;Xiop;12.12;45.3;47.897;31.5;
asdfafd;14355.54664;peasd;125.1;900.2;76.897;67.456;asdfdf;
perio;777.2;ipoes;900.34;2;1980.45;870.98;67.67;
kent$ awk -F';' -v OFS=";" '{for(i=6;i<=NF;i++)gsub("[.]",",",$i)}7' f
aoipp;dadada.12312;ss;1245454;Xiop;12,12;45,3;47,897;31,5;
asdfafd;14355.54664;peasd;125.1;900.2;76,897;67,456;asdfdf;
perio;777.2;ipoes;900.34;2;1980,45;870,98;67,67;
I used an array slice #fields[ 5 .. $#fields ] to access only the elements to be changed.
#!/usr/bin/perl
use warnings;
use strict;
my #input = qw( aoipp;dadada.12312;ss;1245454;Xiop;12.12;45.3;47.897;31.5;
asdfafd;14355.54664;peasd;125.1;900.2;76.897;67.456;asdfdf;
perio;777.2;ipoes;900.34;2;1980.45;870.98;67.67;
);
my #expected = qw( aoipp;dadada.12312;ss;1245454;Xiop;12,12;45,3;47,897;31,5;
asdfafd;14355.54664;peasd;125.1;900.2;76,897;67,456;asdfdf;
perio;777.2;ipoes;900.34;2;1980,45;870,98;67,67;
);
sub process {
my (#input) = #_;
my #output;
for my $line (#input) {
my #fields = split /;/, $line;
s/\./,/ for #fields[ 5 .. $#fields ];
push #output, join ';', #fields, q();
}
return \#output
}
use Test::More tests => 1;
is_deeply(process(#input), \#expected);
while (my $line = <DATA>) {
if ($line =~ /^(?:[^;]*;){5}/) {
substr($line, $+[0]) =~ y/./,/;
}
print $line;
}
__DATA__
aoipp;dadada.12312;ss;1245454;Xiop;12.12;45.3;47.897;31.5;
asdfafd;14355.54664;peasd;125.1;900.2;76.897;67.456;asdfdf;
perio;777.2;ipoes;900.34;2;1980.45;870.98;67.67;
perl -pe 's/(.*?;){6}\K(.*)/$2 =~ s!\.!,!rg /ge'
Skip everything until the 6th ; ((.*?;){6}\K),
and aply the substitution . , to rest of the line ($2 =~ s!\.!,!rg)
# this should do your work
sed -i 's/;/,/6g' filename
cat filename
aoipp;dadada.12312;ss;1245454;Xiop;12.12,45.3,47.897,31.5,
asdfafd;14355.54664;peasd;125.1;900.2;76.897,67.456,asdfdf,
perio;777.2;ipoes;900.34;2;1980.45,870.98,67.67,

ignore spaces while comparing tokens files

I have a script in which there is two file. I have to compare both file and have to display the content of mismatching file, e.g.:
file1
file2
content of file1:
abcd
efgh
ijk
content of file2:
abcd=123
efgh=
ijkl=1213
if the matching don't occur, it should display like matching not occur.
if the matching name occur but the value of respective name is not present in file2. It should display like the value is missing.
e.g. aefgh is present in bothy file but the value of efgh is not present in file 2.so it should display the matching value is not present.
file="$HOME/SAMPLE/token_values.txt"
while read -r var
do
if grep "$var" environ.ref >/dev/null
then
:
else
print "$var ((((((Not Present))))))" >> final13.txt
fi
done < "$file"
I guess this script would do it :
#!/bin/bash
#below line removes the blank lines in the first file
fileprocessed1=$( sed '/^$/d' your_file1 )
#below line removes the blank lines and replaces the = with blank space in the second file
fileprocessed2=$( sed '{/^$/d};{s/=/\ /g}' your_file2 )
paste <(echo "$fileprocessed1") <(echo "$fileprocessed2")| awk '{
if($1 == $2)
{
if(length($3) == 0)
{
print NR" : Match found but value Missing for "$2
}
else
{
print NR" : Match found for "$1" with value "$3
}
}
else
{
print NR" : No match for "$1
}
}'
would give :
1 : Match found for bcd with value 123
2 : Match found but value missing for efgh
3 : No match for ijk
for the files you have given.
But I really hope somebody would come with a one-liner for this one. :)
I would tackle it something like this in perl:
#!/usr/bin/env perl
use strict;
use warnings;
open ( my $file1, '<', "~/SAMPLE/token_values.txt" ) or die $!;
chomp ( my #tokens = <$file1> );
open ( my $file2, '<', 'environ.ref' ) or die $!;
my %data = map { /(\w+)=(\w*)/ } <$file2>;
for my $thing ( #tokens ) {
print $thing,"\n" unless $data{$thing} eq '';
}

awk print first occurrence after match

I'm trying to print a portion of a text file between two patterns, then return only the first occurrence. Should be simple but I can't seem to find a solution.
cat test.html
if (var == "Option_1"){
document.write("<td>head1</td>")
document.write("<td>text1</td>")
}
if (var == "Option_2"){
document.write("<td>head2</td>")
document.write("<td>text2</td>")
}
if (var == "Option_1"){
document.write("<td>head3</td>")
document.write("<td>text3</td>")
}
This prints all matches:
awk '/Option_1/,/}/' test.txt
I need it to return only the first, i.e.:
if (var == "Option_1"){
document.write("<td>head1</td>")
document.write("<td>text1</td>")
}
Thanks!
Never use range expressions as they make trivial jobs very slightly briefer but then require a complete rewrite or duplicate conditions for even slightly more interesting tasks. Always use a flag:
$ awk '/Option_1/{f=1} f{print; if (/}/) exit}' file
if (var == "Option_1"){
document.write("<td>head1</td>")
document.write("<td>text1</td>")
}
I assumed that there are no } inside the if blocks.
Using GNU sed :
sed -n '/Option_1/{:a N;s/}/}/;Ta;p;q}' file
Here's how it works :
/Option_1/{ #search for Option_1
:a #create label a
N; #append next line to pattern space
s/}/}/; #substitute } with }
Ta; #if substitution failed, jump to label a
p; #print pattern space
q #exit
}
Adding somewhat to Ed Morton's answer, you can write it again to work for some nested if condition or if there exist any other pair of braces inside the if statement (eg. braces for for loop).
awk '/Option_1/{f=1} f{ if(/{/){count++}; print; if(/}/){count--; if(count==0) exit}}' filename
output for:
if (var == "Option_1"){
document.write("<td>head1</td>")
if (condition){
//code
}
document.write("<td>text1</td>")
}
if (var == "Option_2"){
document.write("<td>head2</td>")
document.write("<td>text2</td>")
}
if (var == "Option_1"){
document.write("<td>head3</td>")
document.write("<td>text3</td>")
}
is:
if (var == "Option_1"){
document.write("<td>head1</td>")
if (condition){
//code
}
document.write("<td>text1</td>")
}
count will keep count on number of starting braces and will print the statement until the count reaches 0 again.
My input might be different from question but the information may be useful.
sed '/Option_1/,/}/ !d;/}/q' YourFile
delete everything not inside your delimiter and quit after last line of it (so 1 section only)
for non GNU sed, replace the ; after d by a real new line
You can do,
awk '/Option_1/,/}/{print; if ($0 ~ /}/) exit}' test.txt
This exits after printing the first match

delete lines between two patterns without deleting the pattern

i have a file like below
[NAMES]
biren
bikash
dibya
[MAIL]
biren_k
bikash123
dibya008
my output should be like below
[NAMES]
[MAIL]
i tried the below code just to remove the lines between NAMES and MAIL, but it did not work.
sed -n '/NAMES/{p; :a; N; /MAIL/ba; s/.*\n//}; p' input.txt
Can anyone help please... i would prefer perl code if any...
NOTE: like [NAMES] and [MAIL] , i have a lot of headers in my actual file. here i have just shown two headers. I have to replace the contents below the headers(not all, only selected headers which are at random line numbers) with new contents. but first i nedd to delete the contents below them. Thats why i need my output like this. Any suggestions please...
You can modify sed as
$ sed '/\[NAMES\]/, /\[MAIL\]/ {/^\[/p; d}' input
[NAMES]
[MAIL]
biren_k
bikash123
dibya008
Please try this may be helpful on your question:
%hashes = (
"[NAMES]" => "<br/>kumar<br/>avi<br/><br/>\n",
"[MAIL]" => "<br/>biren_k<br/>bikash123<br/>dibya008<br/>\n"
);
my #arr = <DATA>;
foreach my $snarr(#arr)
{
chomp($snarr);
push(#newarr, "$snarr\n$hashes{$snarr}"), if( $hashes{$snarr} );
}
print #newarr;
__DATA__
[NAMES]
biren
bikash
dibya
[MAIL]
biren_k
bikash123
dibya008
Just replace the lines between my #erase = qw[ and ]; with HEADERS you meant to empty out.
#!/usr/bin/env perl
use strict;
use warnings;
push #ARGV, 'file.txt';
# here list out the HEADERS
# which content you wanna erase
my #erase = qw[
NAMES
MAIL
];
my %dump;
my $header;
# build a hash from your file
while (<>) {
if (/^\[([^\]]+)\]$/) {
$header = $1;
$dump{$header} = "";
next;
}
$dump{$header} .= $_ if $header;
}
# replace the content
# with empty string
foreach (#erase) {
$dump{$_} = "";
}
# now print it back to <STDOUT>
foreach (sort keys %dump) {
print "[$_]\n$dump{$_}\n";
}
I found solution to my problem here:
my #name_var = ();
while (<STDIN>)
{
last if ($_ =~ /^\n/ );
push(#name_var, $_);
}
my #mail_add = ();
while (<STDIN>)
{
last if ($_ =~ /^\n/ );
push(#mail_add, $_);
}
open(my $var, "input.txt") || die("Input File not found");
open(my $out, ">temp.txt") || die("Temp File not created");
while($line = <$var>)
{
# print $line;
if( $line =~ /\[NAMES\]/)
{
print $out $line;
print $out $name_var;
while(($line = <$var>) && ($line !~ /^\n/))
{
}
}
if( $line =~ /\[MAIL\]/)
{
print $out $line;
print $out $mail_add;
while(($line = <$var>) && ($line !~ /^\n/))
{
}
}
print $tcf_out $line;
}
close($var);
close($out);
open($var1,">input.txt") || die("failed to open\n");
open($out1,"<temp.txt") || die("failed to open\n");
while($fl = <$out1>)
{
print $var1 $fl;
}
close($var1);
close($out1);
Thank you all. I got the solution from stack overflow, perlmonk and few more sites related to perl.

Script to migrate data from one source to another

I have a .h file, among other things, containing data in this format
struct X[]{
{"Field", "value1 value2 value"},
{"Field2", "value11 value12 value232"},
{"Field3", "x y z"},
{"Field4", "a bbb s"},
{"Field5", "sfsd sdfdsf sdfs"};
/****************/
};
I have text file containing, values that I want to replace in .h file with new values
value1 Valuesdfdsf1
value2 Value1dfsdf
value3 Value1_another
sfsd sfsd_ewew
sdfdsf sdfdsf_ew
sdfs sfsd_new
And the resulting .h file will contain the replacements from the text file above. Everything else remains the same.
struct X[]{
{"Field1", "value11 value12 value232"},
{"Field2", "value11 value12 value232"},
{"Field3", "x y z"},
{"Field4", "a bbb s"},
{"Field5", "sfsd_ewew sdfdsf_ew sdfs_new"};
/****************/
};
Please help me come with a solution to accomplish it using unix tools: awk, perl, bash, sed, etc
cat junk/n2.txt | perl -e '{use File::Slurp; my #r = File::Slurp::read_file("junk/n.txt"); my %r = map {chomp; (split(/\s+/,$_))[0,1]} #r; while (<>) { unless (/^\s*{"/) {print $_; next;}; my ($pre,$values,$post) = ($_ =~ /^(\s*{"[^"]+", ")([^"]+)(".*)$/); my #new_values = map { exists $r{$_} ? $r{$_}:$_ } split(/\s+/,$values); print $pre . join(" ",#new_values) . $post . "\n"; }}'
Result:
struct X[]{
{"Field", "value1 Value1dfsdf value"},
{"Field2", "value11 value12 value232"},
{"Field3", "x y z"},
{"Field4", "a bbb s"},
{"Field5", "sfsd_ewew sdfdsf_ew sfsd_new"};
/****************/
};
Code untangled:
use File::Slurp;
my #replacements = File::Slurp::read_file("junk/n.txt");
my %r = map {chomp; (split(/\s+/,$_))[0,1]} #replacements;
while (<>) {
unless (/^\s*{"/) {print $_; next;}
my ($pre,$values,$post) = ($_ =~ /^(\s*{"[^"]+", ")([^"]+)(".*)$/);
my #new_values = map { exists $r{$_} ? $r{$_} : $_ } split(/\s+/, $values);
print $pre . join(" ",#new_values) . $post . "\n";
}
#!/usr/bin/perl
use strict; use warnings;
# you need to populate %lookup from the text file
my %lookup = qw(
value1 Valuesdfdsf1
value2 Value1dfsdf
value3 Value1_another
sfsd sfsd_ewew
sdfdsf sdfdsf_ew
sdfs sfsd_new
);
while ( my $line = <DATA> ) {
if ( $line =~ /^struct \w+\Q[]/ ) {
print $line;
process_struct(\*DATA, \%lookup);
}
else {
print $line;
}
}
sub process_struct {
my ($fh, $lookup) = #_;
while (my $line = <$fh> ) {
unless ( $line =~ /^{"(\w+)", "([^"]+)"}([,;])\s+/ ) {
print $line;
return;
}
my ($f, $v, $p) = ($1, $2, $3);
$v =~ s/(\w+)/exists $lookup->{$1} ? $lookup->{$1} : $1/eg;
printf qq|{"%s", "%s"}%s\n|, $f, $v, $p;
}
return;
}
__DATA__
struct X[]{
{"Field", "value1 value2 value"},
{"Field2", "value11 value12 value232"},
{"Field3", "x y z"},
{"Field4", "a bbb s"},
{"Field5", "sfsd sdfdsf sdfs"};
/****************/
};
Here's a simple looking program:
use strict;
use warnings;
use File::Copy;
use constant {
OLD_HEADER_FILE => "headerfile.h",
NEW_HEADER_FILE => "newheaderfile.h",
DATA_TEXT_FILE => "data.txt",
};
open (HEADER, "<", OLD_HEADER_FILE) or
die qq(Can't open file old header file ") . OLD_HEADER_FILE . qq(" for reading);
open (NEWHEADER, ">", NEW_HEADER_FILE) or
die qq(Can't open file new header file ") . NEW_HEADER_FILE . qq(" for writing);
open (DATA, "<", DATA_TEXT_FILE) or
die qq(Can't open file data file ") . DATA_TEXT_FILE . qq(" for reading);
#
# Put Replacement Data in a Hash
#
my %dataHash;
while (my $line = <DATA>) {
chomp($line);
my ($key, $value) = split (/\s+/, $line);
$dataHash{$key} = $value if ($key and $value);
}
close (DATA);
#
# NOW PARSE THOUGH HEADER
#
while (my $line = <HEADER>) {
chomp($line);
if ($line =~ /^\s*\{"Field/) {
foreach my $key (keys(%dataHash)) {
$line =~ s/\b$key\b/$dataHash{$key}/g;
}
}
print NEWHEADER "$line\n";
}
close (HEADER);
close (NEWHEADER);
copy(NEW_HEADER_FILE, OLD_HEADER_FILE) or
die qq(Unable to replace ") . OLD_HEADER_FILE . qq(" with ") . NEW_HEADER_FILE . qq(");
I could make it more efficient by using map, but that makes it harder to understand.
Basically:
I open three files, the original Header, the new Header I'm building, and the data file
I first put my data into a hash where the replacement text is keyed by the original text. (Could have done it the other way around if I wanted.
I then go through each line of the original header.
** If I see a line that looks like its a field line, I know that I might have to do a replacement.
** For each entry in my %dataHash, I do a substitution of the $key with the $dataHash{$key} replacement value. I use the \b to mark word boundries. This way, field11 is not substituted because I see field1 in that string.
** Now I write the line back to my new header file. If I didn't replace anything, I just write back the original line.
Once I finish, I copy the new header over the old header file.
This script should work
keyval is the file containing key value pairs
filetoreplace is the file containing data to be modified
The file named changed will contain the changes
#!/bin/sh
echo
keylist=`cat keyval | awk '{ print $1}'`
while read line
do
for i in $keylist
do
if echo $line | grep -wq $i; then
value=`grep -w $i keyval | awk '{print $2}'`
line=`echo $line | sed -e "s/$i/$value/g"`
fi
done
echo $line >> changed
done < filetoreplace
This might be kind of slow if your files are big.
gawk -F '[ \t]*|"' 'FNR == NR {repl[$1]=$2;next}{for (f=1;f<=NF;++f) for (r in repl) if ($f == r) $f=repl[r]; print} ' keyfile file.h

Resources