grep between two lines with specified string - shell

I have this simple plat file (file.txt)
a43
test1
abc
cvb
bnm
test2
test1
def
ijk
xyz
test2
kfo
I need all lines between test1 and test2 in two forms, the firte one create two new files like
newfile1.txt :
test1
abc
cvb
bnm
test2
newfile2.txt
test1
def
ijk
xyz
test2
and the second form create only one new file like :
newfile.txt
test1abccvbbnmtest2
test1defijkxyztest2
Do you have any propositions?
EDIT
For the second form. I used this
sed -n '/test1/,/test2/p' file.txt > newfile.txt
But it give me a result like
test1abccvbbnmtest2test1defijkxyztest2
I need a return line like :
test1abccvbbnmtest2
test1defijkxyztest2

You can use this awk:
awk -v fn="newfile.txt" '/test1/ {
f="newfile" ++n ".txt";
s=1
} s {
print > f;
printf "%s", $0 > fn
} /test2/ {
close(f);
print "" > fn;
s=0
} END {
close(fn)
}' file

Perl, like sed and other languages, has the ability to select ranges of lines from a file, so it's a good fit for what you're trying to do.
This solution ended up being a lot more complicated than I thought it would be. I see no good reason to use it over #anubhava's awk solution. But I wrote it, so here it is:
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
use constant {
RANGE_START => qr/\Atest1\z/,
RANGE_END => qr/\Atest2\z/,
SUMMARY_FILE => 'newfile.txt',
GROUP_FILE => 'newfile%d.txt'
};
my $n = 1; # starting number of group file
my #wg; # storage for "working group" of lines
# Open summary file to write to.
open(my $sfh, '>', SUMMARY_FILE) or die $!;
while (my $line = <>) {
chomp $line;
# If the line is within the range, add it to our working group.
push #wg, $line if $line =~ RANGE_START .. $line =~ RANGE_END;
if ($line =~ RANGE_END) {
# We are at the end of a group, so summarize it and write it out.
unless (#wg > 2) {
# Discard any partial or empty groups.
#wg = ();
next;
}
# Write a line to the summary file.
$sfh->say(join '', #wg);
# Write out all lines to the group file.
my $group_file = sprintf(GROUP_FILE, $n);
open(my $gfh, '>', $group_file) or die $!;
$gfh->say(join "\n", #wg);
close($gfh);
printf STDERR "WROTE %s with %d lines\n", $group_file, scalar #wg;
# Get ready for the next group.
$n++;
#wg = ();
}
}
close($sfh);
printf STDERR "WROTE %s with %d groups\n", SUMMARY_FILE, $n - 1;
To use it, write the above lines into a file named e.g. ranges.pl, and make it executable with chmod +x ranges.pl. Then:
$ ./ranges.pl plat.txt
WROTE newfile1.txt with 5 lines
WROTE newfile2.txt with 5 lines
WROTE newfile.txt with 2 groups
$ cat newfile1.txt
test1
abc
cvb
bnm
test2
$ cat newfile.txt
test1abccvbbnmtest2
test1defijkxyztest2

For the second for you can add a new line after "test2" adding \n
sed -n '/test1/,/test2/p' file.txt | sed -e 's/test2/test2\n/g' > newfile.txt
sed is not useful to create multiple files so for the first one you should find another solution.

Related

Unscramble words Challenge - improve my bash solution

There is a Capture the Flag challenge
I have two files; one with scrambled text like this with about 550 entries
dnaoyt
cinuertdso
bda
haey
tolpap
...
The second file is a dictionary with about 9,000 entries
radar
ccd
gcc
fcc
historical
...
The goal is to find the right, unscrambled version of the word, which is contained in the dictionary file.
My approach is to sort the characters from the first word from the first file and then look up if the first word from the second file has the same length. If so then sort that too and compare them.
This is my fully functional bash script, but it is very slow.
#!/bin/bash
while IFS="" read -r p || [ -n "$p" ]
do
var=0
ro=$(echo $p | perl -F -lane 'print sort #F')
len_ro=${#ro}
while IFS="" read -r o || [ -n "$o" ]
do
ro2=$(echo $o | perl -F -lane 'print sort # F')
len_ro2=${#ro2}
let "var+=1"
if [ $len_ro == $len_ro2 ]; then
if [ $ro == $ro2 ]; then
echo $o >> new.txt
echo $var >> whichline.txt
fi
fi
done < dictionary.txt
done < scrambled-words.txt
I have also tried converting all characters to ASCII integers and sum each word, but while comparing I realized that the sum of a different char pattern may have the same sum.
[edit]
For the records:
- no anagrams contained in dictionary
- to get the flag, you need to export the unscrambled words as one blob and ans make a SHA-Hash out of it (thats the flag)
- link to ctf for guy who wanted the files https://challenges.reply.com/tamtamy/user/login.action
You're better off creating a lookup dictionary (keyed by the sorted word) from the dictionary file.
Your loop body is executed 550 * 9,000 = 4,950,000 times (O(N*M)).
The solution I propose executes two loops of at most 9,000 passes each (O(N+M)).
Bonus: It finds all possible solutions at no cost.
#!/usr/bin/perl
use strict;
use warnings qw( all );
use feature qw( say );
my $dict_qfn = "dictionary.txt";
my $scrambled_qfn = "scrambled-words.txt";
sub key { join "", sort split //, $_[0] }
my %dict;
{
open(my $fh, "<", $dict_qfn)
or die("Can't open \"$dict_qfn\": $!\n");
while (<$fh>) {
chomp;
push #{ $dict{key($_)} }, $_;
}
}
{
open(my $fh, "<", $scrambled_qfn)
or die("Can't open \"$scrambled_qfn\": $!\n");
while (<$fh>) {
chomp;
my $matches = $dict{key($_)};
say "$_ matches #$matches" if $matches;
}
}
I wouldn't be surprised if this only takes one millionths of the time of your solution for the sizes you provided (and it scales so much better than yours if you were to increase the sizes).
I would do something like this with gawk
gawk '
NR == FNR {
dict[csort()] = $0
next
}
{
print dict[csort()]
}
function csort( chars, sorted) {
split($0, chars, "")
asort(chars)
for (i in chars)
sorted = sorted chars[i]
return sorted
}' dictionary.txt scrambled-words.txt
Here's perl-free solution I came up with using sort and join:
sort_letters() {
# Splits each letter onto a line, sorts the letters, then joins them
# e.g. "hello" becomes "ehllo"
echo "${1}" | fold-b1 | sort | tr -d '\n'
}
# For each input file...
for input in "dict.txt" "words.txt"; do
# Convert each line to [sorted] [original]
# then sort and save the results with a .sorted extension
while read -r original; do
sorted=$(sort_letters "${original}")
echo "${sorted} ${original}"
done < "${input}" | sort > "${input}.sorted"
done
# Join the two files on the [sorted] word
# outputting the scrambled and unscrambed words
join -j 1 -o 1.2,2.2 "words.txt.sorted" "dict.txt.sorted"
I tried something very alike, but a bit different.
#!/bin/bash
exec 3<scrambled-words.txt
while read -r line <&3; do
printf "%s" ${line} | perl -F -lane 'print sort #F'
done>scrambled-words_sorted.txt
exec 3>&-
exec 3<dictionary.txt
while read -r line <&3; do
printf "%s" ${line} | perl -F -lane 'print sort #F'
done>dictionary_sorted.txt
exec 3>&-
printf "" > whichline.txt
exec 3<scrambled-words_sorted.txt
while read -r line <&3; do
counter="$((++counter))"
grep -n -e "^${line}$" dictionary_sorted.txt | cut -d ':' -f 1 | tr -d '\n' >>whichline.txt printf "\n" >>whichline.txt
done
exec 3>&-
As you can see I don't create a new.txt file; instead I only create whichline.txt with a blank line where the word doesn't match. You can easily paste them up to create new.txt.
The logic behind the script is nearly the logic behind yours, with the exception that I called perl less times and I save two support files.
I think (but I am not sure) that creating them and cycle only one file will be better than ~5kk calls of perl. This way "only" ~10k times is called.
Finally, I decided to use grep because it's (maybe) the fastest regex matcher, and searching for the entire line the lenght is intrinsic in the regex.
Please, note that what #benjamin-w said is still valid and, in that case, grep will reply badly and I did not managed it!
I hope this could help [:

Perl - Substitute after nth delimiter

i would like some help with a substitution i want to do on the lines of a file that look like this :
aoipp;dadada.12312;ss;1245454;Xiop;12.12;45.3;47.897;31.5;
asdfafd;14355.54664;peasd;125.1;900.2;76.897;67.456;asdfdf;
perio;777.2;ipoes;900.34;2;1980.45;870.98;67.67;
I want to replace every . with , but only after the fifth occurrence of the delimiter ;. Everything else needs to remain unchanged. So the desired output file would look like this :
aoipp;dadada.12312;ss;1245454;Xiop;12,12;45,3;47,897;31,5;
asdfafd;14355.54664;peasd;125.1;900.2;76,897;67,456;asdfdf;
perio;777.2;ipoes;900.34;2;1980,45;870,98;67,67;
I m interested in doing this primarily in perl so i can incorporate it to a larger program, but any solutions in bash / awk are welcome as well. Thanks in advance.
This awk one-liner should work for you:
awk -F';' -v OFS=";" '{for(i=6;i<=NF;i++)gsub("[.]",",",$i)}7' file
It starts from the 6th field (; separated), for each field replace all . by ,.
Test with your data:
kent$ cat f
aoipp;dadada.12312;ss;1245454;Xiop;12.12;45.3;47.897;31.5;
asdfafd;14355.54664;peasd;125.1;900.2;76.897;67.456;asdfdf;
perio;777.2;ipoes;900.34;2;1980.45;870.98;67.67;
kent$ awk -F';' -v OFS=";" '{for(i=6;i<=NF;i++)gsub("[.]",",",$i)}7' f
aoipp;dadada.12312;ss;1245454;Xiop;12,12;45,3;47,897;31,5;
asdfafd;14355.54664;peasd;125.1;900.2;76,897;67,456;asdfdf;
perio;777.2;ipoes;900.34;2;1980,45;870,98;67,67;
I used an array slice #fields[ 5 .. $#fields ] to access only the elements to be changed.
#!/usr/bin/perl
use warnings;
use strict;
my #input = qw( aoipp;dadada.12312;ss;1245454;Xiop;12.12;45.3;47.897;31.5;
asdfafd;14355.54664;peasd;125.1;900.2;76.897;67.456;asdfdf;
perio;777.2;ipoes;900.34;2;1980.45;870.98;67.67;
);
my #expected = qw( aoipp;dadada.12312;ss;1245454;Xiop;12,12;45,3;47,897;31,5;
asdfafd;14355.54664;peasd;125.1;900.2;76,897;67,456;asdfdf;
perio;777.2;ipoes;900.34;2;1980,45;870,98;67,67;
);
sub process {
my (#input) = #_;
my #output;
for my $line (#input) {
my #fields = split /;/, $line;
s/\./,/ for #fields[ 5 .. $#fields ];
push #output, join ';', #fields, q();
}
return \#output
}
use Test::More tests => 1;
is_deeply(process(#input), \#expected);
while (my $line = <DATA>) {
if ($line =~ /^(?:[^;]*;){5}/) {
substr($line, $+[0]) =~ y/./,/;
}
print $line;
}
__DATA__
aoipp;dadada.12312;ss;1245454;Xiop;12.12;45.3;47.897;31.5;
asdfafd;14355.54664;peasd;125.1;900.2;76.897;67.456;asdfdf;
perio;777.2;ipoes;900.34;2;1980.45;870.98;67.67;
perl -pe 's/(.*?;){6}\K(.*)/$2 =~ s!\.!,!rg /ge'
Skip everything until the 6th ; ((.*?;){6}\K),
and aply the substitution . , to rest of the line ($2 =~ s!\.!,!rg)
# this should do your work
sed -i 's/;/,/6g' filename
cat filename
aoipp;dadada.12312;ss;1245454;Xiop;12.12,45.3,47.897,31.5,
asdfafd;14355.54664;peasd;125.1;900.2;76.897,67.456,asdfdf,
perio;777.2;ipoes;900.34;2;1980.45,870.98,67.67,

ignore spaces while comparing tokens files

I have a script in which there is two file. I have to compare both file and have to display the content of mismatching file, e.g.:
file1
file2
content of file1:
abcd
efgh
ijk
content of file2:
abcd=123
efgh=
ijkl=1213
if the matching don't occur, it should display like matching not occur.
if the matching name occur but the value of respective name is not present in file2. It should display like the value is missing.
e.g. aefgh is present in bothy file but the value of efgh is not present in file 2.so it should display the matching value is not present.
file="$HOME/SAMPLE/token_values.txt"
while read -r var
do
if grep "$var" environ.ref >/dev/null
then
:
else
print "$var ((((((Not Present))))))" >> final13.txt
fi
done < "$file"
I guess this script would do it :
#!/bin/bash
#below line removes the blank lines in the first file
fileprocessed1=$( sed '/^$/d' your_file1 )
#below line removes the blank lines and replaces the = with blank space in the second file
fileprocessed2=$( sed '{/^$/d};{s/=/\ /g}' your_file2 )
paste <(echo "$fileprocessed1") <(echo "$fileprocessed2")| awk '{
if($1 == $2)
{
if(length($3) == 0)
{
print NR" : Match found but value Missing for "$2
}
else
{
print NR" : Match found for "$1" with value "$3
}
}
else
{
print NR" : No match for "$1
}
}'
would give :
1 : Match found for bcd with value 123
2 : Match found but value missing for efgh
3 : No match for ijk
for the files you have given.
But I really hope somebody would come with a one-liner for this one. :)
I would tackle it something like this in perl:
#!/usr/bin/env perl
use strict;
use warnings;
open ( my $file1, '<', "~/SAMPLE/token_values.txt" ) or die $!;
chomp ( my #tokens = <$file1> );
open ( my $file2, '<', 'environ.ref' ) or die $!;
my %data = map { /(\w+)=(\w*)/ } <$file2>;
for my $thing ( #tokens ) {
print $thing,"\n" unless $data{$thing} eq '';
}

How to remove common lines between two files without sorting? [duplicate]

This question already has answers here:
Compare 2 files and remove any lines in file2 when they match values found in file1
(4 answers)
Closed 8 years ago.
I have two files not sortered which have some lines in common.
file1.txt
Z
B
A
H
L
file2.txt
S
L
W
Q
A
The way I'm using to remove common lines is the following:
sort -u file1.txt > file1_sorted.txt
sort -u file2.txt > file2_sorted.txt
comm -23 file1_sorted.txt file2_sorted.txt > file_final.txt
Output:
B
H
Z
The problem is that I want to keep the order of file1.txt, I mean:
Desired output:
Z
B
H
One solution I tought is doing a loop to read all the lines of file2.txt and:
sed -i '/^${line_file2}$/d' file1.txt
But if files are big the performance may suck.
Do you like my idea?
Do you have any alternative to do it?
You can use just grep (-v for invert, -f for file). Grep lines from input1 that do not match any line in input2:
grep -vf input2 input1
Gives:
Z
B
H
grep or awk:
awk 'NR==FNR{a[$0]=1;next}!a[$0]' file2 file1
I've written a little Perl script that I use for this kind of thing. It can do more than what you ask for but it can also do what you need:
#!/usr/bin/env perl -w
use strict;
use Getopt::Std;
my %opts;
getopts('hvfcmdk:', \%opts);
my $missing=$opts{m}||undef;
my $column=$opts{k}||undef;
my $common=$opts{c}||undef;
my $verbose=$opts{v}||undef;
my $fast=$opts{f}||undef;
my $dupes=$opts{d}||undef;
$missing=1 unless $common || $dupes;;
&usage() unless $ARGV[1];
&usage() if $opts{h};
my (%found,%k,%fields);
if ($column) {
die("The -k option only works in fast (-f) mode\n") unless $fast;
$column--; ## So I don't need to count from 0
}
open(my $F1,"$ARGV[0]")||die("Cannot open $ARGV[0]: $!\n");
while(<$F1>){
chomp;
if ($fast){
my #aa=split(/\s+/,$_);
$k{$aa[0]}++;
$found{$aa[0]}++;
}
else {
$k{$_}++;
$found{$_}++;
}
}
close($F1);
my $n=0;
open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");
my $size=0;
if($verbose){
while(<F2>){
$size++;
}
}
close(F2);
open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");
while(<F2>){
next if /^\s+$/;
$n++;
chomp;
print STDERR "." if $verbose && $n % 10==0;
print STDERR "[$n of $size lines]\n" if $verbose && $n % 800==0;
if($fast){
my #aa=split(/\s+/,$_);
$k{$aa[0]}++ if defined($k{$aa[0]});
$fields{$aa[0]}=\#aa if $column;
}
else{
my #keys=keys(%k);
foreach my $key(keys(%found)){
if (/\Q$key/){
$k{$key}++ ;
$found{$key}=undef unless $dupes;
}
}
}
}
close(F2);
print STDERR "[$n of $size lines]\n" if $verbose;
if ($column) {
$missing && do map{my #aa=#{$fields{$_}}; print "$aa[$column]\n" unless $k{$_}>1}keys(%k);
$common && do map{my #aa=#{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>1}keys(%k);
$dupes && do map{my #aa=#{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>2}keys(%k);
}
else {
$missing && do map{print "$_\n" unless $k{$_}>1}keys(%k);
$common && do map{print "$_\n" if $k{$_}>1}keys(%k);
$dupes && do map{print "$_\n" if $k{$_}>2}keys(%k);
}
sub usage{
print STDERR <<EndOfHelp;
USAGE: compare_lists.pl FILE1 FILE2
This script will compare FILE1 and FILE2, searching for the
contents of FILE1 in FILE2 (and NOT vice versa). FILE one must
be one search pattern per line, the search pattern need only be
contained within one of the lines of FILE2.
OPTIONS:
-c : Print patterns COMMON to both files
-f : Search only the first characters of each line of FILE2
for the search pattern given in FILE1
-d : Print duplicate entries
-m : Print patterns MISSING in FILE2 (default)
-h : Print this help and exit
EndOfHelp
exit(0);
}
In your case, you would run it as
list_compare.pl -cf file1.txt file2.txt
The -f option makes it compare only the first word (defined by whitespace) of file2 and greatly speeds things up. To compare the entire line, remove the -f.

Use awk to parse source code

I'm looking to create documentation from source code that I have. I've been looking around and something like awk seems like it will work, but I've had no luck so far. The information is split in two files, file1.c and file2.c.
Note: I've set up an automatic build environment for the program. This detects changes in the source and builds it. I would like to generate a text file containing a list of any variables which have been modified since the last successful build. The script I'm looking for would be a post-build step, and would run after compilation
In file1.c I have a list of function calls (all the same function) that have a string name to identify them such as:
newFunction("THIS_IS_THE_STRING_I_WANT", otherVariables, 0, &iAlsoNeedThis);
newFunction("I_WANT_THIS_STRING_TOO", otherVariable, 0, &iAnotherOneINeed);
etc...
The fourth parameter in the function call contains the value of the string name in file2. For example:
iAlsoNeedThis = 25;
iAnotherOneINeed = 42;
etc...
I'm looking to output the list to a txt file in the following format:
THIS_IS_THE_STRING_I_WANT = 25
I_WANT_THIS_STRING_TOO = 42
Is there any way of do this?
Thanks
Here is a start:
NR==FNR { # Only true when we are reading the first file
split($1,s,"\"") # Get the string in quotes from the first field
gsub(/[^a-zA-Z]/,"",$4) # Remove the none alpha chars from the forth field
m[$4]=s[2] # Create array
next
}
$1 in m { # Match feild four from file1 with field one file2
sub(/;/,"") # Get rid of the ;
print m[$1],$2,$3 # Print output
}
Saving this script.awk and running it with your example produces:
$ awk -f script.awk file1 file2
THIS_IS_THE_STRING_I_WANT = 25
I_WANT_THIS_STRING_TOO = 42
Edit:
The modifications you require affects the first line of the script:
NR==FNR && $3=="0," && /start here/,/end here/ {
You can do it in the shell like so.
#!/bin/sh
eval $(sed 's/[^a-zA-Z0-9=]//g' file2)
while read -r line; do
case $line in
(newFunction*)
set -- $line
string=${1#*\"}
string=${string%%\"*}
while test $# -gt 1; do shift; done
x=${1#&}
x=${x%);}
eval x=\$$x
printf '%s = %s\n' $string $x
esac
done < file1.c
Assumptions: newFunction is at the start of the line. Nothing follows the );. Whitespace exactly as in your samples. Output
THIS_IS_THE_STRING_I_WANT = 25
I_WANT_THIS_STRING_TOO = 42
You can execute file file2.c so variables will be defined in bash. Then, you will just have to print $iAlsoNeedThis to get value from iAlsoNeedThis = 25;
It can be done with . file2.c.
Then, what you can do is:
while read line;
do
name=$(echo $line | cut -d"\"" -f2);
value=$(echo $line | cut -d"&" -f2 | cut -d")" -f1);
echo $name = ${!value};
done < file1.c
to get the THIS_IS_THE_STRING_I_WANT, I_WANT_THIS_STRING_TOO text.

Resources