ignore spaces while comparing tokens files - bash

I have a script in which there is two file. I have to compare both file and have to display the content of mismatching file, e.g.:
file1
file2
content of file1:
abcd
efgh
ijk
content of file2:
abcd=123
efgh=
ijkl=1213
if the matching don't occur, it should display like matching not occur.
if the matching name occur but the value of respective name is not present in file2. It should display like the value is missing.
e.g. aefgh is present in bothy file but the value of efgh is not present in file 2.so it should display the matching value is not present.
file="$HOME/SAMPLE/token_values.txt"
while read -r var
do
if grep "$var" environ.ref >/dev/null
then
:
else
print "$var ((((((Not Present))))))" >> final13.txt
fi
done < "$file"

I guess this script would do it :
#!/bin/bash
#below line removes the blank lines in the first file
fileprocessed1=$( sed '/^$/d' your_file1 )
#below line removes the blank lines and replaces the = with blank space in the second file
fileprocessed2=$( sed '{/^$/d};{s/=/\ /g}' your_file2 )
paste <(echo "$fileprocessed1") <(echo "$fileprocessed2")| awk '{
if($1 == $2)
{
if(length($3) == 0)
{
print NR" : Match found but value Missing for "$2
}
else
{
print NR" : Match found for "$1" with value "$3
}
}
else
{
print NR" : No match for "$1
}
}'
would give :
1 : Match found for bcd with value 123
2 : Match found but value missing for efgh
3 : No match for ijk
for the files you have given.
But I really hope somebody would come with a one-liner for this one. :)

I would tackle it something like this in perl:
#!/usr/bin/env perl
use strict;
use warnings;
open ( my $file1, '<', "~/SAMPLE/token_values.txt" ) or die $!;
chomp ( my #tokens = <$file1> );
open ( my $file2, '<', 'environ.ref' ) or die $!;
my %data = map { /(\w+)=(\w*)/ } <$file2>;
for my $thing ( #tokens ) {
print $thing,"\n" unless $data{$thing} eq '';
}

Related

Retreive specific values from file

I have a file test.cf containing:
process {
withName : teq {
file = "/path/to/teq-0.20.9.txt"
}
}
process {
withName : cad {
file = "/path/to/cad-4.0.txt"
}
}
process {
withName : sik {
file = "/path/to/sik-20.0.txt"
}
}
I would like to retreive value associated at the end of the file for teq, cad and sik
I was first thinking about something like
grep -E 'teq' test.cf
and get only second raw and then remove part of recurrence in line
But it may be easier to do something like:
for a in test.cf
do
line=$(sed -n '{$a}p' test.cf)
if line=teq
#next line using sed -n?
do print nextline &> teq.txt
else if line=cad
do print nextline &> cad.txt
else if line=sik
do print nextline &> sik.txt
done
(obviously it doesn't work)
EDIT:
output wanted:
teq.txt containing teq-0.20.9, cad.txt containing cad-4.0 and sik.txt containing sik-20.0
Is there a good way to do that? Thank you for your comments
Based on your given sample:
awk '/withName/{close(f); f=$3 ".txt"}
/file/{sub(/.*\//, ""); sub(/\.txt".*/, "");
print > f}' ip.txt
/withName/{close(f); f=$3 ".txt"} if line contains withName, save filename in f using the third field. close() will close any previous file handle
/file/{sub(/.*\//, ""); sub(/\.txt".*/, ""); if line contains file, remove everything except the value required
print > f print the modified line and redirect to filename in f
if you can have multiple entries, use >> instead of >
Here is a solution in awk:
awk '/withName/{name=$3} /file =/{print $3 > name ".txt"}' test.cf
/withName/{name=$3}: when I see the line containing "withName", I save that name
When I see the line with "file =", I print

Find pattern in line and find last word, if matches write line and previous one to file

Looking for lines which start with "DESC:", find last word and if it matches: write line and previous one to another file
I read this Replacing the last word of a line only if match string found but unfortunately
I have a long text file in which 2 consecutive lines belongs together: one with the file path and the next one with a description, like f.i.:
PATH: /all movies/DE/0051.mkv
DESC: Bloodshot German
PATH: /all movies/DE/0052.mkv
DESC: Birds of Prey German
PATH: /all movies/EN/0074.mkv
DESC: Army of One English
So actually, if the last word matches in a line which starts with "DESC:" then write line and previous one to another file.
I now use a 'while read loop', but that is so slow.
DIR="c:/all movies/"
FILE1="${DIR}/movies_GE.txt"; echo "MOVIES GERMAN" > ${FILE1)
FILE2="${DIR}/movies_EN.txt"; echo "MOVIES ENGLISH" > ${FILE2)
while read LINE1; do
if [[ ${LINE1:0:4} = "PATH:" ]]; then
read LINE2
if [[ ${LINE2:0:4} = "DESC:" ]]; then
LASTWORD=`awk '{print $NF}' <<< ${LINE2}`
if grep -iq "German" <<< ${LASTWORD}; then echo ${LINE1} >> ${FILE1}; echo ${LINE2} >> ${FILE1}; fi
if grep -iq "English" <<< ${LASTWORD}; then echo ${LINE1} >> ${FILE2}; echo ${LINE2} >> ${FILE2}; fi
fi
fi
done < ${DIR}/all movies/movies_ALL.txt
Is there a (much) better/faster solution f.i. with sed?
I tried:
sed -ir '/^"DESC:":.*/s/^(.* )German$//g' ${FILE1}
sed -ir '/^"DESC:":.*/s/^(.* )English$//g' ${FILE2}
awk -v dir=/some/dir '
BEGIN{
# provide mapping: last word -> filename to write to
files["English"] = dir "movies_EN.txt"
files["German"] = dir "movies_DE.txt"
}
# remember path
/^PATH: /{path=$0}
# when desc
/^DESC: /{
# extract last word
w=$0; gsub(/ *$/, "", w); gsub(/.* /, "", w);
# write to one of files, if exsits
if (w in files) {
printf "%s\n%s", $0, path >> files[w]
}
}'
This might work for you (GNU sed):
sed -ne 'N;/\nDESC/{/.*German$/w file1' -e '/.*English/w file2' -e '};D' file
Turn off implicit printing -n.
Maintain a two line window.
If the second line begins DESC and the last word is either German or English, write to file1 or file2.
Delete the first line and repeat.
Awk would be a better candidate for this:
awk '/^PATH/ { path=$0 } /^DESC/ && /English$/ { printf "%s\n%s",$0,path }' file > newfile
If the line starts with "PATH", set a variable path to the line ($0). Then when the line starts with "DESC" and the line ends with "English", print the line and path to newfile

grep between two lines with specified string

I have this simple plat file (file.txt)
a43
test1
abc
cvb
bnm
test2
test1
def
ijk
xyz
test2
kfo
I need all lines between test1 and test2 in two forms, the firte one create two new files like
newfile1.txt :
test1
abc
cvb
bnm
test2
newfile2.txt
test1
def
ijk
xyz
test2
and the second form create only one new file like :
newfile.txt
test1abccvbbnmtest2
test1defijkxyztest2
Do you have any propositions?
EDIT
For the second form. I used this
sed -n '/test1/,/test2/p' file.txt > newfile.txt
But it give me a result like
test1abccvbbnmtest2test1defijkxyztest2
I need a return line like :
test1abccvbbnmtest2
test1defijkxyztest2
You can use this awk:
awk -v fn="newfile.txt" '/test1/ {
f="newfile" ++n ".txt";
s=1
} s {
print > f;
printf "%s", $0 > fn
} /test2/ {
close(f);
print "" > fn;
s=0
} END {
close(fn)
}' file
Perl, like sed and other languages, has the ability to select ranges of lines from a file, so it's a good fit for what you're trying to do.
This solution ended up being a lot more complicated than I thought it would be. I see no good reason to use it over #anubhava's awk solution. But I wrote it, so here it is:
#!/usr/bin/perl
use 5.010;
use strict;
use warnings;
use constant {
RANGE_START => qr/\Atest1\z/,
RANGE_END => qr/\Atest2\z/,
SUMMARY_FILE => 'newfile.txt',
GROUP_FILE => 'newfile%d.txt'
};
my $n = 1; # starting number of group file
my #wg; # storage for "working group" of lines
# Open summary file to write to.
open(my $sfh, '>', SUMMARY_FILE) or die $!;
while (my $line = <>) {
chomp $line;
# If the line is within the range, add it to our working group.
push #wg, $line if $line =~ RANGE_START .. $line =~ RANGE_END;
if ($line =~ RANGE_END) {
# We are at the end of a group, so summarize it and write it out.
unless (#wg > 2) {
# Discard any partial or empty groups.
#wg = ();
next;
}
# Write a line to the summary file.
$sfh->say(join '', #wg);
# Write out all lines to the group file.
my $group_file = sprintf(GROUP_FILE, $n);
open(my $gfh, '>', $group_file) or die $!;
$gfh->say(join "\n", #wg);
close($gfh);
printf STDERR "WROTE %s with %d lines\n", $group_file, scalar #wg;
# Get ready for the next group.
$n++;
#wg = ();
}
}
close($sfh);
printf STDERR "WROTE %s with %d groups\n", SUMMARY_FILE, $n - 1;
To use it, write the above lines into a file named e.g. ranges.pl, and make it executable with chmod +x ranges.pl. Then:
$ ./ranges.pl plat.txt
WROTE newfile1.txt with 5 lines
WROTE newfile2.txt with 5 lines
WROTE newfile.txt with 2 groups
$ cat newfile1.txt
test1
abc
cvb
bnm
test2
$ cat newfile.txt
test1abccvbbnmtest2
test1defijkxyztest2
For the second for you can add a new line after "test2" adding \n
sed -n '/test1/,/test2/p' file.txt | sed -e 's/test2/test2\n/g' > newfile.txt
sed is not useful to create multiple files so for the first one you should find another solution.

How to delete everything between two :'s, but not if between {}'s? [duplicate]

This question already has an answer here:
How to delete a pattern when it is not found between two symbols in Perl?
(1 answer)
Closed 8 years ago.
I have a text file like this:
This is {an example} of : some of the: text.
This is yet {another : example :} of some of the text.
:This: is :still :yet another {:example:} of :some text:.
I need to delete any text found inside any :'s, including the :'s, but not if they fall inside a pair of { and }.
Anything between a { and } is safe, including :'s.
Anything not between a { and } but found between : and : is deleted.
The :'s found outside { and } are all deleted.
The output would look like this:
This is {an example} of text.
This is yet {another : example :} of some of the text.
is yet another {:example:} of .
There is only one set of braces per line.
The paired braces are never split across lines.
There could be any number of :'s on the line, inside or outside the braces.
:'s always come in pairs.
How can I delete everything between colons, including the colons themselves, but not when protected by braces?
My best attempt so far is to use awk -F"{" '{ print $1 }' > file1.txt, awk -F"{" '{ print $2 }' > file2.txt, etc. to split the lines around the braces into different, run sed on the specific files to remove the parts, but not on the files containing the data inside the braces, then to assemble it back together with paste, but this solution is far too complicated.
This will do as you ask
use strict;
use warnings;
my $data = do {
local $/;
<DATA>;
};
my #parts = split m/ ( \{ [^{}]* \} ) /x, $data;
for (#parts) {
s/ : [^:]* : //gx unless /^\{/;
}
print #parts, "\n";
__DATA__
This is {an example} of : some of the: text.
This is yet {another : example :} of some of the text.
:This: is :still :yet another {:example:} of :some text:.
output
This is {an example} of text.
This is yet {another : example :} of some of the text.
is yet another {:example:} of .
this is simple, try the following:
perl -pe 's/({[^{}]*})|:[^:]*:/$1/g' file
all texts inside { } are saved in $1 and thus skipped:)
In Perl:
#!/usr/bin/env perl
while (<>) {
my #chars = split //;
foreach my $c (#chars) {
if ($c eq "{" .. $c eq "}") {
print "$c";
} elsif ($c eq ":" ... $c eq ":") {
}
else {
print "$c";
}
}
}
or put more succinctly:
while (<>) {
print grep {/\{/ .. /\}/ or not /:/ ... /:/} split //;
}
Counting braces and colons:
perl -ne '
$b = $c = 0;
for $char (split //) {
$b++ if $char eq "{";
$b-- if $char eq "}";
if ($b > 0) {
print $char;
}
else {
if ($c == 0 and $char eq ":") {
$c++;
}
else {
print $char if $c == 0;
$c-- if $c == 1 and $char eq ":";
}
}
}
' <<END
This is {an example} of : some of the: text.
This is yet {another : example :} of some of the text.
:This: is :still :yet another {:example:} of :some text:.
END
This is {an example} of text.
This is yet {another : example :} of some of the text.
is yet another {:example:} of .

How to remove common lines between two files without sorting? [duplicate]

This question already has answers here:
Compare 2 files and remove any lines in file2 when they match values found in file1
(4 answers)
Closed 8 years ago.
I have two files not sortered which have some lines in common.
file1.txt
Z
B
A
H
L
file2.txt
S
L
W
Q
A
The way I'm using to remove common lines is the following:
sort -u file1.txt > file1_sorted.txt
sort -u file2.txt > file2_sorted.txt
comm -23 file1_sorted.txt file2_sorted.txt > file_final.txt
Output:
B
H
Z
The problem is that I want to keep the order of file1.txt, I mean:
Desired output:
Z
B
H
One solution I tought is doing a loop to read all the lines of file2.txt and:
sed -i '/^${line_file2}$/d' file1.txt
But if files are big the performance may suck.
Do you like my idea?
Do you have any alternative to do it?
You can use just grep (-v for invert, -f for file). Grep lines from input1 that do not match any line in input2:
grep -vf input2 input1
Gives:
Z
B
H
grep or awk:
awk 'NR==FNR{a[$0]=1;next}!a[$0]' file2 file1
I've written a little Perl script that I use for this kind of thing. It can do more than what you ask for but it can also do what you need:
#!/usr/bin/env perl -w
use strict;
use Getopt::Std;
my %opts;
getopts('hvfcmdk:', \%opts);
my $missing=$opts{m}||undef;
my $column=$opts{k}||undef;
my $common=$opts{c}||undef;
my $verbose=$opts{v}||undef;
my $fast=$opts{f}||undef;
my $dupes=$opts{d}||undef;
$missing=1 unless $common || $dupes;;
&usage() unless $ARGV[1];
&usage() if $opts{h};
my (%found,%k,%fields);
if ($column) {
die("The -k option only works in fast (-f) mode\n") unless $fast;
$column--; ## So I don't need to count from 0
}
open(my $F1,"$ARGV[0]")||die("Cannot open $ARGV[0]: $!\n");
while(<$F1>){
chomp;
if ($fast){
my #aa=split(/\s+/,$_);
$k{$aa[0]}++;
$found{$aa[0]}++;
}
else {
$k{$_}++;
$found{$_}++;
}
}
close($F1);
my $n=0;
open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");
my $size=0;
if($verbose){
while(<F2>){
$size++;
}
}
close(F2);
open(F2,"$ARGV[1]")||die("Cannot open $ARGV[1]: $!\n");
while(<F2>){
next if /^\s+$/;
$n++;
chomp;
print STDERR "." if $verbose && $n % 10==0;
print STDERR "[$n of $size lines]\n" if $verbose && $n % 800==0;
if($fast){
my #aa=split(/\s+/,$_);
$k{$aa[0]}++ if defined($k{$aa[0]});
$fields{$aa[0]}=\#aa if $column;
}
else{
my #keys=keys(%k);
foreach my $key(keys(%found)){
if (/\Q$key/){
$k{$key}++ ;
$found{$key}=undef unless $dupes;
}
}
}
}
close(F2);
print STDERR "[$n of $size lines]\n" if $verbose;
if ($column) {
$missing && do map{my #aa=#{$fields{$_}}; print "$aa[$column]\n" unless $k{$_}>1}keys(%k);
$common && do map{my #aa=#{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>1}keys(%k);
$dupes && do map{my #aa=#{$fields{$_}}; print "$aa[$column]\n" if $k{$_}>2}keys(%k);
}
else {
$missing && do map{print "$_\n" unless $k{$_}>1}keys(%k);
$common && do map{print "$_\n" if $k{$_}>1}keys(%k);
$dupes && do map{print "$_\n" if $k{$_}>2}keys(%k);
}
sub usage{
print STDERR <<EndOfHelp;
USAGE: compare_lists.pl FILE1 FILE2
This script will compare FILE1 and FILE2, searching for the
contents of FILE1 in FILE2 (and NOT vice versa). FILE one must
be one search pattern per line, the search pattern need only be
contained within one of the lines of FILE2.
OPTIONS:
-c : Print patterns COMMON to both files
-f : Search only the first characters of each line of FILE2
for the search pattern given in FILE1
-d : Print duplicate entries
-m : Print patterns MISSING in FILE2 (default)
-h : Print this help and exit
EndOfHelp
exit(0);
}
In your case, you would run it as
list_compare.pl -cf file1.txt file2.txt
The -f option makes it compare only the first word (defined by whitespace) of file2 and greatly speeds things up. To compare the entire line, remove the -f.

Resources