Looking for Search Algorithm Name - algorithm

I assume there is a name for what I describe here.
Basically, if I search for "word1 word2 word3" (without quotes) and I have this array:
["word1 word2",
"word1 word2 word3",
"word3 word2 word1",
"word2 word3 word1",
"word1 word3 word2 word4",
"word1 word4 word3",
"word4 word1 word2 word3"]
It should return these found results:
word1 word2 word3
word3 word2 word1
word2 word3 word1
word1 word3 word2 word4
word4 word1 word2 word3
Is there any name for such an algorithm?

The description would be:
"Search for a all strings that contain all permutations of the following words". So maybe it should be called "Permutation Search": http://www.keyworddiscovery.com/feature-permutation-search.html

If you also allow word1 word4 word2 word3 to be returned it would be called 'keyword based search' or 'full text search' with the limitation that the search text should contain all keywords (and not only a subset).

What you are doing is basically
Search : search-set{1,2,3}
In :
sample-space-set{
set{1,2,3}
set{1,2,3,4}
set{2,3,4,5}
}
Result:
result-set{
set{1,2,3}
set{1,2,3,4}
}
Which could be put more concisely as
Find all the result-set from the sample-space-set where 'search-set is a subset'.
So basically, name of algorithm could be
'find all the mother-of-subsets'
(I really dont know what is the reverse of subset relationship. If you know that do let us know all.)

Related

windows cmd syntax remove first word i text file on every line

Hi I want following behavior in batch script windows 2012 or later.
scenario:
Example input.txt:
Word1 Word2 Word3.......Wordn
Word1 Word2 Word3.......Wordn
Word1 Word2 Word3.......Wordn
Word1 Word2 Word3.......Wordn
Example output.txt:
Word2 Word3 ........Wordn
Word2 Word3 ........Wordn
Word2 Word3 ........Wordn
Word2 Word3 ........Wordn
Used syntax:
#echo off
(for /f "tokens=1,* usebackq" %%a in ("input.txt") do #echo %%b)>"output.txt"
type output.txt give just rubish
%b
%b
%b
%b
tested without # but no difference, Checked in many examples apparently it should have been working in older windows.

Solr Proximity search

I need to implement search results as per the below link..
https://pastebin.com/abuwJxQp
Example 1:
Word1 Word2 Word3
Doc1 Word1
Doc2 Word 1 Word3
Doc3 Word 1 Word2 Word3
Doc4 Word 2 Word 3
Doc5 Word2 Word1 Word3
------------>
Doc3
Doc5
Doc4
Doc2
Doc1`
Example 2:
Word1 Word2 Word3
Doc1 (Title) Word1 Word2 Word3
Doc2 (Abstract) Word1 Word3 Word3
Doc3 (AuthorName) Word1 Word2 Word3
Doc4 (JournalName) Word1 Word2 Word3
Doc5 (PublisherName) Word1 Word2 Word3
------------>
Doc1
Doc2
?
Example 3:
Word1 Word2 Word3
Doc1 Word1
Doc2 Word 1 Word3
Doc3 Word 1 Word2 Word3
Doc4 Word 2 Word 3
Doc5 Word2 Word1 Word3
Doc6 Word1 Word1 Word1 Word1 Word1 Word1 Word1 Word1 Word1 Word1 Word1 Word1
Doc7 Word1 Word2 Word1 Word2 Word1 Word2 Word1 Word2 Word1 Word2 Word1 Word2 Word1 Word2 Word1 Word2
------------>
Doc3
Doc5
Doc4
Doc2
Doc6?
Doc1
You might want to look into "distance" measures - see the answers to this question Edit Distance Similarity in Lucene/Solr

How to call a variable declared in another perl file?

I've script1.pl and script2.pl. I'm looking for making script2.pl able to call the value of $string from script1.pl.
script1.pl
$string="word1 word2 word3 word4 word5 word6 word7 word8 word9";
$cmd="perl \"My\\File\\Path\\script2.pl\"";
system ($cmd);
script2.pl
print $string;
Note: I'm using perl for Windows.
Best practice is use a module. See perlmod.
In your case, you can use require. Make sure that require files return truth by adding 1.
script1.pl:
#!/usr/bin/perl
use warnings;
use strict;
our $string = "word1 word2 word3 word4 word5 word6 word7 word8 word9";
our $cmd = "perl \"My\\File\\Path\\script2.pl\"";
system ($cmd);
1;
script2.pl:
#!/usr/bin/perl
use strict;
use warnings;
use vars qw($string);
require "script1.pl";
print $string, "\n";
Output:
word1 word2 word3 word4 word5 word6 word7 word8 word9
While you can make that work, you're much better off passing in the variable as command line arguments, or if there's a lot of data, to STDIN.
# script1.pl
my $cmd = qq[$^X "My\\File\\Path\\script2.pl"];
my #words = qw[word1 word2 word3 word4 word5 word6 word7 word8 word9];
system $cmd, #words;
# script2.pl
print join ", ", #ARGV;
This doesn't scale well. You're better off rewriting script2.pl as a library and calling a function.
# mylibrary.pl
sub print_stuff {
print join ", ", #_;
}
# script1.pl
require 'mylibrary.pl';
print_stuff(qw[word1 word2 word3 word4 word5 word6 word7 word8 word9]);
For a handful of functions this will work fine. Eventually you'll want to look into writing modules.

Replacing a string in a Table with another string

I'm trying to solve a problem using the sed command.
I have a Table with data (few rows and cols).
I want to be able to replace the string in the i,j spot with a new string.
For an example :
word1 word2 word3 word4
word5 word6 word7 word8
word9 word10 word11 word12
with the input of 1,1 and abc should return
word1 word2 word3 word4
word5 abc word7 word8
word9 word10 word11 word12
And if possible, print it to a new file.
Thanks
Using awk might be easier:
awk -v c=1 -v r=1 -v w='abc' 'NR==r+1{$(c+1)=w}1' file
word1 word2 word3 word4
word5 abc word7 word8
word9 word10 word11 word12

commenting out lines with matching string

Looking for a simple shell script (with sed or awk) to comment out lines of a text file if a string exists in the line(s). As an example, a text file with the following:
line1 word1 word2
line2 word3 word4
line3 word5 word6
line4 word1 word7
line5 word10 word11
To be changed to:
#line1 word1 word2
line2 word3 word4
line3 word5 word6
#line4 word1 word7
line5 word10 word11
As you see, only the lines with the string "word1" are commented out.
I believe this will do it for you.
sed -i .backup "/[[:<:]]word1[[:>:]]/s/^/#/g" file
I think, your question is similar to How do I add a comment (#) in front of a line after a key word search match
Please correct me if i am wrong. I hope, this will help you.
Try this:
$ sed -e '/[[:<:]]word1[[:>:]]/ s/^/# /' < file
# line1 word1 word2
line2 word3 word4
line3 word5 word6
# line4 word1 word7
line5 word10 word11
How does this work? The sed man page says,
The form of a sed command is as follows:
[address[,address]]function[arguments]
Later in the man page, it clarifies that an address can be a regular expression, which causes the function to be applied to each line matching the regular expression. So what the command given above does is, if the line contains the standalone word word1, apply the substitution function to replace the beginning-of-line anchor with "# ".

Resources