Elastic Search - querying documents where intersection of two arrays is nonempty - elasticsearch

I have a document structure as follows:
{
"documentId": 123,
"someOtherInfo": {...}
"permissions": ["a", "b, ..., "g"]
}
Users themselves have a permission set ["x", "y", "z"]. Business Rule: User A is allowed to view document X if and only if at least one of the user permissions matches documents permissions. Or put mathematically, if intersection is nonempty -
["a", "b, ..., "g"] ∩ ["x", "y", "z"] ≠ ∅
I am building a search engine that needs to find all documents user has access to. I want to store it in Elastic Search for all the great querying capabilities it provides, but how do I add a restriction for permissions using ES DSL? Many thanks.

You need a terms query where an array whose element is to be matched can be passed. This match documents containing any of the provided terms. As an example , the following will match the document containing permissions = ["a", "b", "c"] but not permissions = ["a", "t", "c"]
{
"query": {
"terms": {
"permissions": [
"x",
"y",
"z",
"b"
]
}
}
}

Related

How can we reverse the keys and values of hash with an efficient algorithm like this?

Think like you are implementing Like function of chat app.
You wanna store the information all people's name who liked a specific comment in db.
for memory reason, you don't wanna store the information which comments a specific person liked so far in db.
So what you wanna do is
# commend-id1: (user1, user2, user3)
# comment-id2: (user2]
# comment-id3: (user1)
# You wanna convert above into below by some codes
# user1: (comment-id1, comment-id3)
# user2: (comment-id2)
# user3: (comment-id1)
Is there any efficient way to achieve this?
EDIT::
someone commented the above example is impossible with efficient way.
how about this?
data = {
"field_a": {
"name": "index_a",
"used_fields": ["a", "b", "c"]
},
"field_b": {
"name": "index_b",
"used_fields": ["d", "b", "d"]
},
"field_c": {
"name": "index_c",
"used_fields": ["a"]
}
}
# you wanna convert above into below
# a. index_a, index_c
# b. index_a, index_b
# c. index_a

Elasticsearch filter by whitelist

I have a list of string values which used as a whitelist.
For example:
{
"whitelist": ["a", "b", "c"]
}
In Elastic Search I have many documents, each one contains an array of strings, like:
doc_1
{
"values": ["a", "y", "b"]
}
doc_2
{
"values": ["a", "c"]
}
I want to filter out all the documents that contain at least one value that is not contained in the whitelist.
In the example above, the requested result is doc_2, as it doesn't contain any value which is not in the whitelist, while doc_1 does contain value "y".
Is there any way to do such a thing without an external code outside of ElasticSearch?

Finding dictionary words within a source text, using Ruby

Using Ruby, I need to output a list of words, found in a dictionary, that can be formed by eliminating letters from a source text.
E.g., if I input the source text "crazed" I want to get not only words like "craze" and "razed", whose letters are in the same order AND whose letters are adjacent to each other within the source text, but ALSO words like "rad" and "red", because those words exist and can be found by eliminating select letters from "crazed" AND the output words retain letter order. BUT, words like "dare" or "race" should not be in the output list, because the letter order of the letters in "dare" or "race" are not the same as those letters found in "crazed". (If "raed" or "crae" were words in the dictionary, they WOULD be part of the output.)
My thought was to go through the source text in a binary manner
(for "crazed", we'd get:
000001 = "d";
000010 = "e";
000011 = "ed";
000100 = "z";
000101 = "zd";
000111 = "zed";
001000 = "a";
001001 = "ad"; etc.)
and compare each result with words in a dictionary, though I don't know how to code that, nor whether that is most efficient. This is where I would greatly benefit from your help.
Also, the length of the source text would be variable; it wouldn't necessarily be six letters long (like "crazed"). Inputs would potentially be much larger (20-30 characters, possibly more).
I've searched here and found questions about anagrams and about words that can be in any letter order, but not specifically what i'm looking for. Is this even possible in Ruby? Thank you.
First let's read the words of a dictionary into an array, after chomping, downcasing and removing duplicates (if, for example, the dictionary contains both "A" and "a", as does the dictionary on my Mac that I've used below).
DICTIONARY = File.readlines("/usr/share/dict/words").map { |w| w.chomp.downcase }.uniq
#=> ["a", "aa", "aal", "aalii",..., "zyzomys", "zyzzogeton"]
DICTIONARY.size
#=> 234371
The following method generates all combinations of one or more characters of a given word, respecting order, and for each, joins the characters to form a string, checks to see if the string is in the dictionary, and if it is, saves the string to an array.
To check if a string matches a word in the dictionary I perform a binary search, using the method Array#bsearch. This makes use of the fact that the dictionary is already sorted in alphabetical order.
def subwords(word)
arr = word.chars
(1..word.size).each.with_object([]) do |n,a|
arr.combination(n).each do |comb|
w = comb.join
a << w if DICTIONARY.bsearch { |dw| w <=> dw }
end
end
end
subwords "crazed"
# => ["c", "r", "a", "z", "e", "d",
# "ca", "ce", "ra", "re", "ae", "ad", "ed",
# "cad", "rad", "red", "zed",
# "raze", "craze", "crazed"]
Yes, that particular dictionary contains all those strings (such as "z") that don't appear to be English words.
Another example.
subwords "importance"
#=> ["i", "m", "p", "o", "r", "t", "a", "n", "c", "e",
# "io", "it", "in", "ie", "mo", "mr", "ma", "me", "po", "pa", "or",
# "on", "oe", "ra", "re", "ta", "te", "an", "ae", "ne", "ce",
# "imp", "ima", "ion", "ira", "ire", "ita", "ian", "ice", "mor", "mot",
# "mon", "moe", "man", "mac", "mae", "pot", "poa", "pon", "poe", "pan",
# "pac", "ort", "ora", "orc", "ore", "one", "ran", "tan", "tae", "ace",
# "iota", "ione", "iran", "mort", "mora", "morn", "more", "mote",
# "moan", "mone", "mane", "mace", "port", "pore", "pote", "pone",
# "pane", "pace", "once", "rane", "race", "tane",
# "impot", "moran", "morne", "porta", "ponce", "rance",
# "import", "impone", "impane", "prance",
# "portance",
# "importance"]
An extensive solution set that comprises words that can be obtained from using letters in any order is below. The catch with using combination to find possible subwords is that the permutations of the combinations are missed. eg: drawing from 'importance', the combination of 'mpa' will arise at some point. since this isn't a dictionary word, it'll be skipped. thereby costing us, the permutation 'map'-- dictionary subword of 'importance'. below is an extensive solution that finds more possible dictionary words. I agree that my method can be optimized for speed.
#steps
#split string at ''
#find combinations for n=2 all the way to n=word.size
#for each combination
#find the permutations of all the arrangements
#then
#join the array
#check to see if word is in dictionary
#and it's not already collected
#if it is, add to collecting array
require 'set'
Dictionary=File.readlines('dictionary.txt').map(&:chomp).to_set
Dictionary.size #39501
def subwords(word)
#split string at ''
arr=word.split('')
#excluding single letter words
#you can change 2 to 1 in line below to select for single letter words too
(2..word.size).each_with_object([]) do |n,a|
#find combinations for n=2 all the way to n=word.size
arr.combination(n).each do |comb|
#for each combination
#find the permutations of all the arrangements
comb.permutation(n).each do |perm|
#join the array
w=perm.join
#check to see if word is in dictionary and it's not already collected
if Dictionary.include?(w) && !a.include?(w)
#if it is, add to collecting array
a<<w
end
end
end
end
end
p subwords('crazed')
#["car", "arc", "rec", "ace", "cad", "are", "era", "ear", "rad", "red", "adz", "zed", "czar", "care", "race", "acre", "card", "dace", "raze", "read", "dare", "dear", "adze", "daze", "craze", "cadre", "cedar", "crazed"]
p subwords('battle')
#["bat", "tab", "alb", "lab", "bet", "tat", "ate", "tea", "eat", "eta", "ale", "lea", "let", "bate", "beat", "beta", "abet", "bale", "able", "belt", "teat", "tale", "teal", "late", "bleat", "table", "latte", "battle", "tablet"]

How to grab all values in a hash without specifying individual values in Ruby?

This is a add on for a question I asked yesterday but felt it warranted a new question.
I am taking a JSON response and want to extract all the values per iteration and put them into an array
#response = { "0"=>{"forename_1"=>"John", "surname_1"=>"Smith", forename_2"=>"Josephine", "surname_2"=>"Bradley", "middle_1"=>""},
"1"=>{"forename_1"=>"Chris", "surname_1"=>"Jenkins", forename_2"=>"Christine", "surname_2"=>"Sugar", "middle_1"=>""},
"2"=>{"forename_1"=>"Billy", "surname_1"=>"Bob", forename_2"=>"Brenda", "surname_2"=>"Goodyear", "middle_1"=>""},
"Status" => 100
}
At present this method takes specific values that I want and puts them into the array I want.
col = #response.values.grep(Hash).map { |h| "#{h['forename_1']} #{h['surname_1']} #{h['forename_2']} #{h['surname_2']} #{h['middle_1']}" }
Is there a way however to say grab ALL the values and place them into an array (I have a response where over 25 key/value pairs are returned).
At the moment if middle_1 has no value then a " " gets put into the array, ideally I would like to remove these.
Ideally I would like my newly formed array to look like
["John Smith Josephine Bradley", "Chris Jenkins Christine Sugar", "Billy Bob Brenda Goodyear"]
Even though no middle_1 is supplied there is are no double spaces in the array. I would like to learn how to tackle this.
Maybe will provide example of "cracking" the hash and extracting what you would need:
h = {a1: "a", b2: "b", c3: "", d4: nil, e5: "e"}
values = h.values.map(&:to_s).reject(&:empty?)
# => ["a", "b", "e"]
values.join(" ")
# => "a b e"
Let's consider the h.values.map(&:to_s).reject(&:empty?):
values = h.values
# => ["a", "b", "", nil, "e"]
values = values.map(&:to_s)
# => ["a", "b", "", "" "e"]
values = values.reject(&:empty?)
# => ["a", "b", "e"]
Hope that gives you some idea how you can proceed.
Good luck!
UPDATE
For provided hash you can quite easily reuse what I have described above like:
col = #response.values
.grep(Hash)
.map { |h| h.values.map(&:to_s).reject(&:empty?).join(" ") }
p col
# => ["John Smith Josephine Bradley", "Chris Jenkins Christine Sugar", "Billy Bob Brenda Goodyear"]

Using String.succ with different rules?

I currently need to code an id generator and I would like to get help on how I can do it. Basically, the id has number and letter. I wanted to use succ, but it doesn't quite do what I want. Here is the order I would like to have:
[0, 1, 2, 3, ... , 8, 9, "a", "b", "c", "d", ... , "x", "y", "z", "00", "01", "02", ..., "0a", ...]
Do you think it's possible to pass an array of what come next to succ ? Basically I would just pass something like that.
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
EDIT:
Basically I want to know the next id from an id. For example, I have a34b and this would give me a34c or a329 would give me a32a.
The id you are trying to generate can be seen as a base 36 number. So we can use String#to_i and Fixnum#to_s methods' to convert base systems (between 2 and 36).
Note: I also added a String#prev method as it may make sense here; but such a method isn't provided in the standard API.
Warning: Monkey patching core classes isn't a good practice; I just posted it as the question specifically mentioned String#succ; it may be better to subclass String to create a new id type.
Credits: Idea from this answer.
class String
def succ
(self.to_i(36) + 1).to_s(36)
end
def prev
(self.to_i(36) - 1).to_s(36)
end
end
'a329'.succ # => "a32a"
"a32a".prev # => "a329"
It seems like you basically want the id to be in base 36 (which is the numbers 0-9 plus the letters a-z). To increment a string using base 36, you should do the following:
Translate the string into an integer: nextid = last_id.to_i(36)
Add one: nextid += 1
Convert back to a string: nextid = nextid.to_s(36)

Resources