I have elastic search documents with structure like this:
{
"name": "item1",
"storages": [
{"items": ["a", "b", "c", "d", "e", "f"]},
{"items": ["a 1", "b 2", "c 3", "d 4", "e 5", "f 6"]}]
}
{
"name": "item2",
"storages": [
{"items": ["d", "e", "f", "g", "h", "i", "j"]},
{"items": ["d 4", "e 5", "f 6", "g 7", "h 8", "i 9", "j 10"]}
]
}
and I want to search for sequence of strings, for example ["d 4","e 5"].
For this I use MoreLikeThis query:
{
"query": {
"more_like_this" : {
"fields" : ["storages.items"],
"like" : ["d 4","e 5"],
"min_term_freq": 1,
"min_doc_freq": 1
}
}
}
and it works almost fine, but it returns "_score": 0.1620518 for first document and "_score": 0.13890153 for second.
I want to boost score for terms from the begining of array ('items'), so because "d 4", "e 5" appears on the begining of array it should be ranked higher.
Is there way to create such query in elasticsearch? May be it should be not more like this query?
Tricky part is that query could be something like ["d 4","e 5", "xxx"] (xxx not present in document, but it's ok)
as you can see in this answer to a related question,
arrays are indexed—made searchable—as multivalue fields, which are
unordered
so you can't count on the order when you search.
Even worse, the array of objects is not stored as you think.
Arrays of objects do not work as you would expect: you cannot query each object independently of the other objects in the array. If you need to be able to do this then you should use the nested datatype instead of the object datatype.
Related
I have a document structure as follows:
{
"documentId": 123,
"someOtherInfo": {...}
"permissions": ["a", "b, ..., "g"]
}
Users themselves have a permission set ["x", "y", "z"]. Business Rule: User A is allowed to view document X if and only if at least one of the user permissions matches documents permissions. Or put mathematically, if intersection is nonempty -
["a", "b, ..., "g"] ∩ ["x", "y", "z"] ≠ ∅
I am building a search engine that needs to find all documents user has access to. I want to store it in Elastic Search for all the great querying capabilities it provides, but how do I add a restriction for permissions using ES DSL? Many thanks.
You need a terms query where an array whose element is to be matched can be passed. This match documents containing any of the provided terms. As an example , the following will match the document containing permissions = ["a", "b", "c"] but not permissions = ["a", "t", "c"]
{
"query": {
"terms": {
"permissions": [
"x",
"y",
"z",
"b"
]
}
}
}
Assuming the following data tuple containing a person's name, age and the books he has read:
list = [
["Peter", 21, ["Book 1", "Book 2", "Book 3", "Book 4"],
["Amy", 19, ["Book 3", "Book 4"],
["Sanders", 32, ["Book 1", "Book 2",],
["Charlie", 21, ["Book 4", "Book 5", "Book 6"],
["Amanda", 21, ["Book 2", "Book 5"]
]
What is the optimal way to extract names grouped by the books read, into the following format (basically a an array of arrays containing the book name and an array of names of people who read it)
results = [
["Book 1", ["Sanders", "Peter"]],
["Book 2", ["Sanders" "Amanda", "Peter"]],
["Book 3", ["Peter", "Amy"]],
["Book 4", ["Charlie", "Peter", "Amy"]],
["Book 5", ["Amanda","Charlie"]],
["Book 6", ["Charlie"]]
]
I've tried the following iterating method which extracts the lists of names and puts them into a hash, with the book title as the keys.
book_hash = Hash.new([])
list.each { |name,age,books|
books { |x| book_hash[x] = book_hash[x] + [name] }
}
results = book_hash.to_a.sort
However, the above method seems rather inefficient when handling large datasets containing millions of names. I've attempted to use the Array.group_by, but so far I'm unable to make it work with nested arrays.
Does anyone have any idea about the above?
Hash output. More suitable.
list.each_with_object({}) do |(name, age, books), hash|
books.each do |book|
(hash[book] ||= []) << name
end
end
If you must make it an array, then append a .to_a to the output of the above.
I have an hash like this -
{"examples"=>
[{"year"=>1999,
"provider"=>{"name"=>"abc", "id"=>711},
"url"=> "http://example.com/1",
"reference"=>"abc",
"text"=> "Sample text 1",
"title"=> "Sample Title 1",
"documentId"=>30091286,
"exampleId"=>786652043,
"rating"=>357.08115},
{"year"=>1999,
"provider"=>{"name"=>"abc", "id"=>3243},
"url"=> "http://example.com/2",
"reference"=>"dec",
"text"=> "Sample text 2",
"title"=> "Sample Title 2",
"documentId"=>30091286,
"exampleId"=>786652043,
"rating"=>357.08115},
{"year"=>1999,
"provider"=>{"name"=>"abc", "id"=>191920},
"url"=> "http://example.com/3",
"reference"=>"wer",
"text"=> "Sample text 3",
"title"=> "Sample Title 3",
"documentId"=>30091286,
"exampleId"=>786652043,
"rating"=>357.08115}]
}
and I would like to create a new array by pulling out the keys, and values for just the "text", "url" and "title" keys like below.
[
{"text"=> "Sample text 1", "title"=> "Sample Title 1", "url"=> "http://example.com/1"},
{"text"=> "Sample text 2", "title"=> "Sample Title 2", "url"=> "http://example.com/2"},
{"text"=> "Sample text 3", "title"=> "Sample Title 3", "url"=> "http://example.com/3"}
]
Any help is sincerely appreciated.
You should do as
hash['examples'].map do |hash|
keys = ["text", "title", "url"]
keys.zip(hash.values_at(*keys)).to_h
end
If you are below < 2.1 use,
Hash[keys.zip(hash.values_at(*keys))]
Here's another way this could be done (where h is the hash given in the question).
KEEPERS = ['text','url','title']
h.each_key.with_object({}) { |k,g|
g[k] = h[k].map { |h| h.select { |sk,_| KEEPERS.include? sk } } }
#=> {"examples"=>[
# [{"url"=>"http://example.com/1", "text"=>"Sample text 1",
# "title"=>"Sample Title 1"},
# {"url"=>"http://example.com/2", "text"=>"Sample text 2",
# "title"=>"Sample Title 2"},
# {"url"=>"http://example.com/3", "text"=>"Sample text 3",
# "title"=>"Sample Title 3"}]}
Here we simply create a new hash (denoted by the outer block variable g) which has all the keys of the original hash h (just one, "examples", but there could be more), and for each associated value, which is an array of hashes, we use Enumerable#map and Hash#select to retain only the desired key/value pairs from each of those hashes.
I need a query to be split into words everywhere a non word character is used. For example:
query = "I am a great, boy's and I like! to have: a lot-of-fun and #do$$nice&acti*vities+enjoy good ?times."
Should output:
["I", "am", "a", "great", "", "boy", "s", "and", "I", "like", "", "to", "have", "", "a", "lot", "of", "fun", "and", "", "do", "", "nice", "acti", "vities", "enjoy", "good", "", "times"]
This does the trick but is there a simpler way?
query.split(/[ ,'!:\\#\\$\\&\\*+?.-]/)
query.split(/\W+/)
# => ["I", "am", "a", "great", "boy", "s", "and", "I", "like", "to", "have", "a", "lot", "of", "fun", "and", "do", "nice", "acti", "vities", "enjoy", "good", "times"]
query.scan(/\w+/)
# => ["I", "am", "a", "great", "boy", "s", "and", "I", "like", "to", "have", "a", "lot", "of", "fun", "and", "do", "nice", "acti", "vities", "enjoy", "good", "times"]
This is different from the expected output in that it does not include empty strings.
I am adding this answer as #sawa's did not exactly reproduce the desired output:
#Split using any single non-word character:
query.split(/\W/) #=> ["I", "am", "a", "great", "", "boy", "s", "and", "I", "like", "", "to", "have", "", "a", "lot", "of", "fun", "and", "", "do", "", "nice", "acti", "vities", "enjoy", "good", "", "times"]
Now if you do not want the empty strings in the result just use sawa's answer.
The result above will create many empty strings in the result if the string contains multiple spaces, as each extra spaces will be matched again and create a new splitting point. To avoid that we can add an or condition:
# Split using any number of spaces or a single non-word character:
query.split(/\s+|\W/)
I have a hash in Ruby that looks like this:
{"NameValues"=>[
{"Name"=>"Field 1", "Values"=>["Data 1"]},
{"Name"=>"Field 2", "Values"=>["Data 2"]},
{"Name"=>"Field 3", "Values"=>["Data 3"]},
{"Name"=>"Field 4", "Values"=>["Data 4"]},
{"Name"=>"Field 5", "Values"=>["Data 5"]}
]}
I want to select the contents of the "Values" element by using the name from the "Names" element, e.g., locate the "Data 3" string by searching for "Field 3" etc.
You could use the Enumerable#find method to find the hash by name:
hash = {"NameValues"=>[
{"Name"=>"Field 1", "Values"=>["Data 1"]},
{"Name"=>"Field 2", "Values"=>["Data 2"]},
{"Name"=>"Field 3", "Values"=>["Data 3"]},
{"Name"=>"Field 4", "Values"=>["Data 4"]},
{"Name"=>"Field 5", "Values"=>["Data 5"]}
]}
p hash['NameValues'].find{ |h| h['Name'] == 'Field 3'}['Values']
#=> ["Data 3"]
find basically iterates through the NameValues array until a matching element is found. You can then get the Values from the returned element.