Query nested items containing all words - elasticsearch

I have an index in elastic search that contains simple nested items, defined this way:
'index' : 'items',
'body' : {
'name' : {'type' : 'string'},
'steps' : {
'type' : 'nested',
'text' : {'type' : 'string'},
}
}
Each step is a line in the object definition. Let's consider I have the four following objects:
obj1:
foo
obj2:
bar
obj3:
foo bar
obj4:
foo
bar
I want to be able to search objects that have a line containing all words in the query. So If I query with 'foo bar', only 'obj3' will appear in the result.
My current query is has follows:
'index : 'items',
'body' : {
'query' : {
'match' : {
"steps.text": {
'query' : 'foo bar',
'operator' : 'and'
}
}
}
This query almost works (it filters out obj1 and obj2 as they only contain one of the word) but obj4 still appears.
So is there a way to tell elastic search "at least one step matches all the words" ?
Thanks in advance,
Vincent

Finally solve the issue :)
The query should have been:
{
nested: {
path: 'steps',
query: {
match: {
text: {
query: 'foo bar',
operator: 'AND'
}
}
}
}
}
This way it only finds items where one step contains 'foo' and 'bar'.

Related

Created nested fields from Xpath & check for existing documents

I have two questions;
parsing xml data & adding it to an array in a record in an index
checking for an existing record in an index and if it exists add the new data of that record to the array of the existing record
I have an jdbc input that has an xml column,
input {
jdbc {
....
statement => "SELECT event_xml....
}
}
then an xml filter to parse the data,
How do i make the the last 3 xpaths to be an array? Do i need a mutate or ruby filter? I cant seem to figure it out
filter {
xml {
source => "event_xml"
remove_namespaces => true
store_xml => false
force_array => false
xpath => [ "/CaseNumber/text()", "case_number" ]
xpath => [ "/FormName/text()", "[conversations][form_name]" ]
xpath => [ "/EventDate/text()", "[conversations][event_date]" ]
xpath => [ "/CaseNote/text()", "[conversations][case_note]" ]
}
}
so it would something like this look like this in the Elastic search.
{
"case_number" : "12345",
"conversations" :
[
{
"form_name" : "form1",
"event_date" : "2019-01-09T00:00:00Z",
"case_note" : "this is a case note"
}
]
}
So second question is, if there is already a unique case_number of "12345" instead of creating a new record for this add the new xml values to the conversations array. so it would look like this
{
"case_number" : "12345",
"conversations" : [
{
"form_name" : "form1",
"event_date" : "2019-01-09T00:00:00Z",
"case_note" : "this is a case note"
},
{
"form_name" : "form2",
"event_date" : "2019-05-09T00:00:00Z",
"case_note" : "this is another case note"
}
]
}
my output filter
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "cases"
manage_template => false
}
}
Is this possible? thanks
this ruby filter created the array
ruby {
code => '
event.set("conversations", [Hash[
"publish_event_id", event.get("publish_event_id"),
"form_name", event.get("form_name"),
"event_date", event.get("event_date"),
"case_note", event.get("case_note")
]])
'
}
for the output was resolved by
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "cases"
document_id => "%{case_number}"
action => "update"
doc_as_upsert => true
script => "
boolean recordExists = false;
for (int i = 0; i < ctx._source.conversations.length; i++)
{
if(ctx._source.conversations[i].publish_event_id == params.event.get('conversations')[0].publish_event_id)
{
recordExists = true;
}
}
if(!recordExists){
ctx._source.conversations.add(params.event.get('conversations')[0]);
}
"
manage_template => false
}
}

How to find all documents that don't have an array or it is smaller than

I'm trying to find all documents, which either don't have an array tags or the size of the array is smaller than 2. How do I do this? I'm trying this, but doesn't work:
db.collection.find({
'text' => { '$exists' => true }, # I need this one too
'tags' => {
'$or' => [
{ '$exists' => false },
{ '$lt' => ['$size', 2] }
]
}
})
It's Ruby, btw. MongoDB version is 4.
I'm getting:
unknown operator: $or
You can use below query
db.collection.find({
text: { $exists: true },
$or: [{
tags: { $exists: false }
}, {
$expr: { $lt: [{ $size: '$tags' }, 2] }
}]
})
To slightly modify MauriRamone's answer to a smaller version:
db.getCollection('test').find({
$and:[
{"text":{$exists:true} },
{$where: "!this.tags || this.tags.length < 2"}
]
})
However, $where is slow, and other options (such as Anthony's) should be preferred.
Your original query wasn't working because $or only works in expressions, not in fields, and you need an $expr operator for the size.
try using $were in your query, like this:
db.getCollection('test').find({
$and:[
{"text":{$exists:true} },
{
$or:[
{"tags":{$exists:false}},
{$where: "this.tags.length < 2"}
]
}
]
})
I am using Robomongo to test, you should format the query to Ruby.
regards.

Map keys with the same name

A GET to an API endpoint I'm working with returns json with an inconsistent order of contacts, either
{"contacts"=>[
{"id"=>$UUID_0, "name"=>nil, "email"=>$EMAIL_0, "phone"=>$PHONE_0, "type"=>"foo"},
{"id"=>$UUID_1, "name"=>nil, "email"=>$EMAIL_1, "phone"=>$PHONE_1, "type"=>"bar"}
]}
or
{"contacts"=>[
{"id"=>$UUID_1, "name"=>nil, "email"=>$EMAIL_1, "phone"=>$PHONE_1, "type"=>"bar"},
{"id"=>$UUID_0, "name"=>nil, "email"=>$EMAIL_0, "phone"=>$PHONE_0, "type"=>"foo"}
]}
The "type" values are the only static objects in these responses, so I'd like to map this so that the contact types are keys containing the other pairs:
{
"foo"=>{"id"=>$UUID_0, "name"=>$NAME_0, "email"=>$EMAIL_0, "phone"=>$PHONE_0},
"bar"=>{"id"=>$UUID_1, "name"=>$NAME_1, "email"=>$EMAIL_1, "phone"=>$PHONE_1}
}
A solution is not obvious to me.
If you use Ruby on Rails, or at least ActiveSupport, you can try index_by instead of group_by: it won't put the values into arrays.
hash['contacts'].index_by {|r| r['type']}
=>
{
"bar" => {
"id" => "asdf",
"name" => nil,
"email" => "EMAIL_1",
"phone" => "PHONE_1",
"type" => "bar"
},
"foo" => {
"id" => "asdf",
"name" => nil,
"email" => "EMAIL_0",
"phone" => "PHONE_0",
"type" => "foo"
}
}
Hash[data['contacts'].map { |c| [c['type'], c] }]
This can be done with Enumerable#reduce:
hash['contacts'].reduce({}) {|m,c| m[c['type']] = c;m}
How it works:
An empty hash is the starting point.
The block is called once for each element in the contacts list. The block receives the hash that we're building as m and the current contact as c.
In the block, assign c to the hash based on its type and return the hash so far.
Final result is the last return value of the block.

Multiple limit condition in mongodb

I have a collection in which one of the field is "type". I want to get some values of each type depending upon condition which is same for all the types. Like I want 2 documents for type A, 2 for type B like that.
How to do this in a single query? I am using Ruby Active Record.
Generally what you are describing is a relatively common question around the MongoDB community which we could describe as the "top n results problem". This is when given some input that is likely sorted in some way, how to get the top n results without relying on arbitrary index values in the data.
MongoDB has the $first operator which is available to the aggregation framework which deals with the "top 1" part of the problem, as this actually takes the "first" item found on a grouping boundary, such as your "type". But getting more than "one" result of course gets a little more involved. There are some JIRA issues on this about modifying other operators to deal with n results or "restrict" or "slice". Notably SERVER-6074. But the problem can be handled in a few ways.
Popular implementations of the rails Active Record pattern for MongoDB storage are Mongoid and Mongo Mapper, both allow access to the "native" mongodb collection functions via a .collection accessor. This is what you basically need to be able to use native methods such as .aggregate() which supports more functionality than general Active Record aggregation.
Here is an aggregation approach with mongoid, though the general code does not alter once you have access to the native collection object:
require "mongoid"
require "pp";
Mongoid.configure.connect_to("test");
class Item
include Mongoid::Document
store_in collection: "item"
field :type, type: String
field :pos, type: String
end
Item.collection.drop
Item.collection.insert( :type => "A", :pos => "First" )
Item.collection.insert( :type => "A", :pos => "Second" )
Item.collection.insert( :type => "A", :pos => "Third" )
Item.collection.insert( :type => "A", :pos => "Forth" )
Item.collection.insert( :type => "B", :pos => "First" )
Item.collection.insert( :type => "B", :pos => "Second" )
Item.collection.insert( :type => "B", :pos => "Third" )
Item.collection.insert( :type => "B", :pos => "Forth" )
res = Item.collection.aggregate([
{ "$group" => {
"_id" => "$type",
"docs" => {
"$push" => {
"pos" => "$pos", "type" => "$type"
}
},
"one" => {
"$first" => {
"pos" => "$pos", "type" => "$type"
}
}
}},
{ "$unwind" => "$docs" },
{ "$project" => {
"docs" => {
"pos" => "$docs.pos",
"type" => "$docs.type",
"seen" => {
"$eq" => [ "$one", "$docs" ]
},
},
"one" => 1
}},
{ "$match" => {
"docs.seen" => false
}},
{ "$group" => {
"_id" => "$_id",
"one" => { "$first" => "$one" },
"two" => {
"$first" => {
"pos" => "$docs.pos",
"type" => "$docs.type"
}
},
"splitter" => {
"$first" => {
"$literal" => ["one","two"]
}
}
}},
{ "$unwind" => "$splitter" },
{ "$project" => {
"_id" => 0,
"type" => {
"$cond" => [
{ "$eq" => [ "$splitter", "one" ] },
"$one.type",
"$two.type"
]
},
"pos" => {
"$cond" => [
{ "$eq" => [ "$splitter", "one" ] },
"$one.pos",
"$two.pos"
]
}
}}
])
pp res
The naming in the documents is actually not used by the code, and titles in the data shown for "First", "Second" etc, are really just there to illustrate that you are indeed getting the "top 2" documents from the listing as a result.
So the approach here is essentially to create a "stack" of the documents "grouped" by your key, such as "type". The very first thing here is to take the "first" document from that stack using the $first operator.
The subsequent steps match the "seen" elements from the stack and filter them, then you take the "next" document off of the stack again using the $first operator. The final steps in there are really justx to return the documents to the original form as found in the input, which is generally what is expected from such a query.
So the result is of course, just the top 2 documents for each type:
{ "type"=>"A", "pos"=>"First" }
{ "type"=>"A", "pos"=>"Second" }
{ "type"=>"B", "pos"=>"First" }
{ "type"=>"B", "pos"=>"Second" }
There was a longer discussion and version of this as well as other solutions in this recent answer:
Mongodb aggregation $group, restrict length of array
Essentially the same thing despite the title and that case was looking to match up to 10 top entries or greater. There is some pipeline generation code there as well for dealing with larger matches as well as some alternate approaches that may be considered depending on your data.
You will not be able to do this directly with only the type column and the constraint that it must be one query. However there is (as always) a way to accomplish this.
To find documents of different types, you would need to have some type of additional value that, on average distributed the types out according to how you want the data back.
db.users.insert({type: 'A', index: 1})
db.users.insert({type: 'B', index: 2})
db.users.insert({type: 'A', index: 3})
db.users.insert({type: 'B', index: 4})
db.users.insert({type: 'A', index: 5})
db.users.insert({type: 'B', index: 6})
Then when querying for items with db.users.find(index: {$gt: 2, $lt: 7}) you will have the right distribution of items.
Though I'm not sure this was what you were looking for

Tire search return terms by first letter

I'm using Tire/ElasticSearch to create an alphabetical browse of all the tags in my database. However, the tire search returns the tag I want as well as all the other tags associated to the same item. So, for example, if my letter was "A" and an item had the tags 'aardvark' and 'biscuit', both 'aardvark' and 'biscuit' would show up as results for the 'A' query. How can I construct this so that I only get 'aardvark'?
def explore
#get alphabetical tire results with term and count only
my_letter = "A"
self.search_result = Tire.search index_name, 'tags' => 'count' do
query {string 'tags:' + my_letter + '*'}
facet 'tags' do
terms 'tags', :order => 'term'
end
end.results
end
Mapping:
{
items: {
item: {
properties: {
tags: {
type: "string",
index_name: "tag",
index: "not_analyzed",
omit_norms: true,
index_options: "docs"
},
}
}
}
}
Following things that you'll need to change:
Mapping
You need to map the tags properly in order to search through them. And as your tags, are inside you item document, you need to set the properties of tags as nested, so that you can apply your search query in the facets too. Here is the mapping that you need to set:
{
item: {
items: {
properties: {
tags: {
properties: {
type: "nested",
properties: {
value: {
type: "string",
analyzer: 'not_analyzed'
}
}
}
}
}
}
}
}
Query
Now, you can use prefix query to search through the tags that start with a certain letter and get the facets, Here is the complete query:
query: {
nested: {
path: "tags",
query: {
prefix: {
'tags.value' : 'A'
}
}
}
}
facets: {
words: {
terms: {field: 'tags.value'},
type: 'nested',
facet_filter: {prefix: {
'tags.value' : 'A'
}
}
}
}
Facet filter is applied while computing facets, so you'll only get the facets which will match your criteria. I preferred prefix query over regular exp. query because of performance issues. But I am not quite sure whether prefix query works for your problem. Let me know it it doesn't work.

Resources