elasticsearch only show where nested object has no values

elasticsearch only show where nested object has no values - elasticsearch

I have the following structure (simplified):
{
"id": 100,
"vendorStatuses": [
{
"id": 200,
"status": "Open"
}
]
}
What I want to find is records where there are no vendor statuses. We recently upgraded from elasticseach 1.x to 5.x and I'm having trouble converting to get this functionality back.
My old Nest query looked like this:
!Filter<PurchaseOrder>.Nested(nfd => nfd.Path(x => x.VendorStatuses.First())
.Filter(f2 => f2.Missing(y => y.Id)));
The new query (now that Missing isn't available) looks like this so far:
Query<PurchaseOrder>
.Bool(z => z
.MustNot(a => a
.Exists(t => t
.Field(f => f.VendorStatuses)
)
)
);
Which generates this:
GET purchaseorder/_search
{
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "vendorStatuses",
}
}
]
}
}
}
But I'm still seeing results that have vendorStatuses records.
What am I doing wrong? I've tried searching for vendorStatuses.id or other fields, but it's not working. When I try to reverse the logic and do a must i see no results. I also tried doing it as a nested but couldn't get any closer with that.

The query using must_not and exists is not a nested query like the 1.x query. I think you're looking for something like
var query = Query<PurchaseOrder>
.Bool(z => z
.MustNot(a => a
.Nested(n => n
.Path(p => p.VendorStatuses)
.Query(nq => nq
.Exists(t => t
.Field(f => f.VendorStatuses)
)
)
)
)
);
client.Search<PurchaseOrder>(s => s.Query(_ => query));
which yields
{
"query": {
"bool": {
"must_not": [
{
"nested": {
"query": {
"exists": {
"field": "vendorStatuses"
}
},
"path": "vendorStatuses"
}
}
]
}
}
}
You can use operator overloading to make the query more succinct too
var query = !Query<PurchaseOrder>
.Nested(n => n
.Path(p => p.VendorStatuses)
.Query(nq => nq
.Exists(t => t
.Field(f => f.VendorStatuses)
)
)
);

I found a workaround that is far from ideal in my opinion. I created a new property on my PurchaseOrder model for NumberOfStatuses, then I just do a term search on that for value of 0.
public int NumberOfStatuses => VendorStatuses.OrEmptyIfNull().Count();
Query<PurchaseOrder>.Term(t => t.Field(po => po.NumberOfStatuses).Value(0));

Related

How to export Elasticsearch Index as CSV file to Google Cloud Storage Using Logstash

I am using ElasticSearch, here we are creating the day wise index and huge amount of data is being ingested every minute. wanted to export few fields from index created every day to Google cloud storage. am able to achieve this with output file as json as shown below:
input {
elasticsearch {
hosts => "localhost:9200"
index => "test"
query => '
{
"_source": ["field1","field2"],
"query": {
"match_all": {}
}
filter {
mutate {
rename => {
"field1" => "test1"
"field2" => "test2"
}
}
}
}
'
}
}
output {
google_cloud_storage {
codec => csv {
include_headers => true
columns => [ "test1", "test2" ]
}
bucket => "bucketName"
json_key_file => "creds.json"
temp_directory => "/tmp"
log_file_prefix => "logstash_gcs"
max_file_size_kbytes => 1024
date_pattern => "%Y-%m-%dT%H:00"
flush_interval_secs => 600
gzip => false
uploader_interval_secs => 600
include_uuid => true
include_hostname => true
}
}
However how to export it as CSV file and send it to Google Cloud Storage

You should be able to change output_format to plain but this setting is going to be deprecated
You should remove output_format and use the codec setting instead, which supports a csv output format
google_cloud_storage {
...
codec => csv {
include_headers => true
columns => [ "field1", "field2" ]
}
}
If you want to rename your fields, you can add a filter section and mutate/rename the fields however you like. Make sure to also change the columns settings in your csv codec output:
filter {
mutate {
rename => {
"field1" => "renamed1"
"field2" => "renamed2"
}
}
}
output {
...
}

The $bucket 'default' field must be less than the lowest boundary or greater than or equal to the highest boundary

What does this bug refer to? I have pasted the code below. Kindly have a look. Can anyone let me know what's wrong with the boundary value here? Thanks in advance
db.match_list.aggregate(
[
{
$bucket: {
groupBy: "$competition_id",
boundaries: ["9dn1m1gh41emoep","9dn1m1ghew6moep", "d23xmvkh4g8qg8n","gy0or5jhj6qwzv3"],
default: "Other",
output: {
"data" : {
$push: {
"season_id": "$season_id",
"status_id": "$status_id",
"venue_id": "$venue_id",
"referee_id": "$referee_id",
"neutral":"$neutral",
"note": "$note",
"home_scores":"$home_scores",
"away_scores": "$away_scores",
}
}
}
}
},
{
$sort : { competition_id : 1 }
},
])
mongodb laravel query using raw. Not sure what's going wrong here.new_array vale also has been mentioned
$contents = $query->orderby('competition_id')->pluck('competition_id')->toArray();
$contents = array_unique($contents);
$new_array = array_values($contents);
$data = $query->raw(function ($collection) use ($new_array) {
return $collection->aggregate([
[
'$bucket' => [
'groupBy' => '$competition_id',
'boundaries' => $new_array,
'default' => 'zzzzzzzzzzzzzzzzzzzzzzzzz',
'output' => [
"data" => [
'$push' => [
"id" => '$id',
"season_id" => '$season_id',
"status_id" => '$status_id',
"venue_id" => '$venue_id',
"referee_id" => '$referee_id',
"neutral" => '$neutral',
"note" => '$note',
"home_scores" => '$home_scores',
]
]
]
]
]
]);
});

You have entered Other in the default parameter of the $bucket parameter which
is in between the min-max boundaries you have provided.
I would suggest you try a value of the greater or lesser string than to that provided in the
boundaries array or better enter a different datatype value such as int of 1.
db.match_list.aggregate( [
{
$bucket: {
groupBy: "$competition_id",
boundaries: ["9dn1m1gh41emoep","9dn1m1ghew6moep", "d23xmvkh4g8qg8n","gy0or5jhj6qwzv3"],
default: 1, // Or `zzzzzzzzzzzzzzzzzzzzzzzzz` (any greater or lesser value than provided in `boundaries`
output: {
"data" :
{
$push: {
"season_id": "$season_id",
"status_id": "$status_id",
"venue_id": "$venue_id",
"referee_id": "$referee_id",
"neutral":"$neutral",
"note": "$note",
"home_scores":"$home_scores",
"away_scores": "$away_scores",
}
}
}
}
},
{ $sort : { competition_id : 1 } },
] )

Elasticsearch Aggregation on objects by query on other documents

Lets say i have an index that contains documents that represent a Message in a discussion.
that document owns a discussionId property.
(it also has its own ID "that represent MessageId")
now, i need to find all discussionIds that have no documents (messages) that match a query.
for example:
"Find all discussionIds , that have no message that contains the text 'YO YO'"
how can i do that?
the class is similar to this:
public class Message
{
public string Id{get;set}
public string DiscussionId {get;set}
public string Text{get;set}
}

You just need to wrap the query that would find matches for the phrase "YO YO" in a bool query must_not clause.
With NEST
client.Search<Message>(s => s
.Query(q => q
.Bool(b => b
.MustNot(mn => mn
.MatchPhrase(m => m
.Field(f => f.Text)
.Query("YO YO")
)
)
)
)
);
which, with operator overloading, can be shortened to
client.Search<Message>(s => s
.Query(q => !q
.MatchPhrase(m => m
.Field(f => f.Text)
.Query("YO YO")
)
)
);
Both produce the query
{
"query": {
"bool": {
"must_not": [
{
"match": {
"text": {
"type": "phrase",
"query": "YO YO"
}
}
}
]
}
}
}
To only return DiscussionId values, you can use source filtering
client.Search<Message>(s => s
.Source(sf => sf
.Includes(f => f
.Field(ff => ff.DiscussionId)
)
)
.Query(q => !q
.MatchPhrase(m => m
.Field(f => f.Text)
.Query("YO YO")
)
)
);
And, if you want to get them all, you can use the scroll API
var searchResponse = client.Search<Message>(s => s
.Scroll("1m")
.Source(sf => sf
.Includes(f => f
.Field(ff => ff.DiscussionId)
)
)
.Query(q => !q
.MatchPhrase(m => m
.Field(f => f.Text)
.Query("YO YO")
)
)
);
// fetch the next batch of documents, using the scroll id returned from
// the previous call. Do this in a loop until no more docs are returned.
searchResponse = client.Scroll<Message>("1m", searchResponse.ScrollId);

Mongoid: Query based on size of embedded document array

This is similar to this question here but I can't figure out how to convert it to Mongoid syntax:
MongoDB query based on count of embedded document
Let's say I have Customer: {_id: ..., orders: [...]}
I want to be able to find all Customers that have existing orders, i.e. orders.size > 0. I've tried queries like Customer.where(:orders.size.gt => 0) to no avail. Can it be done with an exists? operator?

I nicer way would be to use the native syntax of MongoDB rather than resort to rails like methods or JavaScript evaluation as pointed to in the accepted answer of the question you link to. Especially as evaluating a JavaScript condition will be much slower.
The logical extension of $exists for a an array with some length greater than zero is to use "dot notation" and test for the presence of the "zero index" or first element of the array:
Customer.collection.find({ "orders.0" => { "$exists" => true } })
That can seemingly be done with any index value where n-1 is equal to the value of the index for the "length" of the array you are testing for at minimum.
Worth noting that for a "zero length" array exclusion the $size operator is also a valid alternative, when used with $not to negate the match:
Customer.collection.find({ "orders" => { "$not" => { "$size" => 0 } } })
But this does not apply well to larger "size" tests, as you would need to specify all sizes to be excluded:
Customer.collection.find({
"$and" => [
{ "orders" => { "$not" => { "$size" => 4 } } },
{ "orders" => { "$not" => { "$size" => 3 } } },
{ "orders" => { "$not" => { "$size" => 2 } } },
{ "orders" => { "$not" => { "$size" => 1 } } },
{ "orders" => { "$not" => { "$size" => 0 } } }
]
})
So the other syntax is clearer:
Customer.collection.find({ "orders.4" => { "$exists" => true } })
Which means 5 or more members in a concise way.
Please also note that none of these conditions alone can just an index, so if you have another filtering point that can it is best to include that condition first.

Just adding my solution which might be helpful for someone:
scope :with_orders, -> { where(orders: {"$exists" => true}, :orders.not => {"$size" => 0}}) }

Elasticsearch sum total values for specific hours within a month

I have an elasticsearch server with fields: timestamp, user and bytes_down (among others)
I would like to total the bytes_down value for a user for a month BUT only where the hours are between 8am and 8pm
I'm able to get the daily totals with the date histogram with following query (I'm using the perl API here) but can't figure out a way of reducing this down to the hour range for each day
my $query = {
index => 'cm',
body => {
query => {
filtered => {
query => {
term => {user => $user}
},
filter => {
and => [
{
range => {
timestamp => {
gte => '2014-01-01',
lte => '2014-01-31'
}
}
},
{
bool => {
must => {
term => { zone => $zone }
}
}
}
]
}
}
},
facets => {
bytes_down => {
date_histogram => {
field => 'timestamp',
interval => 'day',
value_field => 'downstream'
}
}
},
size => 0
}
};
Thanks
Dale

I think you need to use script filter instead of range filter and then you need to put it in facet_filter section of your facet:
"facet_filter" => {
"script" => {
"script" => "doc['timestamp'].date.getHourOfDay() >= 8 &&
doc['timestamp'].date.getHourOfDay() < 20"
}
}

Add a bool must range filter for every hour, I'm not sure if you're looking to do this forever or for the specific day, but this slide show from Zachary Tong is a good way to understand what you could be doing, especially with filters in general.
https://speakerdeck.com/polyfractal/elasticsearch-query-optimization?slide=28

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

elasticsearch only show where nested object has no values - elasticsearch

Related

How to export Elasticsearch Index as CSV file to Google Cloud Storage Using Logstash

The $bucket 'default' field must be less than the lowest boundary or greater than or equal to the highest boundary

Elasticsearch Aggregation on objects by query on other documents

Mongoid: Query based on size of embedded document array

Elasticsearch sum total values for specific hours within a month

Categories

Resources