Using upsert with push to an array option on Ruby driver - ruby

I'm trying to do an upsert with ruby driver to mongodb.
If the row exist I wish to push new data to and array, else create new document with one item in the array.
When I run it on mongodb it looks like that:
db.events.update( { "_id" : ObjectId("4f0ef9171d41c85a1b000001")},
{ $push : { "events" : { "field_a" : 1 , "field_b" : "2"}}}, true)
And it works.
When I run it on ruby it looks like that:
#col_events.update( { "_id" => BSON::ObjectId.from_string("4f0ef9171d41c85a1b000001")},
{ :$push => { "events" => { "field_a" => 1 , "field_b" => "2"}}}, :$upsert=>true)
And it doesn't work. I don't get an error but I don't see new rows either.
Will appreciate the help in understanding what am I doing wrong.

So a couple of issues.
In Ruby, the command should be :upsert=>true. Note that there is not $. The docs for this are here.
You are not running the query with :safe=>true. This means that some exceptions will not fire. So you could be causing an exception on the server, but you are not waiting for the server to acknowledge the write.

Just adding some code for Gates VP's excellent answer:
require 'rubygems'
require 'mongo'
#col_events = Mongo::Connection.new()['test']['events']
#safemode enabled
#col_events.update(
{ "_id" => BSON::ObjectId.from_string("4f0ef9171d41c85a1b000001")},
{ "$push" => { "events" => { "field_a" => 1, "field_b" => "2"}}},
:upsert => true, :safe => true
)

Related

Elasticsearch, upsert a document with script when the index does not exist

I'm receiving some payloads in a logstash, that I push in Elastic in a monthly rolling index with a script that allows me to override the fields depending on the order of the status of those payloads.
Example :
{
"id" : "abc",
"status" : "OPEN",
"field1" : "foo",
"opening_ts" : 1234567
}
{
"id" : "abc",
"status" : "CLOSED",
"field1" : "bar",
"closing_ts": 7654321
}
I want that, even if i receive the payload OPEN after the CLOSE for the id "abc", my elastic document to be :
{
"_id" : "abc",
"status": "CLOSED",
"field1" : "bar",
"closing_ts": 7654321,
"opening_ts" : 1234567
}
I order to guarantee that, i have added a script in my elastic output plugin in logstash
script => "
if (ctx._source['status'] == 'CLOSED') {
for (key in params.event.keySet()) {
if (ctx._source[key] == null) {
ctx._source[key] = params.event[key]
}
}
} else {
for (key in params.event.keySet()) {
ctx._source[key] = params.event[key]
}
}
"
Buuuuut, adding this script also added an extra step between the implicit "PUT" on the index, and if the target index does not exist, the script will fail and the whole document will never be created. (Nor the index)
Do you know how could i handle an error in this scripts ?
You need to resort to scripted upsert:
output {
elasticsearch {
index => "your-index"
document_id => "%{id}"
action => "update"
scripted_upsert => true
script => "... your script..."
}
}

update elastic-search document with the same ID

everyone. I'm new in elk and I have a question about logstash.
I have some services and each one has 4 or 6 logs; it means a doc in elastic may has 4 or 6 logs.
I want to read these logs and if they have the same id, put them in one elastic doc.
I must specify that all of the logs have a unique "id" and each request and every log that refers to that request has the same id. each log has a specific type.
I want to put together every log that has the same id and type; like this:
{
"_id":"123",
"Type1":{},
"Type2":[{},{}],
"Type3":[{},{}],
"Type4":{}
}
Every log for the same requset:
Some of them must be in the same group. because their type are the same. look example above. Type2 is Json Array and has 2 jsons. I want to use logstash to read every log and have them classified.
Imagine that our doc is like bellow JSON at the moment:
{
"_id": "123",
"Type1":{},
"Type2":[{},{}],
"Type3":{}
}
now a new log arrives, with id 123 and it's type is Type4. The doc must update like this:
{
"_id": "123",
"Type1":{},
"Type2":[{},{}],
"Type3":{},
"Type4":{}
}
again, I have new log with id, 123 and type, Type3. the doc update like this:
{
"_id": "123",
"Type1":{},
"Type2":[{},{}],
"Type3":[{},{}],
"Type4":{}
}
I tried with script, but I didn't succeed. :
{
"id": 1,
"Type2": {}
}
The script is:
input {
stdin {
codec => json_lines
}
}
output {
elasticsearch {
hosts => ["XXX.XXX.XXX.XXX:9200"]
index => "ss"
document_id => "%{requestId}"
action => "update" # update if possible instead of overwriting
document_type => "_doc"
script_lang => "painless"
scripted_upsert => true
script_type => "inline"
script => 'if (ctx._source.Type3 == null) { ctx._source.Type3 = new ArrayList() } if(!ctx._source.Type3.contains("%{Type3}")) { ctx._source.Type3.add("%{Type3}")}'
}
}
now my problem is this script format just one type; if it works for multiple types, what would it look like?
there is one more problem. I have some logs that they don't have an id, or they have an id, but don't have a type. I want to have these logs in the elastic, what should I do?
You can have a look on aggregate filter plugin for logstash. Or as you mentioned if some of the logs don't have an id, then you can use fingerprint filter plugin to create an id, which you can use to update document in elasticsearch.
E.g:
input {
stdin {
codec => json_lines
}
}
filter {
fingerprint {
source => "message"
target => "[#metadata][id]"
method => "MURMUR3"
}
}
output {
elasticsearch {
hosts => ["XXX.XXX.XXX.XXX:9200"]
index => "ss"
document_id => "%{[#metadata][id]}"
action => "update" # update if possible instead of overwriting
}
}

Automatically parse logs fields with Logstash

Let's say I have this kind of log :
Jun 2 00:00:00 192.168.14.4 date=2016-06-01 time=23:56:05
devname=POPB-FW-01 devid=FG1K2D3I14800220 logid=1059028704 type=utm
subtype=app-ctrl eventtype=app-ctrl-all level=information vd="root"
appid=40568 user="" srcip=10.20.4.35 srcport=52438
srcintf="VRF-PUBLIC" dstip=125.209.230.238 dstport=443 dstintf="OUT"
proto=6 service="HTTPS" sessionid=424666004 applist="Monitor-all"
appcat="Web.Others" app="HTTPS.BROWSER" action=pass
hostname="lcs.naver.com" url="/" msg="Web.Others: HTTPS.BROWSER,"
apprisk=medium
So with this code below, I can regex the timestamp and the ip in future elastic fields :
filter {
grok {
match => {"message" => "%{SYSLOGTIMESTAMP:timestamp} %{client}" }
}
}
Now, how do I automatically get fields for the rest of the log ? Is there a simple way to say :
The thing before the "=" is the field name and the thing after is the value.
So I can obtain a JSON for elastic index with many fields for each log line :
{
"path" => "C:/Users/yoyo/Documents/yuyu/temp.txt",
"#timestamp" => 2017-11-29T10:50:18.947Z,
"#version" => "1",
"client" => "192.168.14.4",
"timestamp" => "Jun 2 00:00:00",
"date" => "2016-06-01",
"time" => "23:56:05",
"devname" => "POPB-FW-01 ",
"devid" => "FG1K2D3I14800220",
etc,...
}
Thanks in advance
Okay, I am really dumb
It was easy, rather than search on google, how to match equals, I just had to search key value matching with logstash.
So I just have to write :
filter {
kv {
}
}
And it's done !
Sorry

Query Mongo Embedded Documents with a size

I have a ruby on rails app using Mongoid and MongoDB v2.4.6.
I have the following MongoDB structure, a record which embeds_many fragments:
{
"_id" : "76561198045636214",
"fragments" : [
{
"id" : 76561198045636215,
"source_id" : "source1"
},
{
"id" : 76561198045636216,
"source_id" : "source2"
},
{
"id" : 76561198045636217,
"source_id" : "source2"
}
]
}
I am trying to find all records in the database that contain fragments with duplicate source_ids.
I'm pretty sure I need to use $elemMatch as I need to query embedded documents.
I have tried
Record.elem_match(fragments: {source_id: 'source2'})
which works but doesn't restrict to duplicates.
I then tried
Record.elem_match(fragments: {source_id: 'source2', :source_id.with_size => 2})
which returns no results (but is a valid query). The query Mongoid produces is:
selector: {"fragments"=>{"$elemMatch"=>{:source_id=>"source2", "source_id"=>{"$size"=>2}}}}
Once that works I need to update it to $size is >1.
Is this possible? It feels like I'm very close. This is a one-off cleanup operation so query performance isn't too much of an issue (however we do have millions of records to update!)
Any help is much appreciated!
I have been able to achieve desired outcome but in testing it's far too slow (will take many weeks to run across our production system). The problem is double query per record (we have ~30 million records in production).
Record.where('fragments.source_id' => 'source2').each do |record|
query = record.fragments.where(source_id: 'source2')
if query.count > 1
# contains duplicates, delete all but latest
query.desc(:updated_at).skip(1).delete_all
end
# needed to trigger after_save filters
record.save!
end
The problem with the current approach in here is that the standard MongoDB query forms do not actually "filter" the nested array documents in any way. This is essentially what you need in order to "find the duplicates" within your documents here.
For this, MongoDB provides the aggregation framework as probably the best approach to finding this. There is no direct "mongoid" style approach to the queries as those are geared towards the existing "rails" style of dealing with relational documents.
You can access the "moped" form though through the .collection accessor on your class model:
Record.collection.aggregate([
# Find arrays two elements or more as possibles
{ "$match" => {
"$and" => [
{ "fragments" => { "$not" => { "$size" => 0 } } },
{ "fragments" => { "$not" => { "$size" => 1 } } }
]
}},
# Unwind the arrays to "de-normalize" as documents
{ "$unwind" => "$fragments" },
# Group back and get counts of the "key" values
{ "$group" => {
"_id" => { "_id" => "$_id", "source_id" => "$fragments.source_id" },
"fragments" => { "$push" => "$fragments.id" },
"count" => { "$sum" => 1 }
}},
# Match the keys found more than once
{ "$match" => { "count" => { "$gte" => 2 } } }
])
That would return you results like this:
{
"_id" : { "_id": "76561198045636214", "source_id": "source2" },
"fragments": ["76561198045636216","76561198045636217"],
"count": 2
}
That at least gives you something to work with on how to deal with the "duplicates" here

Difference with count result in Mongo group by query with Ruby/Javascript

I'm using Mongoid to get a count of certain types of records in a Mongo database. When running the query with the javascript method:
db.tags.group({
cond : { tag: {$ne:'donotwant'} },
key: { tag: true },
reduce: function(doc, out) { out.count += 1; },
initial: { count: 0 }
});
I get the following results:
[
{"tag" : "thing", "count" : 4},
{"tag" : "something", "count" : 1},
{"tag" : "test", "count" : 1}
]
Does exactly what I want it to do. However, when I utilize the corresponding Mongoid code to perform the same query:
Tag.collection.group(
:cond => {:tag => {:$ne => 'donotwant'}},
:key => [:tag],
:reduce => "function(doc, out) { out.count += 1 }",
:initial => { :count => 0 },
)
the count parameters are (seemingly) selected as floats instead of integers:
[
{"tag"=>"thing", "count"=>4.0},
{"tag"=>"something", "count"=>1.0},
{"tag"=>"test", "count"=>1.0}
]
Am I misunderstanding what's going on behind the scenes? Do I need to (can I?) cast those counts or is the javascript result just showing it without the .0?
JavaScript doesn't distinguish between floats and ints. It has one Number type that is implemented as a double. So what you are seeing in Ruby is correct, the mongo shell output follows javascript printing conventions and displays Numbers that don't have a decimal component without the '.0'

Resources