capped collection mongodb - ruby

I have issues with mongoDB.
Currently i'm working with Ruby mongodb drivers and there r some strange things r going on:
i need to insert 20 documents in the capped collection but when i write the following code, it inserts only 3 docs and i can't get what's going on:
coll = db.create_collection("test",:capped => true, :max=>20)
1024.times{#pad_string +=" "}
20.times{coll.insert({
:HostName => #hostname,
:CommandLine => #cmdline,
:Pid => "1111",
:BlockName => #blockname,
:ExitCode => 0,
:StartTime => Time.now,
:EndTime => Time.utc(2000,"jan",1,00,00,00),
:StdErr => #pad_string,
:Stdout => #pad_string}
)}
actually the point is that i insert #pad_string with 1024 preallocated spaces. As soon as i do that before inserting 1024.times{#pad_string +=" "}, it inserts only 3 docs maximum.

When you cap a collection based on the number of objects you also have to cap it based on size - I wonder what size the ruby driver is sending down.
try this:
coll = db.create_collection("test",:capped => true, :size=>100000, :max=>20)
Then tweak the size to whatever works for you (it's in bytes).

Related

Logstash pagination quits early

I'm having a problem that i could not crack by googling, we are doing a load with jdbc plugin, using explicit pagination. when pipeline runs it is loading about 3.2 million records and then quits without errors like it finished successfully, but it should load around 6.4 million records. Here are our configuration:
input {
jdbc {
id => "NightlyRun"
jdbc_connection_string => "*******"
jdbc_driver_class => "Driver"
jdbc_user => "${USER}"
jdbc_password => "${PASS}"
lowercase_column_names => "false"
jdbc_paging_enabled => true
jdbc_page_size => 50000
jdbc_paging_mode => "explicit"
schedule => "5 2 * * *"
statement_filepath => "/usr/share/logstash/sql-files/sqlQuery1.sql"
}
}
}
output {
elasticsearch {
hosts => ["${ELASTIC_HOST}:9200"]
index => "index"
user => logstash
password => "${PASSWORD}"
document_id => "%{NUMBER}-%{value}"
}
}
And sql query we use:
declare #PageSize int
declare #Offset integer
set #PageSize=:size
set #Offset=:offset;
WITH cte AS
(
SELECT
id
FROM
entry
ORDER BY CREATE_TIMESTAMP
OFFSET #Offset ROWS
FETCH NEXT #PageSize ROWS ONLY
)
select * from entry
where entry.id=cte.id
the cte select count(*) from entry returns the expected 6.4 million records but logstash loads only 3.2 million before quitting. How can I ensure logstash loads all the records.
I tried running the query in database and setting offset to 3200000 and page size to 50000, database returns results, so it is not likely a database issue.

How to hash by choosing the key and the string value with Ruby

I'm traying to hash urls with Ruby but I had some problems the size of my urls differs from one url to another, hence my hash key doesn't give me the right result.
2 examples of my urls
url1="Services/tech_name/prise/name_Prise/service_name/sites/xxxx/yyyy/devices/AAAA/wan/16170515?startDate=2021-01-18T23:00: 00.000Z&endDate=2021-01-19T08:22:42.000Z& timeProfile=1&tz=CET"
url2="Services/tech_name/prise/name_Prise/service_name/sites/xxxx/yyyy/devices/AAAA/BBBB/wan/1617051?startDate=2021-01-18T23:00: 00.000Z&endDate=2021-01-19T08:22:42.000Z& timeProfile=1&tz=CET"
Example of my code to hash url1:
url1="Services/tech_name/prise/name_Prise/service_name/sites/xxxx/yyyy/devices/AAAA/wan/16170515?startDate=2021-01-18T23:00: 00.000Z&endDate=2021-01-19T08:22:42.000Z& timeProfile=1&tz=CET"
spliturl=my_url.gsub("?","/")
url=spliturl.split("/")
if !url.count.even?
url.push(nil)
h=Hash[*url]
puts h
end
My result:
{"Services"=>"name_services", "prise"=>"name_prise", "tech"=>"sites", "xxxx"=>"yyyy", "devices"=>"AAAA", "wan"=>"16170515", "startDate=2021-01-18T23:00:00.000Z&endDate=2021-01-19T08:22:42.000Z&timeProfile=1&tz=CET"=>nil}
The "sites" has become a value and the "sites" value has become a key !!
{"tech"=>"sites", "xxxx"=>"yyyy", "devices"=>"AAAA", "wan"=>"16170515"}
But the result I would like to have from url1:
{"sites" => "xxxx/yyyy", "devices" => "AAAA", "wan" => "16170515"}
and from url2:
{"sites" => "xxxx/yyyy", "devices" => "AAAA/BBBB", "wan" => "1617051"}
I have one idea how you could solve the problem:
result = url1.match /\/sites\/(?<sites>.*)\/devices\/(?<devices>.*)\/wan\/(?<wan>.*)\?/
Then to get values from results:
result[:sites] => "xxxx/yyyy"
result[:devices] => "AAAA"
result[:wan] => "16170515"

Get the last and the previous "gold_id" of the previous day with mongo

I have this kind of data :
{"_id"=>BSON::ObjectId('560b5c5d80ec9700030035dc'), "active"=>true, "user_id"=>nil, "action"=>"connection", "shop_id"=>245929, "gold_id"=>23452349, "indexed"=>true, "created_at"=>2015-09-30 03:51:57 UTC}
I'm trying to get the first and last gold_id of the previous day. I'm getting arroung 10_000 logs per days.
I'm using the ruby driver
first_gold_in = Time.utc(Date.today.year, Date.today.month, (Date.today.day - 1), 00, 00)
first_gold_out = yesterday_o_clock + 5*60
first_gold_id = logs
.find("action" => "connection", "created_at" => {"$gte" => first_gold_in, "$lte" => first_gold_out} )
.first
.fetch("gold_id")
last_gold_in = Time.utc(Date.today.year, Date.today.month, (Date.today.day - 1), 23, 55)
last_gold_out = yesterday_o_clock + 5*60 - 1
last_gold_id = logs
.find("action" => "connection", "created_at" => {"$gte" => last_gold_in, "$lte" => last_gold_out} )
.first
.fetch("gold_id")
But It's very slow even with shorter date range. Is there a better way to do it?
Also is is possible to get the first and the last of the day in the same request?
Thanks

How to save and display Dashing historical values?

Currently to setup a graph widget, the job should pass all values to be displayed:
data = [
{ "x" => 1980, "y" => 1323 },
{ "x" => 1981, "y" => 53234 },
{ "x" => 1982, "y" => 2344 }
]
I would like to read just current (the latest) value from my server, but previous values should be also displayed.
It looks like I could create a job, which will read the current value from the server, but remaining values to be read from the Redis (or sqlite database, but I would prefer Redis). The current value after that should be saved to the database.
I never worked with Ruby and Dashing before, so the first question I have - is it possible? If I will use Redis, then the question is how to store the data since this is key-value database. I can keep it as widget-id-1, widget-id-2, widget-id-3 ... widget-id-N etc., but in this case I will have to store N value (like widget-id=N). Or, is there any better way?
I came to the following solution:
require 'redis' # https://github.com/redis/redis-rb
redis_uri = URI.parse(ENV["REDISTOGO_URL"])
redis = Redis.new(:host => redis_uri.host, :port => redis_uri.port, :password => redis_uri.password)
if redis.exists('values_x') && redis.exists('values_y')
values_x = redis.lrange('values_x', 0, 9) # get latest 10 records
values_y = redis.lrange('values_y', 0, 9) # get latest 10 records
else
values_x = []
values_y = []
end
SCHEDULER.every '10s', :first_in => 0 do |job|
rand_data = (Date.today-rand(10000)).strftime("%d-%b") # replace this line with the code to get your data
rand_value = rand(50) # replace this line with the code to get your data
values_x << rand_data
values_y << rand_value
redis.multi do # execute as a single transaction
redis.lpush('values_x', rand_data)
redis.lpush('values_y', rand_value)
# feel free to add more datasets values here, if required
end
data = [
{
label: 'dataset-label',
fillColor: 'rgba(220,220,220,0.5)',
strokeColor: 'rgba(220,220,220,0.8)',
highlightFill: 'rgba(220,220,220,0.75)',
highlightStroke: 'rgba(220,220,220,1)',
data: values_y.last(10) # display last 10 values only
}
]
options = { scaleFontColor: '#fff' }
send_event('barchart', { labels: values_x.last(10), datasets: data, options: options })
end
Not sure if everything is implemented correctly here, but it works.

clusteredPoints of cluster result disappear [mahout]

I got CSV and TEXT format results like followings with clusterdump.
CSV:
0,Sports_38.txt
1,Sports_23.txt
2,Sports_36.txt
3,Sports_13.txt
4,Sports_31.txt,Sports_32.txt
5,Sports_28.txt,Sports_29.txt
6,Sports_2.txt
9,Sports_15.txt
TEXT:
{"identifier":"VL-1","r":[],"c":[...,"n":7}
Top Terms:
什 => 15.829998016357422
利物浦 => 13.629814147949219
克 => 11.317766189575195
格 => 10.938775062561035
特 => 10.842317581176758
尔 => 10.447234153747559
切尔西 => 9.742402076721191
比赛 => 8.247735023498535
表现 => 7.909337520599365
批评 => 7.462332725524902
I noticed that just one point of VL-1 in CSV file but 7 points of VL-1 in TEXT file (VL-1's "n" equals 7).
Why did some points disappear? And how can I get every points' cluster?
Thanks a lot.
I also got empty clusteredPoints if the data is a little bigger.
I finally found the reason by myself.
clusterClassificationThreshold should be 0 in Kmeans.run's 8th parameter.(mahout 1.0)
Check this: http://mail-archives.apache.org/mod_mbox/mahout-user/201211.mbox/%3C50B62629.5020700#windwardsolutions.com%3E

Resources