I have mongodb on windows so there is no logrotate or anything. The log consumes 175GB per week! I need to cut this down a lot..
Currently db.GetProfilingLevel() returns 0 and db.getLogComponents() returns -1 for all components and still I get almost 2000 of these bad boys a minute:
2018-06-25T15:44:59.653+0200 I COMMAND [conn2355] command mydb.LimitStubs command: find { find: "LimitStubs", filter: { Limit: "asdl;" }, skip: 0, noCursorTimeout: false } planSummary: IXSCAN { Limit: 1, Holder: 1 } keysExamined:0 docsExamined:0 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:0 reslen:129 locks:{ Global: { acquireCount: { r: 2 } }, MMAPV1Journal: { acquireCount: { r: 1 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { R: 1 } } } protocol:op_query 0ms
Any suggestions?
Related
Here's the logic I am trying to accomplish:
I am using Elasticsearch to display top selling Products and randomly inserting newly created products in the results using function_score query DSL.
The issue I am facing is that I am using random_score fn for newly created products and the query does inserts new products up till page 2 or 3 but then rest all the other newly created products pushed towards the end of search results.
Here's the logic written for function_score:
function_score: {
query: query,
functions: [
{
filter: [
{ terms: { product_type: 'sponsored') } },
{ range: { live_at: { gte: 'CURRENT_DATE - 1.MONTH' } } }
],
random_score: {
seed: Time.current.to_i / (60 * 10), # new seed every 10 minutes
field: '_seq_no'
},
weight: 0.975
},
{
filter: { range: { live_at: { lt: 'CURRENT_DATE - 1.MONTH' } } },
linear: {
weighted_sales_rate: {
decay: 0.9,
origin: 0.5520974289580515,
scale: 0.5520974289580515
}
},
weight: 1
}
],
score_mode: 'sum',
boost_mode: 'replace'
}
And then I am sorting based on {"_score" => { "order" => "desc" } }
Let's say there are 100 sponsored products created in last 1 month. Then the above Elasticsearch query displays 8-10 random products (3 to 4 per page) as I scroll through 2 or 3 pages but then all other 90-92 products are displayed in last few pages of the result. - This is because the score calculated by random_score for 90-92 products is coming lower than the score calculated by linear
decay function.
Kindly suggest how can I modify this query so that I continue to see newly created Products as I navigate through pages and can prevent pushing new records towards the end of results.
[UPDATE]
I tried adding gauss decay function to this query (so that I can somehow modify the score of the products appearing towards the end of result) like below:
{
filter: [
{ terms: { product_type: 'sponsored' } },
{ range: { live_at: { gte: 'CURRENT_DATE - 1.MONTH' } } },
{ range: { "_score" => { lt: 0.9 } } }
],
gauss: {
views_per_age_and_sales: {
origin: 1563.77,
scale: 1563.77,
decay: 0.95
}
},
weight: 0.95
}
But this too is not working.
Links I have referred to:
https://intellipaat.com/community/12391/how-to-get-3-random-search-results-in-elasticserch-query
Query to get random n items from top 100 items in Elastic Search
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-function-score-query.html
I am not sure if this is the best solution, but I was able to accomplish this with wrapping up the original query with script_score query + I have added a new ElasticSearch indexing called sort_by_views_per_year. Here's how the solution looks:
Link I referred to: https://github.com/elastic/elasticsearch/issues/7783
attribute(:sort_by_views_per_year) do
object.live_age&.positive? ? object.views_per_year.to_f / object.live_age : 0.0
end
Then while querying ElasticSearch:
def search
#...preparation of query...#
query = original_query(query)
query = rearrange_low_scoring_docs(query)
sort = apply_sort opts[:sort]
Product.search(query: query, sort: sort)
end
I have not changed anything in original_query (i.e. using random_score to products <= 1.month.ago and then use linear decay function).
def rearrange_low_scoring_docs query
{
function_score: {
query: query,
functions: [
{
script_score: {
script: "if (_score.doubleValue() < 0.9) {return 0.9;} else {return _score;}"
}
}
],
#score_mode: 'sum',
boost_mode: 'replace'
}
}
end
Then finally my sorting looks like this:
def apply_sort
[
{ '_score' => { 'order' => 'desc' } },
{ 'sort_by_views_per_year' => { 'order' => 'desc' } }
]
end
It would be way too helpful if ElasticSearch random_score query DSL starts supporting something like: max_doc_to_include and min_score attributes. So that I can use it like:
{
filter: [
{ terms: { product_type: 'sponsored' } },
{ range: { live_at: { gte: 'CURRENT_DATE - 1.MONTH' } } }
],
random_score: {
seed: 123456, # new seed every 10 minutes
field: '_seq_no',
max_doc_to_include: 10,
min_score: 0.9
},
weight: 0.975
},
So basically I have a collection that looks like this(other fields omitted):
[{
user: mail1#test.com
},
{
user: mail1#test.com
},
{
user: mail1#test.com
},
{
user: mail2#test.com
},
{
user: mail2#test.com
},
{
user: mail3#test.com
}
]
I'm looking for a way to query MongoDB in order to get the top 10 active users(those with the most records in DB). Is there an easy way to get this, perhaps just using the interface?
perhaps a simple group aggregation will give you the needed result?
db.Users.aggregate(
[
{
$group: {
_id: "$user",
count: { $sum: 1 }
}
},
{
$sort: { count: -1 }
},
{
$limit: 10
},
{
$project: {
user: "$_id",
_id: 0
}
}
])
There is something called $sortByCount for aggregation.
List<UserCount> getTop10UserCount() {
return mongoTemplate.aggregate(
newAggregation(
User.class,
sortByCount("user"),
limit(10),
project("_id", "count")
),
UserCount.class
);
}
static class UserCount {
String _id;
Integer count;
// constructors, getters or setters..
}
i can’t find any docs or anything on this issue. I’m using VSC and discord.JS and I’m doing a leaderboard system. The goal is when a user types !top, a leaderboard(that is embedded) pops up with the top 10 people with the most money.this works every first time after a reload(only sometimes), but after that the .sort doesn’t work and the leaderboard stats are reshuffled somewhere else, and someone with $0 is at the top. here is my code:
if (msg.content.startsWith("!top")) {
let moneyC = [];
let embedT = new Discord.MessageEmbed();
let membersCurrent = 0;
msg.guild.members.cache.forEach(element => {
money.fetchBal(element.id).then((i) => {
membersCurrent++;
moneyC.push({ name: element.user.username, moneyT: i.money });
if (membersCurrent >= msg.guild.memberCount) {
moneyC.sort((a, b) => b.money - a.money);
for (i = 0; i < 10; i++) {
embedT.addField("---", moneyC[i].name + " = " + moneyC[i].moneyT);
}
}
if (membersCurrent == msg.guild.memberCount) {
membersCurrent++;
sendEmbed();
}
})
});
function sendEmbed() {
console.log(moneyC);
embedT.setDescription("here are the top 10 people with the highest balance!")
msg.channel.send(embedT);
moneyC = [];
}
}
and here is the console output AND the discord output for the 1st and 2nd time:
CONSOLE:
{ name: 'TheBigCringeMaster', moneyT: 100 },
{ name: 'PogchampInRealLife', moneyT: 0 },
{ name: 'bluestone', moneyT: 0 },
{ name: 'Birdie_YT', moneyT: 0 },
{ name: 'iDopeyScope', moneyT: 0 },
{ name: 'Lewcyる', moneyT: 0 },
{ name: 'Aretimis', moneyT: 0 },
{ name: 'Cam S', moneyT: 0 },
{ name: 'IAmABoomer', moneyT: 0 },
{ name: '$HOO!ER_SavgE', moneyT: 0 },
{ name: 'AwokenYt', moneyT: 0 },
{ name: 'Wingyman2019', moneyT: 0 },
{ name: 'I LOVE EINAR - owo', moneyT: 0 },
{ name: 'Lilly', moneyT: 0 },
{ name: 'Reaction Roles', moneyT: 0 },
{ name: 'boobieman123', moneyT: 0 },
{ name: 'Jamelfarm', moneyT: 0 }
(these are some of my discord members)
AND here is the discord output #TheBigCringeMaster is me and always has the most money:
here is the image = img
The issue is that you are switching moneyT and moneyaround,
moneyC.sort((a, b) => b.money - a.money); => moneyC.sort((a, b) => b.moneyT - a.moneyT);
As shown by:
moneyC.push({ name: element.user.username, moneyT: i.money })
I am in a situation where I have applied limit for the ElasticSearch
results but it's not working for me. I have gone through the ES
guide below is my code:
module Invoices
class RestaurantBuilder < Base
def query(options = {})
buckets = {}
aggregations = {
orders_count: { sum: { field: :orders_count } },
orders_tip: { sum: { field: :orders_tip } },
orders_tax: { sum: { field: :orders_tax } },
monthly_fee: { sum: { field: :monthly_fee } },
gateway_fee: { sum: { field: :gateway_fee } },
service_fee: { sum: { field: :service_fee } },
total_due: { sum: { field: :total_due } },
total: { sum: { field: :total } }
}
buckets_for_restaurant_invoices buckets, aggregations, options[:restaurant_id]
filters = []
filters << time_filter(options)
query = {
query: { bool: { filter: filters } },
aggregations: buckets,
from: 0,
size: 5
}
query
end
def buckets_for_restaurant_invoices(buckets, aggregations, restaurant_id)
restaurant_ids(restaurant_id).each do |id|
buckets[id] = {
filter: { term: { restaurant_id: id } },
aggregations: aggregations
}
end
end
def restaurant_ids(restaurant_id)
if restaurant_id
[restaurant_id]
else
::Restaurant.all.pluck :id
end
end
end
end
the restaurant_ids function returns approx 5.5k restaurants so in this
case i got an error "circuit_breaking_exception","reason":"[request]
Data too large, data for [] would be
[622777920/593.9mb], which is larger than the limit of
[622775500/593.9mb]". That's why I want to apply some limit so that I
can get only a few hundreds of records at a time.
Could anyone guide me where I am doing wrong?
The way to limit the amount of data to avoid this error is to configure the indices.breaker.request.limit.
Suppose I have record like this:
{
id: 1,
statistics: {
stat1: 1,
global: {
stat2: 3
},
stat111: 99
}
}
I want to make update on record with object:
{
statistics: {
stat1: 8,
global: {
stat2: 6
},
stat4: 3
}
}
And it should be added to current record as delta. So, the result record should looks like this:
{
id: 1,
statistics: {
stat1: 9,
global: {
stat2: 9
},
stat4: 3,
stat111: 99
}
}
Is it possible to make this with one query?
Do you want something generic or something specific?
Specific is easy, this is the generic case:
const updateValExpr = r.expr(updateVal);
const updateStats = (stats, val) => val
.keys()
.map(key => r.branch(
stats.hasFields(key),
[key, stats(key).add(val(key))],
[key, val(key)]
))
.coerceTo('object')
r.table(...)
.update(stats =>
updateStats(stats.without('global'), updateValExpr.without('global'))
.merge({ global: updateStats(stats('global'), updateValExpr('global'))
)
There might be some bugs here sincce it's untested but the solution key point is the updateStats function, the fact that you can get all the keys with .keys() and that coerceTo('object') transforms this array: [['a',1],['b',2]] to this object: { a: 1, b: 2 },
Edit:
You can do it recursively, although with limited stack (since you can't send recursive stacks directly, they resolve when the query is actually built:
function updateStats(stats, val, stack = 10) {
return stack === 0
? {}
: val
.keys()
.map(key => r.branch(
stats.hasFields(key).not(),
[key, val(key)],
stats(key).typeOf().eq('OBJECT'),
[key, updateStats(stats(key), val(key), stack - 1)],
[key, stats(key).add(val(key))]
)).coerceTo('object')
}
r.table(...).update(row => updateStats(row, r(updateVal)).run(conn)
// test in admin panel
updateStats(r({
id: 1,
statistics: {
stat1: 1,
global: {
stat2: 3
},
stat111: 99
}
}), r({
statistics: {
stat1: 8,
global: {
stat2: 6
},
stat4: 3
}
}))