maximum concurrent count record in elasticsearch? - elasticsearch

how to find the maximum concurrent connected client count on the basis of networkId filed in 1 hour in elasticsearch?
I have tried this 1 query
GET /sample_index/_search {"size":0,"query":{"bool":{"range":{"#timestamp":{"gte":"now-1h","lt":"now"}}}},"aggregations":{"maxconnectclient":{"terms":{"field":"networkId.keyword","size":10},"aggregations":{"wlan0clintCount":{"sum":{"field":"wlan0.clients"}},"wlan1clintCount":{"sum":{"field":"wlan1.clients"}},"totalClientCount":{"bucket_script":{"buckets_path":{"wlan0clintCount":"wlan0clintCount","wlan1clintCount":"wlan1clintCount"},"script":{"source":"double sum = 0.0; sum = params.wlan0clintCount + params.wlan1clintCount+p; return (sum);","lang":"painless"}}}}}}}
and found the only all-max record of 1hours
or should i use this query---
GET sample _index/_search
{"size":0,"query":{"bool":{"filter":[{"range":{"#timestamp":{"gte":"now-1h","lt":"now"}}}]}},"aggregations":{"maxconnectclient":{"terms":{"field":"networkId.keyword","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]},"aggregations":{"wlan0clintCount":{"sum":{"field":"wlan0.clients"}},"wlan1clintCount":{"sum":{"field":"wlan1.clients"}},"wlan2clintCount":{"sum":{"field":"wlan2.clients"}},"wlan2_6clintCount":{"sum":{"field":"wlan2_6.clients"}},"totalClientCount":{"bucket_script":{"buckets_path":{"wlan0clintCount":"wlan0clintCount","wlan1clintCount":"wlan1clintCount","wlan2clintCount":"wlan2clintCount","wlan2_6clintCount":"wlan2_6clintCount"},"script":{"source":"double sum = 0.0; sum = params.wlan0clintCount + params.wlan1clintCount+params.wlan2clintCount+params.wlan2_6clintCount; return (sum);","lang":"painless"},"gap_policy":"skip"}},"sum_bucket_sort":{"bucket_sort":{"sort":[{"totalClientCount":{"order":"desc"}}],"from":0,"gap_policy":"SKIP"}}}}}}

Related

Jqgrid - Calculation of noOfPages value

I was referring to the Instant jqGrid book to set up the grid. The noOfPages attribute is calculated as follows.
//Prepare the response
$numberOfPages = ceil( $numberOfRows / $rowsPerPage );
I could see that for 581 records with rowPerPage=25, the noOfPages were appearing as 23.
System.out.println((int)Math.ceil(581/25));//23
I was expecting a value as 24 with the last page containing the records [576-581]. So here, we are missing these 6 records.
It seems you use Java, where ceil of two integer gives integer. I suggest you to look at this theard
To resume the possible solutions are:
int n = (int) Math.ceil((double) a / b));
int n = a / b + (a % b == 0) ? 0 : 1;
int n = (a + b - 1) / b;
Select one which will meet best your requierments

RX LINQ partition the input stream

I have a input stream where the input element consist of Date, Depth and Area.
I want to plot the Area against the Depth and want therefor to take out a window of Depth e.g. between 1.0-100.0m.
The problem is that I want to down sample the input stream since there can be many inputs with close Depth values.
I want to partition the input into x bins, e.g. 2 bins means all depth values between 1-50 is averaged in the first bin and 51-100 in the second bin.
I was thinking something like this:
var q = from e in input
where (e.Depth > 1) && (e.Depth <= 100)
// here I need some way of partition the sequence into bins
// and averaging the elements.
Split a collection into `n` parts with LINQ? wants to do something similar without rx.
Modified answer as per your comment. steps = number of buckets.
var min = 1, max = 100;
var steps = 10;
var f = (max - min + 1) / steps; // The extra 1 is really an epsilon. #hack
var q = from e in input
where e.Depth > 1 && e.depth <= 100
let x = e.Depth - min
group e by x < max ? (x - (x % f)) : ;
This is the function we're grouping by for the given e.Depth.
This probably won't work so great with floating point values (due to precision), unless you floor/ceil the selection, but then you might run out of integers, so you may need to scale a bit... something like group e by Math.Floor((x - (x % f)) * scaleFactor).
This should do what you want
static int GetBucket(double value, double min, double max, int bucketCount)
{
return (int)((value - min) / (max - min) * bucketCount + 0.5);
}
var grouped = input.GroupBy(e => GetBucket(e.Depth, 1, 100, 50));

Algorithm: Determine if a combination of min/max values fall within a given range

Imagine you have 3 buckets, but each of them has a hole in it. I'm trying to fill a bath tub. The bath tub has a minimum level of water it needs and a maximum level of water it can contain. By the time you reach the tub with the bucket it is not clear how much water will be in the bucket, but you have a range of possible values.
Is it possible to adequately fill the tub with water?
Pretty much you have 3 ranges (min,max), is there some sum of them that will fall within a 4th range?
For example:
Bucket 1 : 5-10L
Bucket 2 : 15-25L
Bucket 3 : 10-50L
Bathtub 100-150L
Is there some guaranteed combination of 1 2 and 3 that will fill the bathtub within the requisite range? Multiples of each bucket can be used.
EDIT: Now imagine there are 50 different buckets?
If the capacity of the tub is not very large ( not greater than 10^6 for an example), we can solve it using dynamic programming.
Approach:
Initialization: memo[X][Y] is an array to memorize the result. X = number of buckets, Y = maximum capacity of the tub. Initialize memo[][] with -1.
Code:
bool dp(int bucketNum, int curVolume){
if(curVolume > maxCap)return false; // pruning extra branches
if(curVolume>=minCap && curVolume<=maxCap){ // base case on success
return true;
}
int &ret = memo[bucketNum][curVolume];
if(ret != -1){ // this state has been visited earlier
return false;
}
ret = false;
for(int i = minC[bucketNum]; i < = maxC[bucketNum]; i++){
int newVolume = curVolume + i;
for(int j = bucketNum; j <= 3; j++){
ret|=dp(j,newVolume);
if(ret == true)return ret;
}
}
return ret;
}
Warning: Code not tested
Here's a naïve recursive solution in python that works just fine (although it doesn't find an optimal solution):
def match_helper(lower, upper, units, least_difference, fail = dict()):
if upper < lower + least_difference:
return None
if fail.get((lower,upper)):
return None
exact_match = [ u for u in units if u['lower'] >= lower and u['upper'] <= upper ]
if exact_match:
return [ exact_match[0] ]
for unit in units:
if unit['upper'] > upper:
continue
recursive_match = match_helper(lower - unit['lower'], upper - unit['upper'], units, least_difference)
if recursive_match:
return [unit] + recursive_match
else:
fail[(lower,upper)] = 1
return None
def match(lower, upper):
units = [
{ 'name': 'Bucket 1', 'lower': 5, 'upper': 10 },
{ 'name': 'Bucket 2', 'lower': 15, 'upper': 25 },
{ 'name': 'Bucket 3', 'lower': 10, 'upper': 50 }
]
least_difference = min([ u['upper'] - u['lower'] for u in units ])
return match_helper(
lower = lower,
upper = upper,
units = sorted(units, key = lambda u: u['upper']),
least_difference = min([ u['upper'] - u['lower'] for u in units ]),
)
result = match(100, 175)
if result:
lower = sum([ u['lower'] for u in result ])
upper = sum([ u['upper'] for u in result ])
names = [ u['name'] for u in result ]
print lower, "-", upper
print names
else:
print "No solution"
It prints "No solution" for 100-150, but for 100-175 it comes up with a solution of 5x bucket 1, 5x bucket 2.
Assuming you are saying that the "range" for each bucket is the amount of water that it may have when it reaches the tub, and all you care about is if they could possibly fill the tub...
Just take the "max" of each bucket and sum them. If that is in the range of what you consider the tub to be "filled" then it can.
Updated:
Given that buckets can be used multiple times, this seems to me like we're looking for solutions to a pair of equations.
Given buckets x, y and z we want to find a, b and c:
a*x.min + b*y.min + c*z.min >= bathtub.min
and
a*x.max + b*y.max + c*z.max <= bathtub.max
Re: http://en.wikipedia.org/wiki/Diophantine_equation
If bathtub.min and bathtub.max are both multiples of the greatest common divisor of a,b and c, then there are infinitely many solutions (i.e. we can fill the tub), otherwise there are no solutions (i.e. we can never fill the tub).
This can be solved with multiple applications of the change making problem.
Each Bucket.Min value is a currency denomination, and Bathtub.Min is the target value.
When you find a solution via a change-making algorithm, then apply one more constraint:
sum(each Bucket.Max in your solution) <= Bathtub.max
If this constraint is not met, throw out this solution and look for another. This will probably require a change to a standard change-making algorithm that allows you to try other solutions when one is found to not be suitable.
Initially, your target range is Bathtub.Range.
Each time you add an instance of a bucket to the solution, you reduce the target range for the remaining buckets.
For example, using your example buckets and tub:
Target Range = 100..150
Let's say we want to add a Bucket1 to the candidate solution. That then gives us
Target Range = 95..140
because if the rest of the buckets in the solution total < 95, then this Bucket1 might not be sufficient to fill the tub to 100, and if the rest of the buckets in the solution total > 140, then this Bucket1 might fill the tub over 150.
So, this gives you a quick way to check if a candidate solution is valid:
TargetRange = Bathtub.Range
foreach Bucket in CandidateSolution
TargetRange.Min -= Bucket.Min
TargetRange.Max -= Bucket.Max
if TargetRange.Min == 0 AND TargetRange.Max >= 0 then solution found
if TargetRange.Min < 0 or TargetRange.Max < 0 then solution is invalid
This still leaves the question - How do you come up with the set of candidate solutions?
Brute force would try all possible combinations of buckets.
Here is my solution for finding the optimal solution (least number of buckets). It compares the ratio of the maximums to the ratio of the minimums, to figure out the optimal number of buckets to fill the tub.
private static void BucketProblem()
{
Range bathTub = new Range(100, 175);
List<Range> buckets = new List<Range> {new Range(5, 10), new Range(15, 25), new Range(10, 50)};
Dictionary<Range, int> result;
bool canBeFilled = SolveBuckets(bathTub, buckets, out result);
}
private static bool BucketHelper(Range tub, List<Range> buckets, Dictionary<Range, int> results)
{
Range bucket;
int startBucket = -1;
int fills = -1;
for (int i = buckets.Count - 1; i >=0 ; i--)
{
bucket = buckets[i];
double maxRatio = (double)tub.Maximum / bucket.Maximum;
double minRatio = (double)tub.Minimum / bucket.Minimum;
if (maxRatio >= minRatio)
{
startBucket = i;
if (maxRatio - minRatio > 1)
fills = (int) minRatio + 1;
else
fills = (int) maxRatio;
break;
}
}
if (startBucket < 0)
return false;
bucket = buckets[startBucket];
tub.Maximum -= bucket.Maximum * fills;
tub.Minimum -= bucket.Minimum * fills;
results.Add(bucket, fills);
return tub.Maximum == 0 || tub.Minimum <= 0 || startBucket == 0 || BucketHelper(tub, buckets.GetRange(0, startBucket), results);
}
public static bool SolveBuckets(Range tub, List<Range> buckets, out Dictionary<Range, int> results)
{
results = new Dictionary<Range, int>();
buckets = buckets.OrderBy(b => b.Minimum).ToList();
return BucketHelper(new Range(tub.Minimum, tub.Maximum), buckets, results);
}

finding mean using pig or hadoop

I have a huge text file of form
data is saved in directory data/data1.txt, data2.txt and so on
merchant_id, user_id, amount
1234, 9123, 299.2
1233, 9199, 203.2
1234, 0124, 230
and so on..
What I want to do is for each merchant, find the average amount..
so basically in the end i want to save the output in file.
something like
merchant_id, average_amount
1234, avg_amt_1234 a
and so on.
How do I calculate the standard deviation as well?
Sorry for asking such a basic question. :(
Any help would be appreciated. :)
Apache PIG is well adapted for such tasks. See example:
inpt = load '~/pig_data/pig_fun/input/group.txt' as (amnt:double, id:chararray,c2:chararray);
grp = group inpt by id;
mean = foreach grp {
sum = SUM(inpt.amnt);
count = COUNT(inpt);
generate group as id, sum/count as mean, sum as sum, count as count;
};
Pay special attention to the data type of the amnt column as it will influence which implementation of the SUM function PIG is going to invoke.
PIG can also do something that SQL can not, it can put the mean against each input row without using any inner joins. That is useful if you are calculating z-scores using standard deviation.
mean = foreach grp {
sum = SUM(inpt.amnt);
count = COUNT(inpt);
generate FLATTEN(inpt), sum/count as mean, sum as sum, count as count;
};
FLATTEN(inpt) does the trick, now you have access to the original amount that had contributed to the groups average, sum and count.
UPDATE 1:
Calculating variance and standard deviation:
inpt = load '~/pig_data/pig_fun/input/group.txt' as (amnt:double, id:chararray, c2:chararray);
grp = group inpt by id;
mean = foreach grp {
sum = SUM(inpt.amnt);
count = COUNT(inpt);
generate flatten(inpt), sum/count as avg, count as count;
};
tmp = foreach mean {
dif = (amnt - avg) * (amnt - avg) ;
generate *, dif as dif;
};
grp = group tmp by id;
standard_tmp = foreach grp generate flatten(tmp), SUM(tmp.dif) as sqr_sum;
standard = foreach standard_tmp generate *, sqr_sum / count as variance, SQRT(sqr_sum / count) as standard;
It will use 2 jobs. I have not figured out how to do it in one, hmm, need to spend more time on it.
So what do you want? You want the running java code or the abstract map-reduce process? For the second:
The map step:
record -> (merchant_id as key, amount as value)
The reduce step:
(merchant_id, amount) -> (merchant_id, aggregate the value you want)
As in the reduce step, you will be provided with a stream of record having the same key and you can do almost everything you can including the average, variance.
you can calculate the standard deviation just in one step; using the formula
var=E(x^2)-(Ex)^2
inpt = load '~/pig_data/pig_fun/input/group.txt' as (amnt:double, id:chararray, c2:chararray);
grp = group inpt by id;
mean = foreach grp {
sum = SUM(inpt.amnt);
sum2 = SUM(inpt.amnt**2);
count = COUNT(inpt);
generate flatten(inpt), sum/count as avg, count as count, sum2/count- (sum/count)**2 as std;
};
that's it!
I calculated all stats(min, max, mean and standard deviation) in just 1 loop. FILTER_DATA contains data-set.
GROUP_SYMBOL_YEAR = GROUP FILTER_DATA BY (SYMBOL, SUBSTRING(TIMESTAMP,0,4));
STATS_ALL = FOREACH GROUP_SYMBOL_YEAR {
MINIMUM = MIN(FILTER_DATA.CLOSE);
MAXIMUM = MAX(FILTER_DATA.CLOSE);
MEAN = AVG(FILTER_DATA.CLOSE);
CNT = COUNT(FILTER_DATA.CLOSE);
CSQ = FOREACH FILTER_DATA GENERATE CLOSE * CLOSE AS (CC:DOUBLE);
GENERATE group.$0 AS (SYMBOL:CHARARRAY), MINIMUM AS (MIN:DOUBLE), MAXIMUM AS (MAX:DOUBLE), ROUND_TO(MEAN,6) AS (MEAN:DOUBLE), ROUND_TO(SQRT(SUM(CSQ.CC) / (CNT * 1.0) - (MEAN * MEAN)),6) AS (STDDEV:DOUBLE), group.$1 AS (YEAR:INT);
};

Flooding Bayesian rating creates values out of range

I'm trying to apply the Bayesian rating formula, but if I rate 1 out of 5 thousand of hundreds, the final rating is greater than 5.
For example, a given item has no votes and after voting 170,000 times with 1 star, its final rating is 5.23. If I rate 100, it has a normal value.
Here is what I have in PHP.
<?php
// these values came from DB
$total_votes = 2936; // total of votes for all items
$total_rating = 582.955; // sum of all ratings
$total_items = 202;
// now the specific item, it has no votes yet
$this_num_votes = 0;
$this_score = 0;
$this_rating = 0;
// simulating a lot of votes with 1 star
for ($i=0; $i < 170000; $i++) {
$rating_sent = 1; // the new rating, always 1
$total_votes++; // adding 1 to total
$total_rating = $total_rating+$rating_sent; // adding 1 to total
$avg_num_votes = ($total_votes/$total_items); // Average number of votes in all items
$avg_rating = ($total_rating/$total_items); // Average rating for all items
$this_num_votes = $this_num_votes+1; // Number of votes for this item
$this_score = $this_score+$rating_sent; // Sum of all votes for this item
$this_rating = $this_score/$this_num_votes; // Rating for this item
$bayesian_rating = ( ($avg_num_votes * $avg_rating) + ($this_num_votes * $this_rating) ) / ($avg_num_votes + $this_num_votes);
}
echo $bayesian_rating;
?>
Even if I flood with 1 or 2:
$rating_sent = rand(1,2)
The final rating after 100,000 votes is over 5.
I just did a new test using
$rating_sent = rand(1,5)
And after 100,000 I got a value completely out of range range (10.53). I know that in a normal situation no item will get 170,000 votes while all the other items get no vote. But I wonder if there is something wrong with my code or if this is an expected behavior of Bayesian formula considering the massive votes.
Edit
Just to make it clear, here is a better explanation for some variables.
$avg_num_votes // SUM(votes given to all items)/COUNT(all items)
$avg_rating // SUM(rating of all items)/COUNT(all items)
$this_num_votes // COUNT(votes given for this item)
$this_score // SUM(rating for this item)
$bayesian_rating // is the formula itself
The formula is: ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes). Taken from here
You need to divide by total_votes rather than total_items when calculating avg_rating.
I made the changes and got something that behaves much better here.
http://codepad.org/gSdrUhZ2

Resources