Elasticsearch percolate performance - elasticsearch

I use percolator(Elasticsearch 2.3.3) and i have ~100 term queries. When i percolate 1 document in 1 thread, it took ~500ms:
{u'total': 0, u'took': 452, u'_shards': {u'successful': 12, u'failed': 0, u'total': 12}} TIME 0.467885982513
There are 4 CPU, so i want to percolate in 4 processes. But when i launch them, everyone took ~2000ms:
{u'total': 0, u'took': 1837, u'_shards': {u'successful': 12, u'failed': 0, u'total': 12}} TIME 1.890885982513
Why?
I use python module Elasticsearch 2.3.0.
I have tried to manage count of shards(from 1 to 12), but it is the same result.
When i try to percolate in 20 thread, elastic crushes with error:
RemoteTransportException[[test_node01][192.168.69.142:9300][indices:data/read/percolate[s]]];
nested: EsRejectedExecutionException[rejected execution of
org.elasticsearch.transport.TransportService$4#7906d a8a on
EsThreadPoolExecutor[percolate, queue capacity = 1000,
org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor#31a1c278[Running,
pool size = 16, active threads = 16, queued tasks = 1000, compl eted
tasks = 156823]]]; Caused by: EsRejectedExecutionException[rejected
execution of org.elasticsearch.transport.TransportService$4#7906da8a
on EsThreadPoolExecutor[percolate, queue capacity = 1000,
org.elasticsearch.common.util
.concurrent.EsThreadPoolExecutor#31a1c278[Running, pool size = 16,
active threads = 16, queued tasks = 1000, completed tasks = 156823]]]
Server has 16 CPU and 32 GB RAM

Related

Heapify Down Method in Min Heap

Currently trying to grasp the Min Heap with Repl here: https://repl.it/#Stylebender/Minheap#index.js
Min Heap has a capacity of 5.
Once I insert the 6th element (50) we swap the positions of the elements at index 0 and index 5 and eject the smallest element from the heap leaving the heap as:
[ 50, 10, 20, 40, 30]
My specific query has to deal with Lines 39-40.
As you can see from the console log, the first time we call the trickeDown() method, min (0) which represents the index position of 50 becomes leftChild and ends up swapping positions with index 1 with the following result:
[ 50, 10, 20, 40, 30]
However, on the second call of the trickleDown() method, 50 which is at index 1 assumes the position of rightChild and swaps positions with index 4 to form the final heap as below:
[ 10, 30, 20, 40, 50]
Maybe I'm just missing something but I'm not sure why min decided to become leftChild in the first run and rightChild in the second run since wouldn't 50, as the largest element within the min heap satisfy both the For Loops every time the method is invoked?
In the first call, we comparing 50, 10 and 20.
min begins at 0, indicating the 50.
10 is less than 50, so min becomes 1.
20 is not less than 10, so min does not change.
We have found the minimum: 10.
In the second call, we compare 50, 40 and 30.
min begins at 1, indicating the 50.
40 is less than 50, so min becomes 3.
30 is less than 40, so min becomes 4.
We have found the minimum: 30.
It is not sufficient to find an element less than 50; we must find the minimum. To swap 50 and 20 would not produce a valid min-heap.

Finding largest difference in array of compass headings

I'm trying to have the "range" of compass headings over the last X seconds. Example: Over the last minute, my heading has been between 120deg and 140deg on the compass. Easy enough right? I have an array with the compass headings over the time period, say 1 reading every second.
[ 125, 122, 120, 125, 130, 139, 140, 138 ]
I can take the minimum and maximum values and there you go. My range is from 120 to 140.
Except it's not that simple. Take for example if my heading has shifted from 10 degrees, to 350 degrees (ie it "passed" through North, changing -20deg.
Now my array might look something like this:
[ 9, 10, 6, 3, 358, 355, 350, 353 ]
Now the Min is 3 and max 358, which is not what I need :( I'm looking for the most "right hand" (clockwise) value, and most "left hand" (counter-clockwise) value.
Only way I can think of, is finding the largest arc along the circle that includes none of the values in my array, but I don't even know how I would do that.
Would really appreciate any help!
Problem Analysis
To summarize the problem, it sounds like you want to find both of the following:
The two readings that are closest together (for simplicity: in a clockwise direction) AND
Contain all of the other readings between them.
So in your second example, 9 and 10 are only 1° apart, but they do not contain all the other readings. Conversely, traveling clockwise from 10 to 9 would contain all of the other readings, but they are 359° apart in that direction, so they are not closest.
In this case, I'm not sure if using the minimum and maximum readings will help. Instead, I'd recommend sorting all of the readings. Then you can more easily check the two criteria specified above.
Here's the second example you provided, sorted in ascending order:
[ 3, 6, 9, 10, 350, 353, 355, 358 ]
If we start from the beginning, we know that traveling from reading 3 to reading 358 will encompass all of the other readings, but they are 358 - 3 = 355° apart. We can continue scanning the results progressively. Note that once we circle around, we have to add 360 to properly calculate the degrees of separation.
[ 3, 6, 9, 10, 350, 353, 355, 358 ]
*--------------------------> 358 - 3 = 355° separation
[ 3, 6, 9, 10, 350, 353, 355, 358 ]
-> *----------------------------- (360 + 3) - 6 = 357° separation
[ 3, 6, 9, 10, 350, 353, 355, 358 ]
----> *-------------------------- (360 + 6) - 9 = 357° separation
[ 3, 6, 9, 10, 350, 353, 355, 358 ]
-------> *----------------------- (360 + 9) - 10 = 359° separation
[ 3, 6, 9, 10, 350, 353, 355, 358 ]
----------> *------------------- (360 + 10) - 350 = 20° separation
[ 3, 6, 9, 10, 350, 353, 355, 358 ]
--------------> *-------------- (360 + 350) - 353 = 357° separation
[ 3, 6, 9, 10, 350, 353, 355, 358 ]
-------------------> *--------- (360 + 353) - 355 = 358° separation
[ 3, 6, 9, 10, 350, 353, 355, 358 ]
------------------------> *---- (360 + 355) - 358 = 357° separation
Pseudocode Solution
Here's a pseudocode algorithm for determining the minimum degree range of reading values. There are definitely ways it could be optimized if performance is a concern.
// Somehow, we need to get our reading data into the program, sorted
// in ascending order.
// If readings are always whole numbers, you can use an int[] array
// instead of a double[] array. If we use an int[] array here, change
// the "minimumInclusiveReadingRange" variable below to be an int too.
double[] readings = populateAndSortReadingsArray();
if (readings.length == 0)
{
// Handle case where no readings are provided. Show a warning,
// throw an error, or whatever the requirement is.
}
else
{
// We want to track the endpoints of the smallest inclusive range.
// These values will be overwritten each time a better range is found.
int minimumInclusiveEndpointIndex1;
int minimumInclusiveEndpointIndex2;
double minimumInclusiveReadingRange; // This is convenient, but not necessary.
// We could determine it using the
// endpoint indices instead.
// Check the range of the greatest and least readings first. Since
// the readings are sorted, the greatest reading is the last element.
// The least reading is the first element.
minimumInclusiveReadingRange = readings[array.length - 1] - readings[0];
minimumInclusiveEndpointIndex1 = 0;
minimumInclusiveEndpointIndex2 = array.length - 1;
// Potential to skip some processing. If the ends are 180 or less
// degrees apart, they represent the minimum inclusive reading range.
// The for loop below could be skipped.
for (int i = 1; i < array.length; i++)
{
if ((360.0 + readings[i-1]) - readings[i] < minimumInclusiveReadingRange)
{
minimumInclusiveReadingRange = (360.0 + readings[i-1]) - readings[i];
minimumInclusiveEndpointIndex1 = i;
minimumInclusiveEndpointIndex2 = i - 1;
}
}
// Most likely, there will be some different readings, but there is an
// edge case of all readings being the same:
if (minimumInclusiveReadingRange == 0.0)
{
print("All readings were the same: " + readings[0]);
}
else
{
print("The range of compass readings was: " + minimumInclusiveReadingRange +
" spanning from " + readings[minimumInclusiveEndpointIndex1] +
" to " + readings[minimumInclusiveEndpointIndex2]);
}
}
There is one additional edge case that this pseudocode algorithm does not cover, and that is the case where there are multiple minimum inclusive ranges...
Example 1: [0, 90, 180, 270] which has a range of 270 (90 to 0/360, 180 to 90, 270 to 180, and 0 to 270).
Example 2: [85, 95, 265, 275] which has a range of 190 (85 to 275 and 265 to 95)
If it's necessary to report each possible pair of endpoints that create the minimum inclusive range, this edge case would increase the complexity of the logic a bit. If all that matters is determining the value of the minimum inclusive range or it is sufficient to report just one pair that represents the minimum inclusive range, the provided algorithm should suffice.

Need timer countdown resetting to 60 when ends with Observables or in any way in Angular2

I am developing bidding website in Angular2. I need a timer for every product, which starts at 60 seconds and when it gets to 0 it resets again to 60.
countDown;
counter = 60;
constructor() {
this.countDown = Observable.timer(0,1000)
.take(this.counter)
.map(() => --this.counter);
}
The counter starts with 60 seconds and ends up with 0. But I can't implement reset part which reassigns 60 again to counter when its value gets to 0.
I know little about Obervables in RxJs. Can anyone help me?
Welcome to Stack Overflow!
We can use some math to achieve this. Please read bonus part to see an alternative
this.countDown = Observable
.timer(0, 1000) // emits 0, 1, 2, 3, 4, 5, 6, 7, 8...
.map(tick => 60 - (tick % 60)); // emits 60, 59, 58... 0, 60, 59...
We're taking the rest of division by 60 - tick % 60
1 % 60 = 1
2 % 60 = 2
...
60 % 60 = 0
61 % 60 = 1
...
If we want to count down from 60 then we need to subtract the rest of division from 60.
Bonus (answer to the question in the comment)
If you want to start with a random number, you have to keep some kind of state in the observable. scan operator serves this purpose. In this scenario, using timer operator is an overkill and we can use interval instead.
function getRandomNumber() { // returns a random integer from 1 to 60
return Math.floor(Math.random() * 60) + 1;
}
Observable
.interval(1000) // emits every second
// `scan` works like `reduce` method of the array
// it keeps its own internal state in `acc`
// `acc` initial value is random and when it reaches 1, it has a new random integer assigned to it again
.scan(acc => acc === 1 ? getRandomNumber() : acc - 1, getRandomNumber());
// emits e.g. 5, 4, 3, 2, 1, 10, 9, 8, 7, 6...
Technically you can solve original problem by replacing getRandomNumber with a desired number.

Algorithm to gradually slow down polling requests

I have a requirement to do polling that gradually gets slower over time. Is there a mathematical formula that is common to use in such a scenario?
For example I might want to poll 10 seconds after the first try and then gradually get slower to around every 1-5 minutes.
I think geometric serieses are common choices. This is how 10 * 1.2**N looks like:
irb(main):009:0> (0..20).map{|i| 10 * 1.2**i }
=> [10.0, 12.0, 14.399999999999999, 17.279999999999998, 20.735999999999997, 24.883199999999995,
29.85983999999999, 35.831807999999995, 42.99816959999998, 51.597803519999985, 61.917364223999975,
74.30083706879996, 89.16100448255996, 106.99320537907195, 128.39184645488632, 154.0702157458636,
184.8842588950363, 221.86111067404354, 266.23333280885225, 319.4799993706227, 383.3759992447472]
You may also want to check the cumulative time until declaring "timeout".
irb(main):010:0> (0..20).map{|i| 10 * 1.2**i }.inject(:+)
=> 2250.255995468484
FYI, Linux TCP SYN retry employs more aggressive slowdown factor 3 * 2**N
irb(main):011:0> (0..5).map{|i| 3 * 2**i }
=> [3, 6, 12, 24, 48, 96]
irb(main):012:0> (0..5).map{|i| 3 * 2**i }.inject(:+)
=> 189

disk scheduling algorithm

Assume the disk head starts at track 1; there are 230 tracks (from 1 to 230); a seek takes 34 + 0.1*T milliseconds, where T is the number of tracks to move; latency is 12 milliseconds; and I/O transfer takes 3 milliseconds. Here are the requests, all in the disk queue already, and the time they arrive (starting at time 0):
arrival time(ms): 0, 21, 23, 28, 32, 45, 58, 83, 89, 109
for track: 43, 132, 34, 23, 202, 175, 219, 87, 75, 182
Compute the average time to service a request for each of the following disk scheduling algorithms: SCAN, FCFS, SSTF. Also show the order of service for each algorithm.
Answer for SCAN:
1>23>34>43>75>87>132>175>182>202>219>230
average time = 10*49 + 0.1*218 = 51.18 ms
I don't understand how they calculated the average time.
The above is the only work they showed.
Where did they get the 10 and 218 from in the average time formula?
Answer for FCFS
1>43>132>34>23>202>175>219>87>75>182
average time = 490 + (42+89+98+11+179+27+44+132+12+107)*0.1 = 56.4ms
I understand where they got (42+89+98+11+179+27+44+132+12+107)*0.1 from, but how did they get 490?
For scan, the total number of tracks of movement is just the difference between 1, where the head starts, and 219, the most distant track, so time due to moving past tracks is 0.1*(219-1).
There is a seek overhead of 34, latency 12, transfer 3, total 34+12+3 = 49.
Thus the total time is 10*49+0.1*218 = 490+21.8 = 511.8, average 51.18.
The 490 ms of non-move time is the same for FCFS. Only the track move time is different.

Resources