Highstock custom min max approximation works properly only at some ranges - max

I am feeding highstock chart with datas from mysql. There are variables stored every minute so if you want to look at data past 3 months they are grouped. Highstock's default approximation functions give low,high,averare,sum values only. I min and max values are most important for me so I made my own approximation function which is:
approximation: function (arr) {
// first time or point precalculated
if ( !gInfo || gInfo.nextPoint) {
// first time return first value (arr[0])
var point = gInfo ? gInfo.nextPoint : arr[0];
// save current data to the next iteration
gInfo = {prev : arr, nextPoint : null};
return point;
} else {
var prev = gInfo.prev,
// concat current group with the previous one
data = prev.concat(arr),
// get min, max and their positions
min = Math.min.apply(null, data),
max = Math.max.apply(null, data),
minIdx = data.indexOf(min),
maxIdx = data.indexOf(max),
// order min and max
aprox = minIdx < maxIdx ? [min, max] : [max, min];
// save next aproximation and return current
gInfo.nextPoint = aprox[1];
return aprox[0];
}
},
Actually I didn't make it but I found it here in the forum.
The problem is it gives me right results only at some ranges as shown in the pictures below:
First picture at max range - not ok:
Max range - you can't see every min value
As I am changing range to smaller I can see every min value:
This is how it should looke like at max range
It is also happening when I zoom in so datas are grouped in two min intervals and I am just scrolling to the left or to the right.
At first I thought that it has something to do with the way groups are made by changing groupPixelWidth: to any value did not help.
Having min or max is really important for me and this is something I can solve in highstock.

There seems to an error in approximation function.
In 3rd line:
if ( !gInfo || gInfo.nextPoint) {
if should evaluate to false if !gInfo is false (it is after first time) AND gInfo.nextPoint returns false, but it will return false not only if it is null (as set in function), but also when it is zero. Changing if condition to:
if (!gInfo || gInfo.nextPoint !== null) {
Example with error (before the fix): http://jsfiddle.net/p2qvx24a/1/
Example with fix: http://jsfiddle.net/p2qvx24a/

Related

Calculating sensing range from sensing sensitivity of the device in Castalia?

I am implementing a WSN algorithm in Castalia. I need to calculate sensing range of the sensing device. I know I will need to use the sensing sensitivity parameter but what will be the exact equation?
The answer will vary depending on the behaviour specified by the PhysicalProcess module used. Since you say in your comment that you may be using the CarsPhysicalProcess let's use that as an example.
A sensor reading request initiated by the application is first sent to the SensorManager via a SensorReadingMessage message. In SensorManager.cc you can see how this is processed in its handleMessage function:
...
case SENSOR_READING_MESSAGE: {
SensorReadingMessage *rcvPacket =check_and_cast<SensorReadingMessage*>(msg);
int sensorIndex = rcvPacket->getSensorIndex();
simtime_t currentTime = simTime();
simtime_t interval = currentTime - sensorlastSampleTime[sensorIndex];
int getNewSample = (interval < minSamplingIntervals[sensorIndex]) ? 0 : 1;
if (getNewSample) { //the last request for sample was more than minSamplingIntervals[sensorIndex] time ago
PhysicalProcessMessage *requestMsg =
new PhysicalProcessMessage("sample request", PHYSICAL_PROCESS_SAMPLING);
requestMsg->setSrcID(self); //insert information about the ID of the node
requestMsg->setSensorIndex(sensorIndex); //insert information about the index of the sensor
requestMsg->setXCoor(nodeMobilityModule->getLocation().x);
requestMsg->setYCoor(nodeMobilityModule->getLocation().y);
// send the request to the physical process (using the appropriate
// gate index for the respective sensor device )
send(requestMsg, "toNodeContainerModule", corrPhyProcess[sensorIndex]);
// update the most recent sample times in sensorlastSampleTime[]
sensorlastSampleTime[sensorIndex] = currentTime;
} else { // send back the old sample value
rcvPacket->setSensorType(sensorTypes[sensorIndex].c_str());
rcvPacket->setSensedValue(sensorLastValue[sensorIndex]);
send(rcvPacket, "toApplicationModule");
return;
}
break;
}
....
As you can see, what it's doing is first working out how much time has elapsed since the last sensor reading request for this sensor. If it's less time than specified by the minSamplingInterval possible for this sensor (this is determined by the maxSampleRates NED parameter of the SensorManager), it just returns the last sensor reading given. If it's greater, a new sensor reading is made.
A new sensor reading is made by sending a PhysicalProcessMessage message to the PhysicalProcess module (via the toNodeContainerModule gate). In the message we pass the X and Y coordinates of the node.
Now, if we have specified CarsPhysicalProcess as the physical process to be used in our omnetpp.ini file, the CarsPhysicalProcess module will receive this message. You can see this in CarsPhysicalProcess.cc:
....
case PHYSICAL_PROCESS_SAMPLING: {
PhysicalProcessMessage *phyMsg = check_and_cast < PhysicalProcessMessage * >(msg);
// get the sensed value based on node location
phyMsg->setValue(calculateScenarioReturnValue(
phyMsg->getXCoor(), phyMsg->getYCoor(), phyMsg->getSendingTime()));
// Send reply back to the node who made the request
send(phyMsg, "toNode", phyMsg->getSrcID());
return;
}
...
You can see that we calculate a sensor value based on the X and Y coordinates of the node, and the time at which the sensor reading was made. The response is sent back to the SensorManager via the toNode gate. So we need to look at the calculateScenarioReturnValue function to understand what's going on:
double CarsPhysicalProcess::calculateScenarioReturnValue(const double &x_coo,
const double &y_coo, const simtime_t &stime)
{
double retVal = 0.0f;
int i;
double linear_coeff, distance, x, y;
for (i = 0; i < max_num_cars; i++) {
if (sources_snapshots[i][1].time >= stime) {
linear_coeff = (stime - sources_snapshots[i][0].time) /
(sources_snapshots[i][1].time - sources_snapshots[i][0].time);
x = sources_snapshots[i][0].x + linear_coeff *
(sources_snapshots[i][1].x - sources_snapshots[i][0].x);
y = sources_snapshots[i][0].y + linear_coeff *
(sources_snapshots[i][1].y - sources_snapshots[i][0].y);
distance = sqrt((x_coo - x) * (x_coo - x) +
(y_coo - y) * (y_coo - y));
retVal += pow(K_PARAM * distance + 1, -A_PARAM) * car_value;
}
}
return retVal;
}
We start with a sensor return value of 0. Then we loop over every car that is on the road (if you look at the TIMER_SERVICE case statement in the handleMessage function, you will see that CarsPhysicalProcess puts cars on the road randomly according to the car_interarrival rate, up to a maximum of max_num_cars number of cars). For every car, we calculate how far the car has travelled down the road, and then calculate the distance between the car and the node. Then for each car we add to the return value based on the formula:
pow(K_PARAM * distance + 1, -A_PARAM) * car_value
Where distance is the distance we have calculated between the car and the node, K_PARAM = 0.1, A_PARAM = 1 (defined at the top of CarsPhysicalProcess.cc) and car_value is a number specified in the CarsPhysicalProcess.ned parameter file (default is 30).
This value is passed back to the SensorManager. The SensorManager then may change this value depending on the sensitivity, resolution, noise and bias of the sensor (defined as SensorManager parameters):
....
case PHYSICAL_PROCESS_SAMPLING:
{
PhysicalProcessMessage *phyReply = check_and_cast<PhysicalProcessMessage*>(msg);
int sensorIndex = phyReply->getSensorIndex();
double theValue = phyReply->getValue();
// add the sensor's Bias and the random noise
theValue += sensorBias[sensorIndex];
theValue += normal(0, sensorNoiseSigma[sensorIndex], 1);
// process the limitations of the sensing device (sensitivity, resoultion and saturation)
if (theValue < sensorSensitivity[sensorIndex])
theValue = sensorSensitivity[sensorIndex];
if (theValue > sensorSaturation[sensorIndex])
theValue = sensorSaturation[sensorIndex];
theValue = sensorResolution[sensorIndex] * lrint(theValue / sensorResolution[sensorIndex]);
....
So you can see that if the value is below the sensitivity of the sensor, the floor of the sensitivity is returned.
So basically you can see that there is no specific 'sensing range' in Castalia - it all depends on how the specific PhysicalProcess handles the message. In the case of CarsPhysicalProcess, as long as there is a car on the road, it will always return a value, regardless of the distance - it just might be very small if the car is a long distance away from the node. If the value is very small, you may receive the lowest sensor sensitivity instead. You could increase or decrease the car_value parameter to get a stronger response from the sensor (so this is kind of like a sensor range)
EDIT---
The default sensitivity (which you can find in SensorManager.ned) is 0. Therefore for CarsPhysicalProcess, any car on the road at any distance should be detected and returned as a value greater than 0. In other words, there is an unlimited range. If the car is very, very far away it may return a number so small it becomes truncated to zero (this depends on the limits in precision of a double value in the implementation of c++)
If you wanted to implement a sensing range, you would have to set a value for devicesSensitivity in SensorManager.ned. Then in your application, you would test to see if the returned value is greater than the sensitivity value - if it is, the car is 'in range', if it is (almost) equal to the sensitivity it is out of range. I say almost because (as we have seen earlier) the SensorManager adds noise to the value returned, so for example if you have a sensitivity value of 5, and no cars, you will get values which will hover slightly around 5 (e.g. 5.0001, 4.99)
With a sensitivity value set, to calculate the sensing range (assuming only 1 car on the road), this means simply solving the equation above for distance, using the minimum sensitivity value as the returned value. i.e. if we use a sensitivity value of 5:
5 = pow(K_PARAM * distance + 1, -A_PARAM) * car_value
Substitute values for the parameters, and use algebra to solve for distance.

dc.js and crossfilter second level aggregation to average count per hour

I am trying to slightly extend the problem described in this question:
dc.js and crossfilter reduce average counts per day of week
I would like to chart average counts per hour of the day. I have followed the solution above, counting the values by day in the custom reduce with the only change being to dimension by hour of day. This seems to work well and can be seen in the following fiddle:
http://jsfiddle.net/dolomite/6eeahs6z/73/
The top bar chart shows the average counts by hour, the lower chart the total counts by hour. So hour 22 has a total count of 47 and average count of 4.2727... There are 11 days in the data so this is correct.
However, when I click on the weekday row chart and filter for Sunday I get a total count for hour 22 of 4 and an average of 0.3636... The denominator in calculating the average values is still including all weekdays in the data, irrespective of the weekday I filter by. So while the total count has filtered to just show 4 for Sunday it is being divided by the total number of days in the data, whereas the requirement is just to divide by the number of whichever day/s have been selected in the filter.
I know the solution lies in modifying the custom reduce but I am stuck! Any pointers on where I am going wrong would be gratefully received.
hourAvgGroup = hourDim.group().reduce(
function (p, v) { // add
var day = d3.time.day(v.EventDate).getTime();
p.map.set(day, p.map.has(day) ? p.map.get(day) + 1 : 1);
p.avg = average_map(p.map);
return p;
},
function (p, v) { // remove
var day = d3.time.day(v.EventDate).getTime();
p.map.set(day, p.map.has(day) ? p.map.get(day) - 1 : 0);
p.avg = average_map(p.map);
return p;
},
function () { // init
return { map: d3.map(), avg: 0 };
}
)
function average_map(m) {
var sum = 0;
m.forEach(function(k, v) {
sum += v;
});
return m.size() ? sum / m.size() : 0;
}
m.size() counts up the number of keys in the map. The problem is that even if a day has 0 records assigned to it, the key is still there, so m.size() counts it in the denominator. The solution is to remove the key when the count gets to 0. There are probably more efficient ways to do this, but the simplest solution is to add one line to your remove function in the custom reducer so that the function looks like this:
function (p, v) { // remove
var day = d3.time.day(v.EventDate).getTime();
p.map.set(day, p.map.has(day) ? p.map.get(day) - 1 : 0);
// If the day has 0 records, remove the key
if(p.map.has(day) && p.map.get(day) == 0) p.map.remove(day);
p.avg = average_map(p.map);
return p;
},
By the way, I would also recommend not including the actual average and average calculation in your group. Calculate it in the dc.js chart valueAccessor instead. The reducer is run once for every record added or removed. The valueAccessor is only run once per filter operation.

Add water between in a bar chart

Recently came across an interview question in glassdoor-like site and I can't find an optimized solution to solve this problem:
This is nothing like trapping water problem. Please read through the examples.
Given an input array whose each element represents the height of towers, the amount of water will be poured and the index number indicates the pouring water position.The width of every tower is 1. Print the graph after pouring water.
Notes:
Use * to indicate the tower, w to represent 1 amount water.
The pouring position will never at the peak position.No need to consider the divide water case.
(A Bonus point if you gave a solution for this case, you may assume that if Pouring N water at peak position, N/2 water goes to left, N/2 water goes to right.)
The definition for a peak: the height of peak position is greater than the both left and right index next to it.)
Assume there are 2 extreme high walls sits close to the histogram.
So if the water amount is over the capacity of the histogram,
you should indicate the capacity number and keep going. See Example 2.
Assume the water would go left first, see Example 1
Example 1:
int[] heights = {4,2,1,2,3,2,1,0,4,2,1}
It look like:
* *
* * **
** *** **
******* ***
+++++++++++ <- there'll always be a base layer
42123210431
Assume given this heights array, water amout 3, position 2:
Print:
* *
*ww * **
**w*** **
******* ***
+++++++++++
Example 2:
int[] heights = {4,2,1,2,3,2,1,0,4,2,1}, water amout 32, position 2
Print:
capacity:21
wwwwwwwwwww
*wwwwwww*ww
*www*www**w
**w***ww**w
*******w***
+++++++++++
At first I though it's like the trapping water problem but I was wrong. Does anyone have an algorithm to solve this problem?
An explanation or comments in the code would be welcomed.
Note:
The trapping water problem is asked for the capacity, but this question introduced two variables: water amount and the pouring index. Besides, the water has the flowing preference. So it not like trapping water problem.
I found a Python solution to this question. However, I'm not familiar with Python so I quote the code here. Hopefully, someone knows Python could help.
Code by #z026
def pour_water(terrains, location, water):
print 'location', location
print 'len terrains', len(terrains)
waters = [0] * len(terrains)
while water > 0:
left = location - 1
while left >= 0:
if terrains[left] + waters[left] > terrains[left + 1] + waters[left + 1]:
break
left -= 1
if terrains[left + 1] + waters[left + 1] < terrains[location] + waters[location]:
location_to_pour = left + 1
print 'set by left', location_to_pour
else:
right = location + 1
while right < len(terrains):
if terrains[right] + waters[right] > terrains[right - 1] + waters[right - 1]:
print 'break, right: {}, right - 1:{}'.format(right, right - 1)
break
right += 1
if terrains[right - 1] + waters[right - 1] < terrains[location] + waters[right - 1]:
location_to_pour = right - 1
print 'set by right', location_to_pour
else:
location_to_pour = location
print 'set to location', location_to_pour
waters[location_to_pour] += 1
print location_to_pour
water -= 1
max_height = max(terrains)
for height in xrange(max_height, -1, -1):
for i in xrange(len(terrains)):
if terrains + waters < height:
print ' ',
elif terrains < height <= terrains + waters:
print 'w',
else:
print '+',
print ''
Since you have to generate and print out the array anyway, I'd probably opt for a recursive approach keeping to the O(rows*columns) complexity. Note each cell can be "visited" at most twice.
On a high level: first recurse down, then left, then right, then fill the current cell.
However, this runs into a little problem: (assuming this is a problem)
*w * * *
**ww* * instead of **ww*w*
This can be fixed by updating the algorithm to go left and right first to fill cells below the current row, then to go both left and right again to fill the current row. Let's say state = v means we came from above, state = h1 means it's the first horizontal pass, state = h2 means it's the second horizontal pass.
You might be able to avoid this repeated visiting of cells by using a stack, but it's more complex.
Pseudo-code:
array[][] // populated with towers, as shown in the question
visited[][] // starts with all false
// call at the position you're inserting water (at the very top)
define fill(x, y, state):
if x or y out of bounds
or array[x][y] == '*'
or waterCount == 0
return
visited = true
// we came from above
if state == v
fill(x, y+1, v) // down
fill(x-1, y, h1) // left , 1st pass
fill(x+1, y, h1) // right, 1st pass
fill(x-1, y, h2) // left , 2nd pass
fill(x+1, y, h2) // right, 2nd pass
// this is a 1st horizontal pass
if state == h1
fill(x, y+1, v) // down
fill(x-1, y, h1) // left , 1st pass
fill(x+1, y, h1) // right, 1st pass
visited = false // need to revisit cell later
return // skip filling the current cell
// this is a 2nd horizontal pass
if state == h2
fill(x-1, y, h2) // left , 2nd pass
fill(x+1, y, h2) // right, 2nd pass
// fill current cell
if waterCount > 0
array[x][y] = 'w'
waterCount--
You have an array height with the height of the terrain in each column, so I would create a copy of this array (let's call it w for water) to indicate how high the water is in each column. Like this you also get rid of the problem not knowing how many rows to initialize when transforming into a grid and you can skip that step entirely.
The algorithm in Java code would look something like this:
public int[] getWaterHeight(int index, int drops, int[] heights) {
int[] w = Arrays.copyOf(heights);
for (; drops > 0; drops--) {
int idx = index;
// go left first
while (idx > 0 && w[idx - 1] <= w[idx])
idx--;
// go right
for (;;) {
int t = idx + 1;
while (t < w.length && w[t] == w[idx])
t++;
if (t >= w.length || w[t] >= w[idx]) {
w[idx]++;
break;
} else { // we can go down to the right side here
idx = t;
}
}
}
return w;
}
Even though there are many loops, the complexity is only O(drops * columns). If you expect huge amount of drops then it could be wise to count the number of empty spaces in regard to the highest terrain point O(columns), then if the number of drops exceeds the free spaces, the calculation of the column heights becomes trivial O(1), however setting them all still takes O(columns).
You can iterate over the 2D grid from bottom to top, create a node for every horizontal run of connected cells, and then string these nodes together into a linked list that represents the order in which the cells are filled.
After row one, you have one horizontal run, with a volume of 1:
1(1)
In row two, you find three runs, one of which is connected to node 1:
1(1)->2(1) 3(1) 4(1)
In row three, you find three runs, one of which connects runs 2 and 3; run 3 is closest to the column where the water is added, so it comes first:
3(1)->1(1)->2(1)->5(3) 6(1) 4(1)->7(1)
In row four you find two runs, one of which connects runs 6 and 7; run 6 is closest to the column where the water is added, so it comes first:
3(1)->1(1)->2(1)->5(3)->8(4) 6(1)->4(1)->7(1)->9(3)
In row five you find a run which connects runs 8 and 9; they are on opposite sides of the column where the water is added, so the run on the left goes first:
3(1)->1(1)->2(1)->5(3)->8(4)->6(1)->4(1)->7(1)->9(3)->A(8)
Run A combines all the columns, so it becomes the last node and is given infinite volume; any excess drops will simply be stacked up:
3(1)->1(1)->2(1)->5(3)->8(4)->6(1)->4(1)->7(1)->9(3)->A(infinite)
then we fill the runs in the order in which they are listed, until we run out of drops.
Thats my 20 minutes solution. Each drop is telling the client where it will stay, so the difficult task is done.(Copy-Paste in your IDE) Only the printing have to be done now, but the drops are taking their position. Take a look:
class Test2{
private static int[] heights = {3,4,4,4,3,2,1,0,4,2,1};
public static void main(String args[]){
int wAmount = 10;
int position = 2;
for(int i=0; i<wAmount; i++){
System.out.println(i+"#drop");
aDropLeft(position);
}
}
private static void aDropLeft(int position){
getHight(position);
int canFallTo = getFallPositionLeft(position);
if(canFallTo==-1){canFallTo = getFallPositionRight(position);}
if(canFallTo==-1){
stayThere(position);
return;
}
aDropLeft(canFallTo);
}
private static void stayThere(int position) {
System.out.print("Staying at: ");log(position);
heights[position]++;
}
//the position or -1 if it cant fall
private static int getFallPositionLeft(int position) {
int tempHeight = getHight(position);
int tempPosition = position;
//check left , if no, then check right
while(tempPosition>0){
if(tempHeight>getHight(tempPosition-1)){
return tempPosition-1;
}else tempPosition--;
}
return -1;
}
private static int getFallPositionRight(int position) {
int tempHeight = getHight(position);
int tempPosition = position;
while(tempPosition<heights.length-1){
if(tempHeight>getHight(tempPosition+1)){
return tempPosition+1;
}else if(tempHeight<getHight(tempPosition+1)){
return -1;
}else tempPosition++;
}
return -1;
}
private static int getHight(int position) {
return heights[position];
}
private static void log(int position) {
System.out.println("I am at position: " + position + " height: " + getHight(position));
}
}
Of course the code can be optimized, but thats my straightforward solution
l=[0,1,0,2,1,0,1,3,2,1,2,1]
def findwater(l):
w=0
for i in range(0,len(l)-1):
if i==0:
pass
else:
num = min(max(l[:i]),max(l[i:]))-l[i]
if num>0:
w+=num
return w
col_names=[1,2,3,4,5,6,7,8,9,10,11,12,13] #for visualization
bars=[4,0,2,0,1,0,4,0,5,0,3,0,1]
pd.DataFrame(dict(zip(col_names,bars)),index=range(1)).plot(kind='bar') # Plotting bars
def measure_water(l):
water=0
for i in range(len(l)-1): # iterate over bars (list)
if i==0: # case to avoid max(:i) situation in case no item on left
pass
else:
vol_at_curr_bar=min(max(l[:i]),max(l[i:]))-l[i] #select min of max heighted bar on both side and minus current height
if vol_at_curr_bar>0: # case to aviod any negative sum
water+=vol_at_curr_bar
return water
measure_water(bars)

Efficient way to generate a seemingly random permutation from a very large set without repeating?

I have a very large set (billions or more, it's expected to grow exponentially to some level), and I want to generate seemingly random elements from it without repeating. I know I can pick a random number and repeat and record the elements I have generated, but that takes more and more memory as numbers are generated, and wouldn't be practical after couple millions elements out.
I mean, I could say 1, 2, 3 up to billions and each would be constant time without remembering all the previous, or I can say 1,3,5,7,9 and on then 2,4,6,8,10, but is there a more sophisticated way to do that and eventually get a seemingly random permutation of that set?
Update
1, The set does not change size in the generation process. I meant when the user's input increases linearly, the size of the set increases exponentially.
2, In short, the set is like the set of every integer from 1 to 10 billions or more.
3, In long, it goes up to 10 billion because each element carries the information of many independent choices, for example. Imagine an RPG character that have 10 attributes, each can go from 1 to 100 (for my problem different choices can have different ranges), thus there's 10^20 possible characters, number "10873456879326587345" would correspond to a character that have "11, 88, 35...", and I would like an algorithm to generate them one by one without repeating, but makes it looks random.
Thanks for the interesting question. You can create a "pseudorandom"* (cyclic) permutation with a few bytes using modular exponentiation. Say we have n elements. Search for a prime p that's bigger than n+1. Then find a primitive root g modulo p. Basically by definition of primitive root, the action x --> (g * x) % p is a cyclic permutation of {1, ..., p-1}. And so x --> ((g * (x+1))%p) - 1 is a cyclic permutation of {0, ..., p-2}. We can get a cyclic permutation of {0, ..., n-1} by repeating the previous permutation if it gives a value bigger (or equal) n.
I implemented this idea as a Go package. https://github.com/bwesterb/powercycle
package main
import (
"fmt"
"github.com/bwesterb/powercycle"
)
func main() {
var x uint64
cycle := powercycle.New(10)
for i := 0; i < 10; i++ {
fmt.Println(x)
x = cycle.Apply(x)
}
}
This outputs something like
0
6
4
1
2
9
3
5
8
7
but that might vary off course depending on the generator chosen.
It's fast, but not super-fast: on my five year old i7 it takes less than 210ns to compute one application of a cycle on 1000000000000000 elements. More details:
BenchmarkNew10-8 1000000 1328 ns/op
BenchmarkNew1000-8 500000 2566 ns/op
BenchmarkNew1000000-8 50000 25893 ns/op
BenchmarkNew1000000000-8 200000 7589 ns/op
BenchmarkNew1000000000000-8 2000 648785 ns/op
BenchmarkApply10-8 10000000 170 ns/op
BenchmarkApply1000-8 10000000 173 ns/op
BenchmarkApply1000000-8 10000000 172 ns/op
BenchmarkApply1000000000-8 10000000 169 ns/op
BenchmarkApply1000000000000-8 10000000 201 ns/op
BenchmarkApply1000000000000000-8 10000000 204 ns/op
Why did I say "pseudorandom"? Well, we are always creating a very specific kind of cycle: namely one that uses modular exponentiation. It looks pretty pseudorandom though.
I would use a random number and swap it with an element at the beginning of the set.
Here's some pseudo code
set = [1, 2, 3, 4, 5, 6]
picked = 0
Function PickNext(set, picked)
If picked > Len(set) - 1 Then
Return Nothing
End If
// random number between picked (inclusive) and length (exclusive)
r = RandomInt(picked, Len(set))
// swap the picked element to the beginning of the set
result = set[r]
set[r] = set[picked]
set[picked] = result
// update picked
picked++
// return your next random element
Return temp
End Function
Every time you pick an element there is one swap and the only extra memory being used is the picked variable. The swap can happen if the elements are in a database or in memory.
EDIT Here's a jsfiddle of a working implementation http://jsfiddle.net/sun8rw4d/
JavaScript
var set = [];
set.picked = 0;
function pickNext(set) {
if(set.picked > set.length - 1) { return null; }
var r = set.picked + Math.floor(Math.random() * (set.length - set.picked));
var result = set[r];
set[r] = set[set.picked];
set[set.picked] = result;
set.picked++;
return result;
}
// testing
for(var i=0; i<100; i++) {
set.push(i);
}
while(pickNext(set) !== null) { }
document.body.innerHTML += set.toString();
EDIT 2 Finally, a random binary walk of the set. This can be accomplished with O(Log2(N)) stack space (memory) which for 10billion is only 33. There's no shuffling or swapping involved. Using trinary instead of binary might yield even better pseudo random results.
// on the fly set generator
var count = 0;
var maxValue = 64;
function nextElement() {
// restart the generation
if(count == maxValue) {
count = 0;
}
return count++;
}
// code to pseudo randomly select elements
var current = 0;
var stack = [0, maxValue - 1];
function randomBinaryWalk() {
if(stack.length == 0) { return null; }
var high = stack.pop();
var low = stack.pop();
var mid = ((high + low) / 2) | 0;
// pseudo randomly choose the next path
if(Math.random() > 0.5) {
if(low <= mid - 1) {
stack.push(low);
stack.push(mid - 1);
}
if(mid + 1 <= high) {
stack.push(mid + 1);
stack.push(high);
}
} else {
if(mid + 1 <= high) {
stack.push(mid + 1);
stack.push(high);
}
if(low <= mid - 1) {
stack.push(low);
stack.push(mid - 1);
}
}
// how many elements to skip
var toMid = (current < mid ? mid - current : (maxValue - current) + mid);
// skip elements
for(var i = 0; i < toMid - 1; i++) {
nextElement();
}
current = mid;
// get result
return nextElement();
}
// test
var result;
var list = [];
do {
result = randomBinaryWalk();
list.push(result);
} while(result !== null);
document.body.innerHTML += '<br/>' + list.toString();
Here's the results from a couple of runs with a small set of 64 elements. JSFiddle http://jsfiddle.net/yooLjtgu/
30,46,38,34,36,35,37,32,33,31,42,40,41,39,44,45,43,54,50,52,53,51,48,47,49,58,60,59,61,62,56,57,55,14,22,18,20,19,21,16,15,17,26,28,29,27,24,25,23,6,2,4,5,3,0,1,63,10,8,7,9,12,11,13
30,14,22,18,16,15,17,20,19,21,26,28,29,27,24,23,25,6,10,8,7,9,12,13,11,2,0,63,1,4,5,3,46,38,42,44,45,43,40,41,39,34,36,35,37,32,31,33,54,58,56,55,57,60,59,61,62,50,48,49,47,52,51,53
As I mentioned in my comment, unless you have an efficient way to skip to a specific point in your "on the fly" generation of the set this will not be very efficient.
if it is enumerable then use a pseudo-random integer generator adjusted to the period 0 .. 2^n - 1 where the upper bound is just greater than the size of your set and generate pseudo-random integers discarding those more than the size of your set. Use those integers to index items from your set.
Pre- compute yourself a series of indices (e.g. in a file), which has the properties you need and then randomly choose a start index for your enumeration and use the series in a round-robin manner.
The length of your pre-computed series should be > the maximum size of the set.
If you combine this (depending on your programming language etc.) with file mappings, your final nextIndex(INOUT state) function is (nearly) as simple as return mappedIndices[state++ % PERIOD];, if you have a fixed size of each entry (e.g. 8 bytes -> uint64_t).
Of course, the returned value could be > your current set size. Simply draw indices until you get one which is <= your sets current size.
Update (In response to question-update):
There is another option to achieve your goal if it is about creating 10Billion unique characters in your RPG: Generate a GUID and write yourself a function which computes your number from the GUID. man uuid if you are are on a unix system. Else google it. Some parts of the uuid are not random but contain meta-info, some parts are either systematic (such as your network cards MAC address) or random, depending on generator algorithm. But they are very very most likely unique. So, whenever you need a new unique number, generate a uuid and transform it to your number by means of some algorithm which basically maps the uuid bytes to your number in a non-trivial way (e.g. use hash functions).

Algorithm: Determine if a combination of min/max values fall within a given range

Imagine you have 3 buckets, but each of them has a hole in it. I'm trying to fill a bath tub. The bath tub has a minimum level of water it needs and a maximum level of water it can contain. By the time you reach the tub with the bucket it is not clear how much water will be in the bucket, but you have a range of possible values.
Is it possible to adequately fill the tub with water?
Pretty much you have 3 ranges (min,max), is there some sum of them that will fall within a 4th range?
For example:
Bucket 1 : 5-10L
Bucket 2 : 15-25L
Bucket 3 : 10-50L
Bathtub 100-150L
Is there some guaranteed combination of 1 2 and 3 that will fill the bathtub within the requisite range? Multiples of each bucket can be used.
EDIT: Now imagine there are 50 different buckets?
If the capacity of the tub is not very large ( not greater than 10^6 for an example), we can solve it using dynamic programming.
Approach:
Initialization: memo[X][Y] is an array to memorize the result. X = number of buckets, Y = maximum capacity of the tub. Initialize memo[][] with -1.
Code:
bool dp(int bucketNum, int curVolume){
if(curVolume > maxCap)return false; // pruning extra branches
if(curVolume>=minCap && curVolume<=maxCap){ // base case on success
return true;
}
int &ret = memo[bucketNum][curVolume];
if(ret != -1){ // this state has been visited earlier
return false;
}
ret = false;
for(int i = minC[bucketNum]; i < = maxC[bucketNum]; i++){
int newVolume = curVolume + i;
for(int j = bucketNum; j <= 3; j++){
ret|=dp(j,newVolume);
if(ret == true)return ret;
}
}
return ret;
}
Warning: Code not tested
Here's a naïve recursive solution in python that works just fine (although it doesn't find an optimal solution):
def match_helper(lower, upper, units, least_difference, fail = dict()):
if upper < lower + least_difference:
return None
if fail.get((lower,upper)):
return None
exact_match = [ u for u in units if u['lower'] >= lower and u['upper'] <= upper ]
if exact_match:
return [ exact_match[0] ]
for unit in units:
if unit['upper'] > upper:
continue
recursive_match = match_helper(lower - unit['lower'], upper - unit['upper'], units, least_difference)
if recursive_match:
return [unit] + recursive_match
else:
fail[(lower,upper)] = 1
return None
def match(lower, upper):
units = [
{ 'name': 'Bucket 1', 'lower': 5, 'upper': 10 },
{ 'name': 'Bucket 2', 'lower': 15, 'upper': 25 },
{ 'name': 'Bucket 3', 'lower': 10, 'upper': 50 }
]
least_difference = min([ u['upper'] - u['lower'] for u in units ])
return match_helper(
lower = lower,
upper = upper,
units = sorted(units, key = lambda u: u['upper']),
least_difference = min([ u['upper'] - u['lower'] for u in units ]),
)
result = match(100, 175)
if result:
lower = sum([ u['lower'] for u in result ])
upper = sum([ u['upper'] for u in result ])
names = [ u['name'] for u in result ]
print lower, "-", upper
print names
else:
print "No solution"
It prints "No solution" for 100-150, but for 100-175 it comes up with a solution of 5x bucket 1, 5x bucket 2.
Assuming you are saying that the "range" for each bucket is the amount of water that it may have when it reaches the tub, and all you care about is if they could possibly fill the tub...
Just take the "max" of each bucket and sum them. If that is in the range of what you consider the tub to be "filled" then it can.
Updated:
Given that buckets can be used multiple times, this seems to me like we're looking for solutions to a pair of equations.
Given buckets x, y and z we want to find a, b and c:
a*x.min + b*y.min + c*z.min >= bathtub.min
and
a*x.max + b*y.max + c*z.max <= bathtub.max
Re: http://en.wikipedia.org/wiki/Diophantine_equation
If bathtub.min and bathtub.max are both multiples of the greatest common divisor of a,b and c, then there are infinitely many solutions (i.e. we can fill the tub), otherwise there are no solutions (i.e. we can never fill the tub).
This can be solved with multiple applications of the change making problem.
Each Bucket.Min value is a currency denomination, and Bathtub.Min is the target value.
When you find a solution via a change-making algorithm, then apply one more constraint:
sum(each Bucket.Max in your solution) <= Bathtub.max
If this constraint is not met, throw out this solution and look for another. This will probably require a change to a standard change-making algorithm that allows you to try other solutions when one is found to not be suitable.
Initially, your target range is Bathtub.Range.
Each time you add an instance of a bucket to the solution, you reduce the target range for the remaining buckets.
For example, using your example buckets and tub:
Target Range = 100..150
Let's say we want to add a Bucket1 to the candidate solution. That then gives us
Target Range = 95..140
because if the rest of the buckets in the solution total < 95, then this Bucket1 might not be sufficient to fill the tub to 100, and if the rest of the buckets in the solution total > 140, then this Bucket1 might fill the tub over 150.
So, this gives you a quick way to check if a candidate solution is valid:
TargetRange = Bathtub.Range
foreach Bucket in CandidateSolution
TargetRange.Min -= Bucket.Min
TargetRange.Max -= Bucket.Max
if TargetRange.Min == 0 AND TargetRange.Max >= 0 then solution found
if TargetRange.Min < 0 or TargetRange.Max < 0 then solution is invalid
This still leaves the question - How do you come up with the set of candidate solutions?
Brute force would try all possible combinations of buckets.
Here is my solution for finding the optimal solution (least number of buckets). It compares the ratio of the maximums to the ratio of the minimums, to figure out the optimal number of buckets to fill the tub.
private static void BucketProblem()
{
Range bathTub = new Range(100, 175);
List<Range> buckets = new List<Range> {new Range(5, 10), new Range(15, 25), new Range(10, 50)};
Dictionary<Range, int> result;
bool canBeFilled = SolveBuckets(bathTub, buckets, out result);
}
private static bool BucketHelper(Range tub, List<Range> buckets, Dictionary<Range, int> results)
{
Range bucket;
int startBucket = -1;
int fills = -1;
for (int i = buckets.Count - 1; i >=0 ; i--)
{
bucket = buckets[i];
double maxRatio = (double)tub.Maximum / bucket.Maximum;
double minRatio = (double)tub.Minimum / bucket.Minimum;
if (maxRatio >= minRatio)
{
startBucket = i;
if (maxRatio - minRatio > 1)
fills = (int) minRatio + 1;
else
fills = (int) maxRatio;
break;
}
}
if (startBucket < 0)
return false;
bucket = buckets[startBucket];
tub.Maximum -= bucket.Maximum * fills;
tub.Minimum -= bucket.Minimum * fills;
results.Add(bucket, fills);
return tub.Maximum == 0 || tub.Minimum <= 0 || startBucket == 0 || BucketHelper(tub, buckets.GetRange(0, startBucket), results);
}
public static bool SolveBuckets(Range tub, List<Range> buckets, out Dictionary<Range, int> results)
{
results = new Dictionary<Range, int>();
buckets = buckets.OrderBy(b => b.Minimum).ToList();
return BucketHelper(new Range(tub.Minimum, tub.Maximum), buckets, results);
}

Resources