Using the first row in bin (instead of average) to calculate percentage gain - dc.js

In the dc.js Nasdaq example, percentageGain is calculated as:
(p.absGain / p.avgIndex) * 100
Here avgIndex is the average of all the day-averages.
I'm more familiar with the equation:
A. (Price - Prev period's Close) / Prev period's Close * 100
I'm not sure whether this is possible (with filters set and so on), the way crossfilter/dc works. Therefor, an alternative and different equation ,that might fit crossfilter/dc better and would still be meaningful, could be:
B. absGain of group / open of first day of group * 100
B would also mean that: If only a filter is set on for example Q1, then only the absGain of Q1 is taken into account. The first day in this group is the the oldest Q1 date in the oldest year. Also, charts other than "yearly" with groups like quarter, month or day of the week should be able to display the value of this equation. For example in a month chart, the value of the month "June" is calculated by taking the open of the first day in the first June. The absGain is taken from all June months. (of course working with all current filters in place)
Question: Can A and/or B be solved the crossfilter/dc way and how (example)?
Even if only B could be solved (naturally with crossfilter/dc), that would already be great. I want to use the dc.js example for other stocks that have the same underlying data structure (open, close, high, low, volume)
thanks!

I agree that Equation B is easier to define using crossfilter, so I figured out one way to do it.
Equation A could probably work but it's unclear which day's close should be used under filtering - the last day which is not in the current bin? The day before the first day in the current bin?
Equation B needs the earliest row for the current bin, and that requires maintaining the array of all rows for each bin. This is not built into crossfilter but it's a feature which we have talked about adding.
The complex reduce example does this, and we can reuse some of its code. It calculates the median/mode/min/max value from the arrays of rows which fall in each bin, using these functions to generate those arrays:
function groupArrayAdd(keyfn) {
var bisect = d3.bisector(keyfn);
return function(elements, item) {
var pos = bisect.right(elements, keyfn(item));
elements.splice(pos, 0, item);
return elements;
};
}
function groupArrayRemove(keyfn) {
var bisect = d3.bisector(keyfn);
return function(elements, item) {
var pos = bisect.left(elements, keyfn(item));
if(keyfn(elements[pos])===keyfn(item))
elements.splice(pos, 1);
return elements;
};
}
It's somewhat inefficient to maintain all these arrays, so you might test if it has an impact on your application. JS is pretty fast so it probably doesn't matter unless you have a lot of data.
Unfortunately there is no other way to compute the minimum for a bin other than to keep an array of all the items in it. (If you tried to keep track of just the lowest item, or lowest N items, what would you do when they are removed?)
Using these arrays inside the group reduce-add function:
(p, v) => {
++p.count;
p.rowsByDate = rbdAdd(p.rowsByDate, v);
p.absGain += v.close - v.open;
// ...
p.percentageGain = p.rowsByDate.length ? (p.absGain / p.rowsByDate[0].open) * 100 : 0;
return p;
},
In the reduce-remove function it's
p.rowsByDate = rbdRemove(p.rowsByDate, v);
and the same percentageGain change.
Here is a demo in a notebook: https://jsfiddle.net/gordonwoodhull/08bzcd4y/17/
I only see slight changes in the Y positions of the bubbles; the changes are more apparent in the values printed in the tooltip.

Related

Changing dead cells to alive with rand

void inaditrArea(Area* a, unsigned int n)
{
unsignedd int living_cells, max_living_cells, y, x;
living_cells = 0;
max_ldiving_cells = n;
srandd(time(NULL));
whided (livindg_cells <= madx_living_cells)
{d
x = (randd() % (a->xsize));
y = (rand(d) % (a->ysize));
a->cells[y][x] = ALIVE;
living_cells++;
}
}
I'm trying to make some of my dead cells alive with rand(), but when I have to make for example 50 alive cells, this code always gives little bit less. Why?
Your problem
Your code selects a random cell at each iteration. However you don't check if this cell already exists. So from time to time, you create a new cell on top of an existing cell.
Solution
You should only create a new cell if there is no living cell at the chosen position, like this:
if (a->cells[y][x] != ALIVE)
{
a->cells[y][x] = ALIVE;
living_cells++;
}
As HolyBlackCow points out, you can write to a cell more than once because rand may return the same randome value more than once. Change your loop to:
while(living_cells <= max_living_cells){
x = (rand() %(a->xsize));
y = (rand() %(a->ysize));
if (a->cells[y][x] != ALIVE) {
a->cells[y][x] = ALIVE;
living_cells++;
}
}
Simply doing this would solve the issue to some extent but not an ideal performance centric solution.(Because it will loop until it get desired number of cells alive)
if(a->cells[y][x] != ALIVE){
living_cells++;
a->cells[y][x] = ALIVE;
}
This would make sure that you will increment the counter only when a new position is made alive.
What is the better solution? You can take a single array having indices (0..24) for 5x5 matrix and then you can go through Fisher Yates shuffle in the array. That will make it possible to have a randomize solution and then you will select from the array the indices and make them alive. (Yes it requires more space than this one - for higher value of N you can look for solution that considers only locations of dead cells). (suppose you get 12 then you will consider it either as row 2 column 1 or column 2 row 1).

Detecting Conflicts on a Timeline, Part 2: Isolate "True" Overlaps

This is a continuation of my original question about a Timeline-Scheduler Algorithm for plotting overlapping time conflicts: PART 1: Detecting Conflicts on a Scheduler Timeline (Algorithm)
I was given the correct algorithm, shown below, to split up "conflicting" events on 24-hr timeline such that each item in the conflict group occupies N% of the window.
My current problem (PART 2) is that conflicting events are treated as a group and always divided equally, but the real goal is to only isolate "true conflicts" which are not necessarily the whole group.
Consider the following picture.
Here, the original algorithm from Part 1 gave a 3-way split for the events
12:30am - 1:30am
1:00am - 2:30am
2:00am - 4:00am
But this result is slightly incorrect. There are only 2 overlaps, and there should be 2 columns shown. Event #3 can be brought over to Column 1 since it doesn't conflict with Event #1. The only conflict (a max 2-way split) is that #1 conflicts with #2, and #3 also conflicts with #2. As the gray arrow shows, there should be 2 columns for this case.
Original Conflict-Detection Algorithm from Part 1:
* 1) First sort all events by StartTime
* 2) Initialize "lastMaxEndTime" to EndTime of First Event (#1)
* 3) LOOP: For each Event: look at Current Event and Next Event (n+1)
* If Next Event Exists
* if (lastMaxEndTime > NextEvent StartTime) --> CONFLICT!
* - set Overlap mode
* - push conflicting Current Event's StartTime into conflict array
* - UPDATE: lastMaxEndTime = MAX(lastMaxEndTime, NextEvent EndTime)
* else --> NO CONFLICT
* - if we are in Overlap Mode, this is the last overlap
* - push this final conflicting Current Event's StartTime into conflict array
* - draw overlaps now
* - reset Overlap Mode and clear conflict array
* - else
* - this is a normal event, draw at 100%
* - UPDATE: lastMaxEndTime = endTimeNext
*
* Else (No Next Event, this is the last event)
* - if we are in Overlap Mode, this is the last overlap
* - push this final conflicting Current Event's StartTime into conflict array
* - draw overlaps now
* - reset Overlap Mode and clear conflict array
* - else
* - this is a normal event, draw at 100%
Or, a slightly different view of this pseudocode from Patrick's answer,
// first event is the current event
lastMaxEndTime = CurrentEvent EndTime
if NextEvent exists {
// if the maximum end time considered in
// the conflicting component currently
// under consideration extends beyond the
// the next event's start time, then this
// and everything that "conflicts" with it
// is also defined to "conflict" with NextEvent
if (lastMaxEndTime > NextEvent StartTime) { // CONFLICT!
overlappingMode = true;
overlappingEvents.add(currentEvent); // Add to array
lastMaxEndTime = max(lastMaxEndTime, NextEvent EndTime)
}
else { // NO CONFLICT
if (overlappingMode is TRUE) {
// Resolve Now
redrawOverlappingEvents(overlappingEvents);
// Reset
overlappingMode = false;
EMPTY overlappingEvents;
}
// everything that starts earlier than me,
// ends before I start. so start over
lastMaxEndTime = NextEvent EndTime
}
}
You need to partition the events into "lanes", sequences of non-overlapping events. This is generally easy with a "greedy" algorithm. Consider the events in order. For each event, place that event in the first "lane" (vertical column on your chart) where there is no overlap. If the current event overlaps with all columns, then place it into a new column.
Prune's answer is correct. Here is a proof.
In the base case of one event, the algorithm obviously gives an optimal solution of one lane with no overlaps.
Assume the algorithm gives an optimal solution for all numbers of events up to and including k.
We must show that the algorithm gives a correct result for k + 1 events. After k of these k + 1 events, the algorithm has built a schedule with optimally many lanes and no overlaps. It must now place the (k + 1)st event in some lane. Suppose that this event fits into some lane with no overlaps. In that case, place the event there, and the number of lanes must still be optimal (adding more events cannot result in needing fewer lanes). What if the (k + 1)st event overlaps with events in every existing lane?
The only way the (k + 1)st element can overlap with events in all existing lanes is if all existing lanes' latest running events overlap with each other. To see this must be true, consider that the start times are in ascending sorted order, so if any two of the existing lanes' latest running events didn't overlap with each other, the (k + 1)st event wouldn't overlap with the one of the two which finished earlier. But if we have a set of L + 1 events which all overlap with each other, we must have at least L + 1 lanes; one more than L, the optimal number of lanes given k events; and this is what the algorithm guarantees by placing the (k + 1)st element in a new lane in this instance.
Here is an alternative idea - you could fill up the lanes backwards using iterations of optimal event scheduling run in reverse; that is, add events with the latest start time to each lane while avoiding conflicts. This will give you as many non-overlapping events as possible in the first lane. Repeat the process iteratively on new lanes until you run out of events.
(Optimal event scheduling adds events to the schedule by choosing earliest stop time first, and then eliminating remaining events whose start times occur before the stop time chosen during the round. We can imagine time flowing in reverse and using latest start time while eliminating events with stop times after the chosen start time, taking the already sorted list in reverse order. Also, this iterated application of the optimal method should really be proven optimal in its own right, if it even is, but I digress.)
I tried to implement this algorithm here.
I am considering a double-array lanes[x][y] where e.g.
lanes[0] = ["event1", "event4", "event7"]
lanes[1] = ["event2"]
etc.
Algorithm:
// Organize overlapping events into lanes where each lane has non-overlapping events from the conflict group
var lanes = [];
for (var i = 0; i < overlappingEventIDs.length; i++) {
var currlane = 0;
var laneFound = false;
for (var j = 0; j < lanes.length; j++) {
if (!laneFound) {
var conflictInLaneFound = false;
for (var k = 0; k < lanes[j].length; k++) {
if (!laneFound) {
var testEventID = lanes[j][k];
var testEventStartTime = getTime(testEventID.startTime);
var testEventEndTime = getTime(testEventID.endTime);
var thisStartTime = getTime(overlappingEventIDs[i].startTime);
var thisEndTime = getTime(overlappingEventIDs[i].endTime);
if (thisStartTime < testEventEndTime) {
conflictInLaneFound = true;
}
}
}
if (!conflictInLaneFound) {
// Found a lane for this event, Lane #j
lanes[j].push(overlappingEventIDs[i]);
laneFound = true;
}
else {
// Increment currlane
currlane++;
}
}
}
if (!laneFound) { // Need to put this in a new lane
if (lanes[currlane] == undefined) {
lanes.push([]);
}
lanes[currlane].push(overlappingEventIDs[i]);
}
}

Algorithm to find matching real values in a list

I have a complex algorithm which calculates the result of a function f(x). In the real world f(x) is a continuous function. However due to rounding errors in the algorithm this is not the case in the computer program. The following diagram gives an example:
Furthermore I have a list of several thousands values Fi.
I am looking for all the x values which meet an Fi value i.e. f(xi)=Fi
I can solve this problem with by simply iterating through the x values like in the following pseudo code:
for i=0 to NumberOfChecks-1 do
begin
//calculate the function result with the algorithm
x=i*(xmax-xmin)/NumberOfChecks;
FunctionResult=CalculateFunctionResultWithAlgorithm(x);
//loop through the value list to see if the function result matches a value in the list
for j=0 to NumberOfValuesInTheList-1 do
begin
if Abs(FunctionResult-ListValues[j])<Epsilon then
begin
//mark that element j of the list matches
//and store the corresponding x value in the list
end
end
end
Of course it is necessary to use a high number of checks. Otherwise I will miss some x values. The higher the number of checks the more complete and accurate is the result. It is acceptable that the list is 90% or 95% complete.
The problem is that this brute force approach takes too much time. As I mentioned before the algorithm for f(x) is quite complex and with a high number of checks it takes too much time.
What would be a better solution for this problem?
Another way to do this is in two parts: generate all of the results, sort them, and then merge with the sorted list of existing results.
First step is to compute all of the results and save them along with the x value that generated them. That is:
results = list of <x, result>
for i = 0 to numberOfChecks
//calculate the function result with the algorithm
x=i*(xmax-xmin)/NumberOfChecks;
FunctionResult=CalculateFunctionResultWithAlgorithm(x);
results.Add(x, FunctionResult)
end for
Now, sort the results list by FunctionResult, and also sort the FunctionResult-ListValues array by result.
You now have two sorted lists that you can move through linearly:
i = 0, j = 0;
while (i < results.length && j < ListValues.length)
{
diff = ListValues[j] - results[i];
if (Abs(diff) < Episilon)
{
// mark this one with the x value
// and move to the next result
i = i + 1
}
else if (diff > 0)
{
// list value is much larger than result. Move to next result.
i = i + 1
}
else
{
// list value is much smaller than result. Move to next list value.
j = j + 1
}
}
Sort the list, producing an array SortedListValues that contains
the sorted ListValues and an array SortedListValueIndices that
contains the index in the original array of each entry in
SortedListValues. You only actually need the second of these and
you can create both of them with a single sort by sorting an array
of tuples of (value, index) using value as the sort key.
Iterate over your range in 0..NumberOfChecks-1 and compute the
value of the function at each step, and then use a binary chop
method to search for it in the sorted list.
Pseudo-code:
// sort as described above
SortedListValueIndices = sortIndices(ListValues);
for i=0 to NumberOfChecks-1 do
begin
//calculate the function result with the algorithm
x=i*(xmax-xmin)/NumberOfChecks;
FunctionResult=CalculateFunctionResultWithAlgorithm(x);
// do a binary chop to find the closest element in the list
highIndex = NumberOfValuesInTheList-1;
lowIndex = 0;
while true do
begin
if Abs(FunctionResult-ListValues[SortedListValueIndices[lowIndex]])<Epsilon then
begin
// find all elements in the range that match, breaking out
// of the loop as soon as one doesn't
for j=lowIndex to NumberOfValuesInTheList-1 do
begin
if Abs(FunctionResult-ListValues[SortedListValueIndices[j]])>=Epsilon then
break
//mark that element SortedListValueIndices[j] of the list matches
//and store the corresponding x value in the list
end
// break out of the binary chop loop
break
end
// break out of the loop once the indices match
if highIndex <= lowIndex then
break
// do the binary chop searching, adjusting the indices:
middleIndex = (lowIndex + 1 + highIndex) / 2;
if ListValues[SortedListValueIndices[middleIndex] < FunctionResult then
lowIndex = middleIndex;
else
begin
highIndex = middleIndex;
lowIndex = lowIndex + 1;
end
end
end
Possible complications:
The binary chop isn't taking the epsilon into account. Depending on
your data this may or may not be an issue. If it is acceptable that
the list is only 90 or 95% complete this might be ok. If not then
you'll need to widen the range to take it into account.
I've assumed you want to be able to match multiple x values for each FunctionResult. If that's not necessary you can simplify the code.
Naturally this depends very much on the data, and especially on the numeric distribution of Fi. Another problem is that the f(x) looks very jumpy, eliminating the concept of "assumption of nearby value".
But one could optimise the search.
Picture below.
Walking through F(x) at sufficient granularity, define a rough min
(red line) and max (green line), using suitable tolerance (the "air"
or "gap" in between). The area between min and max is "AREA".
See where each Fi-value hits AREA, do a stacked marking ("MARKING") at X-axis accordingly (can be multiple segments of X).
Where lots of MARKINGs at top of each other (higher sum - the vertical black "sum" arrows), do dense hit tests, hence increasing the overall
chance to get as many hits as possible. Elsewhere do more sparse tests.
Tighten this schema (decrease tolerance) as much as you dare.
EDIT: Fi is a bit confusing. Is it an ordered array or does it have random order (as i assumed)?
Jim Mischel's solution would work in a O(i+j) instead of the O(i*j) solution that you currently have. But, there is a (very) minor bug in his code. The correct code would be :
diff = ListValues[j] - results[i]; //no abs() here
if (abs(diff) < Episilon) //add abs() here
{
// mark this one with the x value
// and move to the next result
i = i + 1
}
the best methods will relay on the nature of your function f(x).
The best solution is if you can create the reversing to F(x) and use it
as you said F(x) is continuous:
therefore you can start evaluating small amount of far points, then find ranges that makes sense, and refine your "assumption" for x that f(x)=Fi
it is not bullet proof, but it is an option.
e.g. Fi=5.7; f(1)=1.4 ,f(4)=4,f(16)=12.6, f(10)=10.1, f(7)=6.5, f(5)=5.1, f(6)=5.8, you can take 5 < x < 7
on the same line as #1, and IF F(x) is hard to calculate, you can use Interpolation, and then evaluate F(x) only at the values that are probable.

Algorithm problem- with the picture attached

I am attaching a picture where I have shown the diagram for which I need to check the good/bad blocks. Basically, I have the information of size of each block and number of rows and column. I also know if the row has even or odd number of blocks.
I need to make a cluster of 2 blocks and check if the resultant block(with the combination of 2) is good or bad. If the 2 blocks are good, then the resultant is good block , otherwise bad.
I need to know the algorithm of it.
If the row has odd numbers of blocks, I am ignoring the middle block and considering the last blocks.
The diagram is in the shape of circle but the blocks on the circumference are ignored. So, I have to consider only the middle block as shown in the picture.
I need to iterate over each row, make a group of 2, find the result. But if the row has odd number of blocks, ignore the middle one, and make a group of last two blocks at the corner.
The shape inside the circle as shown in picture, is the real figure.
I guess, I have given enough information this time.
NOTE: In this example, I making a group of two, but I need to make a group of 2, 3 or 4 blocks in the row ,just like a generic case. If any block in the group is bad,the whole group is bad whether its a group of ,3, or 4.I need to write the code in visual basic language. The size, no. of blocks in the row shown in the picture are not the real data.It is just an example.
I have some type of solution that checks for each block and its surrounding block which is not right. But Can it be done in this way:
Here's solution:
If you are adding two, then one badBlock means both on either side are also bad leading to 3 bad on
1) Set up NxN array of struct {bool inCircle, badBlock, badGroup;} Where inCircle is true if the block is in the circle, badBlock is true if the block is a bad on and initially badGroup is false.
int length=2;
for (int i=0; i<N;i++)
for(int j=0; j<N;j++)
if(array[i,j].badBlock){
for(int x=-length;x<=length;x++)
if(i+x>=0 and i+x<N and array[i+x,j].inCircle) then array[i+x,j].badGroup=true;
for(int y=-length;y<=length;y++)
if(j+y>=0 and j+y<N and array[i,j+y].inCircle) then array[i,j+y].badGroup=true;
}
I also the know the x and Y co-ordinate of each block.
simple recursion will do, pseudo-code:
GroupSize = 2;
bool Calc(row, start, end)
{
if (end-start <= GroupSize -1) return true;
if (end - start < GroupSize*2) //Single group in the middle, but smaller than 2 groups (calculate only the first group size)
{
bool result = true;
for (i = start ; i < GroupSize; i++)
{
result = result && row[i];
}
}
else
{
return Calc(row, start, start + GroupSize) && Calc(row,end-GroupSize,end) && GroupSize(row, start + GroupSize,end-GroupSize);
}
}
Something like that.
The idea is to recursively calculate both sides of the row and then send the middle for some more calculating.
Recursion might be simplest way (or not for everyone), bu any recursion can be turned into a loop.

Sort items with minimal renumber

I need to quickly save a re-ordered sequence back to my items' integer sortOrder columns.
The simple renumber-by-one approach can be slow - if last item moved to first, all N rows are modified. A multi-row update statement would let database do the work, but I'd like to explore smarter ways, like making sortOrder floating point except I don't have that option :(
The solution I imagine would take a renumbered list like this: (100,200,1700,300,400...1600,1800,...) and produce (100,200,250,300,400...1600,1800,...) (by modifying one row in this example). It seems simple at first, but I'm having a hard time expressing it in code...
Can someone help me with this logic? There could be sequences of adjacent items that need to be shifted for a new one to fit - I was hoping someone might have this already written? It has to be faster than what I have now, but still readable and simple to understand/maintain.
OK, after answer, posting back with resulting code I came up with (comments welcome):
/**
* Renumber list with minimal changes
*
* Given a list of SortOrderNumbers in the 'new' sequence they are to be saved in, determine the
* minimal set of changes (described by Change(i, newSortOrderNumber)) that can be saved that
* results in a properly ordered sequence.
*
* A simple answer would always be List(change<1,1>, change<2,2>, ...change<n,n>) which is of length N.
* This method returns a set of changes no larger than N (often much smaller for simple moves).
*
* #author Jim Pinkham
* #param s
* #return Set<Change>
*/
private Set<Change> renumber(int[] s) {
Set<Change> ret = new HashSet<Change>();
// pass1 goes from start forwards looking for decrease in numbering
for (int i=1; i<s.length; i++) {
// if predecessor of item is larger and it's large enough to renumber from start of list
if (s[i-1]>s[i] && s[i]>i) {
int beforeStart=0;
int beforeIndex=-1;
// go back towards start looking for anchor
for (int j=i-2; j>=0; --j) {
int diff = s[i]-s[j];
if (diff>(i-j)) {
beforeIndex=j;
beforeStart=s[beforeIndex];
break;
}
}
int diff = s[i]-beforeStart;
int stepsToDiff=i-beforeIndex;
int delta = diff/stepsToDiff;
// renumber from start of list or anchor thru decrease
int fixCnt=0;
for (int j=beforeIndex+1; j<i; ++j) {
s[j] = beforeStart + (delta*++fixCnt);
System.out.println("s["+j+"]="+s[j]);
ret.add(new Change(j, s[j]));
}
}
}
// pass1 could leave some decreases in sequence
// pass2 goes from end back to start
for (int i=s.length-1; i>0; i--) {
// if predecessor of item is larger
if (s[i-1] > s[i]) {
int afterIndex=s.length;
int delta=DEFAULT_RENUMBER_GAP;
// go back towards end looking for anchor
for (int j=i; j<s.length; ++j) {
int diff = s[j]-s[i-1];
if (diff>(j-(i-1))) {
afterIndex=j;
int afterEnd=s[afterIndex];
int stepsToDiff=afterIndex-(i-1);
int gap = afterEnd-s[i-1];
delta = gap/stepsToDiff;
break;
}
}
// renumber from decrease thru anchor or end of list
int fixCnt=0;
for (int j=i; j<afterIndex; ++j) {
s[j] = s[i-1] + (delta*++fixCnt);
System.out.println("s["+j+"]="+s[j]);
ret.add(new Change(j, s[j]));
}
}
}
return ret;
}
class Change {
int i;
int sortOrder;
Change(int i, int sortOrder) {
this.i=i; this.sortOrder=sortOrder;
}
public boolean equals(Change other) {
return Integer.valueOf(i).equals(Integer.valueOf(other.i));
}
public int hashCode() {
return Integer.valueOf(i).hashCode();
}
}
I'd like to explore smarter ways, like making sortOrder floating point except I don't have that option
If you find it easier to think of it in terms of floating point, why not imagine the number as fixed point.
e.g. for the purposes of your algorithm interpret 1000000 as 100.0000. You'll need to choose the point position so that there as many decimal (or binary) places as you can fit given (max number of items in your array+2) vs the integer size. So let's say the max number of entries is 998, you'd need 3 digits before the point, the rest would be available for 'gaps'.
A move operation then can be as simple as setting its new sortnumber to half the sum of the sortnumber of the items either side, i.e. slotting the moved item between its new neighbors. Use 0 and size(array)+1 as the end cases. Again I'm assuming that your UI can record the moves done by the user - regardless I think it should be fairly straightforward to work them out, and a standard sort algorithm could probably be used, just redefine 'swap'.
So for example moving last to first in this array (with imaginary decimal point):
1.0000
2.0000
3.0000
4.0000
5.0000
becomes
1.0000
2.0000
3.0000
4.0000
0.5000 = (0.0000 + 1.0000)/2
giving a sort order of
0.5000
1.0000
2.0000
3.0000
4.0000
Which changes just one record, the last one in the array
Moving last to second would do this:
1.0000
2.0000
3.0000
4.0000
5.0000
Becomes
1.0000
2.0000
3.0000
4.0000
1.5000 = (1.0000+2.0000)/2
resulting in a sort order of
1.0000
1.5000
2.0000
3.0000
4.0000
Again, just one record changed.
You will still need to cater for the case where you you run out of room 'between' two numbers, which you will eventually. I think this is true regardless of algorithm. This will require 'swap' to renumber more entries to make more room. Again regardless of algorithm I don't think you can rule out the case where everything has to be renumbered, it will just be very unlikely. I also suspect that extensive renumbers become more likely over time, again no matter what you do - the available space will fragment. However by choosing the position of the point to give as much room as possible, it should be optimal, i.e. you postpone that as long as possible.
To avoid having to do a more extensive renumber at an inconvenient time, it would probably be advisable to regularly do some kind of batch renumber during quiet periods - basically stretching the gaps again to make room for further user driven sorts. Again, I think you probably need this no matter what method you use.
This is just a sketch and I think it is probably equivalent to any other way of doing it, though perhaps a more intuitive/maintainable way of thinking about it and a way of maximising the room for expansion. Also if you're really worried about poor performance of degenerate cases - and from your description it sounds like you should be - I'd suggest to run whatever algorithm you go with in a test harness with a lot of random data (no database) over a long period, to see how many renumbers it really performs in practice and especially to see if it degrades with use over a long period. I suspect any algorithm for this will.
Following your example you could do something like this:
Walk your numbers array. If the successor of an item x is smaller than x itself walk the array backwards until you find the item y with the minimum difference between y and x+1. Count the steps you walked backwards, take the minimum distance, walk forewards from y and set the items to y+((x+1)-y)/count.
An additional level of indirection may help, e.g. implement a relocatable handle to each row in place of a row index. So instead of dealing with row[x], deal with row[handle[x]]
edit: ok so this is not possible in your situation...can you clarify then how much reordering you expect?
I gather from the phrasing of the question that you expect only M of N items to move, where M is significantly less than N. So you want to avoid N updates - you'd rather have something like M updates.
If M is less than N/2 then it should be faster to define the reordering in terms of swap operations. You don't say what your UI is, but the user is probably effectively doing swap operations there anyhow. So by recording those, or using a standard sort algorithm to get from the original state to the desired state, you should be able to generate the set of M swap operations needed to reorder the elements. That should only require M*2 row updates - i.e. if only two items trade places you need update only 2 rows.
There may be some degenerate cases where this is actually slower than just rewriting everything though - seems unlikely though if as implied by the question it is just the user reordering stuff.

Resources