Algorithm to bucket set of numbers based on buckets sum - algorithm

I have a set of items that I would like to bucket into N different buckets. Each item has a property associated with it (size) and I would like the sum of this property in each bucket to be roughly equal. What is the best way to determine this? Note the range of the size on the items is fairly large, in the data set I'm using the smallest size is 1 and the largest is 325,220.
Example:
Item A - size 5
Item B - size 10
Item C - size 8
Item D - size 16
Item E - size 7
If I wanted to group these into 3 buckets I would want
Bucket 1: A, B
Bucket 2: C, E
Bucket 3: D

I ended up implementing the complete greedy algorithm described in the paper linked by Joe Farrel. The full C# code I used is below:
public class Item
{
public int Id { get; }
public int Size { get; }
public Item(int id, int size)
{
Id = id;
size = size;
}
}
public class Partition
{
public int Index { get; }
public ImmutableList<Item> Items { get; } = ImmutableList<Item>.Empty;
public int Sum { get; }
public Partition(int index)
{
Index = index;
}
private Partition(int index, ImmutableList<Item> items, int sum)
{
Index = index;
Item = items;
Sum = sum;
}
public Partition Add(Item item) => new Partition(Index, Items.Add(item), Sum + item.Size);
public static double AverageDifference(ImmutableList<Partition> partitions)
{
var differences = new List<int>();
for (var i = 0; i < partitions.Count; i++)
{
var partition = partitions[i];
var otherPartitions = partitions.RemoveAt(i);
foreach (var otherPartition in otherPartitions)
{
differences.Add(Math.Abs(partition.Sum - otherPartition.Sum));
}
}
return differences.Average();
}
}
public class Node
{
public Item Item { get; set; }
public int Partition { get; set; }
public Node[] Children { get; set; }
}
private (Node tree, int totalSum) InitTree(IEnumerable<Item> items)
{
var root = new Node();
var totalSum = 0;
Node[] previousLevel = {root};
foreach (var item in items.OrderByDescending(i => i.Size))
{
totalSum += item.Size;
var currentLevel = new Node[_numPartitions];
for (var i = 0; i < _numPartitions; i++)
{
currentLevel[i] = new Node
{
Item = item,
Partition = i
};
}
foreach (var node in previousLevel)
{
node.Children = currentLevel;
}
previousLevel = currentLevel;
}
return (root, totalSum);
}
private ImmutableList<Partition> GetPartitions(Node tree, int totalSum)
{
var partitions = ImmutableList<Partition>.Empty;
for (var i = 0; i < _numPartitions; i++)
{
partitions = partitions.Add(new Partition(i));
}
return TraverseTree(tree, partitions, totalSum, double.MaxValue, ImmutableList<Partition>.Empty);
}
private ImmutableList<Partition> TraverseTree(Node node, ImmutableList<Partition> partitions, int totalSum, double bestDifference, ImmutableList<Partition> bestPartitions)
{
var currentPartitions = partitions;
if (node.Item != null) // skip root
{
// place item into its partition
var updatedPartition = currentPartitions[node.Partition].Add(node.Item);
currentPartitions = currentPartitions.SetItem(node.Partition, updatedPartition);
}
// if this is a leaf, partition is complete
if (node.Children == null)
{
return currentPartitions;
}
// terminate path if partition is sufficiently bad
var largestSum = currentPartitions.Max(p => p.Sum);
if (largestSum - (totalSum - largestSum) / (_numPartitions - 1) >= bestDifference)
{
return null;
}
// contintue to traverse tree in ascending partition size order
foreach (var partition in currentPartitions.OrderBy(p => p.Sum))
{
var nextNode = node.Children[partition.Index];
var nextPartitions = TraverseTree(nextNode, currentPartitions, totalSum, bestDifference, bestPartitions);
if (nextPartitions == null) // path was terminated
{
continue;
}
// if we hit a perfect parition set, return it
var nextDifference = Partition.AverageDifference(nextPartitions);
if (nextDifference <= 1)
{
return nextPartitions;
}
// hold on to the best partition
if (nextDifference < bestDifference)
{
bestDifference = nextDifference;
bestPartitions = nextPartitions;
}
}
return bestPartitions;
}
_numPartitions = 4
var items = GetItems()
var (tree, totalSum) = InitTree(items);
var partitions = GetPartitions(tree, totalSum);

My answer to this question on laying out pictures might be able to be adapted.
Pictures with height become items with size. Columns per page becomes buckets.
The algorithm has three parts: first-fit; greedy-swapping; and reverse sorting, some may be of more use than others with your data.

Related

keeping track of previous elements in foreach loop

Lets say I have a list of asteroid objects like so:
9_Amphitrite
24_Themis
259_Aletheia
31_Euphrosyne
511_Davida
87_Sylvia
9_Metis
41_Daphne
Each asteroid has a title, a StartRoationPeriod, and a EndRoationPeriod.
I need to concatenate their names based on how close the current asteroid StartRoationPeriod and previous asteroid EndRoationPeriod are to an orbital constant and then spit out the concatenated title.
So with the above list, the final objects may look like this:
9_Amphitrite
24_Themis;259_Aletheia
31_Euphrosyne;511_Davida;87_Sylvia
9_Metis
41_Daphne
This requires me to keep track of both the current and previous asteroids.
I started to write the loop, but I'm unsure of where or even how to check the current asteroids start rotation period against the previous asteroids end rotation period...basically, it just gets messy fast...
string asteroid_title = string.Empty;
Asteroid prev_asteroid = null;
foreach (var asteroid in SolarSystem)
{
if (prev_asteroid != null)
{
if (asteroid.StartRoationPeriod + OrbitalConstant >= prev_asteroid.EndRoationPeriod)
{
asteroid_title = asteroid_title + asteroid.Title;
} else {
asteroid_title = asteroid.Title;
yield return CreateTitle();
}
}
prev_evt = evt;
}
I think this should work for you (If aggregate looks too complex try to convert it to a foreach,it's easy)
using System;
using System.Collections.Generic;
using System.Linq;
namespace Program
{
class Asteroid
{
public int EndRoationPeriod { get; internal set; }
public string Name { get; internal set; }
public int StartRoationPeriod { get; internal set; }
}
class AsteroidGroup
{
public int EndRoationPeriod { get; internal set; }
public string Names { get; internal set; }
}
internal class Program
{
private static void Main(string[] args)
{
int OrbitalConstant = 10;
List<Asteroid> SolarSystem = new List<Asteroid>()
{
new Asteroid() { Name= "9_Amphitrite" ,StartRoationPeriod=10 ,EndRoationPeriod=50},
new Asteroid() { Name= "24_Themis" ,StartRoationPeriod=45,EndRoationPeriod=100},
new Asteroid() { Name= "259_Aletheia",StartRoationPeriod=40 ,EndRoationPeriod=150},
new Asteroid() { Name= "31_Euphrosyne" ,StartRoationPeriod=60,EndRoationPeriod=200},
new Asteroid() { Name= "511_Davida" ,StartRoationPeriod=195,EndRoationPeriod=250},
new Asteroid() { Name= "87_Sylvia" ,StartRoationPeriod=90,EndRoationPeriod=300},
new Asteroid() { Name= "9_Metis" ,StartRoationPeriod=100,EndRoationPeriod=350},
new Asteroid() { Name= "41_Daphne" ,StartRoationPeriod=110,EndRoationPeriod=400},
};
var result = //I skip the first element because I initialize a new list with that element in the next step
SolarSystem.Skip(1)
//The first argument of Aggregate is a new List with your first element
.Aggregate(new List<AsteroidGroup>() { new AsteroidGroup { Names = SolarSystem[0].Name, EndRoationPeriod = SolarSystem[0].EndRoationPeriod } },
//foreach item in your list this method is called,l=your list and a=the current element
//the method must return a list
(l, a) =>
{
//Now this is your algorithm
//Should be easy to undrestand
var last = l.LastOrDefault();
if (a.StartRoationPeriod + OrbitalConstant >= last.EndRoationPeriod)
{
last.Names += " " + a.Name;
last.EndRoationPeriod = a.EndRoationPeriod;
}
else
l.Add(new AsteroidGroup { Names = a.Name, EndRoationPeriod = a.EndRoationPeriod });
//Return the updated list so it can be used in the next iteration
return l;
});
A more compact solution
var result = SolarSystem
.Skip(1)
.Aggregate( SolarSystem.Take(1).ToList(),
(l, a) => (a.StartRoationPeriod + OrbitalConstant >= l[l.Count - 1].EndRoationPeriod) ?
(l.Take(l.Count - 1)).Concat(new List<Asteroid> { new Asteroid() { Name = l[l.Count - 1].Name += " " + a.Name, EndRoationPeriod = a.EndRoationPeriod } }).ToList() :
l.Concat(new List<Asteroid> { a }).ToList()
);

ascending descending table column with combobox in swt?

I am create eclipse RCP application and use SWT table and i am trying to ascending descending column value but not working in column shell value is not change so please help me.
my column value (cell value) is combo box , how to sort column combo value in my table.
public static void main(String[] args) {
int size = 5;
Random random = new Random();
final int[][] data = new int[size][];
for (int i = 0; i < data.length; i++) {
data[i] = new int[] { i, random.nextInt() };
}
// create a virtual table to display data
Display display = new Display();
Shell shell = new Shell(display);
shell.setLayout(new FillLayout());
final Table table = new Table(shell, SWT.VIRTUAL);
table.setHeaderVisible(true);
table.setLinesVisible(true);
table.setItemCount(size);
final TableColumn column1 = new TableColumn(table, SWT.NONE);
column1.setText("Key");
column1.setWidth(200);
final TableColumn column2 = new TableColumn(table, SWT.NONE);
column2.setText("Value");
column2.setWidth(200);
table.addListener(SWT.SetData, new Listener() {
public void handleEvent(Event e) {
TableEditor fd_editor = new TableEditor(table);
fd_editor.grabHorizontal = true;
CCombo combo = new CCombo(table, SWT.CHECK);
combo.add("ABC");
combo.add("XYZ");
combo.add("PQR");
combo.add("BABA");
combo.add("PAVAN");
combo.add("RAJA");
combo.select(1);
TableItem item = (TableItem) e.item;
int index = table.indexOf(item);
int[] datum = data[index];
item.setText(0, Integer.toString(datum[0]));
fd_editor.setEditor(combo, item, 1);
}
});
// Add sort indicator and sort data when column selected
Listener sortListener = new Listener() {
public void handleEvent(Event e) {
// determine new sort column and direction
TableColumn sortColumn = table.getSortColumn();
TableColumn currentColumn = (TableColumn) e.widget;
int dir = table.getSortDirection();
if (sortColumn == currentColumn) {
dir = dir == SWT.UP ? SWT.DOWN : SWT.UP;
} else {
table.setSortColumn(currentColumn);
dir = SWT.UP;
}
// sort the data based on column and direction
final int index = currentColumn == column1 ? 0 : 1;
final int direction = dir;
Arrays.sort(data, new Comparator() {
public int compare(Object arg0, Object arg1) {
int[] a = (int[]) arg0;
int[] b = (int[]) arg1;
if (a[index] == b[index])
return 0;
if (direction == SWT.UP) {
return a[index] < b[index] ? -1 : 1;
}
return a[index] < b[index] ? 1 : -1;
}
});
// update data displayed in table
table.setSortDirection(dir);
table.clearAll();
}
};
column1.addListener(SWT.Selection, sortListener);
column2.addListener(SWT.Selection, sortListener);
table.setSortColumn(column1);
table.setSortColumn(column2);
table.setSortDirection(SWT.DOWN);
shell.setSize(shell.computeSize(SWT.DEFAULT, SWT.DEFAULT).x, 500);
shell.open();
while (!shell.isDisposed()) {
if (!display.readAndDispatch())
display.sleep();
}
display.dispose();
}

Flatten LINQ collection object with nested object collections

This is a tricky one. I an trying to flatten a LINQ object collection. Each item in the collection has the potential of having two collections of other objects. See the example below.
public class DemoClass
{
public string Name {get; set;}
public string Address {get; set;}
public List<Foo> Foos = new List<Foo>();
public List<Bar> Bars = new List<Bars>();
}
What I had been doing is this using this code block to flatten this object
var output = from d in DemoClassCollection
from f in d.Foos
from b in d.Bars
select new {
d.Name,
d.Address,
f.FooField1,
f.FooField2,
b.BarField1,
b.BarField2
};
But the problem I'm having is that the result I get is only those DemoClass objects that have objects in the Foos and Bars collections. I need to get all objects in the DemoClass regardless if there are objects in the Foos and Bars collections.
Any help would be greatly appreciated.
Thanks!
Sounds like you might want to use DefaultIfEmpty:
var output = from d in DemoClassCollection
from f in d.Foos.DefaultIfEmpty()
from b in d.Bars.DefaultIfEmpty()
select new {
d.Name,
d.Address,
FooField1 = f == null ? null : f.FooField1,
FooField2 = f == null ? null : f.FooField2,
BarField1 = b == null ? null : b.BarField1,
BarField2 = b == null ? null : b.BarField2
};
Looks like a left outer join in Linq will work (http://msdn.microsoft.com/en-us/library/bb397895.aspx
var output = from d in DemoClassCollection
from f in d.Foos.DefaultIfEmpty()
from b in d.Bars.DefaultIfEmpty()
select new {
d.Name,
d.Address,
f.FooField1,
f.FooField2,
b.BarField1,
b.BarField2
};
I believe you can implement an IComparer to perform custom JOINS or UNIONS in linq based on how you implement the CompareTo() method
From MSDN: http://msdn.microsoft.com/en-us/library/system.icomparable.aspx
using System;
using System.Collections;
public class Temperature : IComparable
{
// The temperature value
protected double temperatureF;
public int CompareTo(object obj) {
if (obj == null) return 1;
Temperature otherTemperature = obj as Temperature;
if (otherTemperature != null)
return this.temperatureF.CompareTo(otherTemperature.temperatureF);
else
throw new ArgumentException("Object is not a Temperature");
}
public double Fahrenheit
{
get
{
return this.temperatureF;
}
set {
this.temperatureF = value;
}
}
public double Celsius
{
get
{
return (this.temperatureF - 32) * (5.0/9);
}
set
{
this.temperatureF = (value * 9.0/5) + 32;
}
}
}
public class CompareTemperatures
{
public static void Main()
{
ArrayList temperatures = new ArrayList();
// Initialize random number generator.
Random rnd = new Random();
// Generate 10 temperatures between 0 and 100 randomly.
for (int ctr = 1; ctr <= 10; ctr++)
{
int degrees = rnd.Next(0, 100);
Temperature temp = new Temperature();
temp.Fahrenheit = degrees;
temperatures.Add(temp);
}
// Sort ArrayList.
temperatures.Sort();
foreach (Temperature temp in temperatures)
Console.WriteLine(temp.Fahrenheit);
}
}
// The example displays the following output to the console (individual
// values may vary because they are randomly generated):
// 2
// 7
// 16
// 17
// 31
// 37
// 58
// 66
// 72
// 95

Lossless hierarchical run length encoding

I want to summarize rather than compress in a similar manner to run length encoding but in a nested sense.
For instance, I want : ABCBCABCBCDEEF to become: (2A(2BC))D(2E)F
I am not concerned that an option is picked between two identical possible nestings E.g.
ABBABBABBABA could be (3ABB)ABA or A(3BBA)BA which are of the same compressed length, despite having different structures.
However I do want the choice to be MOST greedy. For instance:
ABCDABCDCDCDCD would pick (2ABCD)(3CD) - of length six in original symbols which is less than ABCDAB(4CD) which is length 8 in original symbols.
In terms of background I have some repeating patterns that I want to summarize. So that the data is more digestible. I don't want to disrupt the logical order of the data as it is important. but I do want to summarize it , by saying, symbol A times 3 occurrences, followed by symbols XYZ for 20 occurrences etc. and this can be displayed in a nested sense visually.
Welcome ideas.
I'm pretty sure this isn't the best approach, and depending on the length of the patterns, might have a running time and memory usage that won't work, but here's some code.
You can paste the following code into LINQPad and run it, and it should produce the following output:
ABCBCABCBCDEEF = (2A(2BC))D(2E)F
ABBABBABBABA = (3A(2B))ABA
ABCDABCDCDCDCD = (2ABCD)(3CD)
As you can see, the middle example encoded ABB as A(2B) instead of ABB, you would have to make that judgment yourself, if single-symbol sequences like that should be encoded as a repeated symbol or not, or if a specific threshold (like 3 or more) should be used.
Basically, the code runs like this:
For each position in the sequence, try to find the longest match (actually, it doesn't, it takes the first 2+ match it finds, I left the rest as an exercise for you since I have to leave my computer for a few hours now)
It then tries to encode that sequence, the one that repeats, recursively, and spits out a X*seq type of object
If it can't find a repeating sequence, it spits out the single symbol at that location
It then skips what it encoded, and continues from #1
Anyway, here's the code:
void Main()
{
string[] examples = new[]
{
"ABCBCABCBCDEEF",
"ABBABBABBABA",
"ABCDABCDCDCDCD",
};
foreach (string example in examples)
{
StringBuilder sb = new StringBuilder();
foreach (var r in Encode(example))
sb.Append(r.ToString());
Debug.WriteLine(example + " = " + sb.ToString());
}
}
public static IEnumerable<Repeat<T>> Encode<T>(IEnumerable<T> values)
{
return Encode<T>(values, EqualityComparer<T>.Default);
}
public static IEnumerable<Repeat<T>> Encode<T>(IEnumerable<T> values, IEqualityComparer<T> comparer)
{
List<T> sequence = new List<T>(values);
int index = 0;
while (index < sequence.Count)
{
var bestSequence = FindBestSequence<T>(sequence, index, comparer);
if (bestSequence == null || bestSequence.Length < 1)
throw new InvalidOperationException("Unable to find sequence at position " + index);
yield return bestSequence;
index += bestSequence.Length;
}
}
private static Repeat<T> FindBestSequence<T>(IList<T> sequence, int startIndex, IEqualityComparer<T> comparer)
{
int sequenceLength = 1;
while (startIndex + sequenceLength * 2 <= sequence.Count)
{
if (comparer.Equals(sequence[startIndex], sequence[startIndex + sequenceLength]))
{
bool atLeast2Repeats = true;
for (int index = 0; index < sequenceLength; index++)
{
if (!comparer.Equals(sequence[startIndex + index], sequence[startIndex + sequenceLength + index]))
{
atLeast2Repeats = false;
break;
}
}
if (atLeast2Repeats)
{
int count = 2;
while (startIndex + sequenceLength * (count + 1) <= sequence.Count)
{
bool anotherRepeat = true;
for (int index = 0; index < sequenceLength; index++)
{
if (!comparer.Equals(sequence[startIndex + index], sequence[startIndex + sequenceLength * count + index]))
{
anotherRepeat = false;
break;
}
}
if (anotherRepeat)
count++;
else
break;
}
List<T> oneSequence = Enumerable.Range(0, sequenceLength).Select(i => sequence[startIndex + i]).ToList();
var repeatedSequence = Encode<T>(oneSequence, comparer).ToArray();
return new SequenceRepeat<T>(count, repeatedSequence);
}
}
sequenceLength++;
}
// fall back, we could not find anything that repeated at all
return new SingleSymbol<T>(sequence[startIndex]);
}
public abstract class Repeat<T>
{
public int Count { get; private set; }
protected Repeat(int count)
{
Count = count;
}
public abstract int Length
{
get;
}
}
public class SingleSymbol<T> : Repeat<T>
{
public T Value { get; private set; }
public SingleSymbol(T value)
: base(1)
{
Value = value;
}
public override string ToString()
{
return string.Format("{0}", Value);
}
public override int Length
{
get
{
return Count;
}
}
}
public class SequenceRepeat<T> : Repeat<T>
{
public Repeat<T>[] Values { get; private set; }
public SequenceRepeat(int count, Repeat<T>[] values)
: base(count)
{
Values = values;
}
public override string ToString()
{
return string.Format("({0}{1})", Count, string.Join("", Values.Select(v => v.ToString())));
}
public override int Length
{
get
{
int oneLength = 0;
foreach (var value in Values)
oneLength += value.Length;
return Count * oneLength;
}
}
}
public class GroupRepeat<T> : Repeat<T>
{
public Repeat<T> Group { get; private set; }
public GroupRepeat(int count, Repeat<T> group)
: base(count)
{
Group = group;
}
public override string ToString()
{
return string.Format("({0}{1})", Count, Group);
}
public override int Length
{
get
{
return Count * Group.Length;
}
}
}
Looking at the problem theoretically, it seems similar to the problem of finding the smallest context free grammar which generates (only) the string, except in this case the non-terminals can only be used in direct sequence after each other, so e.g.
ABCBCABCBCDEEF
s->ttDuuF
t->Avv
v->BC
u->E
ABABCDABABCD
s->ABtt
t->ABCD
Of course, this depends on how you define "smallest", but if you count terminals on the right side of rules, it should be the same as the "length in original symbols" after doing the nested run-length encoding.
The problem of the smallest grammar is known to be hard, and is a well-studied problem. I don't know how much the "direct sequence" part adds to or subtracts from the complexity.

Aggregate function over an aggregate result set using linq

I have the following linq query:
var totalAmountsPerMonth =
from s in Reports()
where s.ReportDate.Value.Year == year
group s by s. ReportDate.Value.Month into g
orderby g.Key
select new
{
month = g.Key,
totalRecaudacion = g.Sum(rec => rec.RECAUDACION),
totalServicios = g.Sum(ser => ser.SERVICIOS)
};
var final = new ResultSet
{
Recaudacion = meses.Average(q => q. totalRecaudacion),
Servicios = meses.Average(o => o. totalServicios)
};
And I need to obtain the average of the total amount of “RECAUDACION” and “SERVICIOS” of each month. I made this query. However, I definitely think this is not the best solution at all. Could you please suggest me a better and more efficient approach (in a single query if possible) to get these data?
I have created a simple extension method. And it turns out to be two times more efficient in a simple stopwatch benchmark.
public class Report
{
public DateTime? Date { get; set; }
public int RECAUDACION { get; set; }
public int SERVICIOS { get; set; }
}
static class EnumerableEx
{
public static Tuple<double, double> AveragePerMonth(this IEnumerable<Report> reports)
{
var months = new HashSet<int>();
double RECAUDACION = 0d;
double SERVICIOS = 0d;
foreach (Report rep in reports)
{
if (!months.Contains(rep.Date.Value.Month))
{
months.Add(rep.Date.Value.Month);
}
RECAUDACION += rep.RECAUDACION;
SERVICIOS += rep.SERVICIOS;
}
var totalMonth = months.Count;
if (months.Count > 0)
{
RECAUDACION /= totalMonth;
SERVICIOS /= totalMonth;
}
return Tuple.Create<double, double>(RECAUDACION, SERVICIOS);
}
}

Resources