Related
This is a variation on finding all the combinations that add to a target, with two constraints:
We have a limited set of numbers to work with.
The numbers must result in the target number when fed into a separate function.
In this case, the limited set of numbers include 25, 50, 100, 200, 450, 700, 1100, 1800, 2300, 2900, 3900, 5000, 5900, 7200, 8400, etc.
And the function is to add the values together and then multiply by a number based on how many numbers we had:
If 1 number, multiply by 1.
If 2 numbers, multiply by 1.5
If 3-6 numbers, multiply by 2
If 7-10 numbers, multiply by 2.5
If >10 numbers, multiply by 3
Examples:
[50, 50, 50] => 300
[100, 100] => 300
Target numbers include 300, 600, 900, 1500, 3000, 3600, 4400, 5600, 6400, 7600, 9600, etc.
My intuition is that this can't be done recursively, because each step doesn't know the multiplier that will eventually be applied.
Here's a recursive example in JavaScript that seems to answer the requirements:
function getNextM(m, n){
if (n == 1)
return 1.5;
if (n == 2)
return 2;
if (n == 6)
return 2.5;
if (n == 10)
return 3;
return m;
}
function g(A, t, i, sum, m, comb){
if (sum * m == t)
return [comb];
if (sum * m > t || i == A.length)
return [];
let n = comb.length;
let result = g(A, t, i + 1, sum, m, comb);
const max = Math.ceil((t - sum) / A[i]);
let _comb = comb;
for (let j=1; j<=max; j++){
_comb = _comb.slice().concat(A[i]);
sum = sum + A[i];
m = getNextM(m, n);
n = n + 1;
result = result.concat(g(
A, t, i + 1, sum, m, _comb));
}
return result;
}
function f(A, t){
return g(A, t, 0, 0, 1, []);
}
var A = [25, 50, 100, 200, 450, 700, 1100, 1800, 2300, 2900, 3900, 5000, 5900, 7200, 8400];
var t = 300;
console.log(JSON.stringify(f(A, t)));
I wrote a small script in Python3 that may solve this problem.
multiply_factor = [0,1,1.5,2,2,2,2,2.5,2.5,2.5,2.5,3]
def get_multiply_factor(x):
if x< len(multiply_factor):
return multiply_factor[x]
else:
return multiply_factor[-1]
numbers = [25, 50, 100, 200, 450, 700, 1100, 1800, 2300, 2900, 3900, 5000, 5900, 7200, 8400]
count_of_numbers = len(numbers)
# dp[Count_of_Numbers]
dp = [[] for j in range(count_of_numbers+1)]
#Stores multiplying_factor * sum of numbers for each unique Count, See further
sum_found =[set() for j in range(count_of_numbers+1)]
# Stores Results in Unordered_Map for answering Queries
master_record={}
#Initializing Memoization Array
for num in numbers:
dp[1].append(([num],num*get_multiply_factor(1)))
for count in range(2,count_of_numbers+1): # Count of Numbers
for num in numbers:
for previous_val in dp[count-1]:
old_factor = get_multiply_factor(count-1) #Old Factor for Count Of Numbers = count-1
new_factor = get_multiply_factor(count) #New Factor for Count Of Numbers = count
# Multiplying Factor does not change
if old_factor==new_factor:
# Scale Current Number and add
new_sum = num*new_factor+previous_val[1]
else:
#Otherwise, We rescale the entire sum
new_sum = (num+previous_val[1]//old_factor)*new_factor
# Check if NEW SUM has already been found for this Count of Numbers
if new_sum not in sum_found[count]:
# Add to current Count Array
dp[count].append(([num]+previous_val[0],new_sum))
# Mark New Sum as Found for Count Of Numbers = count
sum_found[count].add(new_sum)
if new_sum not in master_record:
# Store Seected Numbers in Master Record for Answering Queries
master_record[new_sum] = dp[count][-1][0]
# for i in dp:
# print(i)
print(master_record[1300])
print(master_record[300])
print(master_record[2300])
print(master_record[7950])
print(master_record[350700.0])
Output :-
[100, 100, 450]
[100, 100]
[25, 25, 1100]
[25, 50, 3900]
[1800, 5900, 8400, 8400, 8400, 8400, 8400, 8400, 8400, 8400, 8400, 8400, 8400, 8400, 8400]
[Finished in 0.3s]
My Algo in a nutshell.
Iterate over Count[2, Limit], I've considered limit = Number of Elements
Iterate over List of Numbers
Iterate over Sums found for previous count.
Calculate New Sum,
If it does not exist for current count, update.
I am assuming that the number of queries will be large so that memorization pays off. The upper limit for count may break my code as the possibilities may grow exponentially.
I'm looking to find an algorithm that successfully generalizes the following problem to n number of sets, but for simplicity assume that there are 4 different sets, each containing 4 elements. Also we can assume that each set always contains an equal number of elements, however there can be any number of elements. So if there are 37 elements in the first set, we can assume there are also 37 elements contained in each of the other sets.
A combination of elements is formed by taking 1 element from the first set and putting it into first place, 1 element from the second set and putting it in the second place, and so on. For example say the first set contains {A0,A1,A2,A3}, the second set contains {B0,B1,B2,B3}, third is {C0,C1,C2,C3} and fourth is {D0,D1,D2,D3}. One possible combination would be [A0, B2, C1, D3].
The goal is to find the path that maximizes the distance when cycling through all the possible combinations, avoiding repetition as much as possible. And avoiding repetition applies to contiguous groups as well as individual columns. For example:
Individual columns
[A0, B0, C0, D0]
[A1, B1, C1, D1]
[A2, B0, C2, D2]
This is incorrect because B0 is repeated sooner than it had to be.
Contiguous groups
[A0, B0, C0, D0]
[A1, B1, C1, D1]
[A2, B2, C2, D2]
[A3, B3, C3, D3]
[A0, B0, C1, D2]
This is incorrect because the contiguous pair (A0, B0) was repeated sooner than it had to be. However if the last one was instead [A0, B1, C0, D1] then this would be alright.
When cycling through all possible combinations the contiguous groups will have to be repeated, but the goal is to maximize the distance between them. So for example if (A0, B0) is used, then ideally all the other first pairs would be used before it's used again.
I was able to find a solution for when there are 3 sets, but I'm having trouble generalizing it to n sets and even solving for 4 sets. Any ideas?
Can you post your solution for three sets?
Sure, first I wrote down all possible combinations. Then I made three 3x3 matrices of entries by grouping the entries where the non-contiguous (first and third) elements were repeated:
(A0,B0,C0)1, (A1,B0,C1)4, (A2,B0,C2)7 (A0,B0,C1)13, (A1,B0,C2)16, (A2,B0,C0)10 (A0,B0,C2)25, (A1,B0,C0)19, (A2,B0,C1)22
(A0,B1,C0)8, (A1,B1,C1)2, (A2,B1,C2)5 (A0,B1,C1)11, (A1,B1,C2)14, (A2,B1,C0)17 (A0,B1,C2)23, (A1,B1,C0)26, (A2,B1,C1)20
(A0,B2,C0)6, (A1,B2,C1)9, (A2,B2,C2)3 (A0,B2,C1)18, (A1,B2,C2)12, (A2,B2,C0)15 (A0,B2,C2)21, (A1,B2,C0)24, (A2,B2,C1)27
Then I realized if I traversed in a diagonal pattern (order indicated by the superscript index) that it would obey the rules. I then wrote the following code to take advantage of this visual pattern:
#Test
public void run() {
List<String> A = new ArrayList<String>();
A.add("0");
A.add("1");
A.add("2");
List<String> B = new ArrayList<String>();
B.add("0");
B.add("1");
B.add("2");
List<String> C = new ArrayList<String>();
C.add("0");
C.add("1");
C.add("2");
int numElements = A.size();
List<String> output = new ArrayList<String>();
int offset = 0;
int nextOffset = 0;
for (int i = 0; i < A.size()*B.size()*C.size(); i++) {
int j = i % numElements;
int k = i / numElements;
if (j == 0 && k%numElements == numElements-1) {
nextOffset = (j+k+offset) % numElements;
}
if (j == 0 && k%numElements == 0) {
offset = nextOffset;
}
String first = A.get((j+k+offset) % numElements);
String second = B.get(j);
String third = C.get((j+k) % numElements);
System.out.println(first + " " + second + " " + third);
output.add(first + second + third);
}
}
However I just realized that this isn't ideal either, since it looks like the pair (A0,B1) is repeated too soon, at indices 8 and 11 :( However I think maybe this is unavoidable, when crossing over from one group to another?.. This is a difficult problem! Harder than it looks
If you can think about and revise your actual requirements
Okay so I decided to remove the restriction of traversing through all possible combinations, and instead reduce the yield a little bit to improve the quality of the results.
The whole point of this is to take elements belonging to a particular set and combine them to form a combination of elements that appear unique. So if I start out with 3 combinations and there are 3 sets, I can break each combination into 3 elements and place the elements into their respective sets. I can then use the algorithm to mix and match the elements and produce 27 seemingly unique combinations -- of course they're formed from derivative elements so they only appear unique as long as you don't look too closely!
So the 3 combinations formed by hand can be turned into 33 combinations, saving a lot of time and energy. Of course this scales up pretty nicely too, if I form 10 combinations by hand then the algorithm can generate 1000 combinations. I probably don't need quite that many combinations anyways, so I can sacrifice some entries to better avoid repetition. In particular with 3 sets I noticed that while my solution was decent, there was some bunching that occurred every numElements2 entries. Here is an example of 3 sets of 5 elements, with an obvious repetition after 25 combinations:
19) A1 B3 C1
20) A2 B4 C2
21) A4 B0 C4 <--
22) A0 B1 C0
23) A1 B2 C1
24) A2 B3 C2
25) A3 B4 C3
26) A0 B0 C4 <--
27) A1 B1 C0
28) A2 B2 C1
29) A3 B3 C2
30) A4 B4 C3
31) A1 B0 C0
32) A2 B1 C1
To fix this we can introduce the following statement and get rid of this bad block:
if (k % numElements == 0) continue;
However this only works when numElements > numSets, otherwise the Individual Columns rule will be broken. (In case you were wondering I also switched the ordering of the first and third sets in this example, did this initially so I wasn't opening with the bad repetition)
Aaannd I'm still completely stuck on how to form an approach for n or even 4 sets. It certainly gets trickier because there are now different sizes of contiguous groups to avoid, contiguous trios as well as pairs.. Any thoughts? Am I crazy for even trying to do this?
Even after the modifications in your question, I'm still not sure exactly what you want. It seems that what you would really like is impossible, but I'm not sure exactly how much relaxation in the conditions is acceptable. Nevertheless, I'll give it a crack.
Oddly there seems to be little literature (that I can find, anyway) covering the subject of your problem, so I had to invent something myself. This is the idea: you are looking for a sequence of points on a multidimensional torus such that elements of the sequence are as far apart as possible in a complicated metric. What this reminds me of is something I learned years ago in a mechanics class, strangely enough. If you have a line on a flat torus with rational slope, the line will loop back onto itself after a few cycles, but if you have a line with irrational slope, the line will densely cover the entire torus.
I don't expect that to mean a lot to many people, but it did give me an idea. The index for each set could step by an irrational amount. You would have to take the floor, of course, and then modulo whatever, but it does seem to cover the bases well, so to speak. The irrational step for each set could be different (and mutually irrational, to use rather loose language).
To make the idea more precise, I wrote a short program. Please check it out.
class Equidistributed {
static final double IRRATIONAL1 = Math.sqrt(2);
static final double IRRATIONAL2 = Math.sqrt(3);
static final double IRRATIONAL3 = Math.sqrt(5)-1;
// four sets of 7 elements each
static int setSize = 7;
public static void main(String[] args) {
for (int i = 0; i < Math.pow(setSize,4); i++) {
String tuple = "";
int j = i % setSize;
tuple += j + ",";
j = ((int)Math.floor(i*IRRATIONAL1)) % setSize;
tuple += j + ",";
j = ((int)Math.floor(i*IRRATIONAL2)) % setSize;
tuple += j + ",";
j = ((int)Math.floor(i*IRRATIONAL3)) % setSize;
tuple += j;
System.out.println(tuple);
}
}
}
I "eyeballed" the results, and they aren't perfect, but they're pretty nice. Plus the program runs quickly. It's for four sets with a variable number of elements (I chose 7 for the example). The irrational numbers I'm using are based on square roots of prime numbers; I subtracted 1 from sqrt(5) so that the result would be in the range between 1 and 2. Each tuple is basically
(i, floor(i*irrational1), floor(i*irrational2), floor(i*irrational3)) mod 7
Statistically that should make the sequence evenly distributed, which is a consequence of what you want. Whether that translates into the right "distance" properties, I can't really be sure. You should probably write a program to test whether a sequence has the property you want, and then pipe the output from my program into the test.
Define an array of all possible combinations.
For each possible order of the array, compute your distance score. If greater than the previous best (default start = 0), then copy the array to your output, overwriting the previous best array.
Assuming values are 1 dimensional, you do not need to compare the distance between every single element. Instead, you can find the maximum and minimum value within each set before comparing it with other sets.
Step 1: Find the element with maximum value and the minimum value within each set (eg A1, A34, B4, B32, C5, C40, with the smaller number with smaller values in this example)
Step 2: Compare A1 with the maximum values of all other sets, and repeat the process for all minimum values.
Generalized algorithm and wrote code to do performance testing:
import java.util.*;
public class Solution {
public static void main(String[] args) throws Exception {
List<String> A = new ArrayList<>();
A.add("A0"); A.add("A1"); A.add("A2");
A.add("A3"); A.add("A4"); A.add("A5"); A.add("A6");
List<String> B = new ArrayList<>();
B.add("B0"); B.add("B1"); B.add("B2");
B.add("B3"); B.add("B4"); B.add("B5"); B.add("B6");
List<String> C = new ArrayList<>();
C.add("C0"); C.add("C1"); C.add("C2");
C.add("C3"); C.add("C4"); C.add("C5"); C.add("C6");
List<String> D = new ArrayList<>();
D.add("D0"); D.add("D1"); D.add("D2");
D.add("D3"); D.add("D4"); D.add("D5"); D.add("D6");
List<List<String>> columns = new ArrayList<>();
columns.add(A); columns.add(B); columns.add(C); columns.add(D);
List<String> output = equidistribute(columns);
// for (String row : output) {
// System.out.println(row);
// }
// new Solution().test(output, columns.size(), A.size());
new Solution().testAllTheThings();
}
public static List<String> equidistribute(List<List<String>> columns) {
List<String> output = new ArrayList<>();
int[] primeNumbers = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
101, 103, 107, 109, 113, 127, 131, 137, 139, 149,
151, 157, 163, 167, 173, 179, 181, 191, 193, 197,
199, 211, 223, 227, 229, 233, 239, 241, 251, 257,
263, 269, 271, 277, 281, 283, 293, 307, 311, 313,
317, 331, 337, 347, 349, 353, 359, 367, 373, 379,
383, 389, 397, 401, 409, 419, 421, 431, 433, 439,
443, 449, 457, 461, 463, 467, 479, 487, 491, 499,
503, 509, 521, 523, 541};
int numberOfColumns = columns.size();
int numberOfElements = columns.get(0).size();
for (int i = 0; i < Math.pow(numberOfElements, numberOfColumns); i++) {
String row = "";
for (int j = 0; j < numberOfColumns; j++) {
if (j==0) {
row += columns.get(0).get(i % numberOfElements);
} else {
int index = ((int) Math.floor(i * Math.sqrt(primeNumbers[j-1]))) % numberOfElements;
row += " " + columns.get(j).get(index);
}
}
output.add(row);
}
return output;
}
class MutableInt {
int value = 0;
public void increment() { value++; }
public int get() { return value; }
public String toString() { return String.valueOf(value); }
}
public void test(List<String> columns, int numberOfColumns, int numberOfElements) throws Exception {
List<HashMap<String, MutableInt>> pairMaps = new ArrayList<>();
List<HashMap<String, MutableInt>> individualElementMaps = new ArrayList<>();
// initialize structures for calculating min distance
for (int i = 0; i < numberOfColumns; i++) {
if (i != numberOfColumns-1) {
HashMap<String, MutableInt> pairMap = new HashMap<>();
pairMaps.add(pairMap);
}
HashMap<String, MutableInt> individualElementMap = new HashMap<>();
individualElementMaps.add(individualElementMap);
}
int minDistancePair = Integer.MAX_VALUE;
int minDistanceElement = Integer.MAX_VALUE;
String pairOutputMessage = "";
String pairOutputDebugMessage = "";
String elementOutputMessage = "";
String elementOutputDebugMessage = "";
String outputMessage = numberOfColumns + " columns, " + numberOfElements + " elements";
for (int i = 0; i < columns.size(); i++) {
String[] elements = columns.get(i).split(" ");
for (int j = 0; j < numberOfColumns; j++) {
// pair stuff
if (j != numberOfColumns-1) {
String pairEntry = elements[j] + " " + elements[j+1];
MutableInt count = pairMaps.get(j).get(pairEntry);
if (pairMaps.get(j).containsKey(pairEntry)) {
if (count.get() <= minDistancePair) {
minDistancePair = count.get();
pairOutputMessage = "minDistancePair = " + minDistancePair;
pairOutputDebugMessage += "(" + pairEntry + " at line " + (i+1) + ") min = " + minDistancePair + "\n";
}
count = null;
}
if (count == null) {
pairMaps.get(j).put(pairEntry, new MutableInt());
}
}
// element stuff
String elementEntry = elements[j];
MutableInt count = individualElementMaps.get(j).get(elementEntry);
if (individualElementMaps.get(j).containsKey(elementEntry)) {
if (count.get() <= minDistanceElement) {
minDistanceElement = count.get();
elementOutputMessage = "minDistanceElement = " + minDistanceElement;
elementOutputDebugMessage += "(" + elementEntry + " at line " + (i+1) + ") min = " + minDistanceElement + "\n";
}
count = null;
}
if (count == null) {
individualElementMaps.get(j).put(elementEntry, new MutableInt());
}
}
// increment counters
for (HashMap<String, MutableInt> pairMap : pairMaps) {
Iterator it = pairMap.entrySet().iterator();
while (it.hasNext()) {
Map.Entry mapEntry = (Map.Entry) it.next();
((MutableInt) mapEntry.getValue()).increment();
}
}
for (HashMap<String, MutableInt> elementMap : individualElementMaps) {
Iterator it = elementMap.entrySet().iterator();
while (it.hasNext()) {
Map.Entry mapEntry = (Map.Entry) it.next();
((MutableInt) mapEntry.getValue()).increment();
}
}
}
System.out.println(outputMessage + " -- " + pairOutputMessage + ", " + elementOutputMessage);
// System.out.print(elementOutputDebugMessage);
// System.out.print(pairOutputDebugMessage);
}
public void testAllTheThings() throws Exception {
char[] columnPrefix = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".toCharArray();
int maxNumberOfColumns = columnPrefix.length;
int maxNumberOfElements = 30;
for (int i = 2; i < maxNumberOfColumns; i++) {
for (int j = i; j < maxNumberOfElements; j++) {
List<List<String>> columns = new ArrayList<>();
for (int k = 0; k < i; k++) {
List<String> column = new ArrayList<>();
for (int l = 0; l < j; l++) {
column.add(String.valueOf(columnPrefix[k]) + l);
}
columns.add(column);
}
List<String> output = equidistribute(columns);
test(output, i, j);
}
}
}
}
edit: removed restriction that each set must have same number of elements
public List<String> equidistribute(List<List<String>> columns) {
List<String> output = new ArrayList<>();
int[] primeNumbers = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41,
43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97,
101, 103, 107, 109, 113, 127, 131, 137, 139, 149,
151, 157, 163, 167, 173, 179, 181, 191, 193, 197,
199, 211, 223, 227, 229, 233, 239, 241, 251, 257,
263, 269, 271, 277, 281, 283, 293, 307, 311, 313,
317, 331, 337, 347, 349, 353, 359, 367, 373, 379,
383, 389, 397, 401, 409, 419, 421, 431, 433, 439,
443, 449, 457, 461, 463, 467, 479, 487, 491, 499,
503, 509, 521, 523, 541};
int numberOfColumns = columns.size();
int numberOfCombinations = 1;
for (List<String> column : columns) {
numberOfCombinations *= column.size();
}
for (int i = 0; i < numberOfCombinations; i++) {
String row = "";
for (int j = 0; j < numberOfColumns; j++) {
int numberOfElementsInColumn = columns.get(j).size();
if (j==0) {
row += columns.get(0).get(i % numberOfElementsInColumn);
} else {
int index = ((int) Math.floor(i * Math.sqrt(primeNumbers[j-1]))) % numberOfElementsInColumn;
row += "|" + columns.get(j).get(index);
}
}
output.add(row);
}
return output;
}
Defined before this block of code:
dataset can be a Vector or List
numberOfSlices is an Int denoting how many "times" to slice dataset
I want to split the dataset into numberOfSlices slices, distributed as evenly as possible. By "split" I guess I mean "partition" (intersection of all should be empty, union of all should be the original) to use the set theory term, though this is not necessarily a set, just an arbitrary collection.
e.g.
dataset = List(1, 2, 3, 4, 5, 6, 7)
numberOfSlices = 3
slices == ListBuffer(Vector(1, 2), Vector(3, 4), Vector(5, 6, 7))
Is there a better way to do it than what I have below? (which I'm not even sure is optimal...)
Or perhaps this is not an algorithmically feasible endeavor, in which case any known good heuristics?
val slices = new ListBuffer[Vector[Int]]
val stepSize = dataset.length / numberOfSlices
var currentStep = 0
var looper = 0
while (looper != numberOfSlices) {
if (looper != numberOfSlices - 1) {
slices += dataset.slice(currentStep, currentStep + stepSize)
currentStep += stepSize
} else {
slices += dataset.slice(currentStep, dataset.length)
}
looper += 1
}
If the behavior of xs.grouped(xs.size / n) doesn't work for you, it's pretty easy to define exactly what you want. The quotient is the size of the smaller pieces, and the remainder is the number of the bigger pieces:
def cut[A](xs: Seq[A], n: Int) = {
val (quot, rem) = (xs.size / n, xs.size % n)
val (smaller, bigger) = xs.splitAt(xs.size - rem * (quot + 1))
smaller.grouped(quot) ++ bigger.grouped(quot + 1)
}
The typical "optimal" partition calculates an exact fractional length after cutting and then rounds to find the actual number to take:
def cut[A](xs: Seq[A], n: Int):Vector[Seq[A]] = {
val m = xs.length
val targets = (0 to n).map{x => math.round((x.toDouble*m)/n).toInt}
def snip(xs: Seq[A], ns: Seq[Int], got: Vector[Seq[A]]): Vector[Seq[A]] = {
if (ns.length<2) got
else {
val (i,j) = (ns.head, ns.tail.head)
snip(xs.drop(j-i), ns.tail, got :+ xs.take(j-i))
}
}
snip(xs, targets, Vector.empty)
}
This way your longer and shorter blocks will be interspersed, which is often more desirable for evenness:
scala> cut(List(1,2,3,4,5,6,7,8,9,10),4)
res5: Vector[Seq[Int]] =
Vector(List(1, 2, 3), List(4, 5), List(6, 7, 8), List(9, 10))
You can even cut more times than you have elements:
scala> cut(List(1,2,3),5)
res6: Vector[Seq[Int]] =
Vector(List(1), List(), List(2), List(), List(3))
Here's a one-liner that does the job for me, using the familiar Scala trick of a recursive function that returns a Stream. Notice the use of (x+k/2)/k to round the chunk sizes, intercalating the smaller and larger chunks in the final list, all with sizes with at most one element of difference. If you round up instead, with (x+k-1)/k, you move the smaller blocks to the end, and x/k moves them to the beginning.
def k_folds(k: Int, vv: Seq[Int]): Stream[Seq[Int]] =
if (k > 1)
vv.take((vv.size+k/2)/k) +: k_folds(k-1, vv.drop((vv.size+k/2)/k))
else
Stream(vv)
Demo:
scala> val indices = scala.util.Random.shuffle(1 to 39)
scala> for (ff <- k_folds(7, indices)) println(ff)
Vector(29, 8, 24, 14, 22, 2)
Vector(28, 36, 27, 7, 25, 4)
Vector(6, 26, 17, 13, 23)
Vector(3, 35, 34, 9, 37, 32)
Vector(33, 20, 31, 11, 16)
Vector(19, 30, 21, 39, 5, 15)
Vector(1, 38, 18, 10, 12)
scala> for (ff <- k_folds(7, indices)) println(ff.size)
6
6
5
6
5
6
5
scala> for (ff <- indices.grouped((indices.size+7-1)/7)) println(ff)
Vector(29, 8, 24, 14, 22, 2)
Vector(28, 36, 27, 7, 25, 4)
Vector(6, 26, 17, 13, 23, 3)
Vector(35, 34, 9, 37, 32, 33)
Vector(20, 31, 11, 16, 19, 30)
Vector(21, 39, 5, 15, 1, 38)
Vector(18, 10, 12)
scala> for (ff <- indices.grouped((indices.size+7-1)/7)) println(ff.size)
6
6
6
6
6
6
3
Notice how grouped does not try to even out the size of all the sub-lists.
Here is my take on the problem:
def partition[T](items: Seq[T], partitionsCount: Int): List[Seq[T]] = {
val minPartitionSize = items.size / partitionsCount
val extraItemsCount = items.size % partitionsCount
def loop(unpartitioned: Seq[T], acc: List[Seq[T]], extra: Int): List[Seq[T]] =
if (unpartitioned.nonEmpty) {
val (splitIndex, newExtra) = if (extra > 0) (minPartitionSize + 1, extra - 1) else (minPartitionSize, extra)
val (newPartition, remaining) = unpartitioned.splitAt(splitIndex)
loop(remaining, newPartition :: acc, newExtra)
} else acc
loop(items, List.empty, extraItemsCount).reverse
}
It's more verbose than some of the other solutions but hopefully more clear as well. reverse is only necessary if you want the order to be preserved.
As Kaito mentions grouped is exactly what you are looking for. But if you just want to know how to implement such a method, there are many ways ;-). You could for example do it like this:
def grouped[A](xs: List[A], size: Int) = {
def grouped[A](xs: List[A], size: Int, result: List[List[A]]): List[List[A]] = {
if(xs.isEmpty) {
result
} else {
val (slice, rest) = xs.splitAt(size)
grouped(rest, size, result :+ slice)
}
}
grouped(xs, size, Nil)
}
I'd approach it this way: Given n elements and m partitions (n>m), either n mod m == 0 in which case, each partition will have n/m elements, or n mod m = y, in which case you'll have each partition with n/m elements and you have to distribute y over some m.
You'll have y slots with n/m+1 elements and (m-y) slots with n/m. How you distribute them is your choice.
I'm currently developing a web application and using JSON for ajax requests and responses. I have an area where I return a very large dataset to the client in the form of an array of over 10000 objects. Here's part of the example (its been simplified somewhat):
"schedules": [
{
"codePractice": 35,
"codeScheduleObject": 576,
"codeScheduleObjectType": "",
"defaultCodeScheduleObject": 12,
"name": "Dr. 1"
},
{
"codePractice": 35,
"codeScheduleObject": 169,
"codeScheduleObjectType": "",
"defaultCodeScheduleObject": 43,
"name": "Dr. 2"
},
{
"codePractice": 35,
"codeScheduleObject": 959,
"codeScheduleObjectType": "",
"defaultCodeScheduleObject": 76,
"name": "Dr. 3"
}
]
As, you can imagine, with a very large number of objects in this array, the size of the JSON reponse can be quite large.
My question is, is there a JSON stringifier/parser that would convert the "schedules" array to look something like this as a JSON string:
"schedules": [
["codePractice", "codeScheduleObject", "codeLogin", "codeScheduleObjectType", "defaultCodeScheduleObject","name"],
[35, 576, "", 12, "Dr. 1"],
[35, 169, "", 43, "Dr. 2"],
[35, 959, "", 76, "Dr. 3"],
]
ie, that there would be an array at the beginning of the "schedules" array that held the keys of the objects this array, and all of the other container arrays would hold the values.
I could, if I wanted, do the conversion on the server and parse it on the client, but I'm wondering if there's a standard library for parsing/stringifying large JSON?
I could also run it through a minifier, but I'd like to keep the keys I have currently as they give some context within the application.
I'm also hoping you might critique my approach here or suggest alternatives?
HTTP compression (i.e. gzip or deflate) already does exactly that. Repeated patterns, like your JSON keys, are replaced with tokens so that the verbose pattern only has to occur once per transmission.
Not an answer, but to give a rough estimate of "savings" based on 10k entries and some bogus data :-) This is in response to a comment I posted. Will the added complexity make the schema'ized approach worth it?
"It depends."
This C# is LINQPad and is ready-to-go for testing/modifying:
string LongTemplate (int n1, int n2, int n3, string name) {
return string.Format(#"
{{
""codePractice"": {0},
""codeScheduleObject"": {1},
""codeScheduleObjectType"": """",
""defaultCodeScheduleObject"": {2},
""name"": ""Dr. {3}""
}}," + "\n", n1, n2, n3, name);
}
string ShortTemplate (int n1, int n2, int n3, string name) {
return string.Format("[{0}, {1}, \"\", {2}, \"Dr. {3}\"],\n",
n1, n2, n3, name);
}
string MinTemplate (int n1, int n2, int n3, string name) {
return string.Format("[{0},{1},\"\",{2},\"Dr. {3}\"],",
n1, n2, n3, name);
}
long GZippedSize (string s) {
var ms = new MemoryStream();
using (var gzip = new System.IO.Compression.GZipStream(ms, System.IO.Compression.CompressionMode.Compress, true))
using (var sw = new StreamWriter(gzip)) {
sw.Write(s);
}
return ms.Position;
}
void Main()
{
var r = new Random();
var l = new StringBuilder();
var s = new StringBuilder();
var m = new StringBuilder();
for (int i = 0; i < 10000; i++) {
var n1 = r.Next(10000);
var n2 = r.Next(10000);
var n3 = r.Next(10000);
var name = "bogus" + r.Next(50);
l.Append(LongTemplate(n1, n2, n3, name));
s.Append(ShortTemplate(n1, n2, n3, name));
m.Append(MinTemplate(n1, n2, n3, name));
}
var lc = GZippedSize(l.ToString());
var sc = GZippedSize(s.ToString());
var mc = GZippedSize(s.ToString());
Console.WriteLine(string.Format("Long:\tNormal={0}\tGZip={1}\tCompressed={2:P}", l.Length, lc, (float)lc / l.Length));
Console.WriteLine(string.Format("Short:\tNormal={0}\tGZip={1}\tCompressed={2:P}", s.Length, sc, (float)sc / s.Length));
Console.WriteLine(string.Format("Min:\tNormal={0}\tGZip={1}\tCompressed={2:P}", m.Length, mc, (float)mc / m.Length));
Console.WriteLine(string.Format("Short/Long\tRegular={0:P}\tGZip={1:P}",
(float)s.Length / l.Length, (float)sc / lc));
Console.WriteLine(string.Format("Min/Long\tRegular={0:P}\tGZip={1:P}",
(float)m.Length / l.Length, (float)mc / lc));
}
My results:
Long: Normal=1754614 GZip=197053 Compressed=11.23 %
Short: Normal=384614 GZip=128252 Compressed=33.35 %
Min: Normal=334614 GZip=128252 Compressed=38.33 %
Short/Long Regular=21.92 % GZip=65.09 %
Min/Long Regular=19.07 % GZip=65.09 %
Conclusion:
The single biggest savings is to use GZIP (better than just using schema'ize).
GZIP + schema'ized will be the smallest overall.
With GZIP there is no point to use a normal JavaScript minimizer (in this scenario).
Use GZIP (e.g. DEFLATE); it performs very well on repetitive structured text (900% compression on normal!).
Happy coding.
Here's an article that does pretty much what you're looking to do:
http://stevehanov.ca/blog/index.php?id=104
At first glance, it looks like your example would be compressed down to the following after the first step of the algorithm, which will actually do more work on it in subsequent steps):
{
"templates": [
["codePractice", "codeScheduleObject", "codeScheduleObjectType", "defaultCodeScheduleObject", "name"]
],
"values": [
{ "type": 1, "values": [ 35, 576, "", 12, "Dr. 1" ] },
{ "type": 1, "values": [ 35, 169, "", 43, "Dr. 2" ] },
{ "type": 1, "values": [ 35, 959, "", 76, "Dr. 3" ] }
]
}
You can start to see the benefit of the algorithm already. Here's the final output after running it through the compressor:
{
"f" : "cjson",
"t" : [
[0,"schedules"],
[0,"codePractice","codeScheduleObject","codeScheduleObjectType","defaultCodeScheduleObject","name"]
],
"v" : {
"" : [ 1, [
{ "" : [2, 35, 576, "", 12, "Dr. 1"] },
{ "" : [2, 35, 169, "", 43, "Dr. 2"] },
{ "" : [2, 35, 959, "", 76, "Dr. 3"] }
]
]
}
}
One can obviously see the improvement if you have several thousands of records. The output is still readable, but I think the other guys are right too: a good compression algorithm is going to remove the blocks of text that are repeated anyway...
Before you change your JSON schema give this a shot
http://httpd.apache.org/docs/2.0/mod/mod_deflate.html
For the record, i am doing exactly it in php. Its a list of objects from a database.
$comp=base64_encode(gzcompress(json_encode($json)));
json: string(22501 length)
gz compressed = string(711) but its a binary format.
gz compressed + base64 = string(948) its a text format.
So, its considerably smaller by using a fraction of second.
I feel that it should be something very simple and obvious but just stuck on this for the last half an hour and can't move on.
All I need is to split an array of elements into N groups based on element index.
For example we have an array of 30 elements [e1,e2,...e30], that has to be divided into N=3 groups like this:
group1: [e1, ..., e10]
group2: [e11, ..., e20]
group3: [e21, ..., e30]
I came up with nasty mess like this for N=3 (pseudo language, I left multiplication on 0 and 1 just for clarification):
for(i=0;i<array_size;i++) {
if(i>=0*(array_size/3) && i<1*(array_size/3) {
print "group1";
} else if(i>=1*(array_size/3) && i<2*(array_size/3) {
print "group2";
} else if(i>=2*(array_size/3) && i<3*(array_size/3)
print "group3";
}
}
But what would be the proper general solution?
Thanks.
What about something like this?
for(i=0;i<array_size;i++) {
print "group" + (Math.floor(i/(array_size/N)) + 1)
}
Here's a little function which will do what you want - it presumes you know the number of groups you want to make:
function arrayToGroups(source, groups) {
//This is the array of groups to return:
var grouped = [];
//work out the size of the group
var groupSize = Math.ceil(source.length/groups);
//clone the source array so we can safely splice it (splicing modifies the array)
var queue = source.slice(0);
for (var r=0;r<groups;r++) {
//Grab the next groupful from the queue, and append it to the array of groups
grouped.push(queue.splice(0, groupSize));
}
return grouped;
}
And you use it like:
var herbs = ['basil', 'marjoram', 'aniseed', 'parsely', 'chives', 'sage', 'fennel', 'oregano', 'thyme', 'tarragon', 'rosemary'];
var herbGroups = arrayToGroups(herbs, 3);
which returns:
herbGroups[0] = ['basil', 'marjoram', 'aniseed', 'parsely']
herbGroups[1] = ['chives', 'sage', 'fennel', 'oregano']
herbGroups[2] = ['thyme', 'tarragon', 'rosemary']
It doesn't do any sanity checking to make sure you pass in an array and a number, but you could add that easily enough. You could probably prototype it into the Javascript's object type, too, which would give you a handy 'toGroups' method on Arrays.
Using a vector language makes this task simple, right tool and all that. Just thought I'd throw this out there to let folks check out an alternative methodology.
The explained version in K (an APL descendent):
split:{[values;n] / define function split with two parameters
enum:!n / ! does enumerate from 0 through n exclusive, : is assign
floor:_(#values)%n / 33 for this sample, % is divide, _ floor, # count
cut:floor*enum / 0 33 66 for this sample data, * multiplies atom * vector
:cut _ values / cut the values at the given cutpoints, yielding #cut lists
}
values:1+!30 / generate values 1 through 30
n:3 / how many groups to split into
groups:split[values;n] / set the groups
yields the expected output:
(1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30)
The short version in K :
split:{((_(#x)%y)*!y)_ x}
groups:split[1+!30;3]
yields the same output:
(1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30)
I modified Beejamin's function above and just wanted to share it.
function arrayToGroups($source, $pergroup) {
$grouped = array();
$groupCount = ceil(count($source)/$pergroup);
$queue = $source;
for ($r=0; $r<$groupCount; $r++) {
array_push($grouped, array_splice($queue, 0, $pergroup));
}
return $grouped;
}
This asks how many items to have per group instead of how many groups total. PHP.
const int g = 3; // number of groups
const int n = (array_size + g - 1)/g; // elements per group
for (i=0,j=1; i<array_size; ++i) {
if (i > j*n)
++j;
printf("Group %d\n", j);
}
int group[3][10];
int groupIndex = 0;
int itemIndex = 0;
for(i = 0; i < array_size; i++)
{
group[groupIndex][itemIndex] = big_array[i];
itemIndex++;
if (itemIndex == 10)
{
itemIndex = 0;
groupIndex++;
}
}
There's probably an infinite number of ways of do this.
I'd suggest: for each group, create a base pointer and count.
struct group {foo * ptr; size_t count };
group * pgroups = new group [ngroups];
size_t objects_per_group = array_size / ngroups;
for (unsigned u = 0; u < ngroups; ++u ) {
group & g = pgroups[u];
size_t index = u * objects_per_group;
g.ptr = & array [index];
g.count = min (objects_per_group, array_size - index); // last group may have less!
}
...`
for (unsigned u = 0; u < ngroups; ++u) {
// group "g" is an array at pgroups[g].ptr, dimension pgroups[g].count
group & g = pgroups[u];
// enumerate the group:
for (unsigned v = 0; v < g.count; ++v) {
fprintf (stdout, "group %u, item %u, %s\n",
(unsigned) u, (unsigned) v, (const char *) g.ptr[v]->somestring);
} }
delete[] pgroups;
I think the problem is a little more complicated; and considering that your only look at group as a 1 dimensional problem your going to get a very odd view of what groups actually are.
Firstly the problem is dimensional according to the number of group primes, and group combinations you are dealing with. In Mathematics; this is represented as n to the power of n or n^n which can be translated to !n (factor of n).
If I have 5 groups arrayed as (1, 2, 3, 4, 5) then I wanted to represent it as certain groups or combonations of groups according to a factorial expression then the combonations get bigger
Group 1x1 = 1,2,3,4,5
Group 2x1 = 12, 23, 45, 13, 14, 15, 21, 24, 25, 31, 32, 34, 35, 41, 42, 43, 45, 51, 52, 53, 54
so the strategy creates a branch systematic branch (easy enough)
12, 13, 14, 15
21, 22, 23, 24
31, 32, 34, 35
41, 42, 43, 45
51, 52, 53, 55
Group 1 + 2x2x1 = (1, 23, 45), (2, 13, 45), (3, 12, 45), (4, 12, 35), (1, 24, 35), (1, 25, 35), (1, 32, 45), (1, 34, 25), (1, 35, 24), ... etc
As you can see when you begin to add factorial sets the comboniations become not so easy to create a mathematic reference to express the terms. It gets worst when you get up into a base set > 3 or 4 length.
If I am understanding your question: you want to expressing in a generic terms an algorythm which allows you to create grouping strategies programmatically?
This is a complicated set; and is represented best in calculus; as set theory. Otherwise all your doing is a two dimensional array handling.
the first Array expresses the grouping strategy;
the second Array expresses the grouping elements.
I don't think this is what your being asked to do, because the term "GROUP" in mathematics has a very specific allocation for the term. You should not use the term group; rather express it as a set; set1, set2 if that is what you are doing.
Set1 contains elements of set2; and therefor this is handled with the same mathematics as Sets and unions are expressed. Lookup "Vin Diagrams" and "Union"; avoid using the term group unless you are representing the factor of a set.
http://en.wikipedia.org/wiki/Group_(mathematics)
I think what you are trying to express is the groups within a known set or table; This is on the wikipedia.org example D2.
In which case that means you have to look at the problem like a rubik's cube; and it gets complicated.
I'm working the same problem in javascript; when I am done I might publish it ;). It's very complicated.