Pig Mapreduce to count letters in a row - hadoop

Instead of counting words I need to count letters.
But I have problems implementing this using Apache Pig version 0.8.1-cdh3u1
Given the following input:
989;850;abcccc
29;395;aabbcc
The ouput should be:
989;850;a;1
989;850;b;1
989;850;c;4
29;395;a;2
29;395;b;2
29;395;c;2
Here is what I tried:
A = LOAD 'input' using PigStorage(';') as (x:int, y:int, content:chararray);
B = foreach A generate x, y, FLATTEN(STRSPLIT(content, '(?<=.)(?=.)', 6)) as letters;
C = foreach B generate x, y, FLATTEN(TOBAG(*)) as letters;
D = foreach C generate x, y, letters.letters as letter;
E = GROUP D BY (x,y,letter);
F = foreach E generate group.x as x, group.y as y, group.letter as letter, COUNT(D.letter) as count;
A, B and C can be dumped, but "dump D" results in "ERROR 2997: Unable to recreate exception from backed error: java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.pig.data.Tuple"
dump C displays(despite the third value being a weird tuple):
(989,850,a)
(989,850,b)
(989,850,c)
(989,850,c)
(989,850,c)
(989,850,c)
(29,395,a)
(29,395,a)
(29,395,b)
(29,395,b)
(29,395,c)
(29,395,c)
Here are the schemas:
grunt> describe A; describe B; describe C; describe D; describe E; describe F;
A: {x: int,y: int,content: chararray}
B: {x: int,y: int,letters: bytearray}
C: {x: int,y: int,letters: (x: int,y: int,letters: bytearray)}
D: {x: int,y: int,letter: bytearray}
E: {group: (x: int,y: int,letter: bytearray),D: {x: int,y: int,letter: bytearray}}
F: {x: int,y: int,letter: bytearray,count: long}
This pig version doesn't seem to support TOBAG($2..$8), hence the TOBAG(*) which also includes x and y, but that could be sorted out synactically later...
I'd like to avoid writing a UDF, otherwise I'd simply use the Java API directly.
But I don't really get the cast error. Can someone please explain it.

I'd propose writing a custom UDF instead. A quick, raw implementation would look like this:
package com.example;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.BagFactory;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
import org.apache.pig.impl.logicalLayer.schema.Schema;
public class CharacterCount extends EvalFunc<DataBag> {
private static final BagFactory bagFactory = BagFactory.getInstance();
private static final TupleFactory tupleFactory = TupleFactory.getInstance();
#Override
public DataBag exec(Tuple input) throws IOException {
try {
Map<Character, Integer> charMap = new HashMap<Character, Integer>();
DataBag result = bagFactory.newDefaultBag();
int x = (Integer) input.get(0);
int y = (Integer) input.get(1);
String content = (String) input.get(2);
for (int i = 0; i < content.length(); i++){
char c = content.charAt(i);
Integer count = charMap.get(c);
count = (count == null) ? 1 : count + 1;
charMap.put(c, count);
}
for (Map.Entry<Character, Integer> entry : charMap.entrySet()) {
Tuple res = tupleFactory.newTuple(4);
res.set(0, x);
res.set(1, y);
res.set(2, String.valueOf(entry.getKey()));
res.set(3, entry.getValue());
result.add(res);
}
return result;
} catch (Exception e) {
throw new RuntimeException("CharacterCount error", e);
}
}
}
Pack it in a jar and execute it:
register '/home/user/test/myjar.jar';
A = LOAD '/user/hadoop/store/sample/charcount.txt' using PigStorage(';')
as (x:int, y:int, content:chararray);
B = foreach A generate flatten(com.example.CharacterCount(x,y,content))
as (x:int, y:int, letter:chararray, count:int);
dump B;
(989,850,b,1)
(989,850,c,4)
(989,850,a,1)
(29,395,b,2)
(29,395,c,2)
(29,395,a,2)

I do not have 0.8 version, but could you try this one:
A = LOAD 'input' using PigStorage(';') as (x:int, y:int, content:chararray);
B = foreach A generate x, y, FLATTEN(STRSPLIT(content, '(?<=.)(?=.)', 6));
C = foreach B generate $0 as x, $1 as y, FLATTEN(TOBAG(*)) as letter;
E = GROUP C BY (x,y,letter);
F = foreach E generate group.x as x, group.y as y, group.letter as letter, COUNT(C.letter) as count;

You can try this
grunt> a = load 'inputfile.txt' using PigStorage(';') as (c1:chararray, c2:chararray, c3:chararray);
grunt> b = foreach a generate c1,c2,FLATTEN(TOKENIZE(REPLACE(c3,'','^'),'^')) as split_char;
grunt> c = group b by (c1,c2,split_char);
grunt> d = foreach c generate group, COUNT(b);
grunt> dump d;
Output looks like this:
((29,395,a),2)
((29,395,b),2)
((29,395,c),2)
((989,850,a),1)
((989,850,b),1)
((989,850,c),4)

Related

Scala how to define an ordering for Rationals

I have to implement compareRationals as something like
(a, b) => {
the body goes here
}
to compare to fractions, transform them so they both have the same denominator, then order the two results by their numerator to make sure they have the same denominator, need to find out the Least Common Denominator so my code works for println(insertionSort2(List(rationals))) and currently works for all the println statements besides that. I really need help to define compareRationals so println(insertionSort2(List(rationals))) shouldBe List(fourth, third, half)
Object {
def insertionSort2[A](xs: List[A])(implicit ord: Ordering[A]): List[A] = {
def insert2(y: A, ys: List[A]): List[A] =
ys match {
case List() => y :: List()
case z :: zs =>
if (ord.lt(y, z)) y :: z :: zs
else z :: insert2(y, zs)
}
xs match {
case List() => List()
case y :: ys => insert2(y, insertionSort2(ys))
}
}
class Rational(x: Int, y: Int) {
private def gcd(a: Int, b: Int): Int = if (b == 0) a else gcd(b, a % b)
private val g = gcd(x, y)
lazy val numer: Int = x / g
lazy val denom: Int = y / g
}
val compareRationals: (Rational, Rational) => Int =
implicit val rationalOrder: Ordering[Rational] =
new Ordering[Rational] {
def compare(x: Rational, y: Rational): Int = compareRationals(x, y)
}
def main(args: Array[String]): Unit = {
val half = new Rational(1, 2)
val third = new Rational(1, 3)
val fourth = new Rational(1, 4)
val rationals = List(third, half, fourth)
println(insertionSort2(List(4,2,9,5,8))(Ordering.Int))
println(insertionSort2(List(4,2,9,5,8)))
println(insertionSort2(List(rationals)))
}
}
}
I think this is all you need.
val compareRationals: (Rational, Rational) => Int =
(x,y) => x.numer * y.denom - y.numer * x.denom

Tail recursive solution in Scala for Linked-List chaining

I wanted to write a tail-recursive solution for the following problem on Leetcode -
You are given two non-empty linked lists representing two non-negative integers. The digits are stored in reverse order and each of their nodes contains a single digit. Add the two numbers and return it as a linked list.
You may assume the two numbers do not contain any leading zero, except the number 0 itself.
Example:
*Input: (2 -> 4 -> 3) + (5 -> 6 -> 4)*
*Output: 7 -> 0 -> 8*
*Explanation: 342 + 465 = 807.*
Link to the problem on Leetcode
I was not able to figure out a way to call the recursive function in the last line.
What I am trying to achieve here is the recursive calling of the add function that adds the heads of the two lists with a carry and returns a node. The returned node is chained with the node in the calling stack.
I am pretty new to scala, I am guessing I may have missed some useful constructs.
/**
* Definition for singly-linked list.
* class ListNode(_x: Int = 0, _next: ListNode = null) {
* var next: ListNode = _next
* var x: Int = _x
* }
*/
import scala.annotation.tailrec
object Solution {
def addTwoNumbers(l1: ListNode, l2: ListNode): ListNode = {
add(l1, l2, 0)
}
//#tailrec
def add(l1: ListNode, l2: ListNode, carry: Int): ListNode = {
var sum = 0;
sum = (if(l1!=null) l1.x else 0) + (if(l2!=null) l2.x else 0) + carry;
if(l1 != null || l2 != null || sum > 0)
ListNode(sum%10,add(if(l1!=null) l1.next else null, if(l2!=null) l2.next else null,sum/10))
else null;
}
}
You have a couple of problems, which can mostly be reduced as being not idiomatic.
Things like var and null are not common in Scala and usually, you would use a tail-recursive algorithm to avoid that kind of things.
Finally, remember that a tail-recursive algorithm requires that the last expression is either a plain value or a recursive call. For doing that, you usually keep track of the remaining job as well as an accumulator.
Here is a possible solution:
type Digit = Int // Refined [0..9]
type Number = List[Digit] // Refined NonEmpty.
def sum(n1: Number, n2: Number): Number = {
def aux(d1: Digit, d2: Digit, carry: Digit): (Digit, Digit) = {
val tmp = d1 + d2 + carry
val d = tmp % 10
val c = tmp / 10
d -> c
}
#annotation.tailrec
def loop(r1: Number, r2: Number, acc: Number, carry: Digit): Number =
(r1, r2) match {
case (d1 :: tail1, d2 :: tail2) =>
val (d, c) = aux(d1, d2, carry)
loop(r1 = tail1, r2 = tail2, d :: acc, carry = c)
case (Nil, d2 :: tail2) =>
val (d, c) = aux(d1 = 0, d2, carry)
loop(r1 = Nil, r2 = tail2, d :: acc, carry = c)
case (d1 :: tail1, Nil) =>
val (d, c) = aux(d1, d2 = 0, carry)
loop(r1 = tail1, r2 = Nil, d :: acc, carry = c)
case (Nil, Nil) =>
acc
}
loop(r1 = n1, r2 = n2, acc = List.empty, carry = 0).reverse
}
Now, this kind of recursions tends to be very verbose.
Usually, the stdlib provide ways to make this same algorithm more concise:
// This is a solution that do not require the numbers to be already reversed and the output is also in the correct order.
def sum(n1: Number, n2: Number): Number = {
val (result, carry) = n1.reverseIterator.zipAll(n2.reverseIterator, 0, 0).foldLeft(List.empty[Digit] -> 0) {
case ((acc, carry), (d1, d2)) =>
val tmp = d1 + d2 + carry
val d = tmp % 10
val c = tmp / 10
(d :: acc) -> c
}
if (carry > 0) carry :: result else result
}
Scala is less popular on LeetCode, but this Solution (which is not the best) would get accepted by LeetCode's online judge:
import scala.collection.mutable._
object Solution {
def addTwoNumbers(listA: ListNode, listB: ListNode): ListNode = {
var tempBufferA: ListBuffer[Int] = ListBuffer.empty
var tempBufferB: ListBuffer[Int] = ListBuffer.empty
tempBufferA.clear()
tempBufferB.clear()
def listTraversalA(listA: ListNode): ListBuffer[Int] = {
if (listA == null) {
return tempBufferA
} else {
tempBufferA += listA.x
listTraversalA(listA.next)
}
}
def listTraversalB(listB: ListNode): ListBuffer[Int] = {
if (listB == null) {
return tempBufferB
} else {
tempBufferB += listB.x
listTraversalB(listB.next)
}
}
val resultA: ListBuffer[Int] = listTraversalA(listA)
val resultB: ListBuffer[Int] = listTraversalB(listB)
val resultSum: BigInt = BigInt(resultA.reverse.mkString) + BigInt(resultB.reverse.mkString)
var listNodeResult: ListBuffer[ListNode] = ListBuffer.empty
val resultList = resultSum.toString.toList
var lastListNode: ListNode = null
for (i <-0 until resultList.size) {
if (i == 0) {
lastListNode = new ListNode(resultList(i).toString.toInt)
listNodeResult += lastListNode
} else {
lastListNode = new ListNode(resultList(i).toString.toInt, lastListNode)
listNodeResult += lastListNode
}
}
return listNodeResult.reverse(0)
}
}
References
For additional details, you can see the Discussion Board. There are plenty of accepted solutions, explanations, efficient algorithms with a variety of languages, and time/space complexity analysis in there.

Android Kotlin replace while with for next loop

We have a HashMap Integer/String and in Java we would iterate over the HashMap and display 3 key value pairs at a time with the click of a button. Java Code Below
hm.put(1, "1");
hm.put(2, "Dwight");
hm.put(3, "Lakeside");
hm.put(4, "2");
hm.put(5, "Billy");
hm.put(6, "Georgia");
hm.put(7, "3");
hm.put(8, "Sam");
hm.put(9, "Canton");
hm.put(10, "4");
hm.put(11, "Linda");
hm.put(12, "North Canton");
hm.put(13, "5");
hm.put(14, "Lisa");
hm.put(15, "Phoenix");
onNEXT(null);
public void onNEXT(View view){
etCity.setText("");
etName.setText("");
etID.setText("");
X = X + 3;
for(int L = 1; L <= X; L++ ){
String id = hm.get(L);
String name = hm.get(L = L + 1);
String city = hm.get(L = L + 1);
etID.setText(id);
etName.setText(name);
etCity.setText(city);
}
if(X == hm.size()){
X = 0;
}
}
We decoded to let Android Studio convert the above Java Code to Kotlin
The converter decide to change the for(int L = 1; L <= X; L++) loop to a while loop which seemed OK at first then we realized the while loop was running for 3 loops with each button click. Also Kotlin complained a lot about these line of code String name = hm.get(L = L + 1); String city = hm.get(L = L + 1);
We will post the Kotlin Code below and ask the question
fun onNEXT(view: View?) {
etCity.setText("")
etName.setText("")
etID.setText("")
X = X + 3
var L = 0
while (L <= X) {
val id = hm[L - 2]
val name = hm.get(L - 1)
val city = hm.get(L)
etID.setText(id)
etName.setText(name)
etCity.setText(city)
L++
}
if (X == hm.size) {
X = 0
}
}
We tried to write a For Next Loop like this for (L in 15 downTo 0 step 1)
it seems you can not count upTo so we thought we would use the hm:size for the value 15 and just use downTo
So the questions are
How do we use the Kotlin For Next Loop syntax and include the hm:size in the construct?
We have L declared as a integer but Kotlin will not let us use
L = L + 1 in the While loop nor the For Next Loop WHY ?
HERE is the strange part notice we can increment X by using X = X + 3
YES X was declared above as internal var X = 0 as was L the same way
Okay, I'll bite.
The following code will print your triples:
val hm = HashMap<Int, String>()
hm[1] = "1"
hm[2] = "Dwight"
hm[3] = "Lakeside"
hm[4] = "2"
hm[5] = "Billy"
hm[6] = "Georgia"
hm[7] = "3"
hm[8] = "Sam"
hm[9] = "Canton"
hm[10] = "4"
hm[11] = "Linda"
hm[12] = "North Canton"
hm[13] = "5"
hm[14] = "Lisa"
hm[15] = "Phoenix"
for (i in 1..hm.size step 3) {
println(Triple(hm[i], hm[i + 1], hm[i + 2]))
}
Now let's convert the same idea into a function:
var count = 0
fun nextTriplet(hm: HashMap<Int, String>): Triple<String?, String?, String?> {
val result = mutableListOf<String?>()
for (i in 1..3) {
result += hm[(count++ % hm.size) + 1]
}
return Triple(result[0], result[1], result[2])
}
We used a far from elegant set of code to accomplish an answer to the question.
We used a CharArray since Grendel seemed OK with that concept of and Array
internal var YY = 0
val CharArray = arrayOf(1, "Dwight", "Lakeside",2,"Billy","Georgia",3,"Sam","Canton")
In the onCreate method we loaded the first set of data with a call to onCO(null)
Here is the working code to iterate over the CharArray that was used
fun onCO(view: View?){
etCity.setText("")
etName.setText("")
etID.setText("")
if(CharArray.size > YY){
val id = CharArray[YY]
val name = CharArray[YY + 1]
val city = CharArray[YY + 2]
etID.setText(id.toString())
etName.setText(name.toString())
etCity.setText(city.toString())
YY = YY + 3
}else{
YY = 0
val id = CharArray[YY]
val name = CharArray[YY + 1]
val city = CharArray[YY + 2]
etID.setText(id.toString())
etName.setText(name.toString())
etCity.setText(city.toString())
YY = YY + 3
}
Simple but not elegant. Seems the code is a better example of a counter than iteration.
Controlling the For Next Look may involve less lines of code. Control of the look seemed like the wrong direction. We might try to use the KEY WORD "when" to apply logic to this question busy at the moment
After some further research here is a partial answer to our question
This code only show how to traverse a hash map indexing this traverse every 3 records needs to be added to make the code complete. This answer is for anyone who stumbles upon the question. The code and a link to the resource is provide below
fun main(args: Array<String>) {
val map = hashMapOf<String, Int>()
map.put("one", 1)
map.put("two", 2)
for ((key, value) in map) {
println("key = $key, value = $value")
}
}
The link will let you try Kotlin code examples in your browser
LINK
We only did moderate research before asking this question. Our Appoligies. If anyone is starting anew with Kotlin this second link may be of greater value. We seldom find understandable answers in the Android Developers pages. The Kotlin and Android pages are beginner friendlier and not as technical in scope. Enjoy the link
Kotlin and Android

what is the purpose of FLATTEN operator in PIG Latin

A = load 'data' as (x, y);
B = load 'data' as (x, z);
C = cogroup A by x, B by x;
D = foreach C generate flatten(A), flatten(b);
E = group D by A::x
what exactly done in the above statements and where we use flatten in realtime scenario.
A = load 'input1' USING PigStorage(',') as (x, y);
(x,y) --> (1,2)(1,3)(2,3)
B = load 'input2' USING PigStorage(',') as (x, z);`
(x,z) --> (1,4)(1,2)(3,2)*/
C = cogroup A by x, B by x;`
result:
(1,{(1,2),(1,3)},{(1,4),(1,2)})
(2,{(2,3)},{})
(3,{},{(3,2)})
D = foreach C generate group, flatten(A), flatten(B);`
when both bags flattened, the cross product of tuples are returned.
result:
(1,1,2,1,4)
(1,1,2,1,2)
(1,1,3,1,4)
(1,1,3,1,2)
E = group D by A::x`
here your are grouping with x column of relation A.
(1,1,2,1,4)
(1,1,2,1,2)
(1,1,3,1,4)
(1,1,3,1,2)

Algorithm to generate a sequence proportional to specified percentage

Given a Map of objects and designated proportions (let's say they add up to 100 to make it easy):
val ss : Map[String,Double] = Map("A"->42, "B"->32, "C"->26)
How can I generate a sequence such that for a subset of size n there are ~42% "A"s, ~32% "B"s and ~26% "C"s? (Obviously, small n will have larger errors).
(Work language is Scala, but I'm just asking for the algorithm.)
UPDATE: I resisted a random approach since, for instance, there's ~16% chance that the sequence would start with AA and ~11% chance it would start with BB and there would be very low odds that for n precisely == (sum of proportions) the distribution would be perfect. So, following #MvG's answer, I implemented as follows:
/**
Returns the key whose achieved proportions are most below desired proportions
*/
def next[T](proportions : Map[T, Double], achievedToDate : Map[T,Double]) : T = {
val proportionsSum = proportions.values.sum
val desiredPercentages = proportions.mapValues(v => v / proportionsSum)
//Initially no achieved percentages, so avoid / 0
val toDateTotal = if(achievedToDate.values.sum == 0.0){
1
}else{
achievedToDate.values.sum
}
val achievedPercentages = achievedToDate.mapValues(v => v / toDateTotal)
val gaps = achievedPercentages.map{ case (k, v) =>
val gap = desiredPercentages(k) - v
(k -> gap)
}
val maxUnder = gaps.values.toList.sortWith(_ > _).head
//println("Max gap is " + maxUnder)
val gapsForMaxUnder = gaps.mapValues{v => Math.abs(v - maxUnder) < Double.Epsilon }
val keysByHasMaxUnder = gapsForMaxUnder.map(_.swap)
keysByHasMaxUnder(true)
}
/**
Stream of most-fair next element
*/
def proportionalStream[T](proportions : Map[T, Double], toDate : Map[T, Double]) : Stream[T] = {
val nextS = next(proportions, toDate)
val tailToDate = toDate + (nextS -> (toDate(nextS) + 1.0))
Stream.cons(
nextS,
proportionalStream(proportions, tailToDate)
)
}
That when used, e.g., :
val ss : Map[String,Double] = Map("A"->42, "B"->32, "C"->26)
val none : Map[String,Double] = ss.mapValues(_ => 0.0)
val mySequence = (proportionalStream(ss, none) take 100).toList
println("Desired : " + ss)
println("Achieved : " + mySequence.groupBy(identity).mapValues(_.size))
mySequence.map(s => print(s))
println
produces :
Desired : Map(A -> 42.0, B -> 32.0, C -> 26.0)
Achieved : Map(C -> 26, A -> 42, B -> 32)
ABCABCABACBACABACBABACABCABACBACABABCABACABCABACBA
CABABCABACBACABACBABACABCABACBACABABCABACABCABACBA
For a deterministic approach, the most obvious solution would probably be this:
Keep track of the number of occurrences of each item in the sequence so far.
For the next item, choose that item for which the difference between intended and actual count (or proportion, if you prefer that) is maximal, but only if the intended count (resp. proportion) is greater than the actual one.
If there is a tie, break it in an arbitrary but deterministic way, e.g. choosing the alphabetically lowest item.
This approach would ensure an optimal adherence to the prescribed ratio for every prefix of the infinite sequence generated in this way.
Quick & dirty python proof of concept (don't expect any of the variable “names” to make any sense):
import sys
p = [0.42, 0.32, 0.26]
c = [0, 0, 0]
a = ['A', 'B', 'C']
n = 0
while n < 70*5:
n += 1
x = 0
s = n*p[0] - c[0]
for i in [1, 2]:
si = n*p[i] - c[i]
if si > s:
x = i
s = si
sys.stdout.write(a[x])
if n % 70 == 0:
sys.stdout.write('\n')
c[x] += 1
Generates
ABCABCABACABACBABCAABCABACBACABACBABCABACABACBACBAABCABCABACABACBABCAB
ACABACBACABACBABCABACABACBACBAABCABCABACABACBABCAABCABACBACABACBABCABA
CABACBACBAABCABCABACABACBABCABACABACBACBAACBABCABACABACBACBAABCABCABAC
ABACBABCABACABACBACBAACBABCABACABACBACBAABCABCABACABACBABCABACABACBACB
AACBABCABACABACBACBAABCABCABACABACBABCAABCABACBACBAACBABCABACABACBACBA
For every item of the sequence, compute a (pseudo-)random number r equidistributed between 0 (inclusive) and 100 (exclusive).
If 0 ≤ r < 42, take A
If 42 ≤ r < (42+32), take B
If (42+32) ≤ r < (42+32+26)=100, take C
The number of each entry in your subset is going to be the same as in your map, but with a scaling factor applied.
The scaling factor is n/100.
So if n was 50, you would have { Ax21, Bx16, Cx13 }.
Randomize the order to your liking.
The simplest "deterministic" [in terms of #elements of each category] solution [IMO] will be: add elements in predefined order, and then shuffle the resulting list.
First, add map(x)/100 * n elements from each element x chose how you handle integer arithmetics to avoid off by one element], and then shuffle the resulting list.
Shuffling a list is simple with fisher-yates shuffle, which is implemented in most languages: for example java has Collections.shuffle(), and C++ has random_shuffle()
In java, it will be as simple as:
int N = 107;
List<String> res = new ArrayList<String>();
for (Entry<String,Integer> e : map.entrySet()) { //map is predefined Map<String,Integer> for frequencies
for (int i = 0; i < Math.round(e.getValue()/100.0 * N); i++) {
res.add(e.getKey());
}
}
Collections.shuffle(res);
This is nondeterministic, but gives a distribution of values close to MvG's. It suffers from the problem that it could give AAA right at the start. I post it here for completeness' sake given how it proves my dissent with MvG was misplaced (and I don't expect any upvotes).
Now, if someone has an idea for an expand function that is deterministic and won't just duplicate MvG's method (rendering the calc function useless), I'm all ears!
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>ErikE's answer</title>
</head>
<body>
<div id="output"></div>
<script type="text/javascript">
if (!Array.each) {
Array.prototype.each = function(callback) {
var i, l = this.length;
for (i = 0; i < l; i += 1) {
callback(i, this[i]);
}
};
}
if (!Array.prototype.sum) {
Array.prototype.sum = function() {
var sum = 0;
this.each(function(i, val) {
sum += val;
});
return sum;
};
}
function expand(counts) {
var
result = "",
charlist = [],
l,
index;
counts.each(function(i, val) {
char = String.fromCharCode(i + 65);
for ( ; val > 0; val -= 1) {
charlist.push(char);
}
});
l = charlist.length;
for ( ; l > 0; l -= 1) {
index = Math.floor(Math.random() * l);
result += charlist[index];
charlist.splice(index, 1);
}
return result;
}
function calc(n, proportions) {
var percents = [],
counts = [],
errors = [],
fnmap = [],
errorSum,
worstIndex;
fnmap[1] = "min";
fnmap[-1] = "max";
proportions.each(function(i, val) {
percents[i] = val / proportions.sum() * n;
counts[i] = Math.round(percents[i]);
errors[i] = counts[i] - percents[i];
});
errorSum = counts.sum() - n;
while (errorSum != 0) {
adjust = errorSum < 0 ? 1 : -1;
worstIndex = errors.indexOf(Math[fnmap[adjust]].apply(0, errors));
counts[worstIndex] += adjust;
errors[worstIndex] = counts[worstIndex] - percents[worstIndex];
errorSum += adjust;
}
return expand(counts);
}
document.body.onload = function() {
document.getElementById('output').innerHTML = calc(99, [25.1, 24.9, 25.9, 24.1]);
};
</script>
</body>
</html>

Resources