I have a vector with a couple of elements and I want to write a function that returns me all combinations of x items from this vector.
The following code produces the right output for the case x=2 or x=3 or x=4.
However, I can not implement a solution for every possible x following this idea.
values = {'A','B','C','D','E'};
n = length(values);
data2 = {}; % case x=2
for i = 1:n
for j = i+1:n
data2{end+1} = {values{i}, values{j}};
fprintf('%s %s\n',values{i}, values{j})
end
end
data3 = {}; % case x=3
for i = 1:n
for j = i+1:n
for k = j+1:n
data3{end+1} = {values{i}, values{j}, values{k}};
fprintf('%s %s %s\n',values{i}, values{j}, values{k})
end
end
end
data4 = {}; % case x=4
for i = 1:n
for j = i+1:n
for k = j+1:n
for l = k+1:n
data4{end+1} = {values{i}, values{j}, values{k}, values{l}};
fprintf('%s %s %s %s\n',values{i}, values{j}, values{k}, values{l})
end
end
end
end
How would a function look like which would be able to return my data variable?
data = getCombinations(values, x) %values is vector with elements, x is integer value
EDIT
The following code comes pretty close:
data = perms(values)
data = data(:,1:x)
data = unique(data,'rows')
but it still produces output like A,B and B,A
EDIT2
This fixed it somehow but it is not very nice to look at and it does not work for text entries in cells but only for numbers
data = perms(values)
data = data(:,1:x)
data = sort(data,2)
data = unique(data,'rows')
EDIT3
This did it but it is not very nice to look at... Maybe there is a better solution?
function [data] = getCombinations(values,x)
i = 1:length(values);
d = perms(i);
d = d(:,1:x);
d = sort(d,2);
d = unique(d,'rows');
data = v(d);
end
If you don't want repetitions (and your example suggests you don't) then try nchoosek as nchoosek(1:n, x) to give indices:
values = {'A','B','C','D','E'};
n = length(values);
x = 3;
C = nchoosek(1:n, x);
data = values(C)
In the above, each row is a unique combination of 3 of the 5 elements of values.
Alternatively pass in the values directly:
data = nchoosek(values, x);
I would like to implementing Bernoulli numbers generation in Scala. I select this algorithm. I wrote following code:
def bernoulli(n: Int) = {
val a: Array[Fraction] = Array.fill(n + 1) {
new Fraction()
}
for (m <- 0 to n) {
a(m) = new Fraction(1, m + 1, 1 / (m + 1F))
for (j <- m to 1 by -1) {
val vc = a(j - 1)
vc.value= j * (vc.value - a(j).value)
}
}
a.head
}
class Fraction(var numerator: Int = 0, var denominator: Int = 0, var value: Float = 0)
But when I print bernoulli(5) it is not 0.
The code that you are using seems to work fine. The reason that you don't get 0 is because of floating point errors. If you want more accurate results, you can use BigDecimal rather than floats.
To use BigDecimal, use a(m) = new Fraction(1, m + 1, 1 / BigDecimal(m + 1)) and use BigDecimal for the class.
By using BigDecimal, you will get
bernoulli(5).value //> res0: BigDecimal = -1.00E-32
which is close to 0.
I have written hash calculate function:
var hash = function (string) {
var h = 7;
var i = 0;
var letters = "acdegilmnoprstuw";
while (i < string.length) {
h = (h * 37 + letters.indexOf( string[i++] ));
}
return h;
};
Where string = "agdpeew" and result is 664804774844. But now I don't know how I can decipher hash. So, If my input is 664804774844, answer will agdpeew.
What algorithm can I use for this?
Maybe I can start with the division 664804774844 / 37 but how I can get letter indexes?
For short strings, you can start by expressing the number in base 37 - but why are you trying to do this? Most of the use cases for hash functions don't require you to invert the function, and many hash functions are designed for it to be difficult or impossible to invert the function, except by evaluating on input after input until you find one that produces the hash value you are looking for.
Below is the code written in Swift language, it has both encrypt & decrypt of the hash value
var letters = "acdegilmnoprstuw";
Hashing / Encrypt
func hash(s:String) -> Int{
var h = 7 as Int;
for (var i = 0; i < s.characters.count; i++) {
// Getting the character at index
let s2: Character = s[s.startIndex.advancedBy(i)];
// Getting index of string 'acdegilmnoprstuw'
let l : Int = letters.startIndex.distanceTo(letters.characters.indexOf(s2)!);
h = (h * 37 + l);
}
return h;
}
Unhashing / Decrypt
func unhash(hashValue:Int) -> String{
var h = hashValue
var unhashedString : String = ""
while(h > 37){
unhashedString.append(letters[letters.startIndex.advancedBy(h % 37)])
h = h / 37
}
return String(unhashedString.characters.reverse())
}
Given a Map of objects and designated proportions (let's say they add up to 100 to make it easy):
val ss : Map[String,Double] = Map("A"->42, "B"->32, "C"->26)
How can I generate a sequence such that for a subset of size n there are ~42% "A"s, ~32% "B"s and ~26% "C"s? (Obviously, small n will have larger errors).
(Work language is Scala, but I'm just asking for the algorithm.)
UPDATE: I resisted a random approach since, for instance, there's ~16% chance that the sequence would start with AA and ~11% chance it would start with BB and there would be very low odds that for n precisely == (sum of proportions) the distribution would be perfect. So, following #MvG's answer, I implemented as follows:
/**
Returns the key whose achieved proportions are most below desired proportions
*/
def next[T](proportions : Map[T, Double], achievedToDate : Map[T,Double]) : T = {
val proportionsSum = proportions.values.sum
val desiredPercentages = proportions.mapValues(v => v / proportionsSum)
//Initially no achieved percentages, so avoid / 0
val toDateTotal = if(achievedToDate.values.sum == 0.0){
1
}else{
achievedToDate.values.sum
}
val achievedPercentages = achievedToDate.mapValues(v => v / toDateTotal)
val gaps = achievedPercentages.map{ case (k, v) =>
val gap = desiredPercentages(k) - v
(k -> gap)
}
val maxUnder = gaps.values.toList.sortWith(_ > _).head
//println("Max gap is " + maxUnder)
val gapsForMaxUnder = gaps.mapValues{v => Math.abs(v - maxUnder) < Double.Epsilon }
val keysByHasMaxUnder = gapsForMaxUnder.map(_.swap)
keysByHasMaxUnder(true)
}
/**
Stream of most-fair next element
*/
def proportionalStream[T](proportions : Map[T, Double], toDate : Map[T, Double]) : Stream[T] = {
val nextS = next(proportions, toDate)
val tailToDate = toDate + (nextS -> (toDate(nextS) + 1.0))
Stream.cons(
nextS,
proportionalStream(proportions, tailToDate)
)
}
That when used, e.g., :
val ss : Map[String,Double] = Map("A"->42, "B"->32, "C"->26)
val none : Map[String,Double] = ss.mapValues(_ => 0.0)
val mySequence = (proportionalStream(ss, none) take 100).toList
println("Desired : " + ss)
println("Achieved : " + mySequence.groupBy(identity).mapValues(_.size))
mySequence.map(s => print(s))
println
produces :
Desired : Map(A -> 42.0, B -> 32.0, C -> 26.0)
Achieved : Map(C -> 26, A -> 42, B -> 32)
ABCABCABACBACABACBABACABCABACBACABABCABACABCABACBA
CABABCABACBACABACBABACABCABACBACABABCABACABCABACBA
For a deterministic approach, the most obvious solution would probably be this:
Keep track of the number of occurrences of each item in the sequence so far.
For the next item, choose that item for which the difference between intended and actual count (or proportion, if you prefer that) is maximal, but only if the intended count (resp. proportion) is greater than the actual one.
If there is a tie, break it in an arbitrary but deterministic way, e.g. choosing the alphabetically lowest item.
This approach would ensure an optimal adherence to the prescribed ratio for every prefix of the infinite sequence generated in this way.
Quick & dirty python proof of concept (don't expect any of the variable “names” to make any sense):
import sys
p = [0.42, 0.32, 0.26]
c = [0, 0, 0]
a = ['A', 'B', 'C']
n = 0
while n < 70*5:
n += 1
x = 0
s = n*p[0] - c[0]
for i in [1, 2]:
si = n*p[i] - c[i]
if si > s:
x = i
s = si
sys.stdout.write(a[x])
if n % 70 == 0:
sys.stdout.write('\n')
c[x] += 1
Generates
ABCABCABACABACBABCAABCABACBACABACBABCABACABACBACBAABCABCABACABACBABCAB
ACABACBACABACBABCABACABACBACBAABCABCABACABACBABCAABCABACBACABACBABCABA
CABACBACBAABCABCABACABACBABCABACABACBACBAACBABCABACABACBACBAABCABCABAC
ABACBABCABACABACBACBAACBABCABACABACBACBAABCABCABACABACBABCABACABACBACB
AACBABCABACABACBACBAABCABCABACABACBABCAABCABACBACBAACBABCABACABACBACBA
For every item of the sequence, compute a (pseudo-)random number r equidistributed between 0 (inclusive) and 100 (exclusive).
If 0 ≤ r < 42, take A
If 42 ≤ r < (42+32), take B
If (42+32) ≤ r < (42+32+26)=100, take C
The number of each entry in your subset is going to be the same as in your map, but with a scaling factor applied.
The scaling factor is n/100.
So if n was 50, you would have { Ax21, Bx16, Cx13 }.
Randomize the order to your liking.
The simplest "deterministic" [in terms of #elements of each category] solution [IMO] will be: add elements in predefined order, and then shuffle the resulting list.
First, add map(x)/100 * n elements from each element x chose how you handle integer arithmetics to avoid off by one element], and then shuffle the resulting list.
Shuffling a list is simple with fisher-yates shuffle, which is implemented in most languages: for example java has Collections.shuffle(), and C++ has random_shuffle()
In java, it will be as simple as:
int N = 107;
List<String> res = new ArrayList<String>();
for (Entry<String,Integer> e : map.entrySet()) { //map is predefined Map<String,Integer> for frequencies
for (int i = 0; i < Math.round(e.getValue()/100.0 * N); i++) {
res.add(e.getKey());
}
}
Collections.shuffle(res);
This is nondeterministic, but gives a distribution of values close to MvG's. It suffers from the problem that it could give AAA right at the start. I post it here for completeness' sake given how it proves my dissent with MvG was misplaced (and I don't expect any upvotes).
Now, if someone has an idea for an expand function that is deterministic and won't just duplicate MvG's method (rendering the calc function useless), I'm all ears!
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>ErikE's answer</title>
</head>
<body>
<div id="output"></div>
<script type="text/javascript">
if (!Array.each) {
Array.prototype.each = function(callback) {
var i, l = this.length;
for (i = 0; i < l; i += 1) {
callback(i, this[i]);
}
};
}
if (!Array.prototype.sum) {
Array.prototype.sum = function() {
var sum = 0;
this.each(function(i, val) {
sum += val;
});
return sum;
};
}
function expand(counts) {
var
result = "",
charlist = [],
l,
index;
counts.each(function(i, val) {
char = String.fromCharCode(i + 65);
for ( ; val > 0; val -= 1) {
charlist.push(char);
}
});
l = charlist.length;
for ( ; l > 0; l -= 1) {
index = Math.floor(Math.random() * l);
result += charlist[index];
charlist.splice(index, 1);
}
return result;
}
function calc(n, proportions) {
var percents = [],
counts = [],
errors = [],
fnmap = [],
errorSum,
worstIndex;
fnmap[1] = "min";
fnmap[-1] = "max";
proportions.each(function(i, val) {
percents[i] = val / proportions.sum() * n;
counts[i] = Math.round(percents[i]);
errors[i] = counts[i] - percents[i];
});
errorSum = counts.sum() - n;
while (errorSum != 0) {
adjust = errorSum < 0 ? 1 : -1;
worstIndex = errors.indexOf(Math[fnmap[adjust]].apply(0, errors));
counts[worstIndex] += adjust;
errors[worstIndex] = counts[worstIndex] - percents[worstIndex];
errorSum += adjust;
}
return expand(counts);
}
document.body.onload = function() {
document.getElementById('output').innerHTML = calc(99, [25.1, 24.9, 25.9, 24.1]);
};
</script>
</body>
</html>
It is known that MATLAB works slow with for loop. I have tried to vectorize the following code without success. Perhaps I am wrong with the implementation.
for I = NS2:-1:1
A = 0;
for J=1:8
A = A + KS2(J,I)*FA(J);
end
S2 = S2 + ( SS2(1,I)*sin(A) + SS2(2,I)*cos(A) );
end
where:
FA = matrix 1x8
KS2 = matrix 8x25
SS2 = matrix 2x25
A = scalar
S2 = scalar
I try to improve it in this way:
A = 0;
J = 1:8;
for I = NS2:-1:1
A = FA(1,J)*KS2(J,I);
S2 = S2 + ( SS2(1,I)*sin(A) + SS2(2,I)*cos(A) );
end
However, the runtime for this improvement is similar to the original code.
Try this instead (no loops):
A = (FA*KS2).'; %'# A is now 25-by-1
S2 = SS2(1,:)*sin(A) + SS2(2,:)*cos(A);