Probability data structure - algorithm

The Idea is to have a data structure that you can access its elements only randomly, but based on a probability factor defined by the user for each element. So if the probability of the structure that contains 100 elements to yield x is 0.5, then, theoretically if we try to retrieve a random element a hundred times then x will be returned about \~50 times.
I couldn't find a ready-solution that does this, so this is my take on it:
import kotlin.math.absoluteValue
/**
*#author mhashim6 on 13/10/2019
*/
class ProbabilitySet<T>(private val items: Array<out Pair<T, Float>>) {
private var probabilityIndices: List<Int>
private fun calcFutureSize(count: Int, probability: Float) =
((count / (1f - probability)) - count).toInt().absoluteValue
init {
probabilityIndices = items.withIndex().flatMap { (i, item) ->
item.act { (_, probability) ->
calcFutureSize(items.size, probability).minus(items.size).act { delta ->
Iterable { ConstIterator(delta, i) }
}
}
}
}
fun next(): T = items.random().first
}
class ConstIterator(private var size: Int, private val const: Int) : IntIterator() {
override fun nextInt(): Int {
size--
return const
}
override fun hasNext(): Boolean = size > 0
}
fun <E> probabilitySetOf(vararg items: Pair<E, Float>) = ProbabilitySet(items)
inline fun <T, R> T.act(action: (T) -> R) = action(this)
I tried to make it mutable, but I met a lot of complexities regarding time and memory. So it's immutable for now.
Is this a viable Implementation?
Is there an implementation for this problem already?
How to make it mutable?

I assume that if the sum of elements' probabilities is not equal to 1, the actual element probability must be calculated by dividing its original probability by a sum of all elements' probabilities. For example, ProbabilitySet consisting of "A" to 0.1F and "B" to 0.3F returns "A" in 25% of cases and "B" in 75% of cases.
Here is my implementation of mutable ProbabilitySet with add running in O(1) and next running in O(logN):
class ProbabilitySet<E>(
private val random: Random = Random.Default
) {
private val nodes = mutableListOf<Node>()
private var sum = 0F
fun add(element: E, probability: Float) {
require(probability >= 0) { "[$element]'s probability ($probability) is less than 0" }
val oldSum = sum
sum += probability
nodes += Node(oldSum..sum, element)
}
fun isEmpty() = sum == 0F
fun next(): E {
if (isEmpty()) throw NoSuchElementException("ProbabilitySet is empty")
val index = random.nextFloat() * sum
return nodes[nodes.binarySearch {
when {
it.range.start > index -> 1
it.range.endInclusive < index -> -1
else -> 0
}
}].element
}
private inner class Node(
val range: ClosedRange<Float>,
val element: E
)
}
Factory method:
fun <E> probabilitySetOf(vararg items: Pair<E, Float>, random: Random = Random.Default) =
ProbabilitySet<E>(random).apply {
items.forEach { (element, probability) -> add(element, probability) }
}
Use case:
val set = probabilitySetOf("A" to 0.4F, "B" to 0.3F)
println(set.next())
set.add("C", 0.9F)
println(set.next())

Related

Kotlin A* Algorithm

I am trying to write a "simple" A* search algorithm for my game, so far it finds its way, but somethings not quite right and I can't figure out what I am doing wrong.
The maze which the algorithm will go through consists of nodes, which have coordinates. There is also a class "squareField", which holds further information which is mostly irrelevant for the algorithm except of the "blocked"-state which indicates that this field (node) is not traversable.
Nodes:
//node represents a single coordinate on the playground holding important information for the algorithm to work
data class Node(val x: Int, val y: Int) : Comparable<Node> {
// parent is the node that came previous to the current one
var parent: Node? = null
// g = distance from selected node to start node
var g: Int = 0
// h = distance from selected node to end node -> optimistic assignment: either equal or less than real distance
var h: Int = 0
// f = g + h -> the lower the value the more attractive it is as a path option
var f: Int = g + h
fun getNeighbors(maxRows: Int, maxCols: Int): MutableSet<Node> {
val neighbors = mutableSetOf<Node>()
if (x - 1 > 0) neighbors.add(Node(x - 1, y))
if (x + 1 < maxRows) neighbors.add(Node(x + 1, y))
if (y - 1 > 0) neighbors.add(Node(x, y - 1))
if (y + 1 < maxCols) neighbors.add(Node(x, y + 1))
return neighbors
}
override fun compareTo(other: Node): Int = this.f.compareTo(other.f)
}
Algorithm:
fun findPath(startNode: Node, targetNode: Node, playGroundRows: Int, playGroundCols: Int): MutableSet<Node>? {
val openSet = PriorityQueue<Node>()
val closedSet = mutableSetOf<Node>()
openSet.add(startNode)
Log.i("Path: ", "openSet at Start: $openSet")
while (openSet.any()) {
var currentNode = openSet.first()
for(i in openSet) {
if(i.f > currentNode.f) {
currentNode = i
}
}
openSet.remove(currentNode)
closedSet.add(currentNode)
if (currentNode == targetNode) {
val path = mutableSetOf<Node>()
var tempNode = currentNode
while (tempNode != null) {
path.add(tempNode)
Log.i("Path: ", "node: $tempNode, f = ${tempNode.f}")
tempNode = tempNode.parent
}
return path
}
val neighbors = currentNode.getNeighbors(playGroundRows, playGroundCols)
neighbors.forEach neighbors#{ node ->
if (GameView.playGround.squareArray[node.x][node.y].isBlocked) {
return#neighbors
}
for(closed_child in closedSet){
if(closed_child == node) {
return#neighbors
}
}
node.parent = currentNode
node.g = currentNode.g + 1
node.h = ((node.x - targetNode.x)) + ((node.y - targetNode.y))
node.f = node.g + node.h
openSet.forEach { openNode ->
if(node == openNode && node.g > openNode.g) {
return#neighbors
}
}
openSet.add(node)
}
}
return null
}
Problem is: The resulting route is not the direct one, it seems to make some detours and walks wiggly lines before finally arriving at its destination.
I've tried for several days now, can't seem to find a correct solution. Maybe someone got an idea what I'm doing wrong?
I do appreciate any help regarding this issue.

Kotlin Quick sort algorithm does not do any sorting and returns the original array

i have this simple quick sort algorithm written in kotlin. It returns the original array without any sorting done. Can anyone point out any errors in my code? it would be appreciated.
class QuickSort {
fun sort(low: Int, high: Int, array: Array<Int>) {
if (low < high) {
val partitionIndex = partition(low, high, array)
sort(low, partitionIndex - 1, array)
sort(partitionIndex + 1, high, array)
}
}
private fun partition(low: Int, high: Int, array: Array<Int>): Int {
var leftPointer = low - 1
val pivot = array[high]
for (i in low until high) {
if (array[i] <= pivot) {
leftPointer++
swap(array, i, leftPointer)
}
}
swap(array, leftPointer + 1, high)
return leftPointer + 1
}
private fun swap(array: Array<Int>, firstIndex: Int, secondIndex: Int) {
val temp = array[firstIndex]
array[firstIndex] = array[secondIndex]
array[secondIndex] = temp
}
}
fun main(args: Array<String>) {
val quickSort = QuickSort()
val array = mutableListOf(9, 8, 7, 6, 5)
//Should return 5,6,7,8,9
quickSort.sort(0, array.size - 1, array.toTypedArray())
for (i in 0 until array.size) {
println(array[i])
}
}
You have defined array field as a mutable list, and passing array.toTypedArray() to the sort function. array.toTypedArray() would be a different object reference altogether that gets sorted, and array object remains as is.
To fix, just define the variable properly using arrayOf function and send the same variable to the sort function
fun main(args: Array<String>) {
val quickSort = QuickSort()
val array = arrayOf(9, 8, 7, 6, 5)
quickSort.sort(0, array.size - 1, array)
for (element in array) {
println(element)
}
}

Contextual type for closure argument list expects 1 argument, but 2 were specified

I'm trying to get this algorithm working with swift 2.1: http://users.eecs.northwestern.edu/~wkliao/Kmeans/
though I am getting the error on this line:
return map(Zip2Sequence(centroids, clusterSizes)) { Cluster(centroid: $0, size: $1) }
Here's the full function:
func kmeans<T : ClusteredType>(
points: [T],
k: Int,
seed: UInt32,
distance: ((T, T) -> Float),
threshold: Float = 0.0001
) -> [Cluster<T>] {
let n = points.count
assert(k <= n, "k cannot be larger than the total number of points")
var centroids = points.randomValues(seed, num: k)
var memberships = [Int](count: n, repeatedValue: -1)
var clusterSizes = [Int](count: k, repeatedValue: 0)
var error: Float = 0
var previousError: Float = 0
repeat {
error = 0
var newCentroids = [T](count: k, repeatedValue: T.identity)
var newClusterSizes = [Int](count: k, repeatedValue: 0)
for i in 0..<n {
let point = points[i]
let clusterIndex = findNearestCluster(point, centroids: centroids, k: k, distance: distance)
if memberships[i] != clusterIndex {
error += 1
memberships[i] = clusterIndex
}
newClusterSizes[clusterIndex]++
newCentroids[clusterIndex] = newCentroids[clusterIndex] + point
}
for i in 0..<k {
let size = newClusterSizes[i]
if size > 0 {
centroids[i] = newCentroids[i] / size
}
}
clusterSizes = newClusterSizes
previousError = error
} while abs(error - previousError) > threshold
return map(Zip2Sequence(centroids, clusterSizes)) { Cluster(centroid: $0, size: $1) }
}
How would I change this to remove this error?
As I understand, you are trying to to the following:
return (0..<k).map { Cluster(centroid: centroids[$0], size: clusterSizes[$0]) }
From the Swift's Zip2Sequence<Sequence1, Sequence2> documentation:
A sequence of pairs built out of two underlying sequences, where the
elements of the ith pair are the ith elements of each underlying
sequence.
Zip2Sequence<[T], [Int]> generator's element is (T, Int) tuple.
You can access the individual elements of this tuple by their index.
So, the following code should work for you:
return Zip2Sequence(centroids, clusterSizes).map { Cluster(centroid: $0.0, size: $0.1) }

Efficient tuple search algorithm

Given a store of 3-tuples where:
All elements are numeric ex :( 1, 3, 4) (1300, 3, 15) (1300, 3, 15) …
Tuples are removed and added frequently
At any time the store is typically under 100,000 elements
All Tuples are available in memory
The application is interactive requiring 100s of searches per second.
What are the most efficient algorithms/data structures to perform wild card (*) searches such as:
(1, *, 6) (3601, *, *) (*, 1935, *)
The aim is to have a Linda like tuple space but on an application level
Well, there are only 8 possible arrangements of wildcards, so you can easily construct 6 multi-maps and a set to serve as indices: one for each arrangement of wildcards in the query. You don't need an 8th index because the query (*,*,*) trivially returns all tuples. The set is for tuples with no wildcards; only a membership test is needed in this case.
A multimap takes a key to a set. In your example, e.g., the query (1,*,6) would consult the multimap for queries of the form (X,*,Y), which takes key <X,Y> to the set of all tuples with X in the first position and Y in third. In this case, X=1 and Y=6.
With any reasonable hash-based multimap implementation, lookups ought to be very fast. Several hundred a second ought to be easy, and several thousand per second doable (with e.g a contemporary x86 CPU).
Insertions and deletions require updating the maps and set. Again this ought to be reasonably fast, though not as fast as lookups of course. Again several hundred per second ought to be doable.
With only ~10^5 tuples, this approach ought to be fine for memory as well. You can save a bit of space with tricks, e.g. keeping a single copy of each tuple in an array and storing indices in the map/set to represent both key and value. Manage array slots with a free list.
To make this concrete, here is pseudocode. I'm going to use angle brackets <a,b,c> for tuples to avoid too many parens:
# Definitions
For a query Q <k2,k1,k0> where each of k_i is either * or an integer,
Let I(Q) be a 3-digit binary number b2|b1|b0 where
b_i=0 if k_i is * and 1 if k_i is an integer.
Let N(i) be the number of 1's in the binary representation of i
Let M(i) be a multimap taking a tuple with N(i) elements to a set
of tuples with 3 elements.
Let t be a 3 element tuple. Then T(t,i) returns a new tuple with
only the elements of t in positions where i has a 1. For example
T(<1,2,3>,0) = <> and T(<1,2,3>,6) = <2,3>
Note that function T works fine on query tuples with wildcards.
# Algorithm to insert tuple T into the database:
fun insert(t)
for i = 0 to 7
add the entry T(t,i)->t to M(i)
# Algorithm to delete tuple T from the database:
fun delete(t)
for i = 0 to 7
delete the entry T(t,i)->t from M(i)
# Query algorithm
fun query(Q)
let i = I(Q)
return M(i).lookup(T(Q, i)) # lookup failure returns empty set
Note that for simplicity, I've not shown the "optimizations" for M(0) and M(7). For M(0), the algorithm above would create a multimap taking the empty tuple to the set of all 3-tuples in the database. You can avoid this merely by treating i=0 as a special case. Similarly M(7) would take each tuple to a set containing only itself.
An "optimized" version:
fun insert(t)
for i = 1 to 6
add the entry T(t,i)->t to M(i)
add t to set S
fun delete(t)
for i = 1 to 6
delete the entry T(t,i)->t from M(i)
remove t from set S
fun query(Q)
let i = I(Q)
if i = 0, return S
elsif i = 7 return if Q\in S { Q } else {}
else return M(i).lookup(T(Q, i))
Addition
For fun, a Java implementation:
package hacking;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Random;
import java.util.Scanner;
import java.util.Set;
public class Hacking {
public static void main(String [] args) {
TupleDatabase db = new TupleDatabase();
int n = 200000;
long start = System.nanoTime();
for (int i = 0; i < n; ++i) {
db.insert(db.randomTriple());
}
long stop = System.nanoTime();
double elapsedSec = (stop - start) * 1e-9;
System.out.println("Inserted " + n + " tuples in " + elapsedSec
+ " seconds (" + (elapsedSec / n * 1000.0) + "ms per insert).");
Scanner in = new Scanner(System.in);
for (;;) {
System.out.print("Query: ");
int a = in.nextInt();
int b = in.nextInt();
int c = in.nextInt();
System.out.println(db.query(new Tuple(a, b, c)));
}
}
}
class Tuple {
static final int [] N_ONES = new int[] { 0, 1, 1, 2, 1, 2, 2, 3 };
static final int STAR = -1;
final int [] vals;
Tuple(int a, int b, int c) {
vals = new int[] { a, b, c };
}
Tuple(Tuple t, int code) {
vals = new int[N_ONES[code]];
int m = 0;
for (int k = 0; k < 3; ++k) {
if (((1 << k) & code) > 0) {
vals[m++] = t.vals[k];
}
}
}
#Override
public boolean equals(Object other) {
if (other instanceof Tuple) {
Tuple triple = (Tuple) other;
return Arrays.equals(this.vals, triple.vals);
}
return false;
}
#Override
public int hashCode() {
return Arrays.hashCode(this.vals);
}
#Override
public String toString() {
return Arrays.toString(vals);
}
int code() {
int c = 0;
for (int k = 0; k < 3; k++) {
if (vals[k] != STAR) {
c |= (1 << k);
}
}
return c;
}
Set<Tuple> setOf() {
Set<Tuple> s = new HashSet<>();
s.add(this);
return s;
}
}
class Multimap extends HashMap<Tuple, Set<Tuple>> {
#Override
public Set<Tuple> get(Object key) {
Set<Tuple> r = super.get(key);
return r == null ? Collections.<Tuple>emptySet() : r;
}
void put(Tuple key, Tuple value) {
if (containsKey(key)) {
super.get(key).add(value);
} else {
super.put(key, value.setOf());
}
}
void remove(Tuple key, Tuple value) {
Set<Tuple> set = super.get(key);
set.remove(value);
if (set.isEmpty()) {
super.remove(key);
}
}
}
class TupleDatabase {
final Set<Tuple> set;
final Multimap [] maps;
TupleDatabase() {
set = new HashSet<>();
maps = new Multimap[7];
for (int i = 1; i < 7; i++) {
maps[i] = new Multimap();
}
}
void insert(Tuple t) {
set.add(t);
for (int i = 1; i < 7; i++) {
maps[i].put(new Tuple(t, i), t);
}
}
void delete(Tuple t) {
set.remove(t);
for (int i = 1; i < 7; i++) {
maps[i].remove(new Tuple(t, i), t);
}
}
Set<Tuple> query(Tuple q) {
int c = q.code();
switch (c) {
case 0: return set;
case 7: return set.contains(q) ? q.setOf() : Collections.<Tuple>emptySet();
default: return maps[c].get(new Tuple(q, c));
}
}
Random gen = new Random();
int randPositive() {
return gen.nextInt(1000);
}
Tuple randomTriple() {
return new Tuple(randPositive(), randPositive(), randPositive());
}
}
Some output:
Inserted 200000 tuples in 2.981607358 seconds (0.014908036790000002ms per insert).
Query: -1 -1 -1
[[504, 296, 987], [500, 446, 184], [499, 482, 16], [488, 823, 40], ...
Query: 500 446 -1
[[500, 446, 184], [500, 446, 762]]
Query: -1 -1 500
[[297, 56, 500], [848, 185, 500], [556, 351, 500], [779, 986, 500], [935, 279, 500], ...
If you think of the tuples like a ip address, then a radix tree (trie) type structure might work. Radix tree is used for IP discovery.
Another way maybe to calculate use bit operations and calculate a bit hash for the tuple and in your search do bit (or, and) for quick discovery.

search tree in scala

I'm trying to put my first steps into Scala, and to practice I took a look at the google code jam storecredit excersize. I tried it in java first, which went well enough, and now I'm trying to port it to Scala. Now with the java collections framework, I could try to do a straight syntax conversion, but I'd end up writing java in scala, and that kind of defeats the purpose. In my Java implementation, I have a PriorityQueue that I empty into a Deque, and pop the ends off untill we have bingo. This all uses mutable collections, which give me the feeling is very 'un-scala'. What I think would be a more functional approach is to construct a datastructure that can be traversed both from highest to lowest, and from lowest to highest. Am I on the right path? Are there any suitable datastructures supplied in the Scala libraries, or should I roll my own here?
EDIT: full code of the much simpler version in Java. It should run in O(max(credit,inputchars)) and has become:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Arrays;
public class StoreCredit {
private static BufferedReader in;
public static void main(String[] args) {
in = new BufferedReader(new InputStreamReader(System.in));
try {
int numCases = Integer.parseInt(in.readLine());
for (int i = 0; i < numCases; i++) {
solveCase(i);
}
} catch (IOException e) {
e.printStackTrace();
}
}
private static void solveCase(int casenum) throws NumberFormatException,
IOException {
int credit = Integer.parseInt(in.readLine());
int numItems = Integer.parseInt(in.readLine());
int itemnumber = 0;
int[] item_numbers_by_price = new int[credit];
Arrays.fill(item_numbers_by_price, -1); // makes this O(max(credit,
// items)) instead of O(items)
int[] read_prices = readItems();
while (itemnumber < numItems) {
int next_price = read_prices[itemnumber];
if (next_price <= credit) {
if (item_numbers_by_price[credit - next_price] >= 0) {
// Bingo! DinoDNA!
printResult(new int[] {
item_numbers_by_price[credit - next_price],
itemnumber }, casenum);
break;
}
item_numbers_by_price[next_price] = itemnumber;
}
itemnumber++;
}
}
private static int[] readItems() throws IOException {
String line = in.readLine();
String[] items = line.split(" "); // uh-oh, now it's O(max(credit,
// inputchars))
int[] result = new int[items.length];
for (int i = 0; i < items.length; i++) {
result[i] = Integer.parseInt(items[i]);
}
return result;
}
private static void printResult(int[] result, int casenum) {
int one;
int two;
if (result[0] > result[1]) {
one = result[1];
two = result[0];
} else {
one = result[0];
two = result[1];
}
one++;
two++;
System.out.println(String.format("Case #%d: %d %d", casenum + 1, one,
two));
}
}
I'm wondering what you are trying to accomplish using sophisticated data structures such as PriorityQueue and Deque for a problem such as this. It can be solved with a pair of nested loops:
for {
i <- 2 to I
j <- 1 until i
if i != j && P(i-1) + P(j - 1) == C
} println("Case #%d: %d %d" format (n, j, i))
Worse than linear, better than quadratic. Since the items are not sorted, and sorting them would require O(nlogn), you can't do much better than this -- as far as I can see.
Actually, having said all that, I now have figured a way to do it in linear time. The trick is that, for every number p you find, you know what its complement is: C - p. I expect there are a few ways to explore that -- I have so far thought of two.
One way is to build a map with O(n) characteristics, such as a bitmap or a hash map. For each element, make it point to its index. One then only has to find an element for which its complement also has an entry in the map. Trivially, this could be as easily as this:
val PM = P.zipWithIndex.toMap
val (p, i) = PM find { case (p, i) => PM isDefinedAt C - p }
val j = PM(C - p)
However, that won't work if the number is equal to its complement. In other words, if there are two p such that p + p == C. There are quite a few such cases in the examples. One could then test for that condition, and then just use indexOf and lastIndexOf -- except that it is possible that there is only one p such that p + p == C, in which case that wouldn't be the answer either.
So I ended with something more complex, that tests the existence of the complement at the same time the map is being built. Here's the full solution:
import scala.io.Source
object StoreCredit3 extends App {
val source = if (args.size > 0) Source fromFile args(0) else Source.stdin
val input = source getLines ()
val N = input.next.toInt
1 to N foreach { n =>
val C = input.next.toInt
val I = input.next.toInt
val Ps = input.next split ' ' map (_.toInt)
val (_, Some((p1, p2))) = Ps.zipWithIndex.foldLeft((Map[Int, Int](), None: Option[(Int, Int)])) {
case ((map, None), (p, i)) =>
if (map isDefinedAt C - p) map -> Some(map(C - p) -> (i + 1))
else (map updated (p, i + 1), None)
case (answer, _) => answer
}
println("Case #%d: %d %d" format (n, p1, p2))
}
}

Resources