Randomly assign a value from a list to each agent entering the process

Randomly assign a value from a list to each agent entering the process - assign

In my Anylogic model I have a option list "Issue". In "Issue" I put twelve values ("a", "b" and so on...). How can I assign randomly one of these values to each agent (in my model agents are customers) and be sure that value assigned is different for each agent entered?

If you want a random option list value (say for each agent you create from a Source block), use a Custom Distribution which returns each option list value with a given probability. (In the Custom Distribution properties interface, it talks about "Number of observations" but these can also just be probabilities; it just uses the relative values of these settings to determine how likely each outcome is.)
If you want to uniquely assign the option list values (but in a random order) you'll need to use some Java to do so:
Store the option list values (which you can get an array of via the option list's values() function) in a list (AnyLogic Collection). Use a LinkedList because that is more efficient to remove them from.
Store the number of alternatives remaining in a variable. (Let's call this n for simplicity.)
Each time you want to allocate a value, sample a random number from 1 to n (sample from a discrete uniform distribution using uniform_discr(1,n)) and remove that entry from the list (list's remove(int index) function), assigning it to the agent. Decrement the num-alternatives-remaining variable.
Obviously you have to ensure that the number of agents created does not exceed the number of option list values (or have some scheme to 'reset' the situation in some way at that point).

Related

Can't distribute items between arrays

Imagine you have a list of objects. Each object looks like:
{'itemName':'name',
'totalItemAppearance':100,
'appearancePerList': 20}
and some number X which stands for number of lists that can contain such items.
What i need to do is randomly picking an item put them into lists with respecting item parameters.
In the end I expect X number of lists whit item which is used(in all lists) exactly 'totalItemAppearance' times but in each list it should be less or equal than 'appearancePerList'
It looks simple but i don't know how to build an algorithm properly and I can't classify the type of "distribution problem" I need for this issue so i could properly ask Google.
Thank you for replies!

First of all, you need not consider all different types of objects at the same time: There are no relations between different kinds of objects. So I will only consider the case where there is only one type of object.
What you want to do is to pick a uniform random sample from a set of objects satisfying some condition. The objects here are all possible distributions of the objects to the lists, and the condition is that the total number of objects should be 'totalItemAppearance' and that no list contains more than 'appearancePerList' objects.
If 'appearancePerList' is not too small then you can apply the following algorithm (and not wait for an eternity):
--> Pick a uniform random distribution of 'totalItemAppearance' items to lists (much easier to do)
--> If there are at most 'appearancePerList' objects in each list accept
--> Otherwise repeat
This algorithm will produce the uniform samples you wanted. I do not know if this sampling technique has a name (maybe a special case of rejection sampling?).

Designing of the "mapper" and "reducer" functions' functionality for hadoop?

I am trying to design a mapper and reducer for Hadoop. I am new to Hadoop, and I'm a bit confused about how the mapper and reducer is supposed for work for my specific application.
The input to my mapper is a large directed graph's connectivity. It is a 2 column input where each row is an individual edge connectivity. The first column is the start node id and the second column is the end node id of each edge. I'm trying to output the number of neighbors for each start node id into a 2 column text file, where the first column is sorted in order of increasing start node id.
My questions are:
(1) The input is already set up such that each line is a key-value pair, where the key is the start node id, and the value is the end node id. Would the mapper simply just read in each line and write it out? That seems redundant.
(2) Does the sorting take place in between the mapper and reducer or could the sorting actually be done with the reducer itself?

If my understanding is correct, you want to count how many distinct values a key will have.
Simply emitting the input key-value pairs in the mapper, and then counting the distinct values per key (e.g., by adding them to a set and emitting the set size as the value of the reducer) in the reducer is one way of doing it, but a bit redundant, as you say.
In general, you want to reduce the network traffic, so you may want to do some more computations before the shuffling (yes, this is done by Hadoop).
Two easy ways to improve the efficiency are:
1) Use a combiner, which will output sets of values, instead of single values. This way, you will send fewer key-value pairs to the reducers, and also, some values may be skipped, since they have been already in the local value set of the same key.
2) Use map-side aggregation. Instead of emitting the input key-value pairs right away, store them locally in the mapper (in memory) in a data structure (e.g., hashmap or multimap). The key can be the map input key and the value can be a set of values seen so far for this key. Each type you meet a new value for this key, you append it to this structure. At the end of each mapper, you emit this structure (or you convert the values to an array), from the close() method (if I remember the name).
You can lookup both methods using the keywords "combiner" and "map-side aggregation".
A global sorting on the key is a bit trickier. Again, two basic options, but are not really good though:
1) you use a single reducer, but then you don't gain anything from parallelism,
2) you use a total order partitioner, which needs some extra coding.
Other than that, you may want to move to Spark for a more intuitive and efficient solution.

list-of-list vs. hash-of-hashes

Setup:
I need to store feature vectors associated with string-string pairs. The string-string pairs encode an input-output relationship. There will be a relatively small number of inputs X (e.g. 5), and for each input x, there will be a relatively small number outputs Y|x (e.g. 10).
The question is, what data structure is fastest?
Additional relevant information:
The outputs are generally different for each input, and it cannot be assumed that each X has the same number of outputs.
Lookup will be done "many" times (perhaps 1000).
Inputs will be sampled equally frequently, but for each input, usually one or 2 outputs will be accessed frequently, and the remainder will be accessed infrequently or not at all.
At present, I am considering three possibilities:
list-of-lists: access outer list with index (representing input X[i]), access inner list with index (representing output Y[i][j]).
hash-of-hashes: same as above.
flat hash: key = (input,output).

If you have strings, it's unclear how you would look up the index to use a list of lists efficiently without utilizing hashing anyway. If you can pass around something that keeps the reference to the index (e.g. if the set of outputs is fixed, and you can define an enumeration of them), instead of the string a list of lists would be faster (assuming you mean list in the 'not necessarily linked list' sense, with O(1) element access). Otherwise you may as well just hash directly and save yourself the effort.
If not, that leaves hash of hashes v. flat hash. What's your access pattern like? Are you always going to ask for X,Y, or would you ever need to access all outputs for X? Hash(X+Y) is likely roughly equivalent to hash(X) + hash(Y) (both are going to generally walk over all the letters to generate the hash. So individual hashes is more flexible, at a slight (almost certainly negligible) overhead. From 3, it sounds like you might need the hash of hashes, anyhow.

Sorting application difficulty

Currently I am reading a book on algorithms and found this usage of sorting.
Reconstructing the original order - How can we restore the original arrangment of a set of items after we permute them for some application? Add an extra field to the data record for the item, such that i-th record sets this field to i. Carry this field along whenever you move the record, and later sort on it when you want the initial order back.
I ve been trying hard to understand what does it mean. And I failed miserably. Pls somebody help?

Suppose you have list of items in random order:
itemC, itemB, itemA, itemD
you sorted them up:
itemA, itemB, itemC, itemD
and you didn't have enough memory to store them in a separate location, so original sequence is lost. Moreover, original order is random and it will be problematic/impossible to restore it.
This article gives a solution to this problem.
Add an extra field to the data record for the item, such that i-th record sets this field to i
So, we add an extra field for each of the items:
(itemC,1), (itemB,2), (itemA,3), (itemD, 4)
And after sort we have:
(itemA,3), (itemB,2), (itemC,1), (itemD, 4)
So we can easily restore initial order sorting by additional field

Let's say you have the data in an array, because it's the simplest structure that I can use to exemplify.
So, your node (i.e., element of the array) may look like this:
(some data type) data
The algorithm is suggesting you to add an integer field, so it looks like this:
(some data type) data,
int position
And then, you fill the positions with the actual index. Something like this pseudocode:
for current: 0 to lastElement
array[current].position = current
(that's not written in any language I know of, but it should be readable)
After doing that, you shuffle it (resort it) for whatever you need to.
When you want to restore the original ordering, all you need to do is sort by the position field.

Well, basically it's saying that you need some sort of thingy to keep track of the original order (which is destroyed by the permutation). One option would be to simply reverse the permutation (check out Steve Jessop's infrmative answer here).
Another option to invert the permutation would require fewer processing steps, but more memory. More specifically, each node in your input set would have an extra ID field, and all the elements in this input set are sorted based on this field. Once you apply the permutation, it's obvious that the IDs are no longer in a sorted order. If you wish to invert the permutation, all you have to do is sort the list again based on this field.

Algorithm / Data Structure for Finding Which of Many Sets are Subsets of another Set

Abstract Description:
I have a set of strings, call it the "active set", and a set of sets of strings - call that the "possible set". When a new string is added to the active set, sets from the possible set may suddenly be subsets of the active set because the active set lacked only that string to be a superset of one of the possibles. I need an algorithm to efficiently find these when I add a new string to the active set. Bonus points if the same data structure allows me to efficiently find which of these possible sets are invalidated (no longer a subset) when a string is removed from the active set.
(The reason I framed the problem described below in terms of sets and subsets of strings in the abstract above is that the language I'm writing this in (Io) is dynamically typed. Objects do have a "type" field but it is a string with the name of the object type in it.)
Background:
In my game engine I have GameObjects which can have several types of Representation objects added to them. For instance if a GameObject has physical presence it might have a PhysicsRepresentation added to it (or not if it's not a solid object). It might have various kinds of GraphicsRepresentations added to it, such as a mesh or particle effect (and you can have more than one if you have multiple visual effects attached to the same game object).
The point of this is to separate subsystems, but you can't completely separate everything: for instance when a GameObject has both a PhysicsRepresentation and a GraphicsRepresentation, something needs to create a 3rd object which connects the position of the GraphicsRepresentation to the location of the PhysicsRepresentation. To serve this purpose while still keeping all the components separate, I have Interaction objects. The Interaction object encapsulates the cross-cutting knowledge about how two system components have to interact.
But in order to protect GameObject from having to know too much about Representations and Interactions, GameObject just provides a generic registry where Interaction prototype objects can register to be called when a particular combination of Representations is present in the GameObject. When a new Representation is added to the GameObject, GameObject should look in it's registry and activate just those Interaction objects which are newly enabled by the presence of the new Representation, plus the existing Representations.
I'm just stuck on what data structure should be used for this registry and how to search it.
Errata:
The sets of strings are not necessarily sorted, but I can choose to store them sorted.
Although an Interaction most commonly will be between two Representations, I do not want to limit it to that; I should be able to have Interactions that trigger with 3 or more different representations, or even interactions that trigger based on just 1 representation.
I want to optimize this for the case of making it as fast as possible to add/remove representations.
I will have many active sets (each game object has an active set), but I have only one possible set (the set of all registered interaction types). So I don't care how long it takes to build the data structure that represents the possible set, because it only needs to be done once provided the algorithm for comparing different active sets is non-destructive of the possible set data structure.

If your sets are really small, the best representation is using bit sets. First, you build a map from strings to consecutive integers 0..N, where N is the number of distinct strings. Then you build your sets by bitwise OR-ing of 1<<k into a number. This lets you turn your set operations into bitwise operations, which are extremely fast (an intersection is an &; a union is an |, and so on).
Here is an example: Let's say you have two sets, A={quick, brown, fox} and B={brown, lazy, dog}. First, you build a string-to-number map, like this:
quick - 0
brown - 1
fox - 2
lazy - 3
dog - 4
Then your sets would become A=00111b and B=11010b. Their intersection is A&B = 00010b, and their union is A|B = 11111b. You know a set X is a subset of set Y if X == X&Y.

One way to do this would be to keep, for each subset, a count of how many of its strings were not in the main set, and a map from strings to lists of subsets containing that string, so that you can update the counts when you add or remove a new string to the active set, and notice when a count goes down to zero.
This problem reminds me of firing rules in a rule-based system when a fact becomes true, which corresponds to a new string being added to the active set. Many of these systems use http://en.wikipedia.org/wiki/Rete_algorithm. http://www.jboss.org/drools/drools-expert.html is an open source rule-based system - although it looks like there is a lot of enterprise system wrapping round it these days.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio