organization find all children algorithm - algorithm

So i am creating a system where users are able to build their own organization structure meaning that all organizations will most likely be different.
My setup is that an organization consists of different divisions. In my division table i have a value called parent_id that points to a division who is the current divisions parent.
a setup might look something like this (Paint drawing)
as you can see from the drawing division 2 and 3 are children of division 1 therefore they both have the value parent_id = 1
division 4 is a child of id 2 and has two children (5 & 6)
now to the tricky part because of the structure in my system i need access to all children and the childrens children in my system depending on a root node.
So for example if i want to know all of the children of division 1 the result should be [2,3,4,5,6]
Now my question is. how will i find all children connected?
At first i thought something like this
root = 1;
while(getChildren(root) != null)
{
}
function getChildren(root)
{
var result = 'select * from division where parent_id = '+root;
if(result != null)
{
root = result;
}
return result;
}
please note this is only an example of using a while loop to get through the list
However this would not work when the result of the statement returns two children
So my question is how do i find all children of any root id with the above setup?

You could use a recursive function. Be careful, and keep track of the children you have found so if you run into them again you stop and error - otherwise you will end up in an infinite loop.
I don't know what language you are using, so here's some psuedocode:
create dictionaryOfDivisions
dictionaryOfDivisions.Add(currentDivision)
GetChildren(currentDivision)
Function GetChildren(thisDivision) {
theseChildren = GetChildrenFromDB(thisDivision)
For each child in theseChildren
If dictionaryOfDivisions.Exists(child)
'Oops, here's a loop! Error
Exit
Else
dictionaryOfDivisions.Add(child)
GetChildren(child)
End If
Next
}

Related

Firebase: How to match opponents in a game?

I'm implementing a social chess game. Every user can create a new game, and they'll wait until the system will find an opponent for them.
When user creates a game, they specify constraints: color they'd like to play, and opponent's minimal chess rating.
Opponents can either match or not match. For example, the following two opponents will match:
// User 1 with rating 1700 // User 2 with rating 1800
// creates this game // creates this game
game: { game: {
color: 'white', minRating: 1650
minRating: 1600 }
} // User did not specify a preferred color,
// meaning they do not care which color to play
So, if User 1 is the first user in the system, and created their game, they'll wait. Once User 2 creates their game, they should be matched immediately with User 1.
On the other side, the following two opponents won't match, because they both want to play white. In this case, both should wait until someone else creates a game with color: 'black' (or color not specified), and minRating that would match the requirements.
// User 1 with rating 1700 // User 2 with rating 1800
// creates this game // creates this game
game: { game: {
color: 'white', color: 'white'
minRating: 1600 minRating: 1650
} }
My concerns related to scenarios where thousands of users creates new games at the same time. How do I make sure that I match opponents without creating deadlocks? i.e. how do I prevent scenarios when User 1, User 2, and User 3 are trying to find an opponent at the same time, and their matching algorithms return User 99. How do I recover from this scenario, assigning User 99 to only one of them?
How would you use the power of Firebase to implement such a matching system?
The obvious choice for a starting point would be the color, since this is an exclusive requirement. The others seem more like weighted results, so those could simply increment or decrement the weight.
Utilize priorities for min/max ranges, and keep each in a separate "index". Then grab the matches for each and create a union. Consider this structure:
/matches
/matches/colors/white/$user_id
/matches/ranking/$user_id (with a priority equal to ranking)
/matches/timezones/$user_id (with a priority of the GMT relationship)
Now to query, I would simply grab the matches in each category and rank them by the number of matches. I can start with colors, because this presumably isn't an optional or relative rating:
var rootRef = new Firebase('.../matches');
var VALUE = {
"rank": 10, "timezone": 5, "color": 0
}
var matches = []; // a list of ids sorted by weight
var weights = {}; // an index of ids to weights
var colorRef = rootRef.child('colors/black');
colorRef.on('child_added', addMatch);
colorRef.child('colors/black').on('child_removed', removeMatch);
var rankRef = rootRef.child('ranking').startAt(minRank).endAt(maxRank);
rankRef.on('child_added', addWeight.bind(null, VALUE['rank']));
rankRef.on('child_removed', removeWeight.bind(null, VALUE['rank']));
var tzRef = ref.child('timezone').startAt(minTz).endAt(maxTz);
tzRef.on('child_added', addWeight.bind(null, VALUE['timezone']));
tzRef.on('child_removed', removeWeight.bind(null, VALUE['timezone']));
function addMatch(snap) {
var key = snap.name();
weights[key] = VALUE['color'];
matches.push(key);
matches.sort(sortcmp);
}
function removeMatch(snap) {
var key = snap.name();
var i = matches.indexOf(key);
if( i > -1 ) { matches.splice(i, 1); }
delete weights[key];
}
function addWeight(amt, snap) {
var key = snap.name();
if( weights.hasOwnProperty(key) ) {
weights[key] += amt;
matches.sort(sortcmp);
}
}
function removeWeight(amt, snap) {
var key = snap.name();
if( weights.hasOwnProperty(key) ) {
weights[key] -= amt;
matches.sort(sortcmp);
}
}
function sortcmp(a,b) {
var x = weights[a];
var y = weights[b];
if( x === y ) { return 0; }
return x > y? 1 : -1;
}
Okay, now I've given what everyone asks for in this use case--how to create a rudimentary where clause. However, the appropriate answer here is that searches should be performed by a search engine. This is no simple where condition. This is a weighted search for the best matches, because fields like color are not optional or simply the best match, while others--ranking maybe--are the closest match in either direction, while some simply affect the quality of the match.
Check out flashlight for a simple ElasticSearch integration. With this approach, you should be able to take advantage of ES's great weighting tools, dynamic sorting, and everything else you need to conduct a proper matching algorithm.
Regarding deadlocks. I would not put too much focus here until you have hundreds of transactions per second (i.e. hundreds of thousands of users competing for matches). Split out the path where we will write to accept a join and do a transaction to ensure only one person succeeds in obtaining it. Keep it separate from the read data so that the lock on that path won't slow down processing. Keep the transaction to a minimal size (a single field if possible).
It is a challenging task in NoSQL environment especially if you want to match multiple fields
in your case, I would setup a simple index by color and within the color I would store the reference to the game with priority set to minRating.
That way you can query the games by the prefered colour with the priority of minRating.
indexes: {
color:{
white:{
REF_WITH_PRIORITY_TO_RATING: true
},
black:{
REF_WITH_PRIORITY_TO_RATING: true
}
}
}
if you want to get info whenever the match opens the game:
ref = new(Firebase)('URL');
query =ref.child('color_index/white/').startAt(minPriority);
query.on('child_added',function(snapshot){
//here is your new game matching the filter
});
This, however, it would get more complex if you introduce multiple fields for filtering the games for example dropRate, timeZone, 'gamesPlayed' etc... In this case, you can nest the indexes deeper:
indexes: {
GMT0: {
color:{
white:{
REF_WITH_PRIORITY_TO_RATING: true
},
black:{
REF_WITH_PRIORITY_TO_RATING: true
},
}
GMT1: {
// etc
}
}

Amazon interview: Timestamp sorting: Find the three page subset sequence repeated maximum number of times [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
The Amazon interview question is:
Given a log file containing (User_Id, URL, Timestamp) user can navigate page from one to the other. Find the three page subset sequence repeated maximum number of times. Records are sorted by Timestamp.
I found this question from this reddit thread.
The poster wrote:
"Given a log file containing (User_Id, URL, Timestamp) user can navigate page from one to the other. Find the three page subset sequence repeated maximum number of times. Records are sorted by Timestamp."
(Although, I wasn't told until late in the interview the file is sorted by timestamp. One of the first things I had asked was if the log was sorted, and my interviewer said no.)
I do think I gave it my all -- I seemed to have been on the right track using a hashmap. I always let my interview know what I was thinking and gave possible outcomes, time complexities, etc.
I am not sure how to approach this problem. What does "Find the three page subset sequence repeated maximum number of times" mean? And if the question didn't say "Records are sorted by Timestamp" (as happened to the poster), then how would that affect the problem?
With "three page subset sequence" I am guessing they mean the three pages must be next to each other, but their internal order does not matter. (A B C = C A B)
public Tuple<string,string,string> GetMostFrequentTriplet(
IEnumerable<LogEntry> entries,
TimeSpan? maxSeparation = null)
{
// Assuming 'entries' is already ordered by timestamp
// Store the last two URLs for each user
var lastTwoUrls = new Dictionary<int,Tuple<string,string,DateTime>>();
// Count the number of occurences of each triplet of URLs
var counters = new Dictionary<Tuple<string,string,string>,int>();
foreach (var entry in entries)
{
Tuple<string,string,DateTime> lastTwo;
if (!lastTwoUrls.TryGetValue(entry.UserId, out lastTwo))
{
// No previous URLs
lastTwoUrls[entry.UserId] = Tuple.Create((string) null, entry.Url, entry.Timestamp);
}
// (comparison with null => false)
else if (entry.Timestamp - lastTwo.Item3 > maxSeparation) {
// Treat a longer separation than maxSeparation as two different sessions.
lastTwoUrls[entry.UserId] = Tuple.Create((string) null, entry.Url, entry.Timestamp);
}
else
{
// One or two previous URLs
if (lastTwo.Item1 != null)
{
// Two previous URLs; Three with this one.
// Sort the three URLs, so that their internal order won't matter
var urls = new List<string> { lastTwo.Item1, lastTwo.Item2, entry.Url };
urls.Sort();
var key = Tuple.Create(urls[0], urls[1], urls[2]);
// Increment count
int count;
counters.TryGetValue(key, out count); // sets to 0 if not found
counters[key] = count + 1;
}
// Shift in the new value, discarding the oldest one.
lastTwoUrls[entry.UserId] = Tuple.Create(lastTwo.Item2, entry.Url, entry.Timestamp);
}
}
Tuple<string,string,string> maxKey = null;
int maxCount = 0;
// Find the key with the maximum count
foreach (var pair in counters)
{
if (maxKey == null || pair.Value > maxCount)
{
maxKey = pair.Key;
maxCount = pair.Value;
}
}
return maxKey;
}
The code goes over the log entries and separates the stream for each user. For each three consecutive URLs for any user, we increment the count for that triplet. Since the order of the three pages are not important, we reorder them in a consistent way, by sorting. In the end, we return the triplet that has the highest count.
Since we only need the last three URLs for each user, we only store the previous two. Combined with the current URL, that makes the triplet we need.
For n URLs, m unique URLs, u users, and s single-visit users, the method will do 2 n - 2 u + s (= O(n)) dictionary lookups, and store up to C(m,3) + u (= O(m3 + u)) tuples.
Edit:
Infer sessions by the duration between requests. If they differ by more than maxSeparation, the new request is treated as the first from that user.

How to evaluate a complex expression tree against incremental data?

I have a collection of data and a collection of search filters I want to run against that data. The filters follow the LDAP search filter format and are parsed into an expression tree. The data is read one item at a time and processed through all the filters. Intermediate match results are stored in each leaf node of the tree until all the data has been processed. Then the final results are obtained by traversing the tree and applying the logical operators to each leaf node's intermediate result. For example, if I have the filter (&(a=b)(c=d)) then my tree will look like this:
root = "&"
left = "a=b"
right = "c=d"
So if a=b and c=d then both the left and right child nodes are a match and thus the filter is a match.
The data is a collection of different types of objects, each with their own fields. For example, assume the collection represents a class at a school:
class { name = "math" room = "12A" }
teacher { name = "John" age = "35" }
student { name = "Billy" age = "6" grade = "A" }
student { name = "Jane" age = "7" grade = "B" }
So a filter might look like (&(teacher.name=John)(student.age>6)(student.grade=A)) and be parsed like so:
root = "&"
left = "teacher.name=John"
right = "&"
left = "student.age>6"
right = "student.grade=A"
I run the class object against it; no matches. I run the teacher object against it; root.left is a match. I run the first student node against it; root.right.right is a match. I run the second student node against it; root.right.left is a match. Then I traverse the tree and determine that all nodes matched and thus the final result is a match.
The problem is the intermediate matches need to be constrained based upon commonality: the student.age and student.grade filters need to somehow be tied together in order to store an intermediate match only if they match for the same object. I can't for the life of me figure out how to do this.
My filter node abstract base class:
class FilterNode
{
public:
virtual void Evaluate(string ObjectName, map<string, string> Attributes) = 0;
virtual bool IsMatch() = 0;
};
I have a LogicalFilterNode class that handles logical AND, OR, and NOT operations; it's implementation is pretty straightforward:
void LogicalFilterNode::Evaluate(string ObjectName, map<string, string> Attributes)
{
m_Left->Evaluate(ObjectName, Attributes);
m_Right->Evaluate(ObjectName, Attributes);
}
bool LogicalFilterNode::IsMatch()
{
switch(m_Operator)
{
case AND:
return m_Left->IsMatch() && m_Right->IsMatch();
case OR:
return m_Left->IsMatch() || m_Right->IsMatch();
case NOT:
return !m_Left->IsMatch();
}
return false;
}
Then I have a ComparisonFilterNode class that handles the leaf nodes:
void ComparisonFilterNode::Evaluate(string ObjectName, map<string, string> Attributes)
{
if(ObjectName == m_ObjectName) // e.g. "teacher", "student", etc.
{
foreach(string_pair Attribute in Attributes)
{
Evaluate(Attribute.Name, Attribute.Value);
}
}
}
void ComparisonFilterNode::Evaluate(string AttributeName, string AttributeValue)
{
if(AttributeName == m_AttributeName) // e.g. "age", "grade", etc.
{
if(Compare(AttributeValue, m_AttributeValue) // e.g. "6", "A", etc.
{
m_IsMatch = true;
}
}
}
bool ComparisonFilterNode::IsMatch() { return m_IsMatch; }
How it's used:
FilterNode* Root = Parse(...);
foreach(Object item in Data)
{
Root->Evaluate(item.Name, item.Attributes);
}
bool Match = Root->IsMatch();
Essentially what I need is for AND statements where the children have the same object name, the AND statement should only match if the children match for the same object.
Create a new unary "operator", let's call it thereExists, which:
Does have state, and
Declares that its child subexpression must be satisfied by a single input record.
Specifically, for each instance of a thereExists operator in an expression tree you should store a single bit indicating whether or not the subexpression below this tree node has been satisfied by any of the input records seen so far. These flags will initially be set to false.
To continue processing your dataset efficiently (i.e. input record by input record, without having to load the entire dataset into memory), you should first preprocess the query expression tree to pull out a list of all instances of the thereExists operator. Then as you read in each input record, test it against the child subexpression of each of these operators that still has its satisfied flag set to false. Any subexpression that is now satisfied should toggle its parent thereExists node's satisfied flag to true -- and it would be a good idea to also attach a copy of the satisfying record to the newly-satisfied thereExists node, if you want to actually see more than a "yes" or "no" answer to the overall query.
You only need to evaluate tree nodes above a thereExists node once, after all input records have been processed as described above. Notice that anything referring to properties of an individual record must appear somewhere beneath a thereExists node in the tree. Everything above a thereExists node in the tree is only allowed to test "global" properties of the collection, or combine the results of thereExists nodes using logical operators (AND, OR, XOR, NOT, etc.). Logical operators themselves can appear anywhere in the tree.
Using this, you can now evaluate expressions like
root = "&"
left = thereExists
child = "teacher.name=John"
right = "|"
left = thereExists
child = "&"
left = "student.age>6"
right = "student.grade=A"
right = thereExists
child = "student.name = Billy"
This will report "yes" if the collection of records contains both a teacher whose name is "John" and either a student named "Billy" or an A student aged over 6, or "no" otherwise. If you track satisfying records as I suggested, you'll also be able to dump these out in the case of a "yes" answer.
You could also add a second operator type, forAll, which checks that its subexpression is true for every input record. But this is probably not as useful, and in any case you can simulate forAll(expr) with not(thereExists(not(expr))).

Data comparing in dataset

I had to write a method that does the following:
There is a DataSet let's say CarDataSet with one table Car and contains Primary key Id and one more column ColorId. And there is a string with Ids seperated with commas for example "5,6,7,8" (random length). The task is to check if all appropriate ColorIds are identical for given Car Ids.
For example:
String ids = "5,6,7,8"
If all the Cars ColorIds are for example 3,3,3,3 where the Car Ids are 5,6,7,8 then return true;
In other words - check if all cars with given Ids are in one color. Now I don't have my code anymore but I made this using 3 foreach loops and 3 linq expressions. Is there any simplier way to do this?
If you want all cars have same color means all of them should have same color as first one:
// first find the cars with given ids
var selectedCars = Cars.Where(x=>ids.Contains(x.ID.ToString());
// select one of them as comparer:
var firstCar = selectedCars.FirstOrDefault();
if (firstCar == null)
return true;
// check all of them has same color as first one:
return selectedCars.All(x=>x.ColorID == firstCar.ColorID);
Edit: Or if you have no problem with throwing exception when there is no car with given ids you can use two query in lambda syntax:
var selectedCars = Cars.Where(x=>ids.Contains(x.ID.ToString()));
return selectedCars.All(x=>x.ColorID == selectedCars.First().ColorID);
You could do this by performing a distinct, and asserting the count is 1.
var colors = Cars.Where(x=>ids.Contains(x.ID.ToString())
.Select(x=>x.ColorID)
.Distinct().Count();
return count == 1;

Groovy tree-like sort

I have a problem with sorting rows from db having tree-like hierarchy. Each row contains three columns meaningful to this problem: id, parent, lp. Id is String, parent is another row and lp is a number used to sort rows having no parent-child relationship. Each row can have any number of children and only one parent (null on top level)
There are three situations I see:
when first row is parent of another: -1 is returned
when first row is child of a parent with lower lp than another row::
-1 is returned
when none of those relations exist (also when rows have same parent and are on the same level) : to lps of rows are compared
I manadged to write this code that I think should solve the problem but it doesnt work for rows that are deep in hierarchy and it messes the order :
dane = dane.sort {it1, it2 ->
it1 == it2.parent ? -1 :
it1.parent && it1.parent.lp < it2.lp ? -1 :
it1.lp - it2.key.lp
}
I'd appreciate any suggestions. Thx in advance!
Your comparison should be consistent regardless of the order of the arguments. If the arguments are a = it1 and b = it2, the result should be the negation of b = it1 and a = it2. It doesn't look like that's the case here. For example, the case where it1.parent == it2.

Resources