I am trying to implement a simulator for the Routing Information Protocol. I think I have most of the implementation done but I am not too sure about one part of the algorithm used when a node receives a table from a different node.
One of the if statements is as follows:
if address is known by p1 with a link of p2 then:
if the cost for p2 is not exactly one less than p1's cost:
act as if this address was unknown to p1
Where p1 is receiving a table from p2. Does this mean that if p1 has the address in its table and that the link associated with that address in p1 is a link to p2 then check whether the cost for p2 is not exactly less than p1s cost?
Thanks
Yes, that's what it means.
When p1 gets routing information about some network n from p2, it must decide whether or not to use the new route. Normally, it will use the new route (setting the link to p2) only if it is better than the route it currently knows for n. However, in the case that p2 was the gateway it already had for n -- in other words, it got the information from p2 earlier -- then it accepts the new information, even if the new cost is higher than the old cost.
This allows p2 to inform its neighbours that it has lost connectivity to some other network (by setting its cost to the RIP equivalent of infinity). If the neighbours previously relied on p2 to reach that network, they will now invalidate their routes to that network and await information from some other gateway which does have connectivity. It also allows p2 to inform its neighbours that the cost of reaching n has increased and they should use a cheaper route if they can find one.
Related
I have a problem where I have a road that has multiple entry points and exits. I am trying to model it so that traffic can flow into an entry and go out the exit. The entry points also act as exits. All the entrypoints are labelled 1 to 10 (i.e. we have 10 entry and exits).
A car is allowed to enter and exit at any point however the entry is always lower number than the exit. For example a car enters at 3 and goes to 8, it cannot go from 3 to 3 or from 8 to 3.
After every second the car moves one unit on the road. So from above example the car goes from 3 to 4 after one second. I want to continuously accept cars at different entrypoints and update their positions after each second. However I cannot accept a car at an entry if there is already one present at that location.
All cars are travelling at the same speed of 1 unit per second and all are same size and occupy just the space at the point they are in. Once a car reaches its destination, its removed from the road.
For all new cars that come into the entrypoint and are waiting, we need to assign a waiting time. How would that work? For example it needs to account for when it is able to find a slot where it can be put on the road.
Is there an algorithm that this problem fits into?
What data structure would I model this in - for example for each entrypoints, I was thinking something like a queue or like an ordered map and for the road, maybe a linkedlist?
Outside of a top down master algorithm that decides what each car does and when, there is another approach that uses agents that interact with their environment and amongst themselves, with a limited set of simple rules. This often give rise to complex behaviors: You could maybe code simple rules into car objects, to define these interactions?
Maybe something like this:
emerging behavior algorithm:
a car moves forward if there are no cars just in front of it.
a car merges into a lane if there are no car right on its side (and
maybe behind that slot too)
a car progresses towards its destination, and removes itself when destination is reached.
proposed data structure
The data structure could be an indexed collection of "slots" along which a car moves towards a destination.
Two data structures could intersect at a tuple of index values for each.
Roads with 2 or more lanes could be modeled with coupled data structures...
optimial numbers
Determining the max road use, and min time to destination would require running the simulation several times, with varying parameters of the number of cars, and maybe variations of the rules.
A more elaborate approach would us continuous space on the road, instead of discrete slots.
I can suggest a Directed Acyclic Graph (DAG) which will store each entry point as a node.
The problem of moving from one point to another can be thought of as a graph-flow problem, which has a number of algorithms for determining movement in a graph.
I have set up a Q-learning problem in R, and would like some help with the theoretical correctness of my approach in framing the problem.
Problem structure
For this problem, the environment consists of 10 possible states. When in each state, the agent has 11 potential actions which it can choose from (these actions are the same regardless of the state which the agent is in). Depending on the particular state which the agent is in and the subsequent action which the agent then takes, there is a unique distribution for transition to a next state i.e. the transition probabilities to any next state are dependant on (only) the previous state as well as the action then taken.
Each episode has 9 iterations i.e. the agent can take 9 actions and make 9 transitions before a new episode begins. In each episode, the agent will begin in state 1.
In each episode, after each of the agent's 9 actions, the agent will get a reward which is dependant on the agent's (immediately) previous state and their (immediately) previous action as well as the state which they landed on i.e. the agent's reward structure is dependant on a state-action-state triplet (of which there will be 9 in an episode).
The transition probability matrix of the agent is static, and so is the reward matrix.
I have set up two learning algorithms. In the first, the q-matrix update happens after each action in each episode. In the second, the q-matrix is updated after each episode. The algorithm uses an epsilon greedy learning formula.
The big problem is that in my Q-learning, my agent is not learning. It gets less and less of a reward over time. I have looked into other potential problems such as simple calculation errors, or bugs in code, but I think that the problem lies with the conceptual structure of my q-learning problem.
Questions
I have set up my Q-matrix as being a 10 row by 11 column matrix i.e. all the 10 states are the rows and the 11 actions are the columns. Would this be the best way to do so? This means that an agent is learning a policy which says that "whenever you are in state x, do action y"
Given this unique structure of my problem, would the standard Q-update still apply? i.e. Q[cs,act]<<-Q[cs,act]+alpha*(Reward+gamma*max(Q[ns,])-Q[cs,act])
Where cs is current state; act is action chosen; Reward is the reward given your current state, your action chosen and the next state which you will transition to; ns is the next state which you will transition to given your last state and last action (note that you transitioned to this state stochastically).
Is there an open AI gym in R? Are there Q-learning packages for problems of this structure?
Thanks and cheers
There is a problem in your definition of the problem.
Q(s,a) is the expected utility of taking action a in state s and following the optimal policy afterwards.
Expected rewards are different after taking 1, 2 or 9 steps. That means that the reward of being in state s_0 and taking action a_0 is different in step 0 from what you get in step 9
The "state" as you have defined does not ensure you any reward, it is the combination of "state+step" what does it.
To adequate model the problem, you should reframe it and consider the state to be both the 'position'+'step'. You will now have 90 states (10pos*9steps).
I have been playing around with algorithms and ILP for the single depot vehicle scheduling problem (SDVSP) and now want to extend my knowledge towards the multiple depot vehicle scheduling problem (MDVSP), as i would like to use this knowledge in a project of mine.
As for the question, I've found and implemented several algorithms for the MDSVP. However, one question i am very curious about is how to go about determining the amount of needed depots (and locations to an extend). Sadly enough i haven't been able to find any resources really which do not assume/require that the depots are set. Thus my question would be: How would i be able to approach a MDVSP in which i can determine the amount and locations of the depots?
(Edit) To clarify:
Assume we are given a set of trips T1, T2...Tn like usually in a SDVSP or MDVSP. Multiple trips can be driven in succession before returning to a depot. Leaving and returning to depots usually only happen at the start and end of a day. But as an extension to the normal problems, we can now determine the amount and locations of our depots, opposed to having set depots.
The objective is to find a solution in which all trips are driven with the minimal cost. The cost consists of the amount of deadhead (the distance which the car has to travel between trips, and from and to the depots), a fixed cost K per car, and a fixed cost C per depots.
I hope this clears up the question somewhat.
The standard approach involves adding |V| binary variables in ILP, one for each node where x_i = 1 if v_i is a depot and 0 otherwise.
However, the way the question is currently articulated, all x_i values will come out to be zero, since there is no "advantage" of making the node a depot and the total cost = (other cost factors) + sum_i (x_i) * FIXED_COST_PER_DEPOT.
Perhaps the question needs to be updated with some other constraint about the range of the car. For example, a car can only go so and so miles before returning to a depot.
Source: Google Interview Question
Given a large network of computers, each keeping log files of visited urls, find the top ten most visited URLs.
Have many large <string (url) -> int (visits)> maps.
Calculate < string (url) -> int (sum of visits among all distributed maps), and get the top ten in the combined map.
Main constraint: The maps are too large to transmit over the network. Also can't use MapReduce directly.
I have now come across quite a few questions of this type, where processiong needs to be done over large Distributed systems. I cant think or find a suitable answer.
All I could think of is brute force, which in some or other way, violates the given constraint.
It says you can't use map-reduce directly which is a hint the author of the question wants you to think how map reduce works, so we will just mimic the actions of map-reduce:
pre-processing: let R be the number of servers in cluster, give each
server unique id from 0,1,2,...,R-1
(map) For each (string,id) - send the tuple to the server which has the id hash(string) % R.
(reduce) Once step 2 is done (simple control communication), produce the (string,count) of the top 10 strings per server. Note that the tuples where those sent in step2 to this particular server.
(map) Each server will send all his top 10 to 1 server (let it be server 0). It should be fine, there are only 10*R of those records.
(reduce) Server 0 will yield the top 10 across the network.
Notes:
The problem with the algorithm, like most big-data algorithms that
don't use frameworks is handling failing servers. MapReduce takes
care of it for you.
The above algorithm can be translated to a 2 phases map-reduce algorithm pretty straight forward.
In the worst case any algorithm, which does not require transmitting the whole frequency table, is going to fail. We can create a trivial case where the global top-10s are all at the bottom of every individual machines list.
If we assume that the frequency of URIs follow Zipf's law, we can come up with effecive solutions. One such solution follows.
Each machine sends top-K elements. K depends solely on the bandwidth available. One master machine aggregates the frequencies and finds the 10th maximum frequency value "V10" (note that this is a lower limit. Since the global top-10 may not be in top-K of every machine, the sum is incomplete).
In the next step every machine sends a list of URIs whose frequency is V10/M (where M is the number of machines). The union of all such is sent back to every machine. Each machines, in turn, sends back the frequency for this particular list. A master aggregates this list into top-10 list.
I'm trying to code a program for a class that simulates a router and so far I have the basics set up ("router" can send and receive packets through an emulated server to other "routers" connected to the server). Each packet contains only the distance vector for that router. When a router receives a packet it is supposed to update it's own distance vector accordingly using the Bellman-Ford algorithm. The problem I'm having is that I am finding myself unable to implement the actual algorithm without cheating and using an adjacency matrix.
For example, say I have 3 routers connected as follows:
A ---1--- B ---2--- C
That is, A and B are connected with a link cost of 1, and B and C are connected with a link cost of 2. So when the routers are all started, they will send a packet to each of their directly connected neighbors containing their distance vector info. So A would send router B (0, 1, INF), B would send A and C (1, 0, 2) and C would send B (INF, 2, 0) where INF means the 2 routers are not directly connected.
So let's look at router A receiving a packet from router B. To calculate the minimum costs to each other router using the Bellman-Ford algorithm is as follows.
Mincost(a,b) = min((cost(a,b) + distance(b,b)),(cost(a,c) + distance(c,b))
Mincost(a,c) = min((cost(a,b) + distance(b,c)),(cost(a,c) + distance(c,c))
So the problem I am running into is that I cannot for the life of me figure out how to implement an algorithm that will calculate the minimum path for a router to every other router. It's easy enough to make one if you know exactly how many routers there are going to be but how would you do it when the number of routers can be arbitrarily big?
You never can make sure of the shortest paths with DVMRP.
You dont have the global view of the network for one thing. Each router is operating on as much as it sees and what it sees is restricted-- can be misleading. Look up the looping problem of DVMRP. DVMRP can never have the full network info to process on.
It isn`t scalable either. Its performance is increasingly lower
as the number or routers increase. This is because of the
distance-vector update message flooded around and current accuracy of these messages.
It is one of the earliest multicast protocols. Its performance
matches that of RIP of the unicast scale.