scheduling algorithm shortest job first - algorithm

i am trying to understand how shortest job first algorithm works, am i doing this in the right way please help
Proc Burst1 Burst2
+------+---------+--------+
| A | 10 | 5 |
| B | 3 | 9 |
| C | 8 | 11 |
+------+---------+--------+
B1->3->C1->11->B2->20->A1->30->A2->35->C2->46

"Shortest job first" is not really an algorithm, but a strategy: among the jobs ready to execute always choose the job with the shortest execution time. Your sequence looks ok. In the beginning the following jobs are ready for execution (with execution time in parenthesis):
A1(10), B1(3), C1(8)
So B1 is chosen, after which also job B2 is ready to execute, so here is the updated list of ready jobs:
A1(10), B2(9), C1(8)
Now C1 is chosen, and so on.
There are variants of the strategy "shortest job first", where the total time over all bursts, i.e. A1 + A2, B1 + B2, ..., is taken into account. Then the chosen sequence would be:
B1, B2, A1, A2, C1, C2

Related

How to optimize the algorithm to find the max_depth_contact_series in a time varying graph?

Assuming there is a time varying graph with N nodes named a1,a2,...,an and contact series as t node1 node2 meaning node1 contacts with node2 at time t
Assuming node a1 carries a message(there is only one copy of the message in the graph), from time 0, how many nodes can the message contact with at most in time T? The message can be transferred to another node freely at anytime. For example, a1 can chose to transfer it to a2 at time 2 or keeps the message until a1 contacts with a3 and transfers it to a3.
Here is an example to make it more clear. For a graph with 6 nodes and contact series:
1 a1 a2
2 a1 a3
3 a1 a4
4 a3 a5
6 a3 a6
10 a4 a3
During time 0~10 the message can contact with 4 nodes at most:a2,a3,a5,a6 with message tranferred from a1 to a3 at time 2.
Keep in mind the time series. Here a1 carries the message but transfers the message to a3 at time 2. Then at time 3 node a1 has no message so the message cant contact with a4. If a1 keeps message at time 2 instead of tranferring to a3, the message contacts with the list a2,a3,a4,a3. The contact set will be {a2,a3,a4} with size 3 which is smaller than 4.
How can I get the largest contact nodes set? Or just the number?
At present I get it with recursive algorithm but the cost is unbearable when T is large.

Minimum cost within limited time for a timetable?

I have a timetable like this:
+-----------+-------------+------------+------------+------------+------------+-------+----+
| transport | trainnumber | departcity | arrivecity | departtime | arrivetime | price | id |
+-----------+-------------+------------+------------+------------+------------+-------+----+
| Q | Q00 | BJ | TJ | 13:00:00 | 15:00:00 | 10 | 1 |
| Q | Q01 | BJ | TJ | 18:00:00 | 20:00:00 | 10 | 2 |
| Q | Q02 | TJ | BJ | 16:00:00 | 18:00:00 | 10 | 3 |
| Q | Q03 | TJ | BJ | 21:00:00 | 23:00:00 | 10 | 4 |
| Q | Q04 | HA | DL | 06:00:00 | 11:00:00 | 50 | 5 |
| Q | Q05 | HA | DL | 14:00:00 | 19:00:00 | 50 | 6 |
| Q | Q06 | HA | DL | 18:00:00 | 23:00:00 | 50 | 7 |
| Q | Q07 | DL | HA | 07:00:00 | 12:00:00 | 50 | 8 |
| Q | Q08 | DL | HA | 15:00:00 | 20:00:00 | 50 | 9 |
| ... | ... | ... | ... | ... | ... | ... | ...|
+-----------+-------------+------------+------------+------------+------------+-------+----+
In this table, there 13 cities and 116 routes altogether and the smallest unit of time is half an hour.
There are difference transports, which doesn't matter. As you can see, there can be multiple edges with same departcity and arrivecity but difference time and difference price. The time is constant everyday.
Now, here arises a problem.
A user wonder how he can travel from city A to city B (A and B may be one city), with passing zero or some cities C, D...(whether they should be in order depends on whether the user wants it to be, that is, there are two problems), within X hours and also least costs under above conditions.
Before this problem, I have solved another simpler problem.
A user wonder how he can travel from city A to city B (A and B may be one city), with passing zero or some cities C, D...(whether they should be in order depends), with least costs under above conditions.
Here is how I solve it (just take not in order as an example):
Sort the must-pass cities:C1, C2, C3...Cn. Let C0 = A, C(n+1) = B, minCost.cost = INFINITE;
i = 0, j = 1, W = {};
Find a least cost way S from Ci to Cj using Dijkstra Algorithm with price as the weight of edges. W=W∪S;
i = i + 1, j = j + 1;
If j <= n + 1, goto 3;
if W.cost < minCost.cost, minCost = W;
If next permutation for C1...Cn exists, rearrange list C1...Cn in order of the next permutation for C1...Cn and goto 2;
Return minCost;
However, I cannot come up with a efficient solution to the first problem, Please help me, thanks.
I'll be appreciated if anyone can solve another problem:
A user wonder how he can travel from city A to city B (A and B may be one city), with passing zero or some cities C, D...(whether they should be in order depends), within least time under above conditions.
It's quite a big problem, so I will just sketch a solution.
First, remodel your graph as follows. Instead of each vertex representing a city, let a vertex represent a tuple of (city, time). This is feasible as there are only 13 cities and only (time_limit - current_time) * 2 possible time slots as the smallest unit of time is half an hour. Now connect vertices according to the given timetable with prices as their weights as before. Don't forget that the user can stay at any city for any amount of time for free. All nodes with city A are start nodes, all nodes with city B are target nodes. Take the minimum value of all (B, time) vertices to get the solution with least cost. If there are multiple, take the one with the smallest time.
Now on towards forcing the user to pass through certain cities in order. If there are n cities to pass through (plus start and target city), you need n+2 copies of the same graph which act as different levels. The level represents how many cities of your list you have already passed. So you start in level 0 on vertex A. Once you get to C1 in level 0 you move to the vertex C1 in level 1 of the graph (connect the vertices by 0-weight edges). This means that when you are in level k, you have already passed cities C1 to Ck and you can get to the next level only by going through C(k+1). The vertices of city B in the last level are your target nodes.
Note: I said copies of the same graph, but that is not exactly true. You can't allow the user to reach C(k+2), ..., B in level k, that would violate the required order.
To enforce passing cities in any order, a different scheme of connecting the levels (and modifying them during runtime) is required. I'll leave this to you.

Finding the "expanded to factored" algorithm

This question is about algorithms and thus language-independent.
Given the following rows:
A1, B1, C1, D1 (1)
A1, B2, C1, D1 (2)
A2, B1, C1, D1 (3)
A2, B2, C1, D1 (4)
A3, B1, C1, D1 (5)
A3, B2, C1, D1 (6)
A1, B1, C2, D1 (7)
They can be factored as follow:
+----+----+----+----+
| A1 | B1 | C1 | D1 |
| A2 | B2 | | |
| A3 | | | |
+----+----+----+----+
| A1 | B1 | C2 | D1 |
+----+----+----+----+
The following objects can store those data:
class ExpandedRow {
String a;
String b;
String c;
String d;
}
class FactoredRow {
List<String> as;
List<String> bs;
List<String> cs;
List<String> ds;
}
Concerning the transformations algorithms, the factored --> expanded one is quite easy:
List<FactoredRow> factoredRows = fill();
List<ExpandedRow> expandedRows = empty();
for each factoredRow in factoredRows {
for each a in factoredRow.as {
for each b in factoredRow.bs {
for each c in factoredRow.cs {
for each d in factoredRow.ds {
expandedRows.add(new ExpandedRow(a, b, c, d));
}
}
}
}
}
But I'm lost concerning the expanded --> factored one. How can I factorize a List<ExpandedRow> into a List<FactoredRow>?
In other words, I have the factored table as input. I expand it using the provided algorithm and store it in its expanded state. The question is: how to retrieve the initial factored state after having expanding it?
I thought that if two expanded rows have only one attribute that differs, they can be factored, for example A1, B1, C1, D1 (1) and A1, B1, C2, D1 (2). But if we factorize those two rows together, we will end with:
+----+----+----+----+
| A1 | B1 | C1 | D1 |
| | | C2 | |
+----+----+----+----+
| A1 | B2 | C1 | D1 |
| A2 | | | |
| A3 | | | |
+----+----+----+----+
| A2 | B1 | C1 | D1 |
| A3 | | | |
+----+----+----+----+
Which is less factored than the initial table.
It's seems that there are many factored solutions, and the main issue is to define and to find the most factored one.
This problem seems something like a graph partitioning problem. I suspect it's NP-hard but I haven't been able to prove it yet.
Let's take a simpler example to see what's going on. Consider the pairs (A1,B1), (A2,B1), (A3,B1), (A2,B2). We represent the points as points in 2D-space, and connect points if it is possible to move from one to the other by a translation parallel to the x- or y-axis:
(A2,B2)
|
(A1,B1) -- (A2,B1) -- (A3,B1)
The idea is to partition the graph by lines parallel to the axes, and repartition each partition, and so on, until we get pieces that are complete rectangles, line segments, or points.
There are two esssentially different ways of partitioning the graph above. We can draw a vertical line at position x=1.5:
(A2,B2)
|
(A1,B1) (A2,B1) -- (A3,B1)
after which the right-side piece needs to be further partitioned (by a vertical or horizontal line, let's take horizontal):
(A2,B2)
(A1,B1) (A2,B1) -- (A3,B1)
We have now factored the original list into
A1 B1
-----
A2 B2
-----
A2 B1
A3
On the other hand, if we had made our initial partition with a horizontal line at position y=1.5, we would have
(A2,B2)
(A1,B1) -- (A2,B1) -- (A3,B1)
which is already nicely factored into a point and a line segment:
A2 B2
-----
A1 B1
A2
A3
In higher dimensions (4D for letters A, B, C, D) we have a similar problem, except that there are correspondingly more choices for initial cuts, and the allowed final pieces are higher-dimesional (not just points, line segments, and rectangles but also 3D and 4D boxes).
The problem feels NP-hard to me, just like many other graph partitioning problems, but there are probably reasonably fast approximation algorithms.

Find a node in a the tree based on some selection criteria

[BASE]
/ \ \
C1 C2 C3
/\ \
C4 C5 C6
I have a tree like the above. This is a N child tree which is not balanced. The problem is, I need to select one of the node based on some condition. Like
Select C1 when k1 = a
Select C4 when K1 = a and K2=b and K3=C
Select C5 when k1 = a and k'=z
Select C2 when K'' = b
Select C3 when k5 = 9
Select C6 when k5=9 and k6 = 10
The input to the program would be an arbitraty length of key value pairs like if input is -k1=a,k2=b,k3=c,k8=10 - I should select C4 as that is the best match.
Ideally I was thinking of traversing the tree and for each node, there is a selection criteria which I can match against the input set. But soon I figured out, this tree can be very huge and Base node can have tens of thousands of child nodes under it. So it might not be a good idea to go node by node. If there is a way to select the nodes more efficiently, I would love to know that.
Looks like your k's are pointing to directory structure and the leaf of this structure (exactly one leaf for each directory) is the node you are looking for. You can keep this string in node as another value. What is not clear in question is how are the k's related to the tree
for e.g.
a->c1
a/b/c->c4
I have found a workable solution like this one
----------------------------------------
|rowId|param1|param2|param3|param4|node|
----------------------------------------
|10 | a | | | | C1 |
----------------------------------------
|14 | a | b | c | | C4 |
----------------------------------------
|18 | a | b | | | C5 |
----------------------------------------
Lets call it a condition table. Each column represent the input series (k) and for different combinations of the value, there is a node to be selected. This table can be think of an in memory data structure or a real table in RDBMS.

Apriori Algorithm

I've heard about the Apriori algorithm several times before but never got the time or the opportunity to dig into it, can anyone explain to me in a simple way the workings of this algorithm? Also, a basic example would make it a lot easier for me to understand.
Apriori Algorithm
It is a candidate-generation-and-test approach for frequent pattern mining in datasets. There are two things you have to remember.
Apriori Pruning Principle - If any itemset is infrequent, then its superset should not be generated/tested.
Apriori Property - A given (k+1)-itemset is a candidate (k+1)-itemset only if everyone of its k-itemset subsets are frequent.
Now, here is the apriori algorithm in 4 steps.
Initially, scan the database/dataset once to get the frequent 1-itemset.
Generate length k+1 candidate itemsets from length k frequent itemsets.
Test the candidates against the database/dataset.
Terminate when no frequent or candidate set can be generated.
Solved Example
Suppose there is a transaction database as follows with 4 transactions including their transaction IDs and items bought with them. Assume the minimum support - min_sup is 2. The term support is the number of transactions in which a certain itemset is present/included.
Transaction DB
tid | items
-------------
10 | A,C,D
20 | B,C,E
30 | A,B,C,E
40 | B,E
Now, let's create the candidate 1-itemsets by the 1st scan of the DB. It is simply called as the set of C_1 as follows.
itemset | sup
-------------
{A} | 2
{B} | 3
{C} | 3
{D} | 1
{E} | 3
If we test this with min_sup, we can see {D} does not satisfy the min_sup of 2. So, it will not be included in the frequent 1-itemset, which we simply call as the set of L_1 as follows.
itemset | sup
-------------
{A} | 2
{B} | 3
{C} | 3
{E} | 3
Now, let's scan the DB for the 2nd time, and generate candidate 2-itemsets, which we simply call as the set of C_2 as follows.
itemset | sup
-------------
{A,B} | 1
{A,C} | 2
{A,E} | 1
{B,C} | 2
{B,E} | 3
{C,E} | 2
As you can see, {A,B} and {A,E} itemsets do not satisfy the min_sup of 2 and hence they will not be included in the frequent 2-itemset, L_2
itemset | sup
-------------
{A,C} | 2
{B,C} | 2
{B,E} | 3
{C,E} | 2
Now let's do a 3rd scan of the DB and get candidate 3-itemsets, C_3 as follows.
itemset | sup
-------------
{A,B,C} | 1
{A,B,E} | 1
{A,C,E} | 1
{B,C,E} | 2
You can see that, {A,B,C}, {A,B,E} and {A,C,E} does not satisfy min_sup of 2. So they will not be included in frequent 3-itemset, L_3 as follows.
itemset | sup
-------------
{B,C,E} | 2
Now, finally, we can calculate the support (supp), confidence (conf) and lift (interestingness value) values of the Association/Correlation Rules that can be generated by the itemset {B,C,E} as follows.
Rule | supp | conf | lift
-------------------------------------------
B -> C & E | 50% | 66.67% | 1.33
E -> B & C | 50% | 66.67% | 1.33
C -> E & B | 50% | 66.67% | 1.77
B & C -> E | 50% | 100% | 1.33
E & B -> C | 50% | 66.67% | 1.77
C & E -> B | 50% | 100% | 1.33
See Top 10 algorithms in data mining (free access) or The Top Ten Algorithms in Data Mining. The latter gives a detailed description of the algorithm, together with details on how to get optimized implementations.
Well, I would assume you've read the wikipedia entry but you said "a basic example would make it a lot easier for me to understand". Wikipedia has just that so I'll assume you haven't read it and suggest that you do.
Read the wikipedia article.
The best introduction to Apriori can be downloaded from this book:
http://www-users.cs.umn.edu/~kumar/dmbook/index.php
you can download the chapter 6 for free which explain Apriori very clearly.
Moreover, if you want to download a Java version of Apriori and other algorithms for frequent itemset mining, you can check my website:
http://www.philippe-fournier-viger.com/spmf/

Resources