timetable scheduling using multidimensional priority table - algorithm

I need a solution to schedule tasks on given dates for timetable scheduling. It is based on each user's vote. I have a table that contains number of votes for the task for the date:
Tasks: {"A","B","C","D","E"},
Dates: {"1 Jan","2 Jan","5 Jan","7 Jan","10 Jan"},
Total users: 16
A B C D E
=============================================
1-Jan | 6 3 1 4 2
2-Jan | 1 3 4 6 2
5-Jan | 2 3 3 1 7
7-Jan | 6 1 2 3 4
10-Jan | 1 6 5 3 1
For example, on 1-Jan, task A has 6 votes, task B has 3 votes, etc.
I need to schedule only one task for one date.
How to schedule tasks? Can anybody help?

Related

Finding tuples if it only exists in all occurrences of a constraint

Database (all entries are integers):
ID | BUDGET
1 | 20
8 | 20
10 | 20
5 | 4
9 | 4
10 | 4
1 | 11
9 | 11
Suppose my constraint is having a budget of >= 10.
I would want to return ID of 1 only in this case. How do I go about it?
I've tried taking the cross product of itself after selecting budget >= 10 and returning if id1 = id2 and budget1 <> budget2 but that does not work in the case where there's only 1 budget that is >= 10. (EG below)
ID | BUDGET
1 | 20
8 | 20
10 | 20
1 | 4
5 | 4
9 | 4
10 | 4
9 | 4
If I were to do what I did for the first example, nothing will be returned as budget1 <> budget2 will result in an empty table.
EDIT1: I can only use relational algebra to solve the problem. So SQL's exist, where and count keywords cant be used.
Edit2: Only project, select, rename, set difference, set union, left join, right join, full inner join, natural joins, set intersection and cross product allowed
The question is not completely clear to me. If you want to return all the ID for which there is a budget greater than 10, and no budget less than 10, the expression is simply the following:
π(ID)(σ(BUDGET>=10)(R)) - π(ID)(σ(BUDGET<10)(R))
If, an the other hand, you want all the ID which have all the budgets present in the relation and greater then 10, then we must use the ÷ operator:
R ÷ π(BUDGET)(σ(BUDGET>=10)(R))
From your comment, the second case is the correct one. Let’s see how to compute the division from its definition (applied to two generic relations R(A) and S(B)):
R ÷ S = πA-B(R) - πA-B((πA-B(R) x S) - R)
where R is the original relation, and
S = π(BUDGET)(σ(BUDGET>=10)(R)),
that is:
BUDGET
------
20
11
Starting from the inner expression:
πA-B(R) is equal to πID(R) =
ID
--
1
5
8
9
10
then πA-B(R) x S) is:
ID BUDGET
---------
1 20
1 11
5 20
5 11
8 20
8 11
9 20
9 11
10 20
10 11
then ((πA-B(R) x S) - R) is:
ID BUDGET
---------
5 20
5 11
8 11
9 20
10 20
then πA-B((πA-B(R) x S) - R) is:
ID
__
5
8
9
10
and, finally, subtracting this relation from πA-B(R) we obtain the result:
ID
--
1

How to find max level in each path in a Oracle hierarchical SQL?

I would like to know the way to find the max level number in Oracle hierarchical SQL within the given path.
For example : If connect by clause starts with root 1 having below relation .
parent_id node_id votes
NULL 1 -
1 2 10
2 3 12
3 4 11
1 20 5
20 30 20
20 40 4
40 50 22
Here first 3 records belongs to one path with max levle 3.
Next 2 records belong to another path with max level 2.
Last two record belongs to another path with max level 3.
I need output with these max level within the given distinct path and minimum votes:
parent_id node_id LEVEL MAX_LEVL MIN_VOTE
1 2 1 3 10
2 3 2 3 10
3 4 3 3 10
1 20 1 2 5
20 30 2 2 5
1 20 1 3 4
20 40 2 3 4
40 50 3 3 4
1
|
--------------
| |
2 20
| |
3 --------------
| | |
4 30 40
|
50
Thanks,
Guru

aggregate ordered rows in hive table

i have a table in hive with 4 columns like this:
row_id| user_id|product_id| duration
1 1 product1 3
2 1 product1 1
3 1 product2 6
4 1 product3 2
5 1 product1 4
6 1 product4 3
7 1 product4 5
8 1 product4 7
9 2 product4 3
10 2 product4 6
i want to aggregate rows of the same product for each user, sum the duration and count the clicks only if they are consequent in order
row_id| user_id|product_id |duration_per_product |clicks_per_product
1 1 product1 4 2
2 1 product2 6 1
3 1 product3 2 1
4 1 product1 4 1
5 1 product4 15 3
6 2 product4 9 2
any ideas how to do that in hive 1.1.0?
group by obviously doesn't work because i don't want to group products if they are consequent , i have tried case,lag and lead but didn't work!
thanks!
First off, this is something you would want to do in a loop, hive is not very suitable for these kind of problems.
That being said, here is an approach that should work:
Suppose this is our dataset
1 1 product1 3
2 1 product1 1
3 1 product2 6
4 1 product1 4
Identify starter rows: 1,3,4
This can be done by doing a left join on id=id+1 and seeing whether user and product match.
Join everything onto these starters by user and product:
1 1
1 2
1 4
3 3
4 1
4 2
4 4
Filter out things that are in the wrong order (starter after row), remaining are:
1 1
1 2
1 4
3 3
4 4
Group to find the maximum valid starter for each row, remaining will be:
1 1
1 2
3 3
4 4
Now join to reattach the relevant dimensions
1 1 3
1 2 1
3 3 6
4 4 4
Now you can get the results by grouping on the starter id.
1 4
3 6
4 4
Of course you can now choose to use another join to attach the name of the product.
1 product1 4
3 product2 6
4 product1 4
And that is all!

From edge or arc list to clusters in Stata

I have a Stata dataset that represents connections between users that looks like this:
src_user linked_user
1 2
2 3
3 5
1 4
6 7
I would like to get something like this:
user cluster
1 1
2 1
3 1
4 1
5 1
6 2
7 2
where isid user evaluates to TRUE and I have grouped all users into disjoint clusters. I have tried thinking of this as a reshape problem, but without much success. None of the user-written SNA commands seem to accomplish this as far as I can tell.
What is the most efficient way of doing it with Stata, other than looping, which I am eager to avoid ?
If you reshape the data to long form, you can use group_id (from SSC) to get what you want.
clear
input user1 user2
1 2
2 3
3 5
1 4
6 7
end
gen id = _n
reshape long user, i(id) j(n)
clonevar cluster = id
list, sepby(cluster)
group_id cluster, match(user)
bysort cluster user (id): keep if _n == 1
list, sepby(cluster)

Sort on specific columns, output only one of those identical but having the highest number in another column

I have records like these:
1 4 6 4 2 4 8
2 3 5 4 6 7 1
5 4 6 4 3 8 4
1 4 6 4 5 7 1
5 7 3 3 3 6 3
6 7 3 3 4 8 4
I want to sort them on columns 2,3,4, and 6 and keep just one of those identical in column 2,3,4 and having the biggest number in column 6 such as:
1 4 6 4 5 7 1
2 3 5 4 6 7 1
5 4 6 4 3 8 4
5 7 3 3 3 6 3
6 7 3 3 4 8 4
I have tried all kinds of combinations between sort and uniq but everything fails because uniq cannot be applied onto a specific column. The only thing I came up with is to change the order of the columns as to first sort as above then move records 2,3,and 4 to the end and then run uniq with -w as to focus only on the last 3 records. This seems quite inefficient to me.
Thanks for help!
You can achieve this with two passes of sort(assuming in the first place I understand your requirement correctly, seeing that the desired data snippet posted above does not match your description of it) . The first pass sorts by field 2 through 4 ascending and field 6 descending, the second pass sorts on fields 2 through 4 only but passing in the "stable sort" and unique flags in addition to pick out those rows for each combination of fields 2-4 that have the highest value from field 6
sort -k2,4n -k6,6nr file.txt | sort -k2,4n -s -u
2 3 5 4 6 7 1
5 4 6 4 3 8 4
6 7 3 3 4 8 4

Resources