Fast algorithm for simple data group - algorithm

There are several billions rows like this
id | type | groupId
---+------+--------
1 | a |
1 | b |
2 | a |
2 | c |
1 | a |
2 | d |
2 | a |
1 | e |
5 | a |
1 | f |
4 | a |
1 | b |
4 | a |
1 | t |
8 | a |
3 | c |
6 | a |
I need to add groupId for these data, if id same or type same, then its a same groupId, the result like this:
id | type | group
---+------+--------
1 | a | 1
1 | b | 1
2 | a | 1
2 | c | 1
1 | a | 1
2 | d | 1
2 | a | 1
1 | e | 1
5 | a | 1
1 | f | 1
4 | a | 1
1 | b | 1
4 | a | 1
7 | t | 2
8 | g | 3
3 | c | 1
6 | a | 1
I try to use a loop to do this, but its very inefficiency, its need server weeks to finish all this.

This is a classic example where you can use a Quick-Union algorithm.
Computational Limits
Time complexity for grouping N rows : O(N log* N) where log* N is the "number of times needed to take the lg of a number until reaching 1" . eg Log* 10^100 = 3 (approx)
Space complexity : O(N)
Read more on this algorithm:
https://www.youtube.com/watch?v=MaNCMWhYIHo ,
https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf

Related

Laravel leftJoin returns null from 2nd table

I have 2 table duty_sheets
centerId | centerName | p1 | p2 | p3 | p4 | ...p22 | examiId
1 | xyz | 1 | 5 | 8 | 7 | 1 | 1
2 | abc | 9 | 1 | 6 | 6 | 1 | 1
and feedback
id | centerId | inspectorId | A | B | C | examiId
1 | 1 | 1 | 1 | 5 | 8 | 1
2 | 2 | 9 | 9 | 1 | 6 | 1
here is my code
$center = DutySheet::select('duty_sheets.centerId', 'duty_sheets.centerName','feedback.id')
->leftJoin('feedback', function ($leftJoin) {
$leftJoin->on('duty_sheets.examId', 'feedback.examId')
->where("duty_sheets.centerId", 'feedback.centerId')
->where("feedback.inspectorId", 1);
})
->where("duty_sheets.examId", 1)
->where("p20", 1)
->get();
dd($center);
to retrieve "All rows from DutySheet where p20 = 1 and dutysheet.examId = 1, and relevant rows from feedback depend on centerId, inspectorId and examId.
The problem is that the query return feedback.id as null while the record exist in feedback table with the ids.
Laravel version = 9
The problem is in left Join
->where("duty_sheets.centerId", 'feedback.centerId')
This build a where against the value 'feedback.centerId'
duty_sheets.centerId='feedback.centerId'
You need use
->on("duty_sheets.centerId",'=', 'feedback.centerId')
Or
->whereColumn("duty_sheets.centerId", 'feedback.centerId')

Combinate of values in a table to get the sum of each combination

I have a table with numeric data that i need make diferent combinations itself.
For example:
| A |
|---|
| 1 |
| 2 |
| 3 |
| 4 |
I need to combine this single column to get the next result:
| A | B | C | D |
| - | - | - | - |
| 1 | | | |
| 1 | 2 | | |
| 1 | 2 | 3 | |
| 1 | 2 | 3 | 4 |
| 1 | 2 | | 4 |
| 1 | | 3 | |
| 1 | | 3 | 4 |
| 1 | | | 4 |
| | 2 | | |
| | 2 | 3 | |
| | 2 | 3 | 4 |
| | 2 | | 4 |
| | | 3 | |
| | | 3 | 4 |
| | | | 4 |
At the end of the table, i have to create a column with the Count of every column that has data and another column that contains the sums of number of each columns.
Maybe it sound very difficult or impossible, but I haven't a way to make it work.
I have try to "Cross Join" from SQL but didn't got the expected result.
Help!
In this case, you can solve this by counting in binary ending with the digits being the number of numbers in the set. etc. the starting set 2568 would end with 1111. this binary number would decide if you show that number in each row. Heres a table of how it would work.
| A |
|---|
| 2 |
| 5 |
| 6 |
| 8 |
A
B
C
D
Binary
Row number
8
0001
1
6
0010
2
6
8
0011
3
5
0100
4
5
8
0101
5
5
6
0110
6
5
6
8
0111
7
2
1000
8
2
8
1001
9
2
6
1010
10
2
6
8
1011
11
2
5
1100
12
2
5
8
1101
13
2
5
6
1110
14
2
5
6
8
1111
15

LINQ Code that counts employee gender in each position and group by department and place in a matrix table

I just want to ask on how to create an LINQ code that can fill up my html table.
Please look at my Tables below
Table EMP: note* my "Male" is boolean
+----+---------------+--------+--------+
| id | Male| JS_REF |DEPT_ID | POS_ID |
+----+---------------+--------+--------+
| 1 | 1 | 1 | 2 | 3 |
| 2 | 0 | 2 | 2 | 3 |
| 3 | 1 | 3 | 1 | 2 |
| 4 | 1 | 2 | 4 | 2 |
| 5 | 1 | 1 | 5 | 5 |
| 6 | 0 | 4 | 6 | 1 |
| 7 | 1 | 1 | 1 | 1 |
| 8 | 0 | 2 | 2 | 3 |
+----+---------------+--------+--------+
Table:JOB_STATUS
+----+--------------------+
| id | JS_REF| JS_TITLE |
+----+--------------------+
| 1 | 1 |Undefined |
| 2 | 2 |Regular |
| 3 | 3 |Contructual |
| 4 | 4 |Probationary|
+----+--------------------+
Table:DEPTS
+----+--------------------+
| id | DEPT_ID| DEPT_NAME |
+----+--------------------+
| 1 | 1 |Admin |
| 2 | 2 |Accounting |
| 3 | 3 |Eginnering |
| 4 | 4 |HR |
+----+--------------------+
Table: POSITIONS
+----+--------------------+
| id | POS_ID| DEPT_NAME |
+----+--------------------+
| 1 | 1 |Clerk |
| 2 | 2 |Accountant |
| 3 | 3 |Bookeeper |
| 4 | 4 |Assistant |
| 5 | 5 |Mechanic |
| 6 | 6 |Staff |
+----+--------------------+
I'd made a static table on what will be the outcome of the LINQ code
Here's the picture:
Here's what i've tried so far:
SELECT tb.DEPT_NAME,TB.JS_TITLE, TB.Male, TB.Female, (TB.Male + TB.Female) AS 'Total Employees' FROM
(
SELECT JS_TITLE,DEPT_NAME,
SUM(CASE WHEN MALE = 1 THEN 1 ELSE 0 END) AS Male,
SUM(CASE WHEN MALE = 0 THEN 1 ELSE 0 END) AS Female
FROM EMP
left join JOB_STATUS on JOB_STATUS.JS_REF = EMP.JS_REF
left join DEPTS on DEPTS.DEPT_ID = EMP.DEPT_ID
GROUP BY JS_TITLE,DEPT_NAME
) AS TB
ORDER BY CASE WHEN TB.MALE IS NULL THEN 1 ELSE 0 END
If anyone can help me or give me some tips on how can I implement this im stuck in this part.
101 is total count for male, 23 for female. (the values are just copy and pasted, that's why the values are the same)
(Actual data result)

How to sort data by hierarchy

Lets say I have some data like this
Name | ID | ParentID | Level
------------+-----+----------+-------
Fruits | 1 | 0 | 1
Vegetables | 2 | 0 | 1
Apple | 3 | 1 | 2
Banana!! | 4 | 1 | 2
Tomato | 5 | 2 | 2
Potato | 6 | 2 | 2
red | 7 | 5 | 3
green | 8 | 5 | 3
How to sort (compare) this data to get a result like this:
Name | ID | ParentID | Level
------------+-----+----------+---------
Fruits | 1 | 0 | 1
Apple | 3 | 1 | 2
Banana!! | 4 | 1 | 2
Vegetables | 2 | 0 | 1
Tomato | 5 | 2 | 2
red | 7 | 5 | 3
green | 8 | 5 | 3
Potato | 6 | 2 | 2
Background is that I have a model-collection with models and I want to add them according to the hierarchy given by ID/ParentID

Confused: would correlation be "--" in Statsample?

I am very new to statsample and having some basic questions. With this sample data:
[[1, 2, 3, 3],[2, 3, 3, 5],[4, 1, 3, 4]]
I create a 4x4 statsample dataaset called ds and get the following output for each call:
puts ds.summary
gets
= Dataset 1
Cases: 3
Element:[actuals]
== Vector 3
n :3
n valid:3
factors:3
mode: 3
Distribution
+---+---+---------+
| 3 | 3 | 100.00% |
+---+---+---------+
Element:[mids]
== Vector 2
n :3
n valid:3
factors:1,2,3
mode: 2
Distribution
+---+---+--------+
| 1 | 1 | 33.33% |
| 2 | 1 | 33.33% |
| 3 | 1 | 33.33% |
+---+---+--------+
Element:[predicteds]
== Vector 4
n :3
n valid:3
factors:3,4,5
mode: 3
Distribution
+---+---+--------+
| 3 | 1 | 33.33% |
| 4 | 1 | 33.33% |
| 5 | 1 | 33.33% |
+---+---+--------+
Element:[prediction_error]
== Vector 5
n :3
n valid:3
factors:0,1,2
mode: 0
Distribution
+---+---+--------+
| 0 | 1 | 33.33% |
| 1 | 1 | 33.33% |
| 2 | 1 | 33.33% |
+---+---+--------+
Element:[uids]
== Vector 1
n :3
n valid:3
factors:1,2,4
mode: 1
Distribution
+---+---+--------+
| 1 | 1 | 33.33% |
| 2 | 1 | 33.33% |
| 4 | 1 | 33.33% |
+---+---+--------+
Which seems reasonable but then:
cm = ds.correlation_matrix
puts cm.summary
gets this, which is confusing:
Correlation Matrix
+------------------+---------+-------+------------+------------------+-------+
| | actuals | mids | predicteds | prediction_error | uids |
+------------------+---------+-------+------------+------------------+-------+
| actuals | 1.000 | -- | -- | -- | -- |
| mids | -- | 1.000 | -- | -- | -- |
| predicteds | -- | -- | 1.000 | -- | -- |
| prediction_error | -- | -- | -- | 1.000 | -- |
| uids | -- | -- | -- | -- | 1.000 |
+------------------+---------+-------+------------+------------------+-------+
You created a dataset with nominal vectors, not scalar ones. So, correlations between not numeric vectors is always 0.

Resources