I have Student list with below data
/*-----------------------
|Student |
-------------------------
| ID | Name | Dept |
-------------------------
| 101 | Peter | IT |
| 102 | John | IT |
| 103 | Ronald | Mech |
| 104 | Sam | Comp |
-----------------------*/
Other list say Extra with below data
/*----------------------
| StudentId | Dept |
------------------------
| 101 | Civil |
| 103 | Chemical |
----------------------*/
Now I want following result
/*-------------------------
|Student |
---------------------------
| ID | Name | Dept |
---------------------------
| 101 | Peter | Civil |
| 102 | John | IT |
| 103 | Ronald | Chemical |
| 104 | Sam | Comp |
-------------------------*/
Currently I have written below logic:
foreach(item in Extra)
{
//Search item in Student list
//Update it
}
I need more efficient way (don't want to use iteration) using LINQ.
Try something like this. Didn't test it though.
var query = from s in student
join e in extra on s.ID == e.StudentId
select new {s.ID, s.Name, (e.Dept != null) ? e.Dept:s.Dept};
LINQ is using iteration internally, too, so there is no performance benefit in using LINQ. Because LINQ has very general implementations, it might even be slower, because it can't make assumptions about your data, which you can make in your custom loop.
Related
First, I've checked other topics on the subject like this one How to transpose/pivot data in hive? but that doesn't match with what I want.
So this is the table I have
| ID | Day | Status |
| 1 | 1 | A |
| 2 | 10 | B |
| 3 | 101 | A |
| 3 | 322 | B |
| 3 | 102 | C |
| 3 | 354 | D |
And i'd like to concat the different Status for each IDs ordering by the Day, in order to have this :
| ID | Status |
| 1 | A |
| 2 | B |
| 3 | A,C,B,D |
The thing is that I don't know how many status I can have, so i can't create as many columns I want for the days since I don't know how many day/status I'll have, so the answers from other topics with group_map or others, I don't know how to adapt it for my problem.
Thank's for helping me ^^
use collect_set (for distinct values) or collect_list to aggregate array and concatenate it using concat_ws:
select ID, concat_ws(',',collect_list(Status)) as Status
from table
group by ID;
I want to join two tables in Oracle and list the results concatenated if they have the same parent id in this way:
Table All
|id | other fields|service_id|
|------|-------------|----------|
| 827 |xxxxxxxx |null |
| 828 |xxxxxxxx |327 |
| 829 |xxxxxxxx |328 |
| 860 |xxxxxxxx |null |
| 861 |xxxxxxxx |326 |
Table Services
| id | parent_id |
| ---- | -------|
| 326 | 860 |
| 327 | 827 |
| 328 | 827 |
I want a query that returns this
|id | sub_id |
|------|---------|
| 827 | 828,829 |
| 828 | null |
| 829 | null |
| 860 | 861 |
| 861 | null |
Thanks a lot!
By joining the all table to service and then back to all in an alias you can get the ID and sub id's in a list that then just need to be combined from multiple rows into 1 column per ID.
This should get you the "Raw data" that then needs to be aggregrated...
TESTED: Rextester working example
NOTE: Since you have different "LEVELS" of depth for your sub_ID i'm not really sure what you want so 860 and 123 isn't included because it's a completely different field than the source of 827
SELECT A.ID as ID, A2.ID as Sub_ID
FROM ALL A
LEFT JOIN SERVICES S
on S.Pareint_ID = A.ID
LEFT JOIN All A2
on A2.Service_ID = S.ID
Now... if we assume that you have a version of oracle which supports ListAgg
SELECT A.ID as ID, ListAgg(A2.ID,',') within group (Order by A2.ID) as Sub_ID
FROM ALL A
LEFT JOIN SERVICES S
on S.Parent_ID = A.ID
LEFT JOIN All A2
on A2.Service_ID = S.ID
GROUP BY A.ID
Giving Us:
+----+-----+---------+
| | ID | SUB_ID |
+----+-----+---------+
| 1 | 827 | 828,829 | -- These are all.id's...
| 2 | 828 | NULL |
| 3 | 829 | NULL |
| 4 | 860 | NULL | --> Now why is 123 present as it's a service.id
| 5 | 861 | NULL |
+----+-----+---------+
**Note all is a reserved word and either needs to be escaped or if your table name really isn't all; adjust accordingly.
LISTAGG Docs
I'm considering whether I should format a table in my sqlite database in "wide or "long" format. Examples of these formats are included at the end of the question.
I anticipate that the majority of my requests will be of the form:
SELECT * FROM table
WHERE
series in (series1, series100);
or the analog for selecting by columns in wide format.
I also anticipate that there will be a large number of columns, even enough to need to increase the column limit.
Are there any general guidelines for selecting a table layout that will optimize query performance for this sort of case?
(Examples of each)
"Wide" format:
| date | series1 | series2 | ... | seriesN |
| ---------- | ------- | ------- | ---- | ------- |
| "1/1/1900" | 15 | 24 | 43 | 23 |
| "1/2/1900" | 15 | null | null | 23 |
| ... | 15 | null | null | 23 |
| "1/2/2019" | 12 | 12 | 4 | null |
"Long" format:
| date | series | value |
| ---------- | ------- | ----- |
| "1/1/1900" | series1 | 15 |
| "1/2/1900" | series1 | 15 |
| ... | series1 | 43 |
| "1/2/2019" | series1 | 12 |
| "1/1/1900" | series2 | 15 |
| "1/2/1900" | series2 | 15 |
| ... | series2 | 43 |
| "1/2/2019" | series2 | 12 |
| ... | ... | ... |
| "1/1/1900" | seriesN | 15 |
| "1/2/1900" | seriesN | 15 |
| ... | seriesN | 43 |
| "1/2/2019" | seriesN | 12 |
The "long" format is the preferred way to go here, for so many reasons. First, if you use the "wide" format and there is ever a need to add more series, then you would have to add new columns to the database table. While this is not too much of a hassle, in general once you put a schema into production, you want to avoid further schema changes.
Second, the "long" format makes reporting and querying much easier. For example, suppose you wanted to get a count of rows/data points for each series. Then you would only need something like:
SELECT series, COUNT(*) AS cnt
FROM yourTable
GROUP BY series;
To get this report with the "wide" format, you would need a lot more code, and it would be as verbose as your sample data above.
The thing to keep in mind here is that SQL databases are built to operate on sets of records (read: across rows). They can also process things column wise, but they are not generally setup to do this.
Originally I have structure like this:
+-------+-------+----+----+----+-----+
| time | type | s1 | s2 | id | p1 |
+-------+-------+----+----+----+-----+
| 10:30 | send | a | b | 1 | 110 |
| 10:35 | send | c | d | 1 | 120 |
| 10:31 | reply | e | f | 3 | 221 |
| 10:33 | reply | a | c | 1 | 210 |
| 10:34 | send | a | a | 3 | 113 |
| 10:32 | reply | c | d | 3 | 157 |
+-------+-------+----+----+----+-----+
I want to normalize the table:
group the entries by id,
inside each group, find out the oldest send type entry,
replace s1, s2 of other entries with the values from that oldest send type entry
```
+-------+-------+----+----+----+-----+
| time | type | s1 | s2 | id | p1 |
+-------+-------+----+----+----+-----+
| 10:30 | send | a | b | 1 | 110 |
| 10:35 | send | a | b | 1 | 120 |
| 10:33 | reply | a | b | 1 | 210 |
| 10:31 | reply | a | a | 3 | 221 |
| 10:34 | send | a | a | 3 | 113 |
| 10:32 | reply | a | a | 3 | 157 |
+-------+-------+----+----+----+-----+
this is how I tried to tackle the problem:
events_groupby_id = GROUP events BY id;
events_normalized = FOREACH events_groupby_id {
f_reqs = FILTER events BY type matches 'send';
o_reqs = ORDER events BY time ASC;
req = LIMIT o_reqs 1;
GENERATE req, events;
};
I am stuck at here. Because I found that events_normalized became a complicated structure with nested bags and I don't know how to flatten correctly.
events_normalized | req:bag{:tuple()} | events:bag{:tuple()}
From here, what should I do to achieve the data structure that I want? I would really appreciate it if anyone can help me out. Thank you.
You can unnest the bags in events_normalized using FLATTEN:
events_flattened = FOREACH events_normalized GENERATE
FLATTEN(req),
FLATTEN(events);
This creates a crossproduct between req and events, but since there is only one tuple in req, you end up with only one record for each of your original entries. The schema for events_flattened is:
req::time | req::type | req::s1 | req::s2 | req::id | req::p1 | events::time | events::type | events::s1 | events::s2 | events::id | events::p1
So now you can refer to the fields you wish to keep, using events for the original entries and req for the replacements from the oldest send type entry:
final = FOREACH events_flattened GENERATE
events::time AS time,
events::type AS type,
req::s1 AS s1,
req::s2 AS s2,
events::id AS id,
events::p1 AS p1;
Oracle has 2 functions - rank() and dense_rank() - which i've found very useful for some applications. I am doing something in mysql now and was wondering if they have something equivalent to those?
Nothing directly equivalent, but you can fake it with some (not terribly efficient) self-joins. Some sample code from a collection of MySQL query howtos:
SELECT v1.name, v1.votes, COUNT(v2.votes) AS Rank
FROM votes v1
JOIN votes v2 ON v1.votes < v2.votes OR (v1.votes=v2.votes and v1.name = v2.name)
GROUP BY v1.name, v1.votes
ORDER BY v1.votes DESC, v1.name DESC;
+-------+-------+------+
| name | votes | Rank |
+-------+-------+------+
| Green | 50 | 1 |
| Black | 40 | 2 |
| White | 20 | 3 |
| Brown | 20 | 3 |
| Jones | 15 | 5 |
| Smith | 10 | 6 |
+-------+-------+------+
how about this "dense_rank implement" in MySQL
CREATE TABLE `person` (
`id` int(11) DEFAULT NULL,
`first_name` varchar(20) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
`gender` char(1) DEFAULT NULL);
INSERT INTO `person` VALUES
(1,'Bob',25,'M'),
(2,'Jane',20,'F'),
(3,'Jack',30,'M'),
(4,'Bill',32,'M'),
(5,'Nick',22,'M'),
(6,'Kathy',18,'F'),
(7,'Steve',36,'M'),
(8,'Anne',25,'F'),
(9,'Mike',25,'M');
the data before dense_rank() like this
mysql> select * from person;
+------+------------+------+--------+
| id | first_name | age | gender |
+------+------------+------+--------+
| 1 | Bob | 25 | M |
| 2 | Jane | 20 | F |
| 3 | Jack | 30 | M |
| 4 | Bill | 32 | M |
| 5 | Nick | 22 | M |
| 6 | Kathy | 18 | F |
| 7 | Steve | 36 | M |
| 8 | Anne | 25 | F |
| 9 | Mike | 25 | M |
+------+------------+------+--------+
9 rows in set (0.00 sec)
the data after dense_rank() like this,including "partition by" function
+------------+--------+------+------+
| first_name | gender | age | rank |
+------------+--------+------+------+
| Anne | F | 25 | 1 |
| Jane | F | 20 | 2 |
| Kathy | F | 18 | 3 |
| Steve | M | 36 | 1 |
| Bill | M | 32 | 2 |
| Jack | M | 30 | 3 |
| Mike | M | 25 | 4 |
| Bob | M | 25 | 4 |
| Nick | M | 22 | 6 |
+------------+--------+------+------+
9 rows in set (0.00 sec)
the query statement is
select first_name,t1.gender,age,FIND_IN_SET(age,t1.age_set) as rank from person t2,
(select gender,group_concat(age order by age desc) as age_set from person group by gender) t1
where t1.gender=t2.gender
order by t1.gender,rank