How to Flatten and get expected output shown below from pig after Group by - hadoop

Sample Date:
ID marks date
12345 12 20210204
12345 13 20210204
12345 2 20210204
Input:
(12345,{(12345,12,20210204),(12345,13,20210204),(12345,2,20210204)})
Output needed:
(12345,27,20210204)
Second element is the aggregated value.
Help is Appreciated

output = FOREACH input GENERATE
group AS ID,
SUM(sample.marks) AS mark_sum,
MIN(sample.date) AS first_date;
You may need to tweak based on your relation and field names. You might also want to group by the date field too if these are all the same.

Related

Oracle: Retrieving specific group of records based by date

I have a table in oracle that I'm trying to write a query for but having a problem writing it correctly. The data of the table looks like this:
Name
ID
DATE
Shane
1
01JAN2023
Angie
2
02JAN2023
Shane
1
02JAN2023
Austin
3
03JAN2023
Shane
1
03JAN2023
Angie
2
03JAN2023
Tony
4
05JAN2023
What I was trying to come up with was a way to iterate over each day, look at all the records for that day and compare with the rest of the records in the table that came before it and only pull back the first instance of the record based on the ID & Date. The expected output would be:
Name
ID
DATE
Shane
1
01JAN2023
Angie
2
02JAN2023
Austin
3
03JAN2023
Tony
4
05JAN2023
Can anyone tell me what the query should be to accomplish this?
Thank you in advance.
You'll need to convert your date field to a real date so it orders correctly
SELECT name,id,MIN(TO_DATE(date,'DDMONYYYY')) date
FROM table
GROUP BY name,id
Isn't that just
select name, id, min(date_column)
from your_table
group by name, id;
If you don't want to use aggregation, you can use FETCH NEXT ROWS WITH TIES:
SELECT tab.*
FROM tab
ORDER BY ROW_NUMBER() OVER(PARTITION BY Name, Id ORDER BY DATE_)
FETCH NEXT 1 ROWS WITH TIES
Output:
NAME
ID
DATE_
Angie
2
02-JAN-23
Austin
3
03-JAN-23
Shane
1
01-JAN-23
Tony
4
05-JAN-23
Check the demo here.

How to sort the answer based on the timestamp after flatten the matrix in google sheet?

Hi everyone,
My goal is to flatten the Answers in C4:E7 into one column and then sort it based on the ascending order of the Submission Timestamp, then sort it again from Answer1 to Answer3.
For example in the screenshot above, Student B submit the answers at 2:49:27pm, which is the earliest among 4 students, so his answer should be on the top of the column and start from Answer 1 to Answer 3 then only follow by the answer from Student A and Student D.
I'm using =QUERY(FLATTEN(C4:E7),"Select * where Col1 is not null") now. I'm not sure how to sort it based on timestamp first in this case.
Column I is the expected output.
Hope to get some help on this issue, any help will be greatly appreciated!
Try:
=arrayformula(query(iferror(split(flatten(if(A4:A<>"",B4:B&char(9999)&C4:E,)),char(9999)),),"select Col2 where Col2 !='' order by Col1,Col2",0))
NOTES:
The starting point is:
=arrayformula(if(A4:A<>"",B4:B&char(9999)&C4:E,))
This repeats your 'Submission time' column with each of the 3 answer columns, separated by a character that is unlikely to be used in your data set char(9999) (✏).
Then flatten() puts them in 1 column:
split() is then used on ✏ to get the results into 2 cols, but you'll need iferror() to stop a formula issue working down the sheet.
Then the query() wraps around the result to select col2 (where it's not empty), and sort by Col1,Col2.
Alternative with filter() so you don't need the iferror():
=arrayformula(query(split(flatten(filter(B4:B&char(9999)&C4:E,B4:B<>"")),char(9999)),"select Col2 where Col2 !='' order by Col1,Col2",0))

Get distinct values using FOR - ABAP

How can I retrieve the distinct values from an internal table?
I am using the SORT and DELETE ADJACENT DUPLICATES to get what I need, but I would like to improve these kind of selections.
The point is: imagine you have an internal table with two purchase orders information, where each one has two items. How can I get the distinct purchase orders number?
For instance: I've selected the following information from EKPO:
ebeln | ebelp
---------- | -----
1234567890 | 00010
1234567890 | 00020
1234567891 | 00010
1234567891 | 00020
To get distinct ebeln values:
ebeln
----------
1234567890
1234567891
For that, I need to sort the table and apply the DELETE ADJACENT DUPLICATES. I would like to know if there is any trick to replace these commands.
COLLECT also results distinct values
DATA: lt_collect like table of lt_source-some_field.
LOOP AT lt_source INTO ls_source.
COLLECT ls_source-some_field INTO lt_collect.
ENDLOOP.
* lt_collect has distinct values of lt_source-some_field
To get distinct EBELN what you need to do is simply
SELECT DISTINCT ebeln
FROM ekpo
INTO TABLE lt_distinct_ebeln
WHERE (your_where_condition).
That's all it takes.
An option would be to create a loop and select when the values change. For this to work as you mention, the table must be sorted by the field you are looking for.
loop at GT_TABLE into WA_TABLE.
on change FIELD.
*Operation
endon.
endloop.
Another option is to use the same but with a AT. In order for AT to work, the values from the field select in AT declaration to the left of the table must be the same.
loop at GT_TABLE into WA_TABLE.
at new WA_TABLE-FIELD.
*Operation
endat.
endloop.

Coldfusion query of queries count by date

I'm trying to get an count based on two dates and I'm not sure how it should look in a query. I have two date fields; I want to get a count based on those dates.
<cfquery>
SELECT COUNT(*)
FROM Table1
Where month of date1 is one month less than month of date2
</cfquery>
Assuming Table1 is your original query, you can accomplish your goal as follows.
Step 1 - Use QueryAddColumn twice to add two empty columns.
Step 2 - Loop through your query and populate these two columns with numbers. One will represent date1 and the other will represent date2. It's not quite as simple as putting in the month numbers because you have to account for the year as well.
Step 3 - Write your Q of Q with a filter resembling this:
where NewColumn1 - NewColumn2 = 1

Count Length and then Count those records.

I am trying to create a view that displays size (char) of LastName and the total number of records whose last name has that size. So far I have:
SELECT LENGTH(LastName) AS Name_Size
FROM Table
ORDER BY Name_Size;
I need to add something like
COUNT(LENGTH(LastName)) AS Students
This is giving me an error. Do I need to add a GROUP BY command? I need the view:
Name_Size Students
3 11
4 24
5 42
SELECT LENGTH(LastName) as Name_Size, COUNT(*) as Students
FROM Table
GROUP BY Name_Size
ORDER BY Name_Size;
You may have to change the group by and order by to LENGTH(LastName) as not all SQL engines let you reference an alias from the select statement in a clause on that same statement.
HTH,
Eric

Resources