Data transformation in PowerQuery - powerquery

I have a csv source with 3 columns: Dates, Product and Sales, that I have imported in PowerQuery and need to reorganize.
Dates
Product
Sales
2020
Italy
30000
2020
Milano
10000
2020
EUR
10000
2020
Cotton
3000
2020
Synthetics
3000
2020
Linen
2500
2020
Silk
1500
2020
Rome
20000
2020
EUR
20000
2020
Cotton
5000
2020
Synthetics
5000
2020
Linen
5000
2020
Silk
5000
2020
Spain
35000
2020
Madrid
10000
2020
EUR
10000
2020
Cotton
1000
2020
Synthetics
6000
2020
Linen
1500
2020
Silk
1500
2020
Barcelona
25000
2020
EUR
25000
2020
Cotton
10000
2020
Synthetics
8000
2020
Linen
3000
2020
Silk
4000
The problem is that Country, City, Currency and Product are all contained in the Product column, while I need to have 6 columns: Dates, Country, City, Currency, Product and Sales. My data should look like this
Dates
Country
City
Currency
Product
Sales
2020
Italy
Milano
EUR
Cotton
3000
2020
Italy
Milano
EUR
Synthetics
3000
2020
Italy
Milano
EUR
Linen
2500
2020
Italy
Milano
EUR
Silk
1500
2020
Italy
Rome
EUR
Cotton
5000
2020
Italy
Rome
EUR
Synthetics
5000
2020
Italy
Rome
EUR
Linen
5000
2020
Italy
Rome
EUR
Silk
5000
2020
Spain
Madrid
EUR
Cotton
1000
2020
Spain
Madrid
EUR
Synthetics
6000
2020
Spain
Madrid
EUR
Linen
1500
2020
Spain
Madrid
EUR
Silk
1500
2020
Spain
Barcelona
EUR
Cotton
1000
2020
Spain
Barcelona
EUR
Synthetics
6000
2020
Spain
Barcelona
EUR
Linen
1500
2020
Spain
Barcelona
EUR
Silk
1500
In my original table, in the Product column there are 3 spaces before each Country, 6 spaces before each City, 9 spaces before each Currency and 12 spaces before each product. How can I achieve this in PowerQuery?

There are three key steps here.
Split by " ". This uses the spaces to separate into distinct columns.
Fill down. This uses the last available value from the previous rows.
Filter subtotal rows. We only want the lowest level of granularity.
Try pasting this into your Advanced Editor and walking though the steps:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjIwMlDSUVJQUPAsScypBDKNDYBAKVYHWQ4IfDNzEvPygVxD7PJA4BoahFceCJzzS0ry86C24FIUXJlXkpFakplcTEChT2ZeKsgwI1M8hmXmZINchVVJUH5uKkg/AS/hlkf2kimxXsKnEOYlvIZBvIRFSXBBYiY4dLFr901MKcpMoVocGhLrYTMiPIw9goiIQ6fEouTUnPy8RGhCwBuR+IIVxV/EecyCCI/hTekQj5mAlcQCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Dates = _t, Product = _t, Sales = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Dates", Int64.Type}, {"Product", type text}, {"Sales", Int64.Type}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Product", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"blank", "Country", "City", "Currency", "Product"}),
#"Replaced Value" = Table.ReplaceValue(#"Split Column by Delimiter","",null,Replacer.ReplaceValue,{"Country", "City", "Currency"}),
#"Filled Down" = Table.FillDown(#"Replaced Value",{"Country", "City", "Currency"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([Product] <> null)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"blank"})
in
#"Removed Columns"

Related

Oracle How to get the birthday list from last 15 days to next 15 days?

I've a table with employees and their birth date, in a column in a format string.
I cannot modify the table, so I created a view to get their birth date in a real date format (TO_DATE).
Now, I would like to get the list of the employees having theirs birthday in the last 15 days and the employees who'll have theirs birthday in the next 15 days.
So, just based with the Day and the month.
I successfully get for exemple all employees bornt in April with "Extract", but, I'm sure you've already understand, when I'll run the query the 25 April, I'd like the futures birthday in May.
How could I get that (oracle 12c)
Thank you 🙂
Using the hiredate column in table scott.emp for testing:
select empno, ename, hiredate
from scott.emp
where add_months(trunc(hiredate),
12 * round(months_between(sysdate, hiredate) / 12))
between trunc(sysdate) - 15 and trunc(sysdate) + 15
;
EMPNO ENAME HIREDATE
---------- ---------- ----------
7566 JONES 04/02/1981
7698 BLAKE 05/01/1981
7788 SCOTT 04/19/1987
This will produce the wrong result in the following situation: if someone's birthday is Feb. 28 in a non-leap year, their birthday in a leap year (calculated with the ADD_MONTHS function in the query) will be considered to be Feb. 29. So, they will be excluded if running the query on, say, Feb. 13 2024 (even though they should be included), and they will be included if running the query on March 14 (even though they should be excluded). If you can live with this - those people will be recognized in the wrong window, once every four years - then this may be all you need. Otherwise that situation will require further tweaking.
For people born on Feb. 29 (in a leap year, obviously), their birthday in a non-leap-year is considered to be Feb. 28. With this convention, the query will always work correctly for them. Whether this convention is appropriate in your locale, only your business users can tell you. (Local laws and regulations may matter, too - depending on what you are using this for.)
You can use ddd format model:
DDD - Day of year (1-366).
For example:
SQL> with v(dt) as (
2 select date'2020-01-01'+level-1 from dual connect by date'2020-01-01'+level-1<date'2021-01-01'
3 )
4 select *
5 from v
6 where
7 not abs(
8 to_number(to_char(date'&dt','ddd'))
9 -to_number(to_char(dt ,'ddd'))
10 ) between 15 and 350;
Enter value for dt: 2022-01-03
DT
-------------------
2020-01-01 00:00:00
2020-01-02 00:00:00
2020-01-03 00:00:00
2020-01-04 00:00:00
2020-01-05 00:00:00
2020-01-06 00:00:00
2020-01-07 00:00:00
2020-01-08 00:00:00
2020-01-09 00:00:00
2020-01-10 00:00:00
2020-01-11 00:00:00
2020-01-12 00:00:00
2020-01-13 00:00:00
2020-01-14 00:00:00
2020-01-15 00:00:00
2020-01-16 00:00:00
2020-01-17 00:00:00
2020-12-19 00:00:00
2020-12-20 00:00:00
2020-12-21 00:00:00
2020-12-22 00:00:00
2020-12-23 00:00:00
2020-12-24 00:00:00
2020-12-25 00:00:00
2020-12-26 00:00:00
2020-12-27 00:00:00
2020-12-28 00:00:00
2020-12-29 00:00:00
2020-12-30 00:00:00
2020-12-31 00:00:00
30 rows selected.
NB: This example doesn't analyze leap years.
Similar to mathguy's answer, but translating the current date back to the birth year (rather than translating the birth year forwards):
SELECT *
FROM employees
WHERE birth_date BETWEEN ADD_MONTHS(
TRUNC(SYSDATE),
ROUND(MONTHS_BETWEEN(birth_date, SYSDATE)/12)*12
) - INTERVAL '15' DAY
AND ADD_MONTHS(
TRUNC(SYSDATE),
ROUND(MONTHS_BETWEEN(birth_date, SYSDATE)/12)*12
) + INTERVAL '15' DAY;
Then, for the sample data:
CREATE TABLE employees (name, birth_date) AS
SELECT 'Alice', DATE '2020-02-28' FROM DUAL UNION ALL
SELECT 'Betty', DATE '2020-02-29' FROM DUAL UNION ALL
SELECT 'Carol', DATE '2021-02-28' FROM DUAL UNION ALL
SELECT 'Debra', DATE '2022-04-28' FROM DUAL UNION ALL
SELECT 'Emily', DATE '2021-03-30' FROM DUAL UNION ALL
SELECT 'Fiona', DATE '2021-03-31' FROM DUAL;
If today's date is 2022-04-16 then the output is:
NAME
BIRTH_DATE
Debra
28-APR-22
If today's date is 2022-03-15 then the output is:
NAME
BIRTH_DATE
Betty
29-FEB-20
Carol
28-FEB-21
Emily
30-MAR-21
And would get values from 28th February - 30th March in a non-leap-year and from 29th February - 30th March in a leap year.
db<>fiddle here

Window functions: How to partition over nothing?

I am extracting a table, but I would also like the sum of a column.
I can say SUM(column) over (partition by other_column)
to get a new column with a sum over the column for every grouping given by other_column.
But I don't want a grouping! Basically sum(column) is meant to give me a column with a constant row equal to the sum of the entire column with no partitioning.
So how do I partition over nothing?
Exactly like you said; over "nothing". For example:
SQL> select deptno, ename, sal, sum(sal) over () sumsal
2 from emp;
DEPTNO ENAME SAL SUMSAL
---------- ---------- ---------- ----------
20 SMITH 920 34145
30 ALLEN 1600 34145
30 WARD 1250 34145
20 JONES 2975 34145
30 MARTIN 1250 34145
30 BLAKE 2850 34145
10 CLARK 2450 34145
20 SCOTT 3000 34145
10 KING 10000 34145
30 TURNER 1500 34145
20 ADAMS 1100 34145
30 JAMES 950 34145
20 FORD 3000 34145
10 MILLER 1300 34145
14 rows selected.
SQL>
Let's see the table orders created as follows:
Schema (MySQL v8.0)
CREATE TABLE orders (
`trade_date` DATETIME,
`ticker` VARCHAR(4),
`trans_type` VARCHAR(4),
`quantity` INTEGER
);
INSERT INTO orders
(`trade_date`, `ticker`, `trans_type`, `quantity`)
VALUES
('2020-12-10', 'FB', 'BUY', '100'),
('2020-12-28', 'FB', 'BUY', '50'),
('2020-12-29', 'FB', 'SELL', '80'),
('2020-12-30', 'FB', 'SELL', '30'),
('2020-12-31', 'FB', 'BUY', '40'),
('2020-11-16', 'AAPL', 'BUY', '30'),
('2020-11-17', 'AAPL', 'SELL', '70'),
('2020-11-20', 'AAPL', 'BUY', '50'),
('2020-11-24', 'AAPL', 'BUY', '40');
And we want to sum over the quantity by the trans_type:
Query #1
SELECT
trade_date,
ticker,
trans_type,
quantity,
SUM(CASE WHEN trans_type='SELL' THEN -quantity ELSE quantity END) OVER () AS net_quantity
FROM
orders;
We will get this table:
trade_date
ticker
trans_type
quantity
net_quantity
2020-12-10 00:00:00
FB
BUY
100
130
2020-12-28 00:00:00
FB
BUY
50
130
2020-12-29 00:00:00
FB
SELL
80
130
2020-12-30 00:00:00
FB
SELL
30
130
2020-12-31 00:00:00
FB
BUY
40
130
2020-11-16 00:00:00
AAPL
BUY
30
130
2020-11-17 00:00:00
AAPL
SELL
70
130
2020-11-20 00:00:00
AAPL
BUY
50
130
2020-11-24 00:00:00
AAPL
BUY
40
130
View on DB Fiddle
This article would be helpful for you to learn window functions: An Intro to SQL Window Functions.
Reference:
mysql window function with case

How to convert two rows in one using Oracle

I have a table named COMPANY_DETAILS which has total 5 columns
ID NAME REVENUE_YEAR TOTAL_REVENUE TOTAL_ASSET
100 ABC CORP 2015 50000 30000
100 ABC CORP 2016 60000 40000
200 XYZ CORP 2015 85000 60000
200 XYZ CORP 2016 75000 55000
How can I get the below table pattern through SQL Query?
ID NAME REVENUE_YEAR15 TOTAL_REVENUE15 TOTAL_ASSET15 REVENUE_YEAR16 TOTAL_REVENUE16 TOTAL_ASSET16
100 ABC CORP 2015 50000 30000 2016 60000 40000
200 XYZ CORP 2015 85000 60000 2016 75000 55000
Please help me on that. I am using Oracle 11g.
Try this.
select ID, NAME, 2015 REVENUE_YEAR15 ,
SUM ( CASE WHEN REVENUE_YEAR = 2015 THEN TOTAL_REVENUE
ELSE 0 END) TOTAL_REVENUE15,
SUM ( CASE WHEN REVENUE_YEAR = 2015 THEN TOTAL_ASSET
ELSE 0 END) TOTAL_REVENUE15,
2016 REVENUE_YEAR16 ,
SUM ( CASE WHEN REVENUE_YEAR = 2016 THEN TOTAL_REVENUE
ELSE 0 END) TOTAL_REVENUE16,
SUM ( CASE WHEN REVENUE_YEAR = 2016 THEN TOTAL_ASSET
ELSE 0 END) TOTAL_REVENUE16
FROM COMPANY_DETAILS GROUP BY ID,NAME;
DEMO

select value from date before 9am everyday

I have a table that tracks mileage of 10 vehicles every hour every day in a oracle database. For example:
Car | Mileage| Timestamp
Honda | 23.4| 11-Jan-17 08.00.00.000000 AM
Honda | 22| 11-Jan-17 09.00.00.000000 AM
Honda | 21.3 | 11-Jan-17 10.00.00.000000 AM
Honda | 24.4| 11-Jan-17 11.00.00.000000 AM
Honda | 23.2| 12-Jan-17 08.00.00.000000 AM
Honda | 25| 12-Jan-17 09.00.00.000000 AM
Honda | 26| 12-Jan-17 10.00.00.000000 AM
I dont understand how I can write a query to run this everyday for last 1 years worth of data and select mileage for every car before 9am everyday
Assuming you mean mileage later than midnight and prior to 09:00 then the following will do the job and also cope with other car makes.
WITH base_data AS
(SELECT car, mileage, read_date , ROW_NUMBER() OVER(PARTITION BY car,TRUNC(READ_DATE) ORDER BY read_date DESC) as ranking
FROM wg_test
WHERE EXTRACT (HOUR FROM read_date) BETWEEN 0 AND 8
AND read_date > SYSDATE - 365)
SELECT car, mileage, read_date
FROM base_data
WHERE ranking = 1

Insert using subquery by adding dates between two dates based on days [oracle 11g]

This is presentation table:
ID PRESENTATIONDAY PRESENTATIONSTART PRESENTATIONEND PRESENTATIONSTARTDATE PRESENTATIONENDDATE
622 Monday 12:00:00 02:00:00 01-05-2016 04-06-2016
623 Tuesday 12:00:00 02:00:00 01-05-2016 04-06-2016
624 Wednesday 08:00:00 10:00:00 01-05-2016 04-06-2016
625 Thursday 10:00:00 12:00:00 01-05-2016 04-06-2016
I would like to insert availabledate in schedule table. This is my current query :
insert into SCHEDULE (studentID,studentName,projectTitle,supervisorID,
supervisorName,examinerID,examinerName,exavailableID,
availableday,availablestart,availableend,
availabledate) //PROBLEM STARTS HERE
values (?,?,?,?,?,?,?,?,?,?,?,?));
The value availabledate are retrieved based on the exavailableID
. For example, if exavailableID = 2, the availableday = Monday, availablestart= 12pm, availableend = 2pm.
The dates will only be chosen only between PRESENTATIONSTARTDATE to PRESENTATIONENDDATE from presentation table.
In presentation table, it will match PRESENTATIONDAY, PRESENTATIONDATESTART and PRESENTATIONDATEEND with availableday, availablestart and availableend to get a list of all possible dates.
This is the query to get list of all possible dates based on particular days:
select
A.PRESENTATIONID,
A.PRESENTATIONDAY,
A.PRESENTATIONDATESTART+delta LIST_DATE
from
PRESENTATION A,
(
select level-1 as delta
from dual
connect by level-1 <= (
select max(PRESENTATIONDATEEND- PRESENTATIONDATESTART) from PRESENTATION
)
)
where A.PRESENTATIONDATESTART+delta <= A.PRESENTATIONDATEEND
and
a.presentationday = trim(to_char(A.PRESENTATIONDATESTART+delta, 'Day'))
order by 1,2,3;
This query result is:
622 Monday 02-05-2016 12:00:00
...
622 Monday 30-05-2016 12:00:00
623 Tuesday 03-05-2016 12:00:00
...
623 Tuesday 31-05-2016 12:00:00
624 Wednesday 04-05-2016 12:00:00
...
624 Wednesday 01-06-2016 12:00:00
625 Thursday 05-05-2016 12:00:00
...
625 Thursday 02-06-2016 12:00:00
It will automatically assign dates from the SELECT query to be inserted in schedule table. However, each date can be used only 4 times. Once it reached 4 times, it will proceed to next date. For example, if Monday, '02-05-2016' to '09-05-2016'
How can I corporate these two queries (INSERT and SELECT) to have a result like this:
StudentName projectTitle SupervisorID ExaminerID availableday availablestart availableend availabledate
abc Hello 1024 1001 MONDAY 12.00pm 2.00pm 02-05-2016
def Hi 1024 1001 MONDAY 12.00pm 2.00pm 02-05-2016
ghi Hey 1002 1004 MONDAY 12.00pm 2.00pm 02-05-2016
xxx hhh 1020 1011 MONDAY 12.00pm 2.00pm 02-05-2016
jkl hhh 1027 1010 MONDAY 12.00pm 2.00pm 09-05-2016
try ttt 1001 1011 MONDAY 12.00pm 2.00pm 09-05-2016
654 bbb 1007 1012 MONDAY 12.00pm 2.00pm 09-05-2016
gyg 888 1027 1051 MONDAY 12.00pm 2.00pm 09-05-2016
yyi 333 1004 1022 TUESDAY 12.00pm 2.00pm 03-05-2016
fff 111 1027 1041 TUESDAY ..
ggg 222 1032 1007 TUESDAY .. .. .. ..
hhh 444 1007 1001 TUESDAY 12.00pm 2.00pm 03-05-2016
and so on :)
In short, I would like to use the list of dates from presentation table based on the day, start time and end time to insertion query where each date will only used 4 times. Thank you!
I am not sure this kind of syntax works with oracle (and have no good way to check), but changing the select part of insert like this may or may not work.
select
A.PRESENTATIONID,
A.PRESENTATIONDAY,
A.PRESENTATIONDATESTART+delta LIST_DATE
from
PRESENTATION A,
(
select level-1 as delta
from dual
connect by level-1 <= (
select max(PRESENTATIONDATEEND - PRESENTATIONDATESTART) from PRESENTATION
)
),
--MIGHT NEED ADDITIONAL LOGIC FOR THE EXAVAILABLEID COMPARISON
(SELECT count(S.*) as counter FROM SCHEDULE S WHERE S.EXAVAILABLEID=A.ID) C
where A.PRESENTATIONDATESTART+delta <= A.PRESENTATIONDATEEND
and
a.presentationday = trim(to_char(A.PRESENTATIONDATESTART+delta, 'Day'))
and
C.counter<4
order by 1,2,3;
EDIT: Changed the operator. Had >= before. Placed teh WHERE check at the right place. Deleted aliases.
EDIT2: changed the syntax to where the counter select statement is a part of the from clause.

Resources