Power Query Transform/Lookup Data From Rows To Columns - powerquery

I have a set of data ("My Data") shown below, how to shift the data from rows to columns in Power Query?
("My Preferred Answer") would be the final output.
My Data:
| FruitName | Price | Quantity |
| --------- | ----- | -------- |
| Apple | 1 | 1 |
| Banana | 2 | 1 |
| Orange | 3 | 1 |
| Colour | *null* | *null* |
| Apple | Red | *null* |
| Banana | Yellow | *null* |
| Orange | Orange | *null* |
My Preferred Answer:
| FruitName | Price | Quantity | Colour |
| --------- | ----- | -------- | ------ |
| Apple | 1 | 1 | Red |
| Banana | 2 | 1 | Yellow |
| Orange | 3 | 1 | Orange |

Read the code comments to better understand the algorithm.
Split the table at the "colour" line
Join the colours with the inventory table
M Code
let
//Change Table name in next line to the actual table name in your workbook
Source = Excel.CurrentWorkbook(){[Name="Table28"]}[Content],
//Separate table for Color and Inventory
splitTable = Table.SplitAt(Source, List.PositionOf(Source[FruitName],"Colour")),
//Remove the row with the table split word "colour"
//Remove the Quantity column
//Rename the Price column
//Set the data types
colourTable = Table.TransformColumnTypes(
Table.RenameColumns(
Table.RemoveColumns(
Table.RemoveFirstN(splitTable{1},1),"Quantity"),{"Price","Colour"}),
{{"FruitName",type text},{"Colour",type text}}),
//Set the data types
inventoryTable = Table.TransformColumnTypes(splitTable{0},{
{"FruitName", type text},
{"Price",Currency.Type},
{"Quantity", Int64.Type}
}),
//Join the colour column
joined = Table.NestedJoin(inventoryTable,"FruitName",colourTable,"FruitName","Joined",JoinKind.LeftOuter),
#"Expanded Joined" = Table.ExpandTableColumn(joined, "Joined", {"Colour"}, {"Colour"})
in
#"Expanded Joined"

Related

Efficient way to execute udf based on group

I am new to pyspark and have a performance issue. Given a dataframe
| Group | Id | Selected |
| ----- | ---- | --------- |
| A | id1 | 0 |
| A | id2 | 0 |
| A | id3 | 0 |
| B | id4 | 0 |
| B | id5 | 0 |
And a sampling dictionary
sample_dict = {'A': 2, 'B': 1}
I want to randomly select (update the Selected column) of the dataframe follows the sampling dictionary. For example
| Group | Id | Selected |
| ----- | ---- | --------- |
| A | id1 | *1* |
| A | id2 | 0 |
| A | id3 | *1* |
| B | id4 | 0 |
| B | id5 | *1* |
Currently, I can only iterate through each group, get the subgroup, randomly select Id and update the original dataframe according to the Ids
for group in sample_dict:
sub_df = df.filter(col('Group') == group)
# id_list = a_udf_randomly_sample_n_Id_from_sub_df
# update df['Selected'] if df['Id'].isin(id_list)
The problem with this approach is sequential processing (do each group one-by-one). If the number of group and total rows scale up, the pyspark code runs slower than the simple pandas version (on databricks).
Would you share better approaches (e.g. processing the incident selection in each group in parallel) for this problem in pyspark?
Thank you very much

How to find and group records which have same conditions subsequently?

We have a table which we record our students' attendances in. We need to find out which students missed n number of contacts in a row.
Here is the attendance table structure with example records
----------------------------------------------------
| id | student_id | class_id | checkedin_time |
--------------------------------------------------
| 1 | 1 | 1 | null |
| 2 | 1 | 2 | 2019-07-09 10:30 |
| 3 | 1 | 3 | null |
| 4 | 1 | 4 | null |
| 5 | 1 | 5 | 2019-07-12 12:00 |
----------------------------------------------------
What I'm looking for is a code that show me records with id 3 and 4 (two subsequent missed contacts) grouped by student_id
This was my starting point:
$attendances = \App\Attendance::where('checked_time' , null)->get();
$attendances->groupBy('student_id')

Determinate unique values from oracle join?

I need a way to avoid duplicate values from oracle join, I have this scenario.
The first table contain general information about a person.
+-----------+-------+-------------+
| ID | Name | Birtday_date|
+-----------+-------+-------------+
| 1 | Byron | 12/10/1998 |
| 2 | Peter | 01/11/1973 |
| 4 | Jose | 05/02/2008 |
+-----------+-------+-------------+
The second table contain information about a telephone of the people in the first table.
+-------+----------+----------+----------+
| ID |ID_Person |CELL_TYPE | NUMBER |
+-------+- --------+----------+----------+
| 1221 | 1 | 3 | 099141021|
| 2221 | 1 | 2 | 099091925|
| 3222 | 1 | 1 | 098041013|
| 4321 | 2 | 1 | 088043153|
| 4561 | 2 | 2 | 090044313|
| 5678 | 4 | 1 | 092049013|
| 8990 | 4 | 2 | 098090233|
+----- -+----------+----------+----------+
The Third table contain information about a email of the people in the first table.
+------+----------+----------+---------------+
| ID |ID_Person |EMAIL_TYPE| Email |
+------+- --------+----------+---------------+
| 221 | 1 | 1 |jdoe#aol.com |
| 222 | 1 | 2 |jdoe1#aol.com |
| 421 | 2 | 1 |xx12#yahoo.com |
| 451 | 2 | 2 |dsdsa#gmail.com|
| 578 | 4 | 1 |sasaw1#sdas.com|
| 899 | 4 | 2 |cvcvsd#wew.es |
| 899 | 4 | 2 |cvsd#www.es |
+------+----------+----------+---------------+
I was able to produce a result like this, you can check in this link http://sqlfiddle.com/#!4/8e326/1
+-----+-------+-------------+----------+----------+----------+----------------+
| ID | Name | Birtday_date| CELL_TYPE| NUMBER |EMAIL_TYPE|EMAIL|
+-----+-------+-------------+----------+----------+----------+----------------+
| 1 | Byron | 12/10/1998 | 3 | 099141021|1 |jdoe#aol.com |
| 1 | Byron | 12/10/1998 | 2 | 099091925|2 |jdoe1#aol.com |
| 1 | Byron | 12/10/1998 | 1 | 099091925| | |
| 2 | Peter | 01/11/1973 | 1 | 088043153|1 |xx12#yahoo.com |
| 2 | Peter | 01/11/1973 | 2 | 090044313|2 |dsdsa#gmail.com |
| 4 | Jose | 05/02/2008 | 1 | 092049013|1 |sasaw1#sdas.com |
| 4 | Jose | 05/02/2008 | 2 | 098090233|2 |cvcvsd#wew.es |
+-----+-------+-------------+----------+----------+----------+----------------+
If you check the data in table Email for user with ID_Person = 4 only present two of the three emails that have, the problem for this case is the person have more emails that cellphone numbers and only will present the same number of the cellphone numbers.
The result i expected is something like this.
+-----+-------+-------------+----------+----------+----------+----------------+
| ID | Name | Birtday_date| CELL_TYPE| NUMBER |EMAIL_TYPE|EMAIL|
+-----+-------+-------------+----------+----------+----------+----------------+
| 1 | Byron | 12/10/1998 | 3 | 099141021|1 |jdoe#aol.com |
| 1 | Byron | 12/10/1998 | 2 | 099091925|2 |jdoe1#aol.com |
| 1 | Byron | 12/10/1998 | 1 | 099091925| | |
| 2 | Peter | 01/11/1973 | 1 | 088043153|1 |xx12#yahoo.com |
| 2 | Peter | 01/11/1973 | 2 | 090044313|2 |dsdsa#gmail.com |
| 4 | Jose | 05/02/2008 | 1 | 092049013|1 |sasaw1#sdas.com |
| 4 | Jose | 05/02/2008 | 2 | 098090233|2 |cvcvsd#wew.es |
| 4 | Jose | 05/02/2008 | | |2 |cvsd#www.es |
+-----+-------+-------------+----------+----------+----------+----------------+
This is the way that i need to present the data.
I could not understand why your query was so complex, thus, added the simple full outer join and it seems to be working:
select distinct p.id, p.name,
case when Lag(CELL) over(partition by p.id order by p.id,pe.id) = CELL then null else cell_type end as cell_type,
case when Lag(CELL) over(partition by p.id order by p.id,pe.id) = CELL then null else CELL end as CELL,
EMAIL_TYPE as EMAIL_TYPE, EMAIL as EMAIL
from person p full outer join phones pe on p.id = pe.id
full outer join emails e
on p.id = e.id and pe.cell_type = e.email_type;

Birt-Crosstab with empty columns

so I'm a BIRT beginner, and I just tried to get a real simple report from one of my tables of a postgres DB.
So I defined a flat table as datasource which looks like:
+----------------+--------+----------+-------+--------+
| date | store | product | value | color |
+----------------+--------+----------+-------+--------+
| 20160101000000 | store1 | productA | 5231 | red |
| 20160101000000 | store1 | productB | 3213 | green |
| 20160101000000 | store2 | productX | 4231 | red |
| 20160101000000 | store3 | productY | 3213 | green |
| 20160101000000 | store4 | productZ | 1223 | green |
| 20160101000000 | store4 | productK | 3113 | yellow |
| 20160101000000 | store4 | productE | 213 | green |
| .... | | | | |
| 20160109000000 | store1 | productA | 512 | green |
+----------------+--------+----------+-------+--------+
So I would like to add a table / crosstab to my birt report which creates a table (and after that a page break) for EVERY store which looks like:
**Store 1**
+----------------+----------+----------+----------+-----+
| | productA | productB | productC | ... |
+----------------+----------+----------+----------+-----+
| 20160101000000 | 3120 | 1231 | 6433 | ... |
| 20160102000000 | 6120 | 1341 | 2121 | ... |
| 20160103000000 | 1120 | 5331 | 1231 | ... |
+----------------+----------+----------+----------+-----+
--- PAGE BREAK ---
....
So what I tried in first was: Getting to work the standard CrossTab tutorial-template of BIRT.
I defined the DataSource, and created a datacube with dimension-group of 'store' and 'product' , and as SUM / detail -data the 'value' and for this example I just selected ONE day.
But the result looks like this:
+--------+----------+----------+----------+----------+-----+----------+
| | productA | productC | productD | productE | ... | productZ |
+--------+----------+----------+----------+----------+-----+----------+
| Store1 | 213 | | 3234 | 897 | ... | 6767 |
| Store2 | 513 | 2213 | 1233 | | ... | 845 |
| Store3 | 21 | | | 32 | ... | |
| Store4 | 123 | 222 | 142 | | ... | |
+--------+----------+----------+----------+----------+-----+----------+
It's because not every product is selled in every store, but the crosstab creates the columns by selecting ALL products available.
So, I just have no idea how to generate dynamicly different tables with different (but also dynamic) amount of columns.
The second step then would be to get the dates (days) to work.
But thanks in advance for every hint ot tutorial link to question one ;-)
You can just add a table with the complete datasource. Select the table and a group. Group by StoreID. You can set the pagebreak options for each grouping. Set the property for after to "always exluding last".
BIRT will add a group header. You can add multiple groupheader rows get the layout you're after.
For crosstabs it works in a similar way. After you added the crosstab to your page and set the info for the groups on rows and columns and added summaries. You can view the data. Select the crosstab and View the Row Area properties, select the pagegroup settings and add a new pagebreak. You can select on which group you want to break, choose your storeID group and select after: "always excluding last"

Auto increment without sequence

I have a select statement that generate set value thereafter I want insert that set of values into another table, MY concern is I'm using select statement in select I'm using one one more select clause((select max(org_id)+1 from org)) where I'm trying to get max value and increment by one but I'm not able get incremented value instead I'm getting same value you can see column name id_limit
select abc,abc1,abc3,abc4,(select max(org_id)+1 from org) as id_limit from xyz
current output
-----------------------------------------------------------------
| abc | abc1 | abc3 | abc4 | id_limit |
----------------------------------------------------------------|
| BUSINESS_UNIT | 0 | 100 | London | 6 |
| BUSINESS_UNIT | 0 | 200 | Sydney | 6 |
| BUSINESS_UNIT | 0 | 300 | Kiev | 6 |
-----------------------------------------------------------------
I'm trying to get expected out output
-----------------------------------------------------------------
| abc | abc1 | abc3 | abc4 | id_limit |
----------------------------------------------------------------|
| BUSINESS_UNIT | 0 | 100 | London | 6 |
| BUSINESS_UNIT | 0 | 200 | Sydney | 7 |
| BUSINESS_UNIT | 0 | 300 | Kiev | 8 |
-----------------------------------------------------------------
Yes, in Oracle 12.
create table foo (
id number generated by default on null as identity
);
https://oracle-base.com/articles/12c/identity-columns-in-oracle-12cr1
In previous versions you use sequence/trigger as explained here:
How to create id with AUTO_INCREMENT on Oracle?

Resources