Powerquery: split text from different columns to same rows - powerquery

I have:
column1 | column2 | colum3
a;b;c | x;y;z | door;house;tree
Desired result using Excel powerquery:
a | x | door
b | y | house
c | z | tree
I tried with:
Text.Split([column1],";") and expand to new lines, obtaining:
a
b
c
However when tried the same with other values, new lines are created instead to use the existent ones.

You may use this code:
let
Source = Excel.CurrentWorkbook(){[Name="Table"]}[Content],
rec = Table.ReplaceValue(Source,0,0,(a,b,c)=>Text.Split(a,";"),{"column1", "column2", "column3"}){0},
table = #table(Record.FieldNames(rec),List.Zip(Record.FieldValues(rec)))
in
table

Related

Cannot sort a range with variables (GAS script on Google Sheets)

My full script offers a sidebar to users, for them to choose the four columns they wish to sort and the sort order of each in a Sheets table called "sortTemp". Their responses are stored in a sheet called "tempVar". All is well until I get to the actual sorting. I gather the variables from tempVar, but when it gets to the sort command, it does nothing. No error. No sorting. I'm putting here the portion of the code that is failing.
function testSort() {
// Sort tempVar
var spreadsheet = SpreadsheetApp.getActive();
spreadsheet.getRange('A1').activate();
var currentCell = spreadsheet.getCurrentCell();
spreadsheet.getActiveRange().getDataRegion().activate();
currentCell.activateAsCurrentCell();
// Map tempVar to sortTempColumns
var col1 = spreadsheet.getRange('tempVar!B5').getValue();
var col1dir = spreadsheet.getRange('tempVar!B6').getValue();
var col2 = spreadsheet.getRange('tempVar!B7').getValue();
var col2dir = spreadsheet.getRange('tempVar!B8').getValue();
var col3 = spreadsheet.getRange('tempVar!B10').getValue();
var col3dir = spreadsheet.getRange('tempVar!B11').getValue();
var col4 = spreadsheet.getRange('tempVar!B12').getValue();
var col4dir = spreadsheet.getRange('tempVar!B13').getValue();
spreadsheet.getActiveRange().sort([{column: col1, ascending: col1dir}, {column: col2, ascending: col2dir}, {column: col3, ascending: col3dir}, {column: col4, ascending: col4dir}]);
spreadsheet.getRange('A1').activate();
}
Here is a sample of the data being sorted:[[table with sample data]][1]Here is the sheet that the variables are being drawn from:[tempVar][2]The user's sort choices are stored in column A and reconfigured for use by the sort script in column B.The column numbers are being calculated like this:=VLOOKUP(A5,C$1:D$8,2,false)The booleans are being calculated with this:=lower(if(A6<="Ascending",true,false))
What I've Tried
I've tested replacing the variables with real numbers and booleans
for the ascending part, to determine if I had written the code
incorrectly. It works without using variables--breaks with them.
I've tested populating the variables with numbers and booleans as
part of the script here, rather than gathering them from another
sheet in the file. Setting "col1 = 4" for example. It did nothing.
I've tested populating the variables with numbers and booleans as
part of the script here, rather than gathering them from another
sheet in the file. Setting "col1 = 4" for example. It did nothing.
I've used ui alerts to ensure the variables are coming over, and they
are. I've even done math with the variables in the alert to be sure
they're real numbers, and they are.
I've tried putting "Number()" around the script that's populating the
variables, to ensure they're converted to numbers. It didn't
help.
I've let a macro memorize the sort steps, then plugged in
the variables, to ensure I have the code that is supposed to work. No
help.
I am new to GAS script and I'm hoping my error will be obvious to someone who is skilled.What am I doing wrong?
[1]: sortTemp table
|12|1|Thomas |Hannah |Jr. | |Spartanburg District|SC|1790|
|13|1|Tom |Hannah |M.D. |Smithville |Spartanburg County |SC|1800|
|14|1|J. T. |Hannah |Junior| |Renfroe |SC|1810|
|15|1|Robert |Hanna |Jr | |Spartanburg District|SC|1820|
|16|1|D. C. |Baker | | |Tuscaloosa County |AL|1830|
|17|1|Donna Cox|Baker | |Birmingham |Jefferson |AL|1860|
|18|1|John |Maloney| |Eastern Valley|Taylor County |FL|1860|
[2]: tempVar table
| Census Worksheet | GAS Value | Key | |
| Copy of Census Worksheet 5 | | Given Name(s) | 3 |
| 6 | | Surname | 4 |
| 19 | | Suffix | 5 |
| Surname | 4 | Community | 6 |
| | true | County | 7 |
| Given Name(s) | 3 | State | 8 |
| | true | Census Year | 9 |
| | | | |
| State | 8 | | |
| | true | | |
| County | 7 | | |
| | true | | |
| Copy to new sheet | | |
The problem turned out to be in the boolean variables. They were coming through as strings and preventing the sort from happening. I have changed the script to convert the "true" and "false" strings to booleans using JSON.parse(), and the script runs fine.
Here is the new code:
function testSort() {
// Sort temp
var spreadsheet = SpreadsheetApp.getActive();
spreadsheet.getRange('A1').activate();
var currentCell = spreadsheet.getCurrentCell();
spreadsheet.getActiveRange().getDataRegion().activate();
currentCell.activateAsCurrentCell();
// Map tempVar to sortTempColumns
var col1 = spreadsheet.getRange('tempVar!B5').getValue();
var col1dir = JSON.parse(spreadsheet.getRange('tempVar!B6').getValue());
var col2 = spreadsheet.getRange('tempVar!B7').getValue();
var col2dir = JSON.parse(spreadsheet.getRange('tempVar!B8').getValue());
var col3 = spreadsheet.getRange('tempVar!B10').getValue();
var col3dir = JSON.parse(spreadsheet.getRange('tempVar!B11').getValue());
var col4 = spreadsheet.getRange('tempVar!B12').getValue();
var col4dir = JSON.parse(spreadsheet.getRange('tempVar!B13').getValue());
spreadsheet.getActiveRange().sort([{column: col1, ascending: col1dir}, {column: col2, ascending: col2dir}, {column: col3, ascending: col3dir}, {column: col4, ascending: col4dir}]);
spreadsheet.getRange('A1').activate();
}

BigQuery: Sample a varying number of rows per group

I have two tables. One has a list of items, and for each item, a number n.
item | n
--------
a | 1
b | 2
c | 3
The second one has a list of rows containing item, uid, and other rows.
item | uid | data
------------------
a | x | foo
a | x | baz
a | x | bar
a | z | arm
a | z | leg
b | x | eye
b | x | eye
b | x | eye
b | x | eye
b | z | tap
c | y | tip
c | z | top
I would like to sample, for each (item,uid) pair, n rows (arbitrary, it's better if this is uniformly random, but it doesn't have to be). In the example above, I want to keep maximum one row per user for item a, two rows per user for item b, and three rows per user to item c:
item | uid | data
------------------
a | x | baz
a | z | arm
b | x | eye
b | x | eye
b | z | tap
c | y | tip
c | z | top
ARRAY_AGG with LIMIT n doesn't work for two reasons: first, I suspect that given that n can be large (on the order of 100,000), this won't scale. The second, more fundamental problem is that n needs to be a constant.
Table sampling also doesn't seem to solve my problem, since it's per-table, and also only supports sampling a fixed percentage of rows, rather than a fixed number of rows.
Are there any other options?
Consider below solution
select * except(n)
from rows_list
join items_list
using(item)
where true
qualify row_number() over win <= n
window win as (partition by item, uid order by rand())
if applied to sample data in your question - output is

How To Parse a String (From a different Table) in Hive (Hadoop) And Load It To a Different Table

I have this Table as an Input:
Table Name:Deals
Columns: Doc_id(BIGINT),Nv_Pairs_Feed(STRING),Nv_Pairs_Category(STRING)
For Example:
Doc_id: 4997143658422483637
Nv_Pairs_Feed: "TYPE:Wiper Blade;CONDITION:New;CATEGORY:Auto Parts and Accessories;STOCK_AVAILABILITY:Y;ORIGINAL_PRICE:0.00"
Nv_Pairs_Category: "Condition:New;Store:PartsGeek.com;"
I am trying to parse Fields: "Nv_Pairs_Feed" & "Nv_Pairs_Category" and extract their N:V Pairs (each pair is Divided by ';', and each Name and Value are divided with ':').
My goal is to insert each N:V as a Row in this table:
Doc_id | Name | Value | Source_Field
Example for desired Result:
4997143658422483637 | Condition | New | Nv_Pairs_Category
4997143658422483637 | Store | PartsGeek.com | Nv_Pairs_Category
4997143658422483637 | TYPE | Wiper Blade | Nv_Pairs_Feed
4997143658422483637 | CONDITION | New | Nv_Pairs_Feed
4997143658422483637 | CATEGORY | Auto Parts and Accessories | Nv_Pairs_Feed
4997143658422483637 | STOCK_AVAILABILITY | Y | Nv_Pairs_Feed
4997143658422483637 | ORIGINAL_PRICE | 0.00 | Nv_Pairs_Feed
You can convert the strings to a map using the standard Hive UDF str_to_map and then use the Brickhouse UDF ( http://github.com/klout/brickhouse ) map_key_values , combine and numeric_range to explode those maps. i.e Something like the following
create view deals_map_view as
select doc_id,
map_key_values(
combine( map_to_str( nv_pairs_feed, ';', ':'),
map_to_str( mv_pairs_category, ';', ':'))) as deals_map_key_values
from deals;
select
doc_id,
array_index( deals_map_key_values, i ).key as name,
array_index( deals_map_key_values, i ).value as value
from deals_map_view
lateral view numeric_range( size( feed_map_key_values) ) i1 as i
You can probably do something similar with an explode_map UDF

Joining tables with same column names - ORACLE

I am using Oracle.
I am currently working one 2 tables which both have the same column names. Is there any way in which I can combine the 2 tables together as they are?
Simple example to show what I mean:
TABLE 1:
| COLUMN 1 | COLUMN 2 | COLUMN 3 |
----------------------------------------
| a | 1 | w |
| b | 2 | x |
TABLE 2:
| COLUMN 1 | COLUMN 2 | COLUMN 3 |
----------------------------------------
| c | 3 | y |
| d | 4 | z |
RESULT THAT I WANT:
| COLUMN 1 | COLUMN 2 | COLUMN 3 |
----------------------------------------
| a | 1 | w |
| b | 2 | x |
| c | 3 | y |
| d | 4 | z |
Any help would be greatly appreciated. Thank you in advance!
You can use the union set operator to get the result of two queries as a single result set:
select column1, column2, column3
from table1
union all
select column1, column2, column3
from table2
union on its own implicitly removes duplicates; union all preserves them. More info here.
The column names don't need to be the same, you just need the same number of columns with the same datatpes, in the same order.
(This is not what is usually meant by a join, so the title of your question is a bit misleading; I'm basing this on the example data and output you showed.)

Data management with several variables

Currently I am facing the following problem, which I'm working in Stata to solve. I have added the algorithm tag, because it's mainly the steps that I'm interested in rather than the Stata code.
I have some variables, say, var1 - var20 that can possibly contain a string. I am only interested in some of these strings, let us call them A,B,C,D,E,F, but other strings can occur also (all of these will be denoted X). Also I have a unique identifier ID. A part of the data could look like this:
ID | var1 | var2 | var3 | .. | var20
1 | E | | | | X
1 | | A | | | C
2 | X | F | A | |
8 | | | | | E
Now I want to create an entry for every ID and for every occurrence of one of the strings A,B,C,E,D,F in any of the variables. The above data should look like this:
ID | var1 | var2 | var3 | .. | var20
1 | E | | | .. |
1 | | A | | |
1 | | | | | C
2 | | F | | |
2 | | | A | |
8 | | | | | E
Here we ignore every time there's a string X that is NOT A,B,C,D,E or F. My attempt so far was to create a variable that for each entry counts the number, N, of occurrences of A,B,C,D,E,F. In the original data above that variable would be N=1,2,2,1. Then for each entry I create N duplicates of this. This results in the data:
ID | var1 | var2 | var3 | .. | var20
1 | E | | | | X
1 | | A | | | C
1 | | A | | | C
2 | X | F | A | |
2 | X | F | A | |
8 | | | | | E
My problem is how do I attack this problem from here? And sorry for the poor title, but I couldn't word it any more specific.
Sorry, I thought the finally block was your desired output (now I understand that it's what you've accomplished so far). You can get the middle block with two calls to reshape (long, then wide).
First I'll generate data to match yours.
clear
set obs 4
* ids
generate n = _n
generate id = 1 in 1/2
replace id = 2 in 3
replace id = 8 in 4
* generate your variables
forvalues i = 1/20 {
generate var`i' = ""
}
replace var1 = "E" in 1
replace var1 = "X" in 3
replace var2 = "A" in 2
replace var2 = "F" in 3
replace var3 = "A" in 3
replace var20 = "X" in 1
replace var20 = "C" in 2
replace var20 = "E" in 4
Now the two calls to reshape.
* reshape to long, keep only desired obs, then reshape to wide
reshape long var, i(n id) string
keep if inlist(var, "A", "B", "C", "D", "E", "F")
tempvar long_id
generate int `long_id' = _n
reshape wide var, i(`long_id') string
The first reshape converts your data from wide to long. The var specifies that the variables you want to reshape to long all start with var. The i(n id) specifies that each unique combination of n and i is a unique observation. The reshape call provides one observation for each n-id combination for each of your var1 through var20 variables. So now there are 4*20=80 observations. Then I keep only the strings that you'd like to keep with inlist().
For the second reshape call var specifies that the values you're reshaping are in variable var and that you'll use this as the prefix. You wanted one row per remaining letter, so I made a new index (that has no real meaning in the end) that becomes the i index for the second reshape call (if I used n-id as the unique observation, then we'd end up back where we started, but with only the good strings). The j index remains from the first reshape call (variable _j) so the reshape already knows what suffix to give to each var.
These two reshape calls yield:
. list n id var1 var2 var3 var20
+-------------------------------------+
| n id var1 var2 var3 var20 |
|-------------------------------------|
1. | 1 1 E |
2. | 2 1 A |
3. | 2 1 C |
4. | 3 2 F |
5. | 3 2 A |
|-------------------------------------|
6. | 4 8 E |
+-------------------------------------+
You can easily add back variables that don't survive the two reshapes.
* if you need to add back dropped variables
forvalues i =1/20 {
capture confirm variable var`i'
if _rc {
generate var`i' = ""
}
}

Resources