Hive : Concat a map - hadoop

I have a small trouble into Hive, when I try to concatenate map
Assume that I've something like that :
var 1 | var 2
x | map(key1:value1)
x | map(key2:value2)
x | map(key3:value3)
y | map(key4:value4)
What I'am trying to get, It's something like that
var 1 | var 2
x | map(key1:value1 ; key2:value2; key3:value3)
y | map(key4,value4)
Something like a map concatenation.
How can I proceed whith Hive ?

Use this Query...
select var1,collect_set(CONCAT_WS(',',map_keys(var2),map_values(var2))) as var2 from example group by var1;
This will get you output like this...
var1 | var2
x | ["key1,value1","key2,value2","key3,value3"]
y | ["key4,value4"]

Related

Powerquery: split text from different columns to same rows

I have:
column1 | column2 | colum3
a;b;c | x;y;z | door;house;tree
Desired result using Excel powerquery:
a | x | door
b | y | house
c | z | tree
I tried with:
Text.Split([column1],";") and expand to new lines, obtaining:
a
b
c
However when tried the same with other values, new lines are created instead to use the existent ones.
You may use this code:
let
Source = Excel.CurrentWorkbook(){[Name="Table"]}[Content],
rec = Table.ReplaceValue(Source,0,0,(a,b,c)=>Text.Split(a,";"),{"column1", "column2", "column3"}){0},
table = #table(Record.FieldNames(rec),List.Zip(Record.FieldValues(rec)))
in
table

more efficient loop in PLSQL

i'm looking for more efficient ways to loop through PLSQL.
Requirements: imagine that i have BudgetTable & RuleSet table.
Ruleset looks something like this:
acc | loc | proj || rule1 | tag1 | prio
A1% | L1% | P2% || direct | all | 90
A12% | L12% | P23% || spread | alloc | 50
the first 3 columns are the where clause for the BudgetTable keys below, and store the rest of the columns..
A123 | L123 | P234
A199 | L199 | P299
and the records being picked up for each loop iteration
loop1: (A1% | L1% | P2% || direct | all | 90)
A123 | L123 | P234 || direct | all | 90
A199 | L199 | P299 || direct | all | 90
loop2: (A12% | L12% | P22% || spread | alloc | 50)
A123 | L123 | P234 || spread | alloc | 50
I've done the most straightforward way, by iterating through the RuleSet table.
(pseudocodes)
FOR r in (select acc,loc,proj,rule1,tag1,prio from RuleSet)
LOOP
INSERT INTO ResultTable
select [columns, rule, tag1,prio ]
from BudgetTable
where acc like r.acc
and loc like r.loc
and proj like r.proj
;
END LOOP;
I am looking for better ways to do this. the problem is the RuleSet can contain several thousands rules, so iterating through & matching the records one-by-one can be lengthy. I was wondering if it's possible to break the loop in several parallel stream & simultaneously run them ..
Thanks for the input..
Use simple INSERT+SELECT+JOIN, it should be 30~50 timest faster than your loop:
INSERT INTO ResultTable
SELECT b.columns, b.rule, b.tag1,prio
FROM BudgetTable b
JOIN RuleSet r
ON b.acc like r.acc
and b.loc like r.loc
and b.proj like r.proj
Maybe you don't need a loop at all:
INSERT INTO ResultTable
select [columns, rule, tag1,prio ]
from BudgetTable bt
cross join RuleSet rs
where bt.acc like rs.acc
and bt.loc like rs.loc
and bt.proj like rs.proj
;

Using Selenium IDE, how to get the value of dynamic variable

i'm using selenium ide 2.8 and i'm trying to store values please find my below commands:
store | ayman | val1
store | 1 | n
store | val${n} | e
how to echo the value of e which is ayman? when i try :
echo | ${e}
i got echo | val1
what is the issue with my commands?
Thanks
From what you've done there the value of 'e' is not ayman, you have stored ayman as the variable 'val1'. I'm not 100% what you're trying to do here but it looks like you're trying to store 2 individual variables and then combine them as one as well. If that is the cast then what you'd need is this
store | ayman | val1
store | 1 | n
store | ${val1}${n} | e
in which case:
val1 = ayman
n = 1
e = ayman1
It sounds like you are trying to force an array type structure? val[1], val[2]? Because what you want e to be is ${val${n}} right? Except that doesn't work. you could do this in javascript though (with storeEval): storeEval storedVars['val'+storedVars['n']] final

How To Parse a String (From a different Table) in Hive (Hadoop) And Load It To a Different Table

I have this Table as an Input:
Table Name:Deals
Columns: Doc_id(BIGINT),Nv_Pairs_Feed(STRING),Nv_Pairs_Category(STRING)
For Example:
Doc_id: 4997143658422483637
Nv_Pairs_Feed: "TYPE:Wiper Blade;CONDITION:New;CATEGORY:Auto Parts and Accessories;STOCK_AVAILABILITY:Y;ORIGINAL_PRICE:0.00"
Nv_Pairs_Category: "Condition:New;Store:PartsGeek.com;"
I am trying to parse Fields: "Nv_Pairs_Feed" & "Nv_Pairs_Category" and extract their N:V Pairs (each pair is Divided by ';', and each Name and Value are divided with ':').
My goal is to insert each N:V as a Row in this table:
Doc_id | Name | Value | Source_Field
Example for desired Result:
4997143658422483637 | Condition | New | Nv_Pairs_Category
4997143658422483637 | Store | PartsGeek.com | Nv_Pairs_Category
4997143658422483637 | TYPE | Wiper Blade | Nv_Pairs_Feed
4997143658422483637 | CONDITION | New | Nv_Pairs_Feed
4997143658422483637 | CATEGORY | Auto Parts and Accessories | Nv_Pairs_Feed
4997143658422483637 | STOCK_AVAILABILITY | Y | Nv_Pairs_Feed
4997143658422483637 | ORIGINAL_PRICE | 0.00 | Nv_Pairs_Feed
You can convert the strings to a map using the standard Hive UDF str_to_map and then use the Brickhouse UDF ( http://github.com/klout/brickhouse ) map_key_values , combine and numeric_range to explode those maps. i.e Something like the following
create view deals_map_view as
select doc_id,
map_key_values(
combine( map_to_str( nv_pairs_feed, ';', ':'),
map_to_str( mv_pairs_category, ';', ':'))) as deals_map_key_values
from deals;
select
doc_id,
array_index( deals_map_key_values, i ).key as name,
array_index( deals_map_key_values, i ).value as value
from deals_map_view
lateral view numeric_range( size( feed_map_key_values) ) i1 as i
You can probably do something similar with an explode_map UDF

Data management with several variables

Currently I am facing the following problem, which I'm working in Stata to solve. I have added the algorithm tag, because it's mainly the steps that I'm interested in rather than the Stata code.
I have some variables, say, var1 - var20 that can possibly contain a string. I am only interested in some of these strings, let us call them A,B,C,D,E,F, but other strings can occur also (all of these will be denoted X). Also I have a unique identifier ID. A part of the data could look like this:
ID | var1 | var2 | var3 | .. | var20
1 | E | | | | X
1 | | A | | | C
2 | X | F | A | |
8 | | | | | E
Now I want to create an entry for every ID and for every occurrence of one of the strings A,B,C,E,D,F in any of the variables. The above data should look like this:
ID | var1 | var2 | var3 | .. | var20
1 | E | | | .. |
1 | | A | | |
1 | | | | | C
2 | | F | | |
2 | | | A | |
8 | | | | | E
Here we ignore every time there's a string X that is NOT A,B,C,D,E or F. My attempt so far was to create a variable that for each entry counts the number, N, of occurrences of A,B,C,D,E,F. In the original data above that variable would be N=1,2,2,1. Then for each entry I create N duplicates of this. This results in the data:
ID | var1 | var2 | var3 | .. | var20
1 | E | | | | X
1 | | A | | | C
1 | | A | | | C
2 | X | F | A | |
2 | X | F | A | |
8 | | | | | E
My problem is how do I attack this problem from here? And sorry for the poor title, but I couldn't word it any more specific.
Sorry, I thought the finally block was your desired output (now I understand that it's what you've accomplished so far). You can get the middle block with two calls to reshape (long, then wide).
First I'll generate data to match yours.
clear
set obs 4
* ids
generate n = _n
generate id = 1 in 1/2
replace id = 2 in 3
replace id = 8 in 4
* generate your variables
forvalues i = 1/20 {
generate var`i' = ""
}
replace var1 = "E" in 1
replace var1 = "X" in 3
replace var2 = "A" in 2
replace var2 = "F" in 3
replace var3 = "A" in 3
replace var20 = "X" in 1
replace var20 = "C" in 2
replace var20 = "E" in 4
Now the two calls to reshape.
* reshape to long, keep only desired obs, then reshape to wide
reshape long var, i(n id) string
keep if inlist(var, "A", "B", "C", "D", "E", "F")
tempvar long_id
generate int `long_id' = _n
reshape wide var, i(`long_id') string
The first reshape converts your data from wide to long. The var specifies that the variables you want to reshape to long all start with var. The i(n id) specifies that each unique combination of n and i is a unique observation. The reshape call provides one observation for each n-id combination for each of your var1 through var20 variables. So now there are 4*20=80 observations. Then I keep only the strings that you'd like to keep with inlist().
For the second reshape call var specifies that the values you're reshaping are in variable var and that you'll use this as the prefix. You wanted one row per remaining letter, so I made a new index (that has no real meaning in the end) that becomes the i index for the second reshape call (if I used n-id as the unique observation, then we'd end up back where we started, but with only the good strings). The j index remains from the first reshape call (variable _j) so the reshape already knows what suffix to give to each var.
These two reshape calls yield:
. list n id var1 var2 var3 var20
+-------------------------------------+
| n id var1 var2 var3 var20 |
|-------------------------------------|
1. | 1 1 E |
2. | 2 1 A |
3. | 2 1 C |
4. | 3 2 F |
5. | 3 2 A |
|-------------------------------------|
6. | 4 8 E |
+-------------------------------------+
You can easily add back variables that don't survive the two reshapes.
* if you need to add back dropped variables
forvalues i =1/20 {
capture confirm variable var`i'
if _rc {
generate var`i' = ""
}
}

Resources