Query, transpose and skip blank cells - google-sheets-formula

I'm completely lost here:
I have a table that looks like this, but has a variable amount of value columns
+------------+------------+-----------+-----------+
| name1 | value1 | value2 | value3 |
+------------+------------+-----------+-----------+
| name1 | value1 | | value3 |
+------------+------------+-----------+-----------+
| name1 | | value2 | value3 |
+------------+------------+-----------+-----------+
What I need is a table looking like this:
+------------+------------+-----------+-----------+
| name1 | value1 | value2 | value3 |
+------------+------------+-----------+-----------+
| name1 | value1 | value3 | |
+------------+------------+-----------+-----------+
| name1 | value2 | value3 | |
+------------+------------+-----------+-----------+
What I came up with for now is this formula, which only works for the first row of data. Named range is my source table range.
=MTRANS(QUERY(MTRANS({Named Range});"select * where Col1 is not null"))
I cannot just add all the columns to it, as I dont know how many of them will be. What secret sauce will I have to add to be able to solve this?
Thank you very much for your help!

#Andii This seems to do what you want:
=ArrayFormula(split(transpose(query(transpose(A5:D7),,9^99))," ",1,1))
I have a sample sheet here:
https://docs.google.com/spreadsheets/d/1Em1V9o5aeAtq0Fo_Yb39xXAZRmIZhwSyAHawa-ExilA/edit?usp=sharing
Let us know if this answers your question.

Related

Apache Drill - Using Multiple Delimiters in File Storage Plugin?

I have logs that resemble the following:
value1 value2 "value 3 with spaces" value4
using:
"formats": {
"csv": {
"type": "text",
"delimiter": " "
}
}
for the storage plugin delimiting by " " gives me the following columns:
columns[0] | columns[1] | columns[2] | columns[3] | columns[5] | columns[6] | columns[7]
value1 | value2 | value | 3 | with | spaces | value4
what I'd like is:
columns[0] | columns[1] | columns[2] | columns[3]
value1 | value2 | value 3 with spaces | value4
To my knowledge, there is no way to skip delimiters in Drill. However, if variable 3 is the only one that can have those " " in between, a workaround I can think of is:
structure your first query so that columns[3] is always the last, Ex
select columns[0], columns[1], columns[2], columns[4], columns[3] from dfs.default./path/to/your/file;
use the CONCATENATE() command to build your variable in a separate column.
Another way around it would require changing the default delimiter in the file prior having Drill reading it. Depending on where you are ingesting your data from this may be feasible or not.
Good luck and if you are looking for more things on Drill, be sure to check out MapR's Community page on Drill, which has code examples that might be helpful: https://community.mapr.com/community/products/apache-drill

how to expend array values in rows!! using Hive SQL

I have a table with 4 columns, one column (items) type is ARRAY and other are string.
ID | items | name | loc
_________________________________________________________________
id1 | ["item1","item2","item3","item4","item5"] | Mike | CT
id2 | ["item3","item7","item4","item9","item8"] | Chris| MN
.
.
Here, I want unnormalized output like
ID | items | name | loc
______________________________________________________
id1 | item1 | Mike | CT
id1 | item2 | Mike | CT
id1 | item3 | Mike | CT
id1 | item4 | Mike | CT
id1 | item5 | Mike | CT
id2 | item3 | Chris | MN
id2 | item7 | Chris | MN
id2 | item4 | Chris | MN
id2 | item9 | Chris | MN
id2 | item8 | Chris | MN
I am not a Hive SQL expert, Please help me out of this.
Try this:
SELECT ID,itemsName,name,loc
FROM Table
LATERAL VIEW explode(items) itemTable AS itemsName;
in explode(items) , there items is your stored table column and Table is your Stored table.
We can use the posexplode() function to achieve the scenario you mentioned, that is with multiple array columns.
Something like this will work out:
SELECT ID,i1.item,i2.itemName,name,loc
FROM Table
LATERAL VIEW posexplode(items) i1 AS item,item_1
LATERAL VIEW posexplode(item_Name) i2 AS itemName,itemName_1
WHERE item=itemName

What is the best big data solution for interactive queries of rows with up to 200 columns?

We have a simple table such as follows:
------------------------------------------------------------------------
| Name | Attribute1 | Attribute2 | Attribute3 | ... | Attribute200 |
------------------------------------------------------------------------
| Name1 | Value1 | Value2 | null | ... | Value3 |
| Name2 | null | Value4 | null | ... | Value5 |
| Name3 | Value6 | null | Value7 | ... | null |
| ... |
------------------------------------------------------------------------
But there could be up to hundreds of millions of rows/names.
The data will be populated every hour or so.
The goal is to get results for interactive queries on the data within a couple of seconds.
Most queries look like:
select count(*) from table
where Attribute1 = Value1 and Attribute3 = Value3 and Attribute113 = Value113;
The where clause contains arbitrary number of attribute name-value pairs.
I'm new in big data and wondering what the best option is in terms of data store (MySQL, HBase, Cassandra, etc) and processing engine (Hadoop, Drill, Storm, etc) for interactive queries like above.
A columnar DB like Vertica (closed source) or MonetDB (open source - but I haven't used it) will handle queries like the ones you mentioned efficiently. In 50000 feet view the reasons for this is that they stores each column separately and thus doesn't read any unneeded columns when they need to query data - for your example 3 attributes will be read and the other 197 won't be
Playorm for Cassandra provide a decent support for SQL including Joins. Read more at http://buffalosw.com/wiki/SJQL-Support/ and for examples see http://buffalosw.com/wiki/Command-Line-Tool/

ORA-30926: unable to get stable set of rows in the source table

I'd like to insert the data after unpivoting it. The statement needs to be a merge statemenet. However, I am getting ora-30926 error, and I can't really figure out how to solve it.
Here the data table:
------------------------------------------------------------------------------------
|Employee_id | work_experience_1 | work_experience_2 | work_experience_3 | language |
-------------------------------------------------------------------------------------
| 123 | C&S | Deloitte | TCS | FI |
| 211 | E&Y | Microsoft | | FI |
| 213 | C&S | | | FI |
-------------------------------------------------------------------------------------
So first before entering the data, I need to unpivot it.
----------------------------------
|Employee_id | work_experience |
----------------------------------
| 123 | C&S |
| 123 | Deloitte |
| 123 | TCS |
| 211 | E&Y |
| 211 | Microsoft |
| 213 | C&S |
----------------------------------
Here is what I have done. The inserting part works ok but updating part fails.
MERGE INTO arc_hrcs.user_multi_work_exp work_exp
USING (SELECT user_id, work_experience_lang, work_exp_fi FROM
(SELECT ext.user_id, tmp_work.employee_id, tmp_work.work_experience_1, tmp_work.work_experience_2, tmp_work.work_experience_3, tmp_work.work_experience_4, tmp_work.work_experience_5, tmp_work.work_experience_6, tmp_work.work_experience_7, tmp_work.work_experience_8, tmp_work.work_experience_9, tmp_work.work_experience_10, tmp_work.work_experience_lang FROM arc_hrcs.hr_extension_data ext
JOIN aa_work_exp_tmp tmp_work ON tmp_work.employee_id = ext.employee_id)
UNPIVOT (work_exp_fi FOR work_code IN (work_experience_1 AS 'a', work_experience_2 AS 'b', work_experience_3 AS 'c', work_experience_4 AS 'd', work_experience_5 AS 'e', work_experience_6 AS 'f', work_experience_7 AS 'g', work_experience_8 AS 'h', work_experience_9 AS 'i', work_experience_10 AS 'j'))) r
ON (work_exp.user_id = r.user_id AND r.work_experience_lang LIKE '%FI%' )
WHEN NOT MATCHED THEN
INSERT (work_exp.user_id, work_exp.work_experience_fi)
VALUES (r.user_id, r.work_exp_fi)
WHEN MATCHED THEN
UPDATE SET work_exp.work_experience_fi = r.work_exp_fi
What can I do to make it working?
Cheers and thx in advance :-)
afaik, the MERGE statement needs UNIQUE or PRIMARY KEY columns specified in the ON clause and also on the target table. Looking at your data sample you are probably missing it (them) on the source table.
Essentially, the query in the USING cause is a multiple-row subquery, when it needs to be a single-row subquery. I would try running the subquery in isolation and attempt to fix the logic of the WHERE cause so that you bring back a unique row.
http://blog.mclaughlinsoftware.com/2010/03/05/stable-set-of-rows/

Counting in Hadoop Hive

I want to count values similar in a map where key would be the value in the Hive table column and the corresponding value is the count.
For example, for the table below:
+-------+-------+
| Col 1 | Col 2 |
+-------+-------+
| Key1 | Val1 |
| Key1 | Val2 |
| Key2 | Val1 |
+-------+-------+
So the hive query should return something like
Key1=2
Key2=1
It looks like you are looking for a simple group by.
SELECT Col1, COUNT(*) FROM Table GROUP BY Col1

Resources