I have a query importrange function to combine hundreds of my sheet into 1 new googlesheet file, but i need an additional column which show sheet/tab source name and the row source.
this is my query so far :
=QUERY(
{IMPORTRANGE("SheetID-1","Sheet1!A5:Y1000");
IMPORTRANGE("SheetID-2","Sheet2!A5:Y1000");
IMPORTRANGE("SheetID-3","Sheet3!A5:Y1000")},
"SELECT * WHERE Col2 IS NOT NULL Order By Col1", 0)
my query is like that (but for hundreds different sheet) and now i hope i can added aditional column in column Z that showing the source tab/sheet name with row :
Example :
Column Z : Sheet1 - Row 7
If i using filter i can add the additional column with the source row & sheet name like that, but im have no idea to add the column using query. Need some help here.
I am trying to divide the data above, that is in the same column within my raw data. The one column name is "Status". I split it under this column into 2 separate columns within Tableau.
How would I create a calculation to divide T & D by each other? For an example, I would like to divide 166/1523, 155/1535, etc.
How do I differentiate between these 2 under the same column "Status".
SUM (IF [Status] = "T" THEN [Your Measure] END)
/
SUM (IF [Status] = "D" THEN [Your Measure] END)
I have table one with 5 columns and table two with 7 columns.
How can I append them into a new table with only the common columns (that is, the columns with common names)?
I tried all sorts of "Append Queries" in the Query Editor but it seems that it only works with exactly identical tables.
In DAX, UNION seems to have the same limitation.
Use Append Queries -> Append as New to create a new table.
The M function, Table.Combine(), will combine the columns where they are named the same and add columns with null values where they only exist in one table
Table.Combine({
Table.FromRecords({[Name = "David", Phone = "01235667886"]}),
Table.FromRecords({[Email= "me#gmail.com", Phone = "01124892522"]})
})
> Name Phone Email
> David 01235667886
> 01124892522 me#gmail.com
I have a Hive table with the following fields:
id STRING , x STRING
where x can have values such as 'c'.
I need a query that display number of rows where column x contains a value 'c' and the number of rows where x has values are other than 'c'.
id | count(x='c') | count(x<>'c')
---|--------------|--------------
1 | 3 | 7
I don't know if it's possible.
You can try :
SELECT sum(if(x='c',1,0)), sum(if(x!='c',1,0)) FROM table_name;
This will print two columns. I didn't understand the id field in your sample output.
I am using the following hive query script for the version 0.13.0
DROP TABLE IF EXISTS movies.movierating;
DROP TABLE IF EXISTS movies.list;
DROP TABLE IF EXISTS movies.rating;
DROP DATABASE IF EXISTS movies;
ADD JAR /usr/local/hadoop/hive/hive/lib/RegexLoader.jar;
CREATE DATABASE IF NOT EXISTS movies;
CREATE EXTERNAL TABLE IF NOT EXISTS movies.list (id STRING, name STRING, genre STRING)
ROW FORMAT SERDE 'com.cisco.hadoop.loaders.RegexSerDe'with SERDEPROPERTIES(
"input.regex"="^(.*)\\:\\:(.*)\\:\\:(.*)$",
"output.format.string"="%1$s %2$s %3$s");
CREATE EXTERNAL TABLE IF NOT EXISTS movies.rating (id STRING, userid STRING, rating STRING, timestamp STRING)
ROW FORMAT SERDE 'com.cisco.hadoop.loaders.RegexSerDe'
with SERDEPROPERTIES(
"input.regex"="^(.*)\\:\\:(.*)\\:\\:(.*)\\:\\:(.*)$",
"output.format.string"="%1$s %2$s %3$s %4$s");
LOAD DATA LOCAL INPATH 'ml-10M100K/movies.dat' into TABLE movies.list;
LOAD DATA LOCAL INPATH 'ml-10M100K/ratings.dat' into TABLE movies.rating;
CREATE TABLE movies.movierating(id STRING, name STRING, genre STRING, rating STRING);
INSERT OVERWRITE TABLE movies.movierating
SELECT list.id, list.name, list.genre, rating.rating from movies.list list LEFT JOIN movies.rating rating ON (list.id=rating.id) GROUP BY list.id;
The issue is when I execute the script without the "GROUP BY" clause it works fine.
But when I execute it with the "GROUP BY" clause, I get the following error
FAILED: SemanticException [Error 10002]: Line 4:21 Invalid column reference 'name'
Any ideas what is happening here?
Appreciate your help
Thanks!
If you group by a column, your select statement can only select a) that column, b) columns derived only from that column, or c) a UDAF applied to other columns.
In this case, you're only grouping by list.id, so when you try to select list.name, that's invalid. Think about it this way: what if your list table contained the following two entries:
id|name |genre
--+-----+------
01|name1|comedy
01|name2|horror
What would you expect this query to return:
select list.id, list.name, list.genre from list group by list.id;
In this case it's nonsensical. I'm guessing that id in reality is a primary key, but note that hive does not know this, so the above data set is perfectly valid.
With all that in mind, it's not clear to me how to fix it because I don't know the desired output. For example, let's say without the group by (just the join), you have as output:
id|name |genre |rating
--+-----+------+-------
01|name1|comedy|'pretty good'
01|name1|comedy|'bad'
02|name2|horror|'9/10'
03|name3|action|NULL
What would you want the output to be with the group by? What are you trying to accomplish by doing the group by?
OK let me see if I can ask this in a better way.
Here are my two tables
Movies list table - Consists of movies information
ID | Movie Name | Genre
1 | Movie 1 | comedy
2 | movie 2 | action
3 | movie 3 | thriller
And I have ratings table
MOVIE_ID | USER ID | RATING on 5 | TIMESTAMP
1 | xyz | 5 | 12345612
1 | abc | 4 | 23232312
2 | zvc | 1 | 12321123
2 | zyx | 2 | 12312312
What I would like to do is get the output in the following way:
Movie ID | Movie Name | Genre | Rating Average
1 | Movie 1 | comedy | 4.5
2 | Movie 2 | action | 1.5
I am not a db expert but I understand this, when you group the data together you need to convert the multiple values to the scalar values or all the values, if string should be same right?
For example in my previous case, I was grouping them together as a string. So which is okay for list.id, list.name and list.genre, but the list.rating, well that is always going to give some problem here (I just learnt PIG along with hive, so grouping works differently there)
So to tackle the problem, I casted the rating and averaged it out and stored it in the float table. Have a look at my code below:
CREATE TABLE movies.movierating(id STRING, name STRING, genre STRING, rating FLOAT);
INSERT OVERWRITE TABLE movies.movierating
SELECT list.id, list.name, list.genre, AVG(cast(rating.rating as FLOAT)) from movies.list list LEFT JOIN movies.rating rating ON (list.id=rating.id) GROUP BY list.id, list.name,list.genre order by list.id DESC;
Thank you for your explanation. I might save the following question for the next thread but here is my observation:
The performance of the Overall job is reduced when performing Grouping and Joining together than to do it in two separate queries. For the same job, I had changed the code a bit to perform the grouping first and then joining the data and the over all time was reduced by 40 seconds. Earlier it was taking 140 seconds and now it is taking 100 seconds. Any reasons to that?
Once again thank you for your explanation.
I came across same issue:
org.apache.hadoop.hive.ql.parse.SemanticException: Invalid column reference "charge_province"
After I put the "charge_province" in the group by, the issue is gone. I don't know why.