How to fix hive code for counting a column and group by another column? - hadoop

There are 3 columns in my hive data (user, gender, rating). now, I want to count number of user_id, gender wise. I have written hive code as
select user_id, gender, count(*) from u_user group by user_id;
but the error that I have got is
SemanticException [Error 10025]: Line 1:16 Expression not in GROUP BY
key 'gender'
How to fix this?

Well, the keys you group by should be the same with the keys in the select. As below:
select user_id,gender,count(1) from u_user group by user_id,gender;
And if you want to count user_id of each gender type , you can write like this:
select gender,count(distinct user_id) from u_user group by gender;

Related

How to retrieve workflow attribute values from workflow table?

I have a situation where in I need to take the values from table column which has data based on one of the column in same table.
There are two column values like that which is required to compare with another table.
Scenario:
Column 1 query:
SELECT text_value
FROM WF_ITEM_ATTRIBUTE_VALUES
WHERE name LIKE 'ORDER_ID' --AND number_value IS NOT NULL
AND Item_type LIKE 'ABC'
this query returns 14 unique records
Column 2 query:
SELECT number_value
FROM WF_ITEM_ATTRIBUTE_VALUES
WHERE name LIKE 'Source_ID' --AND number_value IS NOT NULL
AND Item_type LIKE 'ABC'
this also returns 14 records
and order_id of column 1 query is associated with source_id of column 2 query using this two column values i want to compare 14 records combined order_id, source_id with another table column i.e. Sales_tbl
columns sal_order_id, sal_source_id
Sample Data from WF_ITEM_ATTRIBUTE_VALUES:
Note: same data in the sales_tbl table but order_id is sal_order_id and sal_source_id
Order_id
204994 205000 205348 198517 198176 196856 204225 205348 203510 206528 196886 198971 194076 197940
Source_id
92262138 92261783 92262005 92262615 92374992 92375051 92374948 92375000 92375011 92336793 92374960 92691360 92695445 92695880
Desired O/p based on comparison:
Please help me in writing the query

ORA-00907: missing right parenthesis when creating a table

I have the below query that works fine and produces the correct result
select id, sum(item_stock)
from seller
group by id
order by id ASC;
When I try to create a table with the query above like this
CREATE TABLE total_stock
AS (
select id, sum(item_stock)
from seller
group by id
order by id ASC );
I get the following error
SQL Error: ORA-00907: missing right parenthesis
Any help on why this isn't working would be greatly appreciated
Your problem is caused by the clause ORDER BY.
You have to:
Add an alias to your "sum" field
Create another subquery in order to "remove" the ORDER BY clause
CREATE TABLE total_stock
AS (
select id, item_stock
from (
select id, sum(item_stock) as item_stock
from seller
group by id
order by id ASC
)
)

Oracle to retrieve data

I have table with id, name, update_date etc. columns.
Select distinct id from table1 order by update_date desc;
In above query am getting duplicate values as well. I need to retrieve distinct id with having latest updated date.
Generally speaking, this might do the job:
select id,
max(update_date) max_update_date
from table1
group by id;
as
MAX will return the latest UPDATE_DATE
GROUP BY will return DISTINCT ID's anyway (so you don't have to specify it)
If it does not, provide test case and explain what output you'd want to get; someone will assist.

Howto Insert Into Table with Select and Order by using fetch first rows only with DB2

I am using DB2 V7.2 Express-C for Windows and need to insert a certain number of rows into a table as a select from another table.
Im am not sure, if this is an issue of my old DB2 Version or if there is a general problem with my statement.
So I am looking for an alternative to it.
The task is to insert a set of ids into the table markierung, which is defined with 2 columns as
IDUSER INTEGER not null
IDBILD INTEGER not null
Straight forward would be to insert all corresponding items from other tables, e.g.
insert into markierung
select distinct
xuser.id as userid, xbild.id as bildid
from
user xuser, bild xbild, logbook xlogbook
where
xbild.id = xlogbook.oid and xuser.id = xlogbook.userid
This approach works fine, but I want to insert only a certain number of rows (the newest ones).
Therefore I would like to do something like this:
insert into markierung
select distinct
xuser.id as userid, xbild.id as bildid, xlogbook.date as date
from
user xuser, bild xbild, logbook xlogbook
where
xbild.id = xlogbook.oid and xuser.id = xlogbook.userid
order by date desc fetch first 5 rows only;
First problem: I need to add the date into the selection for using with the order by
because I want to insert the 5 newest rows. But the insert allows only 2 columns so
I changed the query
select userid, bildid from
(
select distinct
xuser.id as userid, xbild.id as bildid, xlogbook.date as date
from
user xuser, bild xbild, logbook xlogbook
where
xbild.id = xlogbook.oid and xuser.id = xlogbook.userid
) as xxx order by date desc fetch first 5 rows only;
The select statement works fine
But when I try to insert the selected results into the table markierung, I get an error:
I was using this statement:
insert into markierung
select userid, bildid from
(
select distinct
xuser.id as userid, xbild.id as bildid, xlogbook.date as date
from
user xuser, bild xbild, logbook xlogbook
where
xbild.id = xlogbook.oid and xuser.id = xlogbook.userid
) as xxx order by date desc fetch first 5 rows only;
This is the error:
SQL0104N At "BEGIN-OF-STATEMENT" unexpected Token "insert into
markierung select userid, ". Possible Tokens are: "".
SQLSTATE=42601
Do you have an idea, how I can insert only 5 rows into markierung?
Is there an alternative way or alternate query for inserting?

INSERT INTO statement with SELECT... IN error

Well, i need to add a row in user_badges for each person who had correctly respond to a poll. The
"select user_id from room_poll_results........" is working fine alone, but as soon as i try to use it in my INSERT INTO statement, it gives back an error:
"[Err] 1054 - Unknown column 'user_id' in 'IN/ALL/ANY subquery'"
I don't know where it's coming from...
INSERT INTO user_badges (user_id,PPO) SELECT user_id IN
(SELECT user_id FROM room_poll_results
WHERE user_id in (select user_id from room_poll_results
where answer_text='3' AND question_id='3') AND user_id in
(select user_id from room_poll_results where answer_text='2' AND question_id='4'));
It's telling you that there's no column called user_id in room_poll_results. Change that column name (in the subselects) to whatever is the appropriate field in the table. (You'd want to post the full schema for a more specific response.)
Whenever you get errors as "[Err] 1054 - Unknown column 'user_id' in 'IN/ALL/ANY subquery'" just read it to mean that a COLUMN with the give name in parenthesis does not exist in your query.
To debug this, since you are have sub-queries, run each query independently then see their results. That will help you know what table has the missing column. Once you've got it then you can create the column and then unite your sub-queries and that's it!
I assume user_badges table has a column user_id. Your first IN should be replaced by FROM. So, I've slightly modified your query as below:
INSERT INTO user_badges (user_id, PPO)
SELECT user_id FROM
(
SELECT user_id FROM room_poll_results
WHERE user_id in
(
select user_id from room_poll_results
where answer_text='3' AND question_id='3'
) AS U
AND user_id in
(
select user_id from room_poll_results
where answer_text='2' AND question_id='4'
) AS U
) AS U;

Resources