order by added column not working in CLOUDERA - hadoop

I've created a table in CLOUDERA and then added a column to it with:
ALTER TABLE table1 ADD COLUMNS (`new_col` VARCHAR(40));
then i'm trying to select with order by:
select col1,col2,new_col from table1 order by 1,2,3
however this is fails.
it is working without order by clause.
it is also working without selecting the new_col in select statement.
any ideas what causing the failure?
edit:
i think it happens because new column contain nulls. how can i overcome this issue?

Related

How to make insert statement re-runnable?

Need to add two following insert statements:
insert into table1(schema, table_name, table_alias)
values ('ref_owner','test_table_1','tb1');
insert into table1(schema, table_name, table_alias)
values ('dba_owner','test_table_2','tb2');
Question is how can I make those two insert statements re-runnable meaning, if those two insert statement are compiled again, it should throw row exists error or something along those lines...?
Additional notes:
1. I've seen examples of Merge in Oracle however, thats only when you're using two tables to match records. In this case im only using a single table.
2. The table does not have any primary, unique or foreign keys - only check constraints on one of the columns.
Any help is highly appreciated.
You can use a MERGE statement, as follows:
MERGE into table1 t1
USING (SELECT 'ref_owner' AS SCHEMA_NAME, 'test_table_1' AS TABLE_NAME, 'tb1' AS ALIAS_NAME FROM DUAL
UNION ALL
SELECT 'dba_owner', 'test_table_2', 'tb2' FROM DUAL) d
ON (t1.SCHEMA = d.SCHEMA_NAME AND
t1.TABLE_NAME = d.TABLE_NAME)
WHEN NOT MATCHED THEN
INSERT (SCHEMA, TABLE_NAME, TABLE_ALIAS)
VALUES (d.SCHEMA_NAME, d.TABLE_NAME, d.ALIAS_NAME)
Best of luck.
You should have a primary key, especially when you want to check for duplicate records and data integrity.
Provide a primary key for your table, or, if you somehow do not want to do that, create a unique constraint for all of the columns in the table, so no duplicate rows are possible.

sybase insert from another database in same server

i am trying to get all extra data from one data base and trying to insert into another.
But i want to omit the column name and am trying to make only the table name as hard coded to achieve this. But we have some fields which are system generated in a table like an id which is not that necessary a data but still will create a integrity issue. How can i do a insert of just the wanna be details omitting those above columns, the names of the columns to omit also changes.. I can't do a total insert, just the addition of some extra data.
so far i have come to this.
while 1=1
begin
if exists(select 1 from db1.table1 not in (select * from db2.table1)
begin
insert into db2.table1 (columns) select (columns) from db1.table1
end
if(rowCount=0)
break
end
please advise how i can optimize this to get the least possible hard coding
Have left the pk part intentionally, as
the query being big.
If you want to something like:
insert into TAB
select * from TAB2
or
insert into TAB
select col1,col2 from TAB2
or
insert into TAB (col1,col2)
select * from TAB2
where TAB1 and TAB2 have different count or type of columns it's not possible, because it will generate an error.

append not working with hive

I am trying to insert data from table a to table b (both are external tables), basically relying upon the append feature of the environment. I have tried the same with managed tables as well, but the behaviour was same.
The append somehow is not working out for me. On the other hand, ther overwrite works just fine.
e.g. the following fails
hive> insert table page_view select viewtime, userid, page_url, country from page_view1;
FAILED: Parse Error: line 1:0 cannot recognize input near 'insert' 'table' 'page_view' in insert clause
but, the following works just fine...
hive> insert overwrite table page_view select viewtime, userid, page_url, country from page_view1;
I am on hadoop 1.0.2 and hive 0.8.1
help needed...
insert table page_view select viewtime, userid, page_url, country from page_view1;
I believe according to what I saw in the comments here (https://issues.apache.org/jira/browse/HIVE-306) you are missing an INTO keyword. I think something like this might work:
insert INTO table page_view select viewtime, userid, page_url, country from page_view1;

Hive: Create New Table from Existing Partitioned Table

I'm using Amazon's Elastic MapReduce and I have a hive table created based on a series of log files stored in Amazon S3 and split in folders by day like so:
data/day=2011-09-01/log_file.tsv
data/day=2011-09-02/log_file.tsv
I am currently trying to create an additional table which filters out some unwanted activity in these log files but I can't figure out how to do this and keep getting errors such as:
FAILED: Error in semantic analysis: need to specify partition columns because the destination table is partitioned.
If my initial table create statement looks something like this:
CREATE EXTERNAL TABLE IF NOT EXISTS table1 (
... fields ...
)
PARTITIONED BY ( DAY STRING )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://bucketname/data/';
That initial table works fine and I've been able to query it with no problems.
How then should I create a new table that shares the structure of the previous one but simply filters out data? This doesn't seem to work.
CREATE EXTERNAL TABLE IF NOT EXISTS table2 LIKE table1;
FROM table1
INSERT OVERWRITE TABLE table2
SELECT * WHERE
col1 = '%somecriteria%' AND
more criteria...
;
As I've stated above, this returns:
FAILED: Error in semantic analysis: need to specify partition columns because the destination table is partitioned.
Thanks!
This always works for me:
CREATE EXTERNAL TABLE IF NOT EXISTS table2 LIKE table1;
INSERT OVERWRITE TABLE table2 PARTITION (day) SELECT col1, col2, ..., day FROM table1;
ALTER TABLE table2 RECOVER PARTITIONS;
Notice that I've added 'day' as a column in the SELECT statement. Also notice that there is an ALTER TABLE line which is necessary for Hive to become aware of the partitions that were newly created in table2.
I have never used the like option.. so thanks for showing me that. Will that actually create all of the partitions that the first table has as well? If not, that could be the issue. You could try using dynamic partitions:
create external table if not exists table2 like table1;
insert overwrite table table2 partition(part) select col1, col2 from table1;
Might not be the best solution, as I think you have to specify your columns in the select clause (as well as the partition column in the partition clause).
And, you must turn on dynamic partitioning.
I hope this helps.

Updating a SQL table where items to change are identified in another table that is linked

Everywhere I look I can find how to update a table from data in another table but I am not looking for that. I have two tables TABLE1 and TABLE2. TABLE1 has a column PULLDATE and a column JOBNMBR. TABLE2 has a column JOBNMBR and a column PROJECT. The two tables link at the JOBNMBR column. I need to do a bulk update to TABLE1.PULLDATE per a project number, but that project number is stored in TABLE2.PROJECT.
Using VisualStudio 2005 and in VB code not C+, does anyone know the code (if there is any) that links the tables and allows me to update all TABLE1.PULLDATE records grouped by TABLE2.PROJECT? I will be providing the trigger to update using a textbox [TxtBox_Pulldate] and a nearby button [Button_UpdatePulldate].
Thanks a bunch
Chuck Vensel
I think I understand that you want to update Table1 given a matching column in Table2?
You write the SQL update just as you would the SELECT except replace the SELECT clause with the UPDATE clause.
UPDATE Table1
SET
[PULLDATE] = your_value
FROM
Table1
JOIN Table2
ON Table2.[JOBNMBR] = Table1.[JOBNMBR]
WHERE
Table2.[PROJECT] = your_project_ID

Resources