Exclude 1 or more column in Impala - hadoop

can I exclude 1 or more column in Impala without specifying all the columns in table
SELECT * [except columnA] FROM tableA

Why not create a view of the data without ColumnA? Then you can just keep using the data but not have to have the column included? It would require you to list all the Columns once, but after that you would be good to go.

Related

How can I merge two tables using ROWID in oracle?

I know that ROWID is distinct for each row in different tables.But,I am seeing somewhere that two tables are being merged using rowid.So,I also tried to see it,but I am getting the blank output.
I have person table which looks as:
scrowid is the column which contains rowid as:
alter table ot.person
add scrowid VARCHAR2(200) PRIMARY KEY;
I populated this person table as:
insert into ot.person(id,name,age,scrowid)
select id,name, age,a.rowid from ot.per a;
After this I also created another table ot.temp_person by same steps.Both table has same table structure and datatypes.So, i wanted to see them using inner join and I tried them as:
select * from ot.person p inner join ot.temp_person tp ON p.scrowid=tp.scrowid
I got my output as empty table:
Is there is any possible way I can merge two tables using rowid? Or I have forgotten some steps?If there is any way to join these two tables using rowid then suggest me.
Define scrowid as datatype ROWID or UROWID then it may work.
However, in general the ROWID may change at any time unless you lock the record, so it would be a poor key to join your tables.
I think perhaps you misunderstood the merging of two tables via rowid, unless what you actually saw was a Union, Cross Join, or Full Outer Join. Any attempt to match rowid, requardless of you define it, doomed to fail. This results from it being an internal definition. Rowid in not just a data type it is an internal structure (That is an older version of description but Oracle doesn't link documentation versions.) Those fields are basically:
- The data object number of the object
- The data block in the datafile in which the row resides
- The position of the row in the data block (first row is 0)
- The datafile in which the row resides (first file is 1). The file
number is relative to the tablespace.
So while it's possible for different tables to have the same rowid, it would be exteremly unlikely. Thus making an inner join on them always return null.

Add conditional field to table in Hive or Impala

I have a massive table stored as parquet and I need to add columns based on conditions.
Is there a way to do that without having to recreate a new table in Hive or Impala?
Something like this?
ALTER TABLE xyz
ADD COLUMN flag AS (CASE WHEN ... END)
Thank you
I don't believe that Hive or Impala support computed columns. This type of calculation is often done using a view:
CREATE VIEW v_xyz AS
SELECT xyz.*,
(CASE WHEN ... END) as flag
FROM xyz;
You can then update the view at any time to adjust the logic or add new columns.

How to drop hive column?

I have two columns Id and Name in Hive table, and I want to delete the Name column. I have used following command:
ALTER TABLE TableName REPLACE COLUMNS(id string);
The result was that the Name column values were assigned to the Id column.
How can I drop a specific column of the table and is there any other command in Hive to achieve my goal?
In addition to the existing answers to the question : Alter hive table add or drop column
As per Hive documentation,
REPLACE COLUMNS removes all existing columns and adds the new set of columns.
REPLACE COLUMNS can also be used to drop columns. For example, ALTER TABLE test_change REPLACE COLUMNS (a int, b int); will remove column c from test_change's schema.
The query you are using is right. But this will modify only schema i.e, the metastore. This will not modify anything on data side.
So, before you are dropping the column you should make sure that you hav correct data file.
In your case the data file should not contain name values.
If you don't want to modify the file then create another table with only specific column that you need.
Create table tablename as select id from already_existing_table
let me know if this helps.

comparing data in two tables taking time

I need to query table1 find all orders and created date ( key is order number an date)).
In table 2 ( key is order number an date) Check if the order exists for a a date.
For this i am scanning table 1 and for each record checking if it exists in table 2. Any better way to do this
In this situation in which your key is identical for both tables, it makes sense to have a single table in which you store both data for Table 1 and Table 2. In that way you can do a single scan on your data and know straight away if the data exists for both criteria.
Even more so, if you want to use this data in MapReduce, you would simply scan that single table. If you only want to get the relevant rows, you could define a filter on the Scan. For example, in the case where you will not be populating rows at all in Table 2, you would simply use a ColumnPrefixFilter
If, however, you do need to keep this data separately in 2 tables, you could pre-split the tables with the same region boundaries for both tables - this will be helpful when you do the query that you are aiming for - load all rows in Table 1 when row exists in Table 2. Essentially this would be a map-side join. You could define multiple inputs in your MapReduce job, and since the region borders are the same, the splits will be such that each mapper will have corresponding rows from both tables. You would probably need to implement your own MultipleInput format for that (the MultiTableInputFormat class recently introduced in 0.96 does not seem to do that map side join)

Informatica target table populating

I am new in Informatica,here i am trying to populate my target table by joining two tables where the no. of rows in both the table is 5649 and 2611 respectively.So, my output rows should be 8260.But the no. of rows rows in target table is around 108860 (approx.).
Why this is happening and how should I remove it?
It seems to me that you are confusing the join operation with union.
You need to merge two sets of rows into one, so use a Union transformation, not a Joiner.
if table structure isn't same pick/select only common columns and then perform a union in SQ Override..

Resources