I try to update column type from DateTime(TZ) to DateTime, but it is key column and couldn't be changed. Drop/create table doesn't have any result - looks like metadata stored in ZK.
Can I change table structure (I can drop/create table) without changing ZK records? Or it is required to remove meta from ZK?
You need to drop a table at all replicas. If you lost a replica and did notdrop their table you need to clean ZK manually.
Or you can just use another ZK path. Table name does matter.
Related
I am using talend for ETL I don't have enough experience in this, I am having two tables for example- account and account_roles account table is having id, name, password etc fields and account_roles table is having account_id which is f.k to account table's pk. and one more field.
Both the fields in account_roles are having duplicates, I want to save account_roles in destination with update and insert logic using talend.
But I am getting error as I don't have any table that can be treated as primary key in the account_roles table, so talend can't update or insert it.
How I deal with this situation I tried tDBOutput advance option use_field_option but still it requires unique entries.
Is there any possible solution to this issue, I also want to know if I can make table Foreign key in the account_roles table will it work then? If yes then How to make F.k in talend OPen studio is my second question.
Attaching Snapshots of my tables and tMap below -
I want to know the way I can put my tables into database if I don't have any primary key! Kindly help me.
First question
I think you should place the primary key in the physical account_roles table. Talend will use the key indication of the dbOutput component, and the physical key of the table.
In order to delete duplicates rows, you can also use a tUniqRow before the dbOutput: The key you indicate in the UniqRow is not directly linked to the database; this is only the key on which tUniqRow will be based.
Second question
It's not possible to delegate the f.k. verification to Talend. But you can do this verification in your database by placing foreign keys in your table. If an id is not present in the reference table, the database returns an error that is returned by Talend.
Working on hive table, where I need to change column name as below, its working as expected and changing column name but underline value of this column getting NULL.
ALTER TABLE db.tbl CHANGE hdfs_loaddate hdfs_load_date String;
Here changed column name is hdfs_load_date and values are getting NULL after renaming column name.
Does any one have idea to fix this. Thanks in advance!!
#Ajay_SK Referencing this article: Hive Alter table change Column Name
There is a comment:
Note that the column change will not change any underlying data if it is a parquet table. That is, if you have data in the table already, renaming a column will not make the data in that column accessible under the new name: select a from test_change; 1 alter table test_change change a a1 int; select a1 from test_change; null
He is specific to parquet, but the scenario you describe is similar where you have successfully changed the name, but hive still thinks the original data is in the original key.
A better approach to solve your issue, would be to create a new table of the schema you want with column name change. Then perform an Insert INTO new table select FROM * old table.
I'm looking for a way to modify a parquet data table in HIVE to remove some fields. The table is managed but it doesn't matter because I can convert it to external.
The problem is that I can not use the command ALTER TABLE ... REPLACE COLUMN with partitioned parquet tables.
It is works well for textfile format (partitioned or not) and only for non-partitioned parquet tables.
I've tried to replace column but this is the result:
hive> ALTER TABLE db_test.mytable REPLACE COLUMNS(name String);
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
Replacing columns cannot drop columns for table db_test.mytable.
SerDe may be incompatible
I've thought about some solutions, but none of them fits my scenario:
First
- [Optional] Convert the table in external.
- Delete the table.
- Re-create the table with the fields that I want.
- MSCK REPAIR TABLE to add HDFS partitions.
- [Optional] Convert back to managed table.
Second
- Create temporary table as selection of the original table with the fields that I choose.
- Delete the original table.
- Rename the temporary table to the original name.
Both options affect my process because I would lose the statistics of my table. This table is consumed with MicroStrategy by Impala and I need to mantain the statistics.
In addition, the second solution is bad with very large data tables.
Any suggestions?
Thanks in advance.
You can use first method and then run
hive> anayze table <db_name>.<table_name> compute statistics;
to compute all the statistics of the table.
I have a table which needs to be ingested from Oracle source to Greenland target using ETL tool talend. The table is huge , hence we want to load the data on daily basis incrementally. The table doesn't have any primary or unique key.
Table has date column, I am able to get both inserted/updated records from last update date but to insert that data, we need a primary key.
Any solution on how to load the data without using a primary key?
You need to define your key in talend in the schema of the component that insert into your target table, like this :
And you can use this key to update your table, in the advanced settings of the same component, activate the check box use fields optins and select your key :
This is tested and worked fine against Oracle table that does not have primary key, and it should work for you.
I have one table - TableA. This is source and target also. Table doesn't have any primary key. I am fetching data from TableA, then doing some calculation on some fields and updating them in same tableA. Now how can I update data when it doesn't have any primary key or composite key? Second question - If joining two columns make a record unique then how can I use it in informatica?Plz help
You can define the update statement in the target. There is that properties.
Still you have to make informatica to perform an update, not insert. To do that you need to use the update strategy.
I think you don't need in this solution to make any PK on that table, because you will use your own update statement, but please verify this.
To set the fields and make proper where condition for update you need to use :TU alias in the code. TU -> means the update strategy before the target.
Example:
update t_table set field1 = :TU.f1 where key_field = :TU.f5
If you don't want (or can't) create primary key in your table in database you can just define it in informatica source
If record unique as combination of two columns just mark both of them as primary key in informatica source