Why Phoenix always add a extra column (named _0) to hbase when I execute UPSERT command? - hadoop

When I execute the UPSERT command on apache phoenix, I always see that Phoenix add an extra column (named _0) with an empty value in the hbase, this column(_0) is auto generate by phoenix, but I don't need it, like this:
ROW COLUMN+CELL
abc column=F:A,timestamp=1451305685300,value=123
abc column=F:_0, timestamp=1451305685300, value=  # I want to avoid generate this row
Could you tell me how to avoid that? Thank you very much!

"At create time, to improve query performance, an empty key value is
added to the first column family of any existing rows or the default
column family if no column families are explicitly defined. Upserts will also add this empty key value. This improves query performance by having a key value column we can guarantee always being there and thus minimizing the amount of data that must be projected and subsequently returned back to the client."
Apache Phoenix Documentation

Regarding your question if that is avoidable:
You could work around the problem by adding the following statements at the end of your sql:
ALTER TABLE "<your-table>" ADD "<your-cf>"."_0" VARCHAR(1);
ALTER TABLE "<your-table>" DROP COLUMN "<your-cf>"."_0";
You should only do this if you query some table with phoenix but then access the table with another system that is not aware of this phoenix-specific dummy value.

Related

how to Insert SQL query from previous database table value to pass to another database table with dynamic value in Katalon

Hi any one can help me on this issue which i have problem insert a dynamic value which is from previous database table value to pass to another table in Katalon.
Please find my information below:-
This screenshot is ab.dbo.DOCUMENT table which DOCUMENT_ID is auto populate with value
which mean it will appear random number by itself.
Another screenshot is bc.dbo.DOCUMENT_IC table which i need to manually key in DOCUMENT_ID in
the value base on on what it is given from ab.dbo.DOCUMENT table DOCUMENT_ID.
Attached of a screenshot for bc.dbo.DOCUMENT_IC
In Katalon i am using a keyword to connect my database, insert query and close
connection. I am aware of this step and able to connect to database with katalon. But i
am not very sure how to pass a dynamic value from ab.dbo.DOCUMENT table DOCUMENT_ID which
it can randomly appear a number value to bc.dbo.DOCUMENT_IC table DOCUMENT_ID which i need to
manually key in a value base on the value given.
Below is my Katalon script:-
Hopefully someone can help me on this
Thank you.
If I have a table with an auto incrementing ID in one table and I need that value elsewhere I would typically write sql like this :
insert into firsttable (Document_Type) values ('PDF');
insert into secondtable (Document_ID, App_ref_Num) values (##Identity, 'somenumber')
In the databases I have worked with ##Identity will give you the integer or id of the last inserted row. If you can't run multiple statements most connection libraries will have something like a $conn->insert_id that will do the same thing as running select ##identity.

Oracle 12c - refreshing the data in my tables based on the data from warehouse tables

I need to update the some tables in my application from some other warehouse tables which would be updating weekly or biweekly. I should update my tables based on those. And these are having foreign keys in another tables. So I cannot just truncate the table and reinsert the whole data every time. So I have to take the delta and update accordingly based on few primary key columns which doesn't change. Need some inputs on how to implement this approach.
My approach:
Check the last updated time of those tables, views.
If it is most recent then compare each row based on the primary key in my table and warehouse table.
update each column if it is different.
Do nothing if there is no change in columns.
insert if there is a new record.
My Question:
How do I implement this? Writing a PL/SQL code is it a good and efficient way? as the expected number of records are around 800K.
Please provide any sample code or links.
I would go for Pl/Sql and bulk collect forall method. You can use minus in your cursor in order to reduce data size and calculating difference.
You can check this site for more information about bulk collect, forall and engines: http://www.oracle.com/technetwork/issue-archive/2012/12-sep/o52plsql-1709862.html
There are many parts to your question above and I will answer as best I can:
While it is possible to disable referencing foreign keys, truncate the table, repopulate the table with the updated data then reenable the foreign keys, given your requirements described above I don't believe truncating the table each time to be optimal
Yes, in principle PL/SQL is a good way to achieve what you are wanting to
achieve as this is too complex to deal with in native SQL and PL/SQL is an efficient alternative
Conceptually, the approach I would take is something like as follows:
Initial set up:
create a sequence called activity_seq
Add an "activity_id" column of type number to your source tables with a unique constraint
Add a trigger to the source table/s setting activity_id = activity_seq.nextval for each insert / update of a table row
create some kind of master table to hold the "last processed activity id" value
Then bi/weekly:
retrieve the value of "last processed activity id" from the master
table
select all rows in the source table/s having activity_id value > "last processed activity id" value
iterate through the selected source rows and update the target if a match is found based on whatever your match criterion is, or if
no match is found then insert a new row into the target (I assume
there is no delete as you do not mention it)
on completion, update the master table "last processed activity id" to the greatest value of activity_id for the source rows
processed in step 3 above.
(please note that, depending on your environment and the number of rows processed, the above process may need to be split and repeated over a number of transactions)
I hope this proves helpful

Populate indexed table in Oracle using Informatica

I'm new to both Oracle and Informatica.
Currently working on a small task where I need to select all records from the source table, filter the results to get only records where field1='Y' and finally insert new rows into the target table that contains only src.field2 and src.field3 values.
These 2 fields are used for the PK and for the Index of the target table.
So i get an error in Informatica:
"ORA-26002: Table has index defined upon it"
I rather not dropping the index? is there a work around?
I've tried alter index to "unusable" but I got the same error.
Please advice.
Thanks.
Try to use Normal load mode instead of Bulk. You can set in session properties for the target.

Adding column with default value

I have a table A (3 columns) in production which is around 10 million records. I wanted to add one more column to that table and also I want to make default value to 1. Is it going to impact production DB performance If add a column with default value 1 or something else. What would be best approach to this to avoid any kind of performance impact on DB? your thoughts are much appreciated!!
In Oracle 11g the process of adding a new column with a default value has been considerably optimized. If a newly added column is specified as NOT NULL, default value for that column is maintained in the data dictionary and it's no longer required for a default value of a column to be stored for all records in a table, so it's no longer required to update each record with a default value. Such an optimization considerably reduces amount of time the table is exclusively locked during the operation.
alter table <tab_name> add(<col_name> <data_type> default <def_val> not null)
Moreover, column with a default value added that way will not consume space, until you deliberately start to update that column or insert a record with a non default value for that column. So the operation of adding a new column with a default value and not null constraint specified completes pretty quick.
i think that it is better that you create a table as backup table with this syntax:
create table BackUpTable as SELECT * FROM YourTable;
alter table BackUpTable add (newColumn number(5,0)default 1);

How do I prevent the loading of duplicate rows in to an Oracle table?

I have some large tables (millions of rows). I constantly receive files containing new rows to add in to those tables - up to 50 million rows per day. Around 0.1% of the rows I receive are duplicates of rows I have already loaded (or are duplicates within the files). I would like to prevent those rows being loaded in to the table.
I currently use SQLLoader in order to have sufficient performance to cope with my large data volume. If I take the obvious step and add a unique index on the columns which goven whether or not a row is a duplicate, SQLLoader will start to fail the entire file which contains the duplicate row - whereas I only want to prevent the duplicate row itself being loaded.
I know that in SQL Server and Sybase I can create a unique index with the 'Ignore Duplicates' property and that if I then use BCP the duplicate rows (as defined by that index) will simply not be loaded.
Is there some way to achieve the same effect in Oracle?
I do not want to remove the duplicate rows once they have been loaded - it's important to me that they should never be loaded in the first place.
What do you mean by "duplicate"? If you have a column which defines a unique row you should setup a unique constraint against that column. One typically creates a unique index on this column, which will automatically setup the constraint.
EDIT:
Yes, as commented below you should setup a "bad" file for SQL*Loader to capture invalid rows. But I think that establishing the unique index is probably a good idea from a data-integrity standpoint.
Use Oracle MERGE statement. Some explanations here.
You dint inform about what release of Oracle you have. Have a look at there for merge command.
Basically like this
---- Loop through all the rows from a record temp_emp_rec
MERGE INTO hr.employees e
USING temp_emp_rec t
ON (e.emp_ID = t.emp_ID)
WHEN MATCHED THEN
--- _You can update_
UPDATE
SET first_name = t.first_name,
last_name = t.last_name
--- _Insert into the table_
WHEN NOT MATCHED THEN
INSERT (emp_id, first_name, last_name)
VALUES (t.emp_id, t.first_name, t.last_name);
I would use integrity constraints defined on the appropriate table columns.
This page from the Oracle concepts manual gives an overview, if you also scroll down you will see what types of constraints are available.
use below option, if you will get this much error 9999999 after that your sqlldr will terminate.
OPTIONS (ERRORS=9999999, DIRECT=FALSE )
LOAD DATA
you will get duplicate records in bad file.
sqlldr user/password#schema CONTROL=file.ctl, LOG=file.log, BAD=file.bad

Resources