Kudu table column containing created timestamp - ddl

We are trying to create a kudu table that should contain a column holding the timestamp when the records are getting inserted.
We tried the below :
create table clcs.table_a (
store_nbr string,
load_dttm timestamp default now(),
primary key ( store_nbr)
)
But the load_dttm timestamp is always the time when the table got created and NOT the time when records are getting inserted.
Any directions would be highly appreciated. Thanks in advance!

You are thinking of Kudu as a database, which it is not. It is a storage layer. Drop the default from your Kudu DDL and instead use whatever function call is available in your SQL language processor that is performing the insert, such as now(), current_timestamp(), or CURRENT_TIMESTMAP (Impala, Impala, and Hive, respectively). Take note of whether the function call is deterministic (repeatable for the lifetime of the INSERT transaction) or not, depending on what time you want to record, the row or set of rows insert.

Related

Change the datatype of a column in a partitioned table with billion rows

We have a table with 120 partitions on date range which in turn is subpartitioned again on range.
Each partition has around 200 million records, the conventional way of changing the datatype will make our production unresponsive for hours.Is there any better way for changing the datatype of such a huge table?
We have already tried the following options:
Exchange partition. This does not work.
Create a new table with the same structure as the existing one and the altered column, and inserting the data using /*+ append *. It again takes hours.
Currently the column size is varchar2(30). We need to change it to:
ALTER TABLE ORDERS MODIFY (INFO VARCHAR2(50) );
Changing varchar2(30) to varchar2(50) should work instantly and should not cause any trouble.
You modify just some meta data but actual table data is not touched.

Rules to be followed before creating a Hive partitioned table

As part of my requirement, I have to create a new Hive table and insert into it programmatically. To do that, I have the following DDL to create a Hive table:
CREATE EXTERNAL TABLE IF NOT EXISTS countData (
tableName String,
ssn String,
hiveCount String,
sapCount String,
countDifference String,
percentDifference String,
sap_UpdTms String,
hive_UpdTms String)
COMMENT 'This table contains record count of corresponding tables of all the source systems present on Hive & SAP'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '';
To insert data into a partition of a Hive table I can handle using an insert query from the program. Before creating the table, in the above DDL, I haven't added the "PARTITIONED BY" column as I am not totally clear with the rules of partitioning a Hive table. Couple of rules I know are
While inserting the data from a query, partition column should be the last one.
PARTITIONED BY column shouldn't be an existing column in the table.
Could anyone let me know if there are any other rules for partitioning a Hive table ?
Also in my case, we run the program twice a day to insert data into the table and every time it runs, there could be 8k to 10k records. I am thinking of adding a PARTITIONED BY column for current date (just "mm/dd/yyyy") and inserting it from the code.
Is there a better way to implement the partition idea for my requirement, if adding a date (String format) is not recommended ?
What you mentioned is fine, but I would recommend yyyyMMdd format because it sorts better and is more standardized than seeing 03/05 and not knowing which is the day, and what is the month.
If you want to run it twice a day, and you care about the time the job runs, then do PARTITIONED BY (dt STRING, hour STRING)
Also, don't use STORED AS TEXT. Use Parquet or ORC instead.

Adding column with default value

I have a table A (3 columns) in production which is around 10 million records. I wanted to add one more column to that table and also I want to make default value to 1. Is it going to impact production DB performance If add a column with default value 1 or something else. What would be best approach to this to avoid any kind of performance impact on DB? your thoughts are much appreciated!!
In Oracle 11g the process of adding a new column with a default value has been considerably optimized. If a newly added column is specified as NOT NULL, default value for that column is maintained in the data dictionary and it's no longer required for a default value of a column to be stored for all records in a table, so it's no longer required to update each record with a default value. Such an optimization considerably reduces amount of time the table is exclusively locked during the operation.
alter table <tab_name> add(<col_name> <data_type> default <def_val> not null)
Moreover, column with a default value added that way will not consume space, until you deliberately start to update that column or insert a record with a non default value for that column. So the operation of adding a new column with a default value and not null constraint specified completes pretty quick.
i think that it is better that you create a table as backup table with this syntax:
create table BackUpTable as SELECT * FROM YourTable;
alter table BackUpTable add (newColumn number(5,0)default 1);

HSQLDB how to insert 1 million of records

I am developing an GWT application.
In order to test my DataGrid I created a button which makes calls to my server.
When I hit on it 1 million of records should be inserted into the database.
I created an alias
CREATE FUNCTION PUBLIC.GENERATENAME() RETURNS VARCHAR(32768) SPECIFIC GENERATENAME_10073 LANGUAGE JAVA NOT DETERMINISTIC NO SQL CALLED ON NULL INPUT EXTERNAL NAME 'CLASSPATH:com.package.sql.Helper.generateName'
And created a stored procedure
CREATE PROCEDURE PUBLIC.GENERATE() SPECIFIC GENERATE_10073 LANGUAGE SQL NOT DETERMINISTIC MODIFIES SQL DATA NEW SAVEPOINT LEVEL BEGIN ATOMIC DECLARE VAL_P BIGINT;TRUNCATE TABLE PUBLIC.CONTACT;SET VAL_P=1;LOOP_LABEL:WHILE VAL_P<=1000 DO INSERT INTO PUBLIC.CONTACT VALUES VAL_P,PUBLIC.GENERATENAME(),PUBLIC.GENERATENAME();SET VAL_P=VAL_P+1;END WHILE LOOP_LABEL;END
My table is a simple one
CREATE MEMORY TABLE PUBLIC.CONTACT(CONTACT_ID BIGINT GENERATED BY DEFAULT AS IDENTITY(START WITH 1) NOT NULL PRIMARY KEY,FIRST_NAME VARCHAR(10) NOT NULL,SECOND_NAME VARCHAR(10) NOT NULL)
I tested and realized I can't insert 1M rows at once, or can I?
What is the best way to insert such a huge amount of data?
I am using HSQLDB version 2.2.4
Because you use CREATE MEMORY TABLE, all the data is stored in memory. You may have to increase your Java heap allocation to store the data.
With file databases, you can use CREATE CACHED TABLE to reduce memory use.

ODBC with Oracle Trigger Key Column

I'm trying to update some existing code that is supposed to write data to a variety of Databases (SQL, Access, Oracle) via ODBC, but I'm having a few problems with Oracle and am looking for any suggestions.
I've set my Oracle database up using a Trigger (basic tutorial online, which I'd like to support).
CREATE TABLE TABLE1 (
RECORDID NUMBER NOT NULL PRIMARY KEY,
ID VARCHAR(40) NULL,
COUNT NUMBER NULL
);
GO
CREATE SEQUENCE TABLE1_SEQ
GO
CREATE or REPLACE TRIGGER TABLE1_TRG
BEFORE INSERT ON TABLE1
FOR EACH ROW
WHEN (new.RECORDID IS NULL)
BEGIN
SELECT TABLE1_SEQ.nextval
INTO :new.RECORDID
FROM dual;
end;
GO
I then populate a DataTable using a SELECT * FROM TABLE1. The first problem is that this DataTable doesn't know that the RecordId column is auto-generated. If I have data in my table then I can't alter it because I get a error
Cannot change AutoIncrement of a DataColumn with type 'Double' once it
has data.
If I continue, ignoring this, then I quickly get stuck. If I create a new DataRow and try to insert it, I can't set RecordID to DBNull.Value because it complains that the column has to be non-null (NoNullAllowedException). I can't however generate a value myself, because I don't know what value I should be using really, and don't want to screw up the trigger by using the next available value.
Any suggestions on how I should insert data without ODBC complaining?
It does not appear that your first problem is with an Oracle database. There is no such thing as an "Autoincrement" column in Oracle. Are you sure that message is coming from an Oracle database?
With Oracle, you should be able to provide any dummy value on insert for the primary key, and the trigger will overwrite it.
There is also nothing in your provided description that would prevent you from updating this value in Oracle (since your trigger is on insert only) unless you have foreign key references to the key.

Resources