vertica copy command with null value for Integer - vertica

Is there any empty character i can put into the csv in order to put a null value
into an integer column
without using the ",X" pattern?
i.e. (X is a value and the first one is null)

Suppose you have a file /tmp/file.csv like this:
2016-01-10,100,abc
2016-02-21,,def
2017-01-01,300,ghi
and a target table defined as follows:
create table t1 ( dt date, id integer, txt char(10));
Then, the following command will insert NULL into "id" for the second column (the one having dt='2016-02-21'):
copy t1 from '/tmp/file.csv' delimiter ',' direct abort on error;
Now, if you want to use a special string to identify NULL values in your input file, let's say 'MYNULL':
2016-01-10,100,abc
2016-02-21,MYNULL,def
2017-01-01,300,ghi
Then... you have to run copy COPY this way:
copy t1 from '/tmp/file.csv' delimiter ',' null 'MYNULL' direct abort on error;

Related

How to insert a blank value instead of NULL in Columns other than String datatype in hive

I have a create statement like
CREATE TABLE temp_tbl (EmpId String,Salary int);
I would like to insert an employee id and a blank value into table.
So What I have done is
insert overwrite table temp_tbl select '013' as EmpId,'' as Salary from tbl;
hive> select * from temp_tbl;
OK
013 NULL
But expected result is
hive> select * from temp_tbl;
OK
013 NULL ---> Blank instead of NULL
Also tried with "". Still I get it as NULL instead of blank
3.Tried to create table with serialization property
CREATE TABLE temp_tbl (EmpId String,Salary int) TBLPROPERTIES ('serialization.null.format' = '');
That too didn't change NULL value to blank.
What can be the workaround for the same.
Use Case while selecting the data.
Select
(CASE
WHEN columnName is null THEN ''
ELSE columnName
END) as 'Result' from temp_tbl;
All types except strings/varchar/char and some complex types like array, in Hive cannot be blank, only NULL is possible. Empty string '' is quite normal value of type String. You can produce empty array() as well (Array with zero size).
As a workaround, you can use some predefined values which are not normally in your data to represent some special numeric values, like -99999. Alternatively you can store your numeric values in a String column, in such case you will be able to have empty values in it. But it's not possible to assign (cast) empty strings to numeric types, because such empty value is not allowed.
If you try to assign empty string to numeric column or cast to numeric type, the result will be the same as if you are converting non-numeric string to numeric - NULL (in Hive if not possible to cast, it returns NULL) or get java.lang.NumberFormatException in Java.
Knowing that datatype Int can be either NULL or integer , I'd think of how to work around the problem.
I have the impression that 0 can do the job. Why can it not?
If 1 is not ideal, why not create a new temp_employees_with_no_salary table?
If 2 is not ideal, can you afford to change the datatype of temp_tbl.Salary from Int to String, then use CAST(Salary AS INT) to work with it?

Import CSV which every cell terminated by newline

I have CSV file. The data looks like this :
PRICE_a
123
PRICE_b
500
PRICE_c
1000
PRICE_d
506
My XYZ Table is :
CREATE TABLE XYZ (
DESCRIPTION_1 VARCHAR2(25),
VALUE NUMBER
)
Do csv as above can be imported to the oracle?
How do I create a control.ctl file?
Here's how to do it without having to do any pre-processing. Use the CONCATENATE 2 clause to tell SQL-Loader to join every 2 lines together. This builds logical records but you have no separator between the 2 fields. No problem, but first understand how the data file is read and processed. SQL-Loader will read the data file a record at a time, and try to map each field in order from left to right to the fields as listed in the control file. See the control file below. Since the concatenated record it read matches with TEMP from the control file, and TEMP does not match a column in the table, it will not try to insert it. Instead, since it is defined as a BOUNDFILLER, that means don't try to do anything with it but save it for future use. There are no more data file fields to try to match, but the control file next lists a field name that matches a column name, DESCRIPTION_1, so it will apply the expression and insert it.
The expression says to apply the regexp_substr function to the saved string :TEMP (which we know is the entire record from the file) and return the substring of that record consisting of zero or more non-numeric characters from the start of the string where followed by zero or more numeric characters until the end of the string, and insert that into the DESCRIPTION_1 column.
The same is then done for the VALUE column, only returning the numeric part at the end of the string, skipping the non-numeric at the beginning of the string.
load data
infile 'xyz.dat'
CONCATENATE 2
into table XYZ
truncate
TRAILING NULLCOLS
(
TEMP BOUNDFILLER CHAR(30),
DESCRIPTION_1 EXPRESSION "REGEXP_SUBSTR(:TEMP, '^([^0-9]*)[0-9]*$', 1, 1, NULL, 1)",
VALUE EXPRESSION "REGEXP_SUBSTR(:TEMP, '^[^0-9]*([0-9]*)$', 1, 1, NULL, 1)"
)
Bada-boom, bada-bing:
SQL> select *
from XYZ
/
DESCRIPTION_1 VALUE
------------------------- ----------
PRICE_a 123
PRICE_b 500
PRICE_c 1000
PRICE_d 506
SQL>
Note that this is pretty dependent on the data following your example, and you should do some analysis of the data to make sure the regular expressions will work before putting this into production. Some tweaking will be required if the descriptions could contain numbers. If you can get the data to be properly formatted with a separator in a true CSV format, that would be much better.

how to pass parameter to oracle update statement from csv file and excluding null values from csv

I have a situation where I have following csv file(say file.csv) with following data:
AcctId,Name,OpenBal,closingbal
1,abc,1000,
2,,0,
3,xyz,,
4,,,
how can I loop through this file using unix shell and say for example for column $2 (Name) , I want to get all occurances of Name column accept null values and pass it to for example following oracle query with single quotes '','' format?
select * from account
where name in (collection of values from csv file column name
but excluding null values)
and openbal in
and same thing for column 3 (collection of values from csv file column Openbal
but excluding null values)
and same thing for column 4 (collection of values from csv file column
closingbal but excluding null values)
In short what I want is pass the csv column values as input parameter to oracle sql query and update query too ? but again I dont want to include null values in it. If a column is entirely null for all rows I want to exclude it too?
Not sure why you'd want to loop through this file in a unix shell script: perhaps because you can't think of any better approach? Anyway, I'm going to skip that and offer a pure Oracle solution.
We can expose data in CSV files to the database using external tables. These are like regular tables except their data comes from files in OS directories on the database server (rather than the database's storage). Find out more.
Given this approach it is easy to write the query you want. I suggest using sub-query factoring to select from the external table once.
with cte as ( select name, openbal, closingbal
from your_external_tab )
select *
from account a
where a.name in ( select cte.name from cte )
and a.openbal in ( select cte.openbal from cte )
and a.closingbal in ( select cte.closingbal from cte )
The behaviour of the IN clause is to exclude NULL from consideration.
Incidentally, that will return a different (larger) result set from this:
select a.*
from account a
, your_external_table e
where a.name = e.name
and a.openbal= e.openbal
and a.closingbal = e.closingbal

Hive select columns to do a case statement on

This will export the data from dynamodb dynamically to s3.
-- Load S3 Table with data from DynamoDB
INSERT OVERWRITE TABLE s3_table SELECT * FROM dynamodb_table;
The problem is that it leaves in a bunch of \N. I can write it by hand it will look something like
-- Load S3 Table with data from DynamoDB
INSERT OVERWRITE TABLE s3_table SELECT DCS_ID, CASE WHEN MAKE IS NULL THEN "" ELSE MAKE END, CASE WHEN MODEL IS NULL THEN "" ELSE MODEL END FROM dynamodb_table;
The problem is selecting the columns to say "When Column is NULL Then "" Else Column End"
The current output looks like this
PORTAL 1.5.1.25.2 2013-08-09 13:45:20.126 2013-08-09 13:45:20.282 \N \N \N \N \N \N
The desired ouput looks like this
PORTAL 1.5.1.25.2 2013-08-13 18:18:24.667 2013-08-13 18:18:24.832
The hive output contains the string "\N" for null values (to distinguish from blank), so either you have to prepare each column, or process the output afterwards (could use a stream job if large amounts of data.)
I often use the coalesce function for this: coalesce takes multiple arguments and returns the first non-null (or null if all null). In your example to avoid the nulls in output, you could do the following:
INSERT OVERWRITE TABLE s3_table
SELECT coalesce(DCS_ID,''), coalesce(MAKE,''), coalesce(MODEL,'')
FROM dynamodb_table;

getting null values while loading the data from flat files into hive tables

I am getting the null values while loading the data from flat files into hive tables.
my tables structure is like this:
hive> create table test_hive (id int,value string);
and my flat file is like this:
input.txt
1 a
2 b
3 c
4 d
5 e
6 F
7 G
8 j
when I am running the below commands I am getting null values:
hive> LOAD DATA LOCAL INPATH '/home/hduser/input.txt' OVERWRITE INTO TABLE test_hive;
hive> select * from test_hive;
OK<br>
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
screen shot:
hive> create table test_hive (id int,value string);
OK
Time taken: 4.97 seconds
hive> show tables;
OK
test_hive
Time taken: 0.124 seconds
hive> LOAD DATA LOCAL INPATH '/home/hduser/input2.txt' OVERWRITE INTO TABLE test_hive;
Copying data from file:/home/hduser/input2.txt
Copying file: file:/home/hduser/input2.txt
Loading data to table default.test_hive
Deleted hdfs://hydhtc227141d:54310/app/hive/warehouse/test_hive
OK
Time taken: 0.572 seconds
hive> select * from test_hive;
OK
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
NULL NULL
Time taken: 0.182 seconds
The default field terminator in Hive is ^A. You need to explicitly mention in your create table statement that you are using a different field separator.
Similar to what Lorand Bending pointed in the comment, use:
CREATE TABLE test_hive(id INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
You don't need to specify a location since you are creating a managed table (and not an external table).
Problem you are facing is because in your data the fields are separated by ' ' and while creating table you did not mention the field delimiter. So if you don't mention the field delimiter while creating hive table, by default hive considers ^A as delimiter.
So to resolve your problem, you can recreate the table mentioning the below syntax and it would work.
CREATE TABLE test_hive(id INT, value STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ';
The solution is quite simple. The Table wan't created in the right way.
Simple solution for your problem or any further problems is knowing how to load the data.
CREATE TABLE [IF NOT EXIST] mytableName(id int,value string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '/t'
STORED AS TEXTFILE ;
Now lemme explain the code :
First Line
Creating your table. The [IF NOT EXIST] is optional that tells if the table exist don't overwrite it. Its more of safety measure.
Second line
Specifies a delimiter at the table level for structured fields.
Third Item
You can include any single character, but the default is '\001'.
'/t' is for a tab space : in your case
'|' is for data which are beside each other and separated by |
' ' for one char space. And so on...
Forth Line :
Specifies the type of file in which data is to be stored. The file can be a TEXTFILE, SEQUENCEFILE, RCFILE, or BINARY SEQUENCEFILE. Or, how the data is stored can be specified as Java input and output classes.
when loading Locally :
LOCD DATA LOCAL INPATH '/your/data/path.csv' [OVERWRITE] INTO TABLE myTableName;
Always try checking your data by a simple select* statement.
Hope it helps.
Hive’s default record and field delimiters list:
\n
^A
^B
^C
press ^V^A could insert a ^A in Vim.
The elements are separated by space or tab? Let it's tab follow these steps. If separated space use ' ' instead of '\t' Ok.
hive> CREATE TABLE test_hive(id INT, value STRING) row format
delimited fields terminated by '\t' line formated by '\n' stored as filename;
Than you have to enter
hive> LOAD DATA LOCAL INPATH '/home/hduser/input.txt' OVERWRITE INTO TABLE test_hive;
hive> select * from test_hive;
Now you will get exact your expected output "filename".
please check the dataset date column it should follow the date format yyyy-mm-dd
If the string is in the form 'yyyy-mm-dd', then a date value corresponding to that year/month/day is returned. If the string value does not match this formate, then NULL is returned.
Hive Official documentation

Resources