I have a fixed length data file a.dat with below data in it
1234544550002200011000330006600000
my focus is on specific positions
POSITION(1:4)
POSITION(5:8)
and I want to add values in these 2 positions and insert it in a field named Qty in XYZ_Table.
I am trying to the following in my CTL file. But it fails, and I don't know how to pursue it further.
LOAD DATA
INFILE '$SOME_DATA/a.dat'
APPEND
PRESERVE BLANKS
INTO TABLE XYZ_Table
(QTY POSITION(1:4)+POSITION(5:8) "to_number(:QTY)")
I need to achieve this addition functionality in SQL Loader only.
If the above methodology is not possible, it would be great if you can help me with a different approach.
P.S: What I am trying to achieve is just one part of the bigger CTL file.
You need to identify the positions you want to add together but not load into their own columns as "BOUNDFILLER", which means don't load them but remember them for use in an expression later. Then use like this:
LOAD DATA
infile test.dat
append
preserve blanks
INTO TABLE X_test
TRAILING NULLCOLS
(val_1 BOUNDFILLER position(1:4)
,val_2 BOUNDFILLER position(5:8)
,qty ":val_1 + :val_2"
)
Related
I am newbie in hadoop and I have to add data into table in hive.
I have data from FIX4.4 protocol, something like this...
8=FIX.4.4<SHO>9=85<SHO>35=A<SHO>34=524<SHO>49=SSGMdemo<SHO>52=20150410-15:25:55.795<SHO>56=Trumid<SHO>98=0<SHO>108=30<SHO>554=TruMid456<SHO>10=154<SHO>
8=FIX.4.4<SHO>9=69<SHO>35=A<SHO>34=1<SHO>49=Trumid<SHO>52=20150410-15:25:58.148<SHO>56=SSGMdemo<SHO>98=0<SHO>108=30<SHO>10=093<SHO>
8=FIX.4.4<SHO>9=66<SHO>35=2<SHO>34=2<SHO>49=Trumid<SHO>52=20150410-15:25:58.148<SHO>56=SSGMdemo<SHO>7=1<SHO>16=0<SHO>10=174<SHO>
8=FIX.4.4<SHO>9=110<SHO>35=5<SHO>34=525<SHO>49=SSGMdemo<SHO>52=20150410-15:25:58.164<SHO>56=Trumid<SHO>58=MsgSeqNum too low, expecting 361 but received 1<SHO>10=195<SHO>
Firstly, what i want is, in 8=FIX.4.4 8 as column name, and FIX.4.4 as value of that column, in 9=66 9 should be column name and 66 would be value of that column and so on.... and there are so many rows in raw file like this.
Secondly, same thing for another row, and that data would append in next row of table in hive.
Now what should i do that i am not able to think.
Any help would be appriciable.
I would first create a tab-separated-file containing this data. I suggested to use a regex in the comments but if that is not your strong suit you can just split on the <SHO> tag and =. Since you did not specify the language you want to use I will suggest a 'solution' in Python.
The code below shows you how to write one of your input lines to a CSV file.
This can easily be extended to support multiple of these lines or to append lines to the CSV files once it is already created.
import csv
input = "8=FIX.4.4<SHO>9=85<SHO>35=A<SHO>34=524<SHO>49=SSGMdemo<SHO>52=20150410-15:25:55.795<SHO>56=Trumid<SHO>98=0<SHO>108=30<SHO>554=TruMid456<SHO>10=154<SHO>"
l = input.split('<SHO>')[:-1] # Don't include last element since it's empty
list_of_pairs = map(lambda x: tuple(x.split('=')),l)
d = dict(list_of_pairs)
with open('test.tsv', 'wb') as c:
cw = csv.writer(c, delimiter='\t')
cw.writerow(d.keys()) # Comment this if you don't want to have a header
cw.writerow(d.values())
What this code does is first split the input line on <SHO> meaning it creates a list of col=val strings. What I does next is create a list of tuple pairs where each tuple is (col,val).
Then it creates a dictionary from this, which is not strictly necessary but might help you if you want to extend the code for more lines.
Next I create a tab-separated-value file test.tsv containing a header and the values in the next line.
This means now you have a file which Hive can understand.
I am sure you can find a lot of articles on importing CSV or tab-separated-value files, but I will give you an example of a generic Hive query you can use to import this file once it is in HDFS.
CREATE TABLE if not exists [database].[table]
([Col1] Integer, [Col2] Integer, [Col3] String,...)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
TBLPROPERTIES('skip.header.line.count'='1');
LOAD DATA inpath '[HDFS path]'
overwrite INTO TABLE [database].[table];
Hope this gives you a better idea on how to proceed.
Copy the file to HDFS and create an external table with a single column (C8), then use the below select statement to extract each columns
create external table tablename(
c8 string )
STORED AS TEXTFILE
location 'HDFS path';
select regexp_extract(c8,'8=(.*?)<SHO>',1) as c8,
regexp_extract(c8,'9=(.*?)<SHO>',1) as c9,
regexp_extract(c8,'35=(.*?)<SHO>',1) as c35,
regexp_extract(c8,'34=(.*?)<SHO>',1) as c34,
regexp_extract(c8,'49=(.*?)<SHO>',1) as c49,
regexp_extract(c8,'52=(.*?)<SHO>',1) as c52,
regexp_extract(c8,'56=(.*?)<SHO>',1) as c56,
regexp_extract(c8,'98=(.*?)<SHO>',1) as c98,
regexp_extract(c8,'108=(.*?)<SHO>',1) as c108,
regexp_extract(c8,'554=(.*?)<SHO>',1) as c554,
regexp_extract(c8,'35=(.*?)<SHO>',1) as c10
from tablename
I have a fixed length file in HDFS on top of which i have to create external table using regex.
My file is something like this:
12piyush34stack10
13pankaj21abcde41
I want it to convert it into a table like:
key_column Value_column
---------- -----------------
1234stack 12piyush34stack10
1321stack 13pankaj21abcde41
I tried even by substr using insert but i am unable to point to key_columns.
Please help with solving this problem.
I don't know why you've used regexp external table but the way cannot workout so as to need also use another substring operation.
If me , I would make a regexp serde table then create two columns(key_column , Value_column) and just specify serde option as follows:
SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" ="(\d\d)\w{6}(\d\d).*",
"output.format.string" = "%1$s%2$sstack %0$s"
)
The output option will write the space separated data to corresponding columns by order.
Haven't yet test it , mind the back slash may not interpreted right in java.
I am currently using DB2 . I do not know much about load statement.
I am using this query to load data..
LOAD FROM "IXAC.CSV" OF DEL METHOD P ('IX',1,2,3,4,) MESSAGES
"SYAC.MSG" INSERT INTO SYNC.AC_COUNT ( "TYPE", AC1, AC2, AC3,
AC4 ) ; COMMIT;
In "IXAC.CSV" there are 4 int values separated with comma. My problem is that how can i insert 'IX' with load statement as a constant with each row insert.
I tried this but not found any success. I am newer in database.
Help me ..
Thanks in advance ...
Change your table definition in the database to have a default value for the column 'IX' (it looks like you want "TYPE"?).
Then do the load as normal, leaving out the IX column.
if you are able to edit the .csv file a workaround is that you can use a text editor (such as ultra edit) that supports wildcards or regular expressions in its find/replace feature and replace each carriage return/line feed with a CR/LF followed by "IX," (quotes optional depending on if you want to specify a text delimiter on insert). then your .csv file will have all your data.
I have a table with the following structure:
create table my_table (
id integer,
point Point -- UDT made of two integers (x, y)
)
and i have a CSV file with the following data:
#id, point
1|(3, 5)
2|(7, 2)
3|(6, 2)
now i want to bulk load this CSV into my table, but i cant find any information about how to handle the UDT in Oracle sqlldr util. Is is possible to use the bulk load util when having UDT columns?
I don't know if sqlldr can do this, but personally I would use an external table.
Attach the file as an external table (the file must be on the database server), and then insert the contents of the external table into the destination table transforming the UDT into two values as you go. The following select from dual should help you with the translation:
select
regexp_substr('(5, 678)', '[[:digit:]]+', 1, 1) x_point,
regexp_substr('(5, 678)', '[[:digit:]]+', 1, 2) y_point
from dual;
UPDATE
In sqlldr, you can transform fields using standard SQL expressions:
LOAD DATA
INFILE 'data.dat'
BADFILE 'bad_orders.txt'
APPEND
INTO TABLE test_tab
FIELDS TERMINATED BY "|"
( info,
x_cord "regexp_substr(:x_cord, '[[:digit:]]+', 1, 1)",
)
The control file above will extract the first digit in the fields like (3, 4), but I cannot find a way to extract the second digit - ie I am not sure if it is possible to have the same field in the input file inserted into two columns.
If external tables are not an option for you, I would suggest either (1) transform the file before loading, using sed, awk, Perl etc or (2) SQLLDR the file into a temporary table and then have a second process to trandform the data and insert into your final table. Another option is to look at how the file is generated - could you generate it so that the field you need to transform is repeated in two fields in the file, eg:
data|(1, 2)|(1, 2)
Maybe someone else will chip in with a way to get sqlldr to do what you want.
Solved the problem after more research, because Oracle SQL*Loader has this feature, and it is used by specifying a column object, the following was the solution:
LOAD DATA
INFILE *
INTO TABLE my_table
FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
id,
point column object
(
x,
y
)
)
BEGINDATA
1,3,5
2,7,2
3,6,2
If i have a CSV file that is in the following format
"fd!","sdf","dsfds","dsfd"
"fd!","asdf","dsfds","dsfd"
"fd","sdf","rdsfds","dsfd"
"fdd!","sdf","dsfds","fdsfd"
"fd!","sdf","dsfds","dsfd"
"fd","sdf","tdsfds","dsfd"
"fd!","sdf","dsfds","dsfd"
Is it possible to exclude any row where the first column has an exclamation mark at the end of the string.
i.e. it should only load the following rows
"fd","sdf","rdsfds","dsfd"
"fd","sdf","tdsfds","dsfd"
Thanks
According to the Loading Records Based on a Condition section of the SQL*Loader Control File Reference (11g):
"You can choose to load or discard a logical record by using the WHEN clause to test a condition in the record."
So you'd need something like this:
LOAD DATA ... INSERT INTO TABLE mytable WHEN mycol1 NOT LIKE '%!'
(mycol1.. ,mycol2 ..)
But the LIKE operator is not available! You only have = and !=
Maybe you could try an External Table instead.
I'd stick a CONSTRAINT on the table, and just let them be rejected. Maybe delete them after load. Or a unix "grep -v" to clear them out the file.