Insert unstructured records into database - oracle

I have a flat file with about half a million records in this format:
last_login=2014022
BPN=1234567890
first_last_names=portal admin
username=portal_admin
email=portal_admin#gmail.com
last_login=2010092
username=UCES1005
BPN=1001117643
email=deepak.prakash#pse
first_last_names=1026 BROAD ASSOCIATES
last_login=2014040
email=rgomes1#optonline.net
username=rgomes1
first_last_names=Robert Gomes
BPN=1001928140
I need to populate a table with these records. The first word is the column name and the second is the value. Each record is separated by a new line.
What is the best way or how do I import this data into a database? (Oracle or Access DB)

I would use plsql and read the file line by line.
Use:
UTL_FILE.FOPEN()
UTL_FILE.GET_LINE()
etc...
Process each line as needed.

Related

Is it possible to remove empty lines in oracle DB?

Our problem is probably regarding already added values in one of the tables in oracle db version 12.1.0
In CMS we are seeing that the values are without empty rows:
But when we copy this text and paste it in notepad for example, we got empty row (line breaks, blank row), after every line:
So, at this point, we are heading to a problem with data imported to the database. The data type of that field is VARCHAR 2000, we have around 10 000 records already in that table and half of them include these empty rows after pasting. Is there any chance that we can remove these empty lines in that column?
You can see in your dump that there is a sequence: 13,13,10 which is Carriage return, Carriage return + Line Feed.
https://www.petefreitag.com/item/863.cfm
If you replace 13,13,10 by 13,10 you should get the desired results.
replace(column_name,chr(13)||chr(13)||chr(10),chr(13)||chr(10))

How to insert data into table in following scenario?

I am newbie in hadoop and I have to add data into table in hive.
I have data from FIX4.4 protocol, something like this...
8=FIX.4.4<SHO>9=85<SHO>35=A<SHO>34=524<SHO>49=SSGMdemo<SHO>52=20150410-15:25:55.795<SHO>56=Trumid<SHO>98=0<SHO>108=30<SHO>554=TruMid456<SHO>10=154<SHO>
8=FIX.4.4<SHO>9=69<SHO>35=A<SHO>34=1<SHO>49=Trumid<SHO>52=20150410-15:25:58.148<SHO>56=SSGMdemo<SHO>98=0<SHO>108=30<SHO>10=093<SHO>
8=FIX.4.4<SHO>9=66<SHO>35=2<SHO>34=2<SHO>49=Trumid<SHO>52=20150410-15:25:58.148<SHO>56=SSGMdemo<SHO>7=1<SHO>16=0<SHO>10=174<SHO>
8=FIX.4.4<SHO>9=110<SHO>35=5<SHO>34=525<SHO>49=SSGMdemo<SHO>52=20150410-15:25:58.164<SHO>56=Trumid<SHO>58=MsgSeqNum too low, expecting 361 but received 1<SHO>10=195<SHO>
Firstly, what i want is, in 8=FIX.4.4 8 as column name, and FIX.4.4 as value of that column, in 9=66 9 should be column name and 66 would be value of that column and so on.... and there are so many rows in raw file like this.
Secondly, same thing for another row, and that data would append in next row of table in hive.
Now what should i do that i am not able to think.
Any help would be appriciable.
I would first create a tab-separated-file containing this data. I suggested to use a regex in the comments but if that is not your strong suit you can just split on the <SHO> tag and =. Since you did not specify the language you want to use I will suggest a 'solution' in Python.
The code below shows you how to write one of your input lines to a CSV file.
This can easily be extended to support multiple of these lines or to append lines to the CSV files once it is already created.
import csv
input = "8=FIX.4.4<SHO>9=85<SHO>35=A<SHO>34=524<SHO>49=SSGMdemo<SHO>52=20150410-15:25:55.795<SHO>56=Trumid<SHO>98=0<SHO>108=30<SHO>554=TruMid456<SHO>10=154<SHO>"
l = input.split('<SHO>')[:-1] # Don't include last element since it's empty
list_of_pairs = map(lambda x: tuple(x.split('=')),l)
d = dict(list_of_pairs)
with open('test.tsv', 'wb') as c:
cw = csv.writer(c, delimiter='\t')
cw.writerow(d.keys()) # Comment this if you don't want to have a header
cw.writerow(d.values())
What this code does is first split the input line on <SHO> meaning it creates a list of col=val strings. What I does next is create a list of tuple pairs where each tuple is (col,val).
Then it creates a dictionary from this, which is not strictly necessary but might help you if you want to extend the code for more lines.
Next I create a tab-separated-value file test.tsv containing a header and the values in the next line.
This means now you have a file which Hive can understand.
I am sure you can find a lot of articles on importing CSV or tab-separated-value files, but I will give you an example of a generic Hive query you can use to import this file once it is in HDFS.
CREATE TABLE if not exists [database].[table]
([Col1] Integer, [Col2] Integer, [Col3] String,...)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
TBLPROPERTIES('skip.header.line.count'='1');
LOAD DATA inpath '[HDFS path]'
overwrite INTO TABLE [database].[table];
Hope this gives you a better idea on how to proceed.
Copy the file to HDFS and create an external table with a single column (C8), then use the below select statement to extract each columns
create external table tablename(
c8 string )
STORED AS TEXTFILE
location 'HDFS path';
select regexp_extract(c8,'8=(.*?)<SHO>',1) as c8,
regexp_extract(c8,'9=(.*?)<SHO>',1) as c9,
regexp_extract(c8,'35=(.*?)<SHO>',1) as c35,
regexp_extract(c8,'34=(.*?)<SHO>',1) as c34,
regexp_extract(c8,'49=(.*?)<SHO>',1) as c49,
regexp_extract(c8,'52=(.*?)<SHO>',1) as c52,
regexp_extract(c8,'56=(.*?)<SHO>',1) as c56,
regexp_extract(c8,'98=(.*?)<SHO>',1) as c98,
regexp_extract(c8,'108=(.*?)<SHO>',1) as c108,
regexp_extract(c8,'554=(.*?)<SHO>',1) as c554,
regexp_extract(c8,'35=(.*?)<SHO>',1) as c10
from tablename

How to skip the last records using sql-loader?

We are using flat files.but how to skip the last records in flat files.
use the Following in your ctl File if your tail record always starts with T! Haven't tested it in a while but worked for me the last time i worked with loading a staging table.
LOAD DATA
INFILE 'file.dat' BADFILE 'file.bad' DISCARDFILE 'file.dis'
APPEND
INTO TABLE tablename
-- This will skip the tail record
WHEN (01) <> 'T'
(
column names);
You can skip the header rows using the SKIP clause but to skip the last records you will have to use the WHEN clause. Typically, your trailing records (last records) will not be identical to the other records in the file and there should be an indicator to specify that this is a trailer record. You need to construct such a condition in your control file that this condition does not get satisfied.
Here is the Oracle documentation on the WHEN clause.
http://docs.oracle.com/cd/B14117_01/server.101/b10825/ldr_control_file.htm#i1005657
Here are some examples on conditional loading.
http://www.orafaq.com/wiki/SQL*Loader#Conditional_Load
Your original post needs more detail but for completeness the control file options clause has a LOAD = n option that tells sqlldr how many rows to load. If you have 100 rows and don't want to load the last 5, specify LOAD = 95.

How to insert name of file and modified time using batch/shell script and sql loader

I have a requirement to insert bulk data into an Oracle database from a CSV file. Now table columns specs match those of the CSV file's header with the exception of three additional fields in database:
A Primary Key field (for which a simple SEQUENCE.NEXTVAL is called)
A field for the name of the CSV file
A field for the last modified date+time of the file
The following stack question address an extra column issue, but the solution is pretty easy because it used Oracle sysdate which is internally available. I need to pass a parameter from either batch script/shell script.
Insert actual date time in a row with SQL*loader
Can PARFILE help here somehow?
My other alternative would be to do the whole task in two steps by writing a small java code:
Use SQL Loader for bulk upload leaving out data for the filename and
modified time
And then run a separate update statement to populate the newly
created rows
But I'm looking for something which will get the job done in one shot. Any advice??
I'm affraid it's not possible with sqlldr alone.
There is no tools for this in sqlldr.
You'd need some sort of script or a program to dynamically create a .ctl file for each load.
Here is a bash script to help you get started:
#!/bin/bash -xv
readonly MY_FILENAME=$1
readonly DB_BUF_TABLE=$2
readonly SQLLDR_CTL="LOAD DATA
CHARACTERSET UTF8
APPEND INTO TABLE $DB_BUF_TABLE
FIELDS TERMINATED BY ';'(
filename \"$MY_FILE_NAME\",
col_foo,
col_bar
)"
echo "$SQLLDR_CTL" > "loader.ctl"
sqlldr control=loader.ctl parfile=loader.par data="$MY_FILENAME"
sqlldrReturnValue=$?
You'd needsome locking with this.. or path separation for concurrent loads to be sure sqlldr starts with proper ctl file

Oracle sqlload : split a source field in several columns?

i have a source file i want to load through sqlload in my Oracle 10g
the problem is one of the source field can be larger than 4000 character. Is it possible to tell oracle to split a source field in several columns ?
let's say one column would have 4000 first character and the second one the 4000 next
Thanks
I'd load it into a CLOB and then do the splitting (if necessary) using DBMS_LOB.SUBSTR over on the database side. But is there a critical business reason to get it into multiple varchar2 columns, or could it just stay in the CLOB?

Resources