I have to insert multiple row into a table from a file structured like this:
BANAC2C100017701007_X75 _CA 4X2 CT MLCR DR SX EP 160 E4
where 4x2, MLCR, 160 E4 have to be inserted into the same column for the same code BANAC2C100017701007. As example, the table should be structured like this:
After to split the elements from the file, how can I put them into the table? Any suggestion?
It can be done with sqlldr. I have made some assumptions, but if the data is one row per line as you describe above, with the same number of elements a line, a properly constructed control file with multiple "into table" statements can write different parts of one row of data as multiple rows to the same table.
The control file:
LOAD DATA
infile "file.dat"
TRUNCATE
INTO TABLE data_table
(entirerow BOUNDFILLER char(4000)
,code expression "regexp_substr(:entirerow, '(.*?)(_)', 1, 1, NULL, 1)"
,desc expression "regexp_substr(:entirerow, '(.*?)( +)', 1, 3, NULL, 1)"
)
INTO TABLE data_table
(entirerow BOUNDFILLER position(1) char(4000)
,code expression "regexp_substr(:entirerow, '(.*?)(_)', 1, 1, NULL, 1)"
,desc expression "regexp_substr(:entirerow, '(.*?)( +)', 1, 5, NULL, 1)"
)
INTO TABLE data_table
(entirerow BOUNDFILLER position(1) char(4000)
,code expression "regexp_substr(:entirerow, '(.*?)(_)', 1, 1, NULL, 1)"
,desc expression "regexp_substr(:entirerow, '(.*?)( +)', 1, 9, NULL, 1) || ' ' ||
regexp_substr(:entirerow, '(.*?)( +|$)', 1, 10, NULL, 1)"
)
A couple of things to note:
Since there are no delimiters, and none are specified, the entire row will be read into the first field "entirerow". Since it is not a column in the table, and it is defined as BOUNDFILLER, it is "remembered" for use later.
The next field "code" is found in the control file. No data field exists to match it with, but sqlldr finds it matches a column in the table and sees it is an expression so it applies the expression with the intention of putting the result into the column. The expression uses REGEXP_SUBSTR against the remembered BOUNDFILLER to cut out the parts we need. For code, get the characters up to but not including the first underscore. For desc, get the 3rd set of characters that are followed by one or more spaces (but not the spaces).
For the second logical row, we need to re-position the logical pointer back to the beginning of the row read in so sqlldr can re-process. Otherwise the logical pointer is at the end of the data and nothing will be returned. This is done with the "position" parameter seen in the "entirerow" definition of the 2nd and 3rd "into table" statements. The last "into table" follows the previous paradigm of just getting the 9th and 10th fields and concatenating them together. I chose to do this rather than come up with another regex to do it as it keeps consistency with the other fields, plus if you want to change it in the future it will be easier to follow.
As you can see it works and is reusable:
SQL> select code, desc
from data_table;
CODE DESC
------------------------- -------------
BANAC2C100017701007 4X2
BANAC2C100017701007 MLCR
BANAC2C100017701007 160 E4
Possible caveat: each row is being scanned 3 times, and the regexp calls are expensive so depending on the amount of data you need to load this may not be a feasible solution for your situation.
Related
I have column with value (200ML) and I need to separate the (ML) from the column !
I assume you mean that you want the leading numeric portion and the trailing alpha portion of the string '200ML' to be returned as separate columns. If that's correct you can use REGEXP_SUBSTR to do this:
SELECT REGEXP_SUBSTR(TEXT_STRING, '^[0-9]+', 1, 1) AS LEAD_NUMERIC,
REGEXP_SUBSTR(TEXT_STRING, '[A-Za-z]+$', 1, 1) AS TRAILING_ALPHA
FROM TABLE_A
db<>fiddle here
My source table looks like this:
id|value|count
Value is a String of values separated by semicolons(;). For example it may look like this
A;B;C;D;
Some may not have values at a certain position, like this
A;;;D;
First, I've selectively moved records to a new table(targettable) based on positions with values using regexp. I achieved this by using [^;]+; for having some value between the semicolons, and [^;]*; for those positions I don't care about. For example, if I wanted the 1st and 4th place to have values, I could incorporate regexp with insert into like this
insert into
targettable tt (id, value, count)
SELECT some_seq.nextval,value, count
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
so now my new table has a list of records that have values at the 1st and 4th position. It may look like this
1|A;B;C;D;|2
2|B;;;E;|1
3|A;D;;D|3
Next there are 2 things I want to do. 1. get rid of values other than 1st and 4th. 2.combine identical values and add up their count. For example, record 1 and 3 are the same, so I want to trim so they become A;D;, and then add their count, so 2+3=5. Now my new table looks like this
1|A;D;|5
2|B;E;|1
As long as I can somehow get to the final table from source table, I don't care about the steps. The intermediate table is not required, but it may help me achieve the final result. I'm not sure if I can go any further with Orcale though. If not, I'll have to move and process the records with Java. Bear in mind I have millions of records, so I would consider the Oracle method if it is possible.
You should be able to skip the intermediate table; just extract the 1st and 4th elements, using the regexp_substr() function, while checking that those are not null:
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null;
VALUE COUNT
------------------ ----------
A;D; 2
B;E; 1
A;D; 3
and then aggregate those results:
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value;
VALUE COUNT
------------------ ----------
A;D; 5
B;E; 1
Then for your insert you can use that query, either with an auto-increment ID (12c+), or setting an ID from a sequence via a trigger, or possibly wrapped in another level of subquery to get the value explicitly:
insert into target (id, value, count)
select some_seq.nextval, value, count
from (
select value, sum(count) as count
from (
select regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) -- first position
|| ';' || regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) -- fourth position
|| ';' as value, -- if you want trailing semicolon
count
from source
where regexp_substr(value, '(.*?)(;|$)', 1, 1, null, 1) is not null
and regexp_substr(value, '(.*?)(;|$)', 1, 4, null, 1) is not null
)
group by value
);
If you're creating a new sequence to do that, so they start from 1, you can use rownum or row_number() instead.
Incidentally, using a keyword or a function name like count as a column name is confusing (sum(count) !?); those might not be your real names though.
I would use regexp_replace to remove the 2nd and 3rd parts of the string, combined with an aggregate query to get the total count, like :
SELECT
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', ''),
SUM(count)
FROM source table
WHERE
regexp_like(value, '^[^;]+;[^;]*;[^;]*;[^;]+;')
GROUP BY
regexp_replace(value, '^[^;]+;([^;]*;[^;]*;)[^;]+;', '')
In my table one of column i have a value like below
Y-1
Y-2
Y-3
Y-4
Y-5
Y-6
Y-7
Y-8
Y-9
Y-10
Y-11
Y-12
Y-13
Y-14
when i am order by this column its working fine if the row has value up to Y-9 other wise my result is wrong like below.
Y-1
Y-10
Y-11
Y-12
Y-13
Y-14
Y-2
Y-3
Y-4
Y-5
Y-6
Y-7
Y-8
Y-9
But i want the output like below
Y-1
Y-2
Y-3
Y-4
Y-5
Y-6
Y-7
Y-8
Y-9
Y-10
Y-11
Y-12
Y-13
Y-14
How to acheive the above result.i am using oracle database.Any help will be greatly appreciated!!!!!
Assuming the data is in a table t with a column col and the structure is an alphabetical string followed by dash followed by a number, and both the alphabetical and the number are always not NULL, then:
select col from t
order by substr(col, 1, instr(col, '-')), to_number(substr(col, instr(col, '-')+1))
You can use an order by manipulatinng the column content and cast to number eg:
order by substr(col1, 1,2), TO_NUMBER(sustr(col1, 3,10))
I think the good way is to get constant length field
select col from t
order by substr(col, 1, 2)|| lpad(substr(col, 3),5,'0')
it will correct work only with two nondigit simbol in begining of string up to 99999 number
In my table I have essentially 2 columns (many more but there is an obvious left side and right side). One of the fields, FIELD1, on the left side comes from a LookupSet() and each ID can have two items from FIELD1. Using a join(lookupset(), vbcrlf) I am able to get both values for the ID and put it into one cell in the table. This works but with the vbcrlf, the row height is increased. This causes a problem because there is data on the right side which cannot have additional space between it.
I did a split(join(lookupset())).getValue(0) for the first row and then the row below it for value 1. With some iif statements to check errors etc, this works.
One problem I solved is that the values for FIELD1 can be longer than the width of the cell, but will not take up more than two. I was able to do some substring magic in oracle like:
SELECT ID, SUBSTR(FIELD1, 1, 70) FROM....
UNION
SELECT ID, SUBSTR(FIELD1, 70) FROM ...
using sorting, I am able to get up to 4 rows of data per ID which I will be able to split the lookupset and get each value into the table.
My last problem, and hopefully someone can help with is that when I ge the substring, It can cut off words and the next row will start with the rest of the word.
Is there a way, possibly using a regex to make sure to keep words in tact, but also to ensure that the total length returned is < some number of characters? I am happy to abandon any part of the approach i am currently taking if I am off track.
I was able to solve this (with some help) by using
SUBSTR(FIELD1,1, regexp_instr(FIELD1, '[ ]', 70))
SELECT ID, SUBSTR(FIELD1,1, regexp_instr(FIELD1, '[ ]', 70)) FROM....
UNION
SELECT ID, SUBSTR(FIELD1, regexp_instr(FIELD1, '[ ]', 70)) FROM ...
I am loading some data to Oracle via SQLLDR. The source file is "pipe delimited".
FIELDS TERMINATED BY '|'
But some records contain pipe character in data, and not as separator. So it breaks correct loading of records as it understands indata pipe characters as field terminator.
Can you point me a direction to solve this issue?
Data file is about 9 GB, so it is hard to edit manually.
For example,
Loaded row:
ABC|1234567|STR 9 R 25|98734959,32|28.12.2011
Rejected Row:
DE4|2346543|WE| 454|956584,84|28.11.2011
Error:
Rejected - Error on table HSX, column DATE_N.
ORA-01847: day of month must be between 1 and last day of month
DATE_N column is the last one.
You could not use any separator, and do something like:
field FILLER,
col1 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\1')",
col2 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\2')",
col3 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\3')",
col4 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\4')",
col5 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\5')",
col6 EXPRESSION "REGEXP_REPLACE(:field,'^([^|]*)\\|([^|]*)\\|(.*)\\|([^|]*)\\|([^|]*)\\|([^|]*)$', '\\6')"
This regexp takes six capture groups (inside parentheses) separated by a vertical bar (I had to escape it because otherwise it means OR in regexp). All groups except the third cannot contain a vertical bar ([^|]*), the third group may contain anything (.*), and the regexp must span from beginning to end of the line (^ and $).
This way we are sure that the third group will eat all superfluous separators. This only works because you've only one field that may contain separators. If you want to proofcheck you can for example specify that the fourth group starts with a digit (include \d at the beginning of the fourth parenthesized block).
I have doubled all backslashes because we are inside a double-quoted expression, but I am not really sure that I ought to.
It looks to me that it's not really possible for SQL*Loader to handle your file because of the third field which: can contain the delimiter, is not surrounded by quotes and is of a variable length. Instead, if the data you provide is an accurate example then I can provide a sample workaround. First, create a table with one column of VARCHAR2 with length the same as the maximum length of any one line in your file. Then just load the entire file into this table. From there you can extract each column with a query such as:
with CTE as
(select 'ABC|1234567|STR 9 R 25|98734959,32|28.12.2011' as CTETXT
from dual
union all
select 'DE4|2346543|WE| 454|956584,84|28.11.2011' from dual)
select substr(CTETXT, 1, instr(CTETXT, '|') - 1) as COL1
,substr(CTETXT
,instr(CTETXT, '|', 1, 1) + 1
,instr(CTETXT, '|', 1, 2) - instr(CTETXT, '|', 1, 1) - 1)
as COL2
,substr(CTETXT
,instr(CTETXT, '|', 1, 2) + 1
,instr(CTETXT, '|', -1, 1) - instr(CTETXT, '|', 1, 2) - 1)
as COL3
,substr(CTETXT, instr(CTETXT, '|', -1, 1) + 1) as COL4
from CTE
It's not perfect (though it may be adaptable to SQL*Loader) but would need a bit of work if you have more columns or if your third field is not what I think it is. But, it's a start.
OK, I recomend you to parse the file and replace the delimiter.
In command line in Unix/linux you should do:
cat current_file | awk -F'|' '{printf( "%s,%s,", $1, $2); for(k=3;k<NF-2;k++) printf("%s|", $k); printf("%s,%s,%s", $(NF-2),$(NF-1),$NF);print "";}' > new_file
This command will not change your current file.
Will create a new file, comma delimited, with five fields.
It splits the input file on "|" and take first, second, anything to antelast, antelast, and last chunk.
You can try to sqlldr the new_file with "," delimiter.
UPDATE:
The command can be put in a script like (and named parse.awk)
#!/usr/bin/awk
# parse.awk
BEGIN {FS="|"}
{
printf("%s,%s,", $1, $2);
for(k=3;k<NF-2;k++)
printf("%s|", $k);
printf("%s,%s,%s\n", $(NF-2),$(NF-1),$NF);
}
and you can run in this way:
cat current_file | awk -f parse.awk > new_file