I'm trying to find a good way to check if data I'm given through a bulk load (SQLLoader) already exists in my data set so I don't load it again.
Currently we have a set up like this,
TableA
col1, col2, bulkLoadName
This table would contain data like,
col1, col2, bulkLoadName
"Joe", 35, "Load1"
"Tim", 65, "Load1"
"Ray", 95, "Load1"
"Joe", 35, "Load2"
And I'd like to change it to,
TableA
PK, col1, col2
TableAtoBulkLoadName
PK, TABLEA_PK, BulkLoadName_PK
BulkLoadName
PK, bulkLoadName
Where the data would look like,
PK, col1, col2
1, "Joe", 35
2, "Tim", 65
3, "Ray", 95
PK, TABLEA_PK, BulkLoadName_PK
1, 1, 1
2, 2, 1
3, 3, 1
4, 1, 2
PK, bulkLoadName
1, "Load1"
2, "Load2"
This normalizes the data so I can easily check for a specific load without a string search and MOST importantly prevents me from loading duplicate data in the database just because something is defined again in a later load.
I'm having trouble deciding how I should implement the duplicate checks. I'm not well verse with SQL and need a solution that is in ORACLE 11g. I've looked and I've come up with 2 possible solutions...
Solution 1:
Use a temp table to store the bulk load and run a stored procedure once loaded to check.
Solution 2:
Use a MERGE clause on the TableA that adds new records to TableA or creates a new intersection record in TableAtoBulkLoadName if the record already exists.
My questions now that all of the background info is out there is what are the pro's and con's with these approaches? Is this kind of normalization normal? Are there standard ways of doing this sort of thing?
Thanks!
Strictly from a performance stand point, if you can do everything in one statement, that's usually better.
But as soon as you start to transform the data in various ways, I personally find that by using a staging table, the resulting code is a lot easier to read and modify.
Related
I have one Oracle DB with ~40 tables. Some of them have IDs = 1, 2, 3, 4, 5... and constraints.
Now I want to "copy" this data from all tables to another Oracle DB which already has the same tables.
The problem is that another DB also has records (can be the same IDs = 1, 2, 3, 77, 88...) and I don't want to lose them.
Is there some automated way to copy data from one table to another with IDs shifting and constraints?
1, 2, 3, 77, 88 +
**1, 2, 3, 4, 5**
=
1, 2, 3, 77, 88, **89, 90, 91, 92, 93**
Or I need to do it by myself?
insert into new.table
select new.sequence_id.nextval, t.* from old.table t
save new.id - old.id mapping and etc etc etc for all 40 tables?
That's a bit dirty solution but if all IDs are numeric you can first update old IDs to negative number ID = -1 * ID (or just do it in select statement on the fly) then do insert. In that case you have all your IDs consistent, constraints are valid and they can live together with new data.
Firs, you need expdp, is second you ned remap schema new schema name in impdp
I'm sorry but I don't know how to explain exactly what I'm asking with words, so here's an example:
http://sqlfiddle.com/#!15/2564a/1/1
create table section(id serial primary key, name text not null);
create table book(id serial primary key, name text not null,
section_id integer not null references section(id));
create table author(id serial primary key, name text not null);
create table author_books(
author_id integer not null references author(id),
book_id integer not null references book(id),
unique(author_id, book_id)
);
create index on book(name);
create index on book(section_id);
create index on author(name);
create index on author_books(author_id, book_id);
insert into section(name) values ('Romance'), ('Terror');
insert into book(name, section_id) values ('Wonderful World', 1), ('Terrible World', 1), ('Simple World', 1), ('Irrelevant', 2);
insert into author(name) values ('Jill'), ('Mark'), ('Tim');
insert into author_books values (1, 1), (2, 1), (3, 1), (1, 2), (3, 2), (3, 3), (3, 4);
select b.section_id, b.name, a.name from book b
join author_books ab on b.id=ab.book_id
join author a on a.id=ab.author_id;
select distinct s.name from section s
join book b on b.section_id=s.id
join author_books ab on b.id=ab.book_id
join author a on a.id=ab.author_id
where a.name in ('Jill', 'Tim')
group by s.id
having count(distinct a.name) >= 2;
This query brings the expected result, however I'm interested in knowing whether it's possible to change it to perform better somehow. For example, it's not clear to me what PostgreSQL will do in this case. For example, after evaluating the first book in Romance section that matches the criteria ideally it should skip the processing for any other books in the Romance section to speed up the query execution. Also, as soon as it finds Jill and Tim authors it should probably stop processing the other author checks since it already has the count(distinct a.name) >= 2 condition met.
Is there any way to help PG to apply such optimizations with changes in the query?
Just to be clear, the query's intention is to find all sections where there's at least one book written by both Jill and Tim authors at least.
The sort of nitpicky optimizations you're talking about are the sort of thing the query engine is meant to keep out of your hands. It's helpful to think of databases as operating on sets of rows rather than as inspecting results row by row: the engine retrieves all section rows, generates the product of section with book, discards all rows that fail to satisfy the JOIN predicate, and so on. The query planner can optimize things ahead of time by switching the order of operations around to minimize the number of rows it has to deal with overall, but it's not going to stop in the middle.
Indexing your foreign keys will help; indexing author.name will help; indexing section.name is probably pointless.
You could also create a materialized view from the query, if performance is more important than the results always being current.
Please suppose you have an Oracle table, TABLE A, as follows:
In this table the main fields are FIELD1 and FIELD2. You can see that:
a) For the couple (AAA, 1) we have two values: 200.03 and 100.02;
b) For the couple (BBB, 3) we have two values: 300.04 and 400.05.
We would like to make a sum aggregation as follows, updating the following table:
In field3 of table B, we would like to store the sum of 200.03 and 100.02 with reference to the couple (AAA, 1), and we would like to store the sum of 300.04 and 400.05 with reference to the couple (BBB, 3).
Please imagine that we could have many different couples:
(ZZZ, 77)
(YYY, 12)
... and so on.
Please suppose that the record referred to a single couple may be more than two, in which case we should sum the values of all the records regarding the same couple.
In our simple case, the result will be the following:
The real case has a table A with about 20 million of records, so I would like to write the software in PL/SQL using BULK COLLECT, UPDATE and FORALL.
What would be the best approach? Please provide PL/SQL code in order to explain how to solve the problem.
Thank you very much for considering my request.
Frankly I wouldn't use BULK COLLECT and FORALL here - I'd use a MERGE statement. Try something like
MERGE INTO TABLE_B b
USING (SELECT FIELD1, FIELD2, SUM(FIELD3) AS TOTAL_FIELD3
FROM TABLE_A
GROUP BY FIELD1, FIELD2) a
ON (b.FIELD1 = a.FIELD1 AND
b.FIELD2 = a.FIELD2)
WHEN NOT MATCHED THEN
INSERT (FIELD1, FIELD2, FIELD3)
VALUES (a.FIELD1, a.FIELD2, a.TOTAL_FIELD3)
WHEN MATCHED THEN
UPDATE
SET FIELD3 = a.TOTAL_FIELD3;
Best of luck.
I am trying to create a 360 questionnaire dynamically using a classic report in Oracle Apex. Got the first part to work nicely using the following:
SELECT q.display_text,
apex_item.radiogroup(rownum, 1, a.answer, null, null, null, null) "ineffective",
apex_item.radiogroup(rownum, 2, a.answer, null, null, null, null) "sometimes"
FROM xxpay_360_questions q,
xxpay_360_answers a
where a.question_id (+) = q.question_id
and a.user_name (+) = :APP_USER
order by q.questionnaire_id,
q.display_sequence
This outputs 3 report columns. The first one is the question and the second two are the horizontal radio buttons to select answer 1 or 2. The 360 questionnaire also needs sections and sub sections and some textarea questions. For those I would like to merge the 3 report columns into 1 column (akin to colspan=1). I would probably need to output them using a union in the above select, but I'm not sure how to dynamically output a colspan and a single report column value.
Note that I am using theme 20 in order to get the Oracle Applications look and this uses table layout.
Anyone know how to output a single report column instead of 3 for some rows and then colspan=1 it? Changing the font for the section and sub-section would be a bonus.
Not sure whether css can do colspan when using table layout.
Add a column to your report query that will serve as a flag for the
3 in 1 column rows.
Create a new report template, make it a "named column" style.
Create the two different column formats you want using the #COLUMN_NAME# token for columns.
Set the condition for each of these two formats using the value of your new flag column.
The added benefit to this is you can now use HTML to do whatever formatting you end up needing later.
I have made following.
Test table:
create table tst as
select 1 a, 2 b, 3 c, 4 d, 5 e from dual
union all
select 11, 12, 13, null, null from dual
union all
select 21, 22, 23, 24, 25 from dual;
Region source:
select a, b,
case when d is null and e is null then
'<td colspan="3">' || c || '</td>'
else '<td>' || c || '</td><td>' || d || '</td><td>' || e || '</td>'
end merged_column
from tst
Report properties: Display as - Standard Display Column, Heading of a column merged_column:
<th>C</th><th>D</th><th>E</th>
Result looks like this:
Maybe it is not so cool and useful example, but cells in second row look merged (and they are really merged, of course). Also it is impossible to sort by columns 4 and 5, and you need manually align text there.
Sorry, can't give a link to the page, apex.oracle.com upgraded to version 5.0, version 4.2 is unavailable now.
I wish I had more time to work up a proper example, but you can use Oracle's LISTAGG function to group the answers into a single row per question and add some HTML tags for styling. Generally speaking, I generate something like this:
<SPAN TITLE="Some help text">Some question text?</SPAN>
<UL>
<LI>[RADIO group1 value1] radio_label1</LI>
<LI>[RADIO group1 value2] radio_label2</LI>
</UL>
Hopefully, you can use this example as a starting point to coding what you want.
As an aside, I should point out one potential issue with what you are trying to do. You may already be aware of it but, via this method, Apex is limited to displaying no more than 50 questions at a time because the value for p_idx has to be a whole number between 1 and 50. (Source: Apex documentation) You can work within that limitation, but being aware of the issue from the start is much easier than discovering it half-way through.
Good Luck!
I have a row that is a varchar(50) that has a unique constraint and i would like to get the next unique number for an new insert but with a given prefix.
My rows could look like this:
ID (varchar)
00010001
00010002
00010003
00080001
So if I would like to get the next unqiue number from the prefix "0001" it would be "00010004" but if I would want it for the prefix "0008" it would be "00080002".
There will be more then 1 millon entries in this table. Is there a way with Oracle 11 to perform this kind of operation that is fairly fast?
I know that this setup is totaly insane but this is what I have to work with. I cant create any new tables etc.
You can search for the max value of the specified prefix and increment it:
SQL> WITH DATA AS (
2 SELECT '00010001' id FROM DUAL UNION ALL
3 SELECT '00010002' id FROM DUAL UNION ALL
4 SELECT '00010003' id FROM DUAL UNION ALL
5 SELECT '00080001' id FROM DUAL
6 )
7 SELECT :prefix || to_char(MAX(to_number(substr(id, 5)))+1, 'fm0000') nextval
8 FROM DATA
9 WHERE ID LIKE :prefix || '%';
NEXTVAL
---------
00010004
I'm sure you're aware that this is an inefficient method to generate a primary key. Furthermore it won't play nicely in a multi-user environment and thus won't scale. Concurrent inserts will wait then fail since there is a UNIQUE constraint on the column.
If the prefix is always the same length, you can reduce the workload somewhat: you could create a specialized index that would find the max value in a minimum number of steps:
CREATE INDEX ix_fetch_max ON your_table (substr(id, 1, 4),
substr(id, 5) DESC);
Then the following query could use the index and will stop at the first row retrieved:
SELECT id
FROM (SELECT substr(id, 1, 4) || substr(id, 5) id
FROM your_table
WHERE substr(id, 1, 4) = :prefix
ORDER BY substr(id, 5) DESC)
WHERE rownum = 1
If you need to do simultaneous inserts with the same prefix, I suggest you use DBMS_LOCK to request a lock on the specified newID. If the call fails because someone is already inserting this value, try with newID+1. Although this involves more work than traditional sequence, at least your inserts won't wait on each others (potentially leading to deadlocks).
This is a very unsatisfactory situation for you. As other posters have pointed out - if you don't use sequences then you will almost certainly have concurrency issues. I mentioned in a comment the possibility that you live with big gaps. This is the simplest solution but you will run out of numbers after 9999 inserts.
Perhaps an alternative would be to create a separate sequence for each prefix. This would only really be practical if the number of prefixes is fairly low but it could be done.
ps - your requirement that > 1000000 records should be possible may, in fact, mean you have no choice but to redesign the database.
SELECT to_char(to_number(max(id)) + 1, '00000000')
FROM mytable
WHERE id LIKE '0001%'
SQLFiddle demo here http://sqlfiddle.com/#!4/4f543/5/0