I am trying to insert 1000,000 rows into a table in Oracle APEX but there is a 32k limit and I can only insert 4000 rows at a time. Is there a way to get round this limit. Otherwise I'm gonna be doing inserts for the rest of my life.
Cheers
Brian
example script
INSERT ALL
INTO staff (staffid, forename, familyName, jobRole) values (staffid_auto_incr.nextval, 'Raymond', 'Mccoy', 'manager')
INTO staff (staffid, forename, familyName, jobRole) values (staffid_auto_incr.nextval, 'Janice', 'Cunningham', 'admin')
INTO staff (staffid, forename, familyName, jobRole) values (staffid_auto_incr.nextval, 'Christine', 'Reyes', 'sales')
etc.etc.
select * from dual
Related
I need to pull humongous amount of data, say 600-700 variables from different tables in a data warehouse...now the dataset in its raw form will easily touch 150 gigs - 79 MM rows and for my analysis purpose I need only a million rows...how can I pull data using proc sql directly from warehouse by doing simple random sampling on the rows.
Below code wont work as ranuni is not supported by oracle
proc sql outobs =1000000;
select * from connection to oracle(
select * from tbl1 order by ranuni(12345);
quit;
How do you propose I do it
Use the DBMS_RANDOM Package to Sort Records and Then Use A Row Limiting Clause to Restrict to the Desired Sample Size
The dbms_random.value function obtains a random number between 0 and 1 for all rows in the table and we sort in ascending order of the random value.
Here is how to produce the sample set you identified:
SELECT
*
FROM
(
SELECT
*
FROM
tbl1
ORDER BY dbms_random.value
)
FETCH FIRST 1000000 ROWS ONLY;
To demonstrate with the sample schema table, emp, we sample 4 records:
SCOTT#DEV> SELECT
2 empno,
3 rnd_val
4 FROM
5 (
6 SELECT
7 empno,
8 dbms_random.value rnd_val
9 FROM
10 emp
11 ORDER BY rnd_val
12 )
13 FETCH FIRST 4 ROWS ONLY;
EMPNO RND_VAL
7698 0.06857749035643605682648168347885993709
7934 0.07529612360785920635181751566833986766
7902 0.13618520865865754766175030040204331697
7654 0.14056380246495282237607922497308953768
SCOTT#DEV> SELECT
2 empno,
3 rnd_val
4 FROM
5 (
6 SELECT
7 empno,
8 dbms_random.value rnd_val
9 FROM
10 emp
11 ORDER BY rnd_val
12 )
13 FETCH FIRST 4 ROWS ONLY;
EMPNO RND_VAL
7839 0.00430658806761508024693197916281775492
7499 0.02188116061148367312927392115186317884
7782 0.10606515700372416131060633064729870016
7788 0.27865276349549877512032787966777990909
With the example above, notice that the empno changes significantly during the execution of the SQL*Plus command.
The performance might be an issue with the row counts you are describing.
EDIT:
With table sizes in the order of 150 gigs - 79 MM, any sorting would be painful.
If the table had a surrogate key based on a sequence incremented by 1, we could take the approach of selecting every nth record based on the key.
e.g.
--scenario n = 3000
FROM
tbl1
WHERE
mod(table_id, 3000) = 0;
This approach would not use an index (unless a function based index is created), but at least we are not performing a sort on a data set of this size.
I performed an explain plan with a table that has close to 80 million records and it does perform a full table scan (the condition forces this without a function based index) but this looks tenable.
None of the answers posted or comments helped my cause, it could but we have 87 MM rows
Now I wanted the answer with the help of sas: here is what I did: and it works. Thanks all!
libname dwh path username pwd;
proc sql;
create table sample as
(select
<all the variables>, ranuni(any arbitrary seed)
from dwh.<all the tables>
<bunch of where conditions goes here>);
quit);
I have a table Student which has name and ratings year wise.
Name Year Rating
Ram 2016 10
Sam 2016 9
Ram 2014 8
Sam 2012 7
I need to find the previous rating of the employee which could be last year or some years before.
The query should return below results
Name Cur_rating_year_2016 Prev_rating
Ram 10 8
Sam 9 7
Below is the script for insert and create
Create table Student (name varchar2(10), year number, rating number );
insert into student values('Ram' ,2016 ,10);
insert into student values('Sam' ,2016 ,9);
insert into student values('Sam' ,2012 ,7);
insert into student values('Ram' ,2014 ,8);
Is there a way to achieve this using select query?
Use LAG analytical function https://docs.oracle.com/database/122/SQLRF/LAG.htm#SQLRF00652
LAG is an analytic function. It provides access to more than one row
of a table at the same time without a self join. Given a series of
rows returned from a query and a position of the cursor, LAG provides
access to a row at a given physical offset prior to that position.
For the optional offset argument, specify an integer that is greater
than zero. If you do not specify offset, then its default is 1. The
optional default value is returned if the offset goes beyond the scope
of the window. If you do not specify default, then its default is
null.
SELECT stud_name AS name,
r_year AS year,
r_value AS rating,
lag(r_value, 1, NULL) OVER(PARTITION BY stud_name ORDER BY r_year) AS prev_rating
FROM stud_r
ORDER BY stud_name;
Try as:
SELECT A.NAME,A.RATING,B.RATING FROM
STUDENTS A INNER JOIN STUDENTS B
ON A.NAME=B.NAME
WHERE A.YEAR='2016' AND B.YEAR<>'2016'
ORDER BY A.NAME ASC
I am currently doing some testing and am in the need for a large amount of data (around 1 million rows)
I am using the following table:
CREATE TABLE OrderTable(
OrderID INTEGER NOT NULL,
StaffID INTEGER,
TotalOrderValue DECIMAL (8,2)
CustomerID INTEGER);
ALTER TABLE OrderTable ADD CONSTRAINT OrderID_PK PRIMARY KEY (OrderID)
CREATE SEQUENCE seq_OrderTable
MINVALUE 1
START WITH 1
INCREMENT BY 1
CACHE 10000;
and want to randomly insert 1000000 rows into it with the following rules:
OrderID needs to be be sequential (1, 2, 3 etc...)
StaffID needs to be a random number between 1 and 1000
CustomerID needs to be a random number between 1 and 10000
TotalOrderValue needs to be a random decimal value between 0.00 and 9999.99
Is this even possible to do? I can I could generate all of these using this update statement? however generating a million rows in 1 go I am not sure on how to do this
Thanks for any help on this matter
This is how i would randomly generate the number on update:
UPDATE StaffTable SET DepartmentID = DBMS_RANDOM.value(low => 1, high => 5);
For testing purposes I created the table and populated it in one shot, with this query:
CREATE TABLE OrderTable(OrderID, StaffID, CustomerID, TotalOrderValue)
as (select level, ceil(dbms_random.value(0, 1000)),
ceil(dbms_random.value(0,10000)),
round(dbms_random.value(0,10000),2)
from dual
connect by level <= 1000000)
/
A few notes - it is better to use NUMBER as data type, NUMBER(8,2) is the format for decimal. It is much more efficient for populating this kind of table to use the "hierarchical query without PRIOR" trick (the "connect by level <= ..." trick) to get the order ID's.
If your table is created already, insert into OrderTable (select level...) (same subquery as in my code) should work just as well. You may be better off adding the PK constraint only after you create the data though, so as not to slow things down.
A small sample from the table created (total time to create the table on my cheap laptop - 1,000,000 rows - was 7.6 seconds):
SQL> select * from OrderTable where orderid between 500020 and 500030;
ORDERID STAFFID CUSTOMERID TOTALORDERVALUE
---------- ---------- ---------- ---------------
500020 666 879 6068.63
500021 189 6444 1323.82
500022 533 2609 1847.21
500023 409 895 207.88
500024 80 2125 1314.13
500025 247 3772 5081.62
500026 922 9523 1160.38
500027 818 5197 5009.02
500028 393 6870 5067.81
500029 358 4063 858.44
500030 316 8134 3479.47
I have a query requirement from ----. Trying to solve it with CONNECT BY, but can't seem to get the results I need.
Table (simplified):
create table CSS.USER_DESC (
USER_ID VARCHAR2(30) not null,
NEW_USER_ID VARCHAR2(30),
GLOBAL_HR_ID CHAR(8)
)
-- USER_ID is the primary key
-- NEW_USER_ID is a self-referencing key
-- GLOBAL_HR_ID is an ID field from another system
There are two sources of user data (datafeeds)... I have to watch for mistakes in either of them when updating information.
Scenarios:
A user is given a new User ID... The old record is set accordingly and deactivated (typically a rename for contractors who become fulltime)
A user leaves and returns sometime later. HR fails to send us the old user ID so we can connect the accounts.
The system screwed up and didn't set the new User ID on the old record.
The data can be bad in a hundred other ways
I need to know the following are the same user, and I can't rely on name or other fields... they differ among matching records:
ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME
-----------------------------------------------------------------------------------------------------------------------------
EX0T1100 2 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA
EX0T1100 2 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA
GL110456 1 1 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA
EXOT1100 and EX000005 are connected properly by the NEW_USER_ID field. The rename occurred before there were global HR IDs, so EX0T1100 doesn't have one. EX000005 was given a new user ID, 'GL110456', and the two are only connected by having the same global HR ID.
Cleaning up the data isn't an option.
The query so far:
select connect_by_root cud.user_id RootUser,
count(connect_by_root cud.user_id) over (partition by connect_by_root cud.user_id) NumRoots,
level NodeLevel, connect_by_isleaf IsLeaf, --connect_by_iscycle IsCycle,
cud.user_id, cud.new_user_id, cud.global_hr_id,
cud.user_type_code UserType, ccud.last_name, cud.first_name
from css.user_desc cud
where cud.user_id in ('EX000005','EX0T1100','GL110456')
-- Using this so I don't get sub-users in my list of root users...
-- It complicates the matches with GLOBAL_HR_ID, however
start with cud.user_id not in (select cudsub.new_user_id
from css.user_desc cudsub
where cudsub.new_user_id is not null)
connect by nocycle (prior new_user_id = user_id);
I've tried various CONNECT BY clauses, but none of them are quite right:
-- As a multiple CONNECT BY
connect by nocycle (prior global_hr_id = global_hr_id)
connect by nocycle (prior new_user_id = user_id)
-- As a compound CONNECT BY
connect by nocycle ((prior new_user_id = user_id)
or (prior global_hr_id = global_hr_id
and user_id != prior user_Id))
UNIONing two CONNECT BY queries doesn't work... I don't get the leveling.
Here is what I would like to see... I'm okay with a resultset that I have to distinct and use as a subquery. I'm also okay with any of the three user IDs in the ROOTUSER column... I just need to know they're the same users.
ROOTUSER NUMROOTS NODELEVEL ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID USERTYPE LAST_NAME FIRST_NAME
-----------------------------------------------------------------------------------------------------------------------------
EX0T1100 3 1 0 EX0T1100 EX000005 CONTRACTOR VON DER HAAVEN VERONICA
EX0T1100 3 2 1 EX000005 00126121 EMPLOYEE HAAVEN, VON DER VERONICA
EX0T1100 3 (2 or 3) 1 GL110456 00126121 EMPLOYEE VONDERHAAVEN VERONICA
Ideas?
Update
Nicholas, your code looks very much like the right track... at the moment, the lead(user_id) over (partition by global_hr_id) gets false hits when the global_hr_id is null. For example:
USER_ID NEW_USER_ID CHAINNEWUSER GLOBAL_HR_ID LAST_NAME FIRST_NAME
FP004468 FP004469 AARON TIMOTHY
FP004469 FOONG KOK WAH
I've often wanted to treat nulls as separate records in a partition, but I've never found a way to make ignore nulls work. This did what I wanted:
decode(global_hr_id,null,null,lead(cud.user_id ignore nulls) over (partition by global_hr_id order by user_id)
... but there's got to be a better way. I haven't been able to get the query to finish yet on the full-blown user data (about 40,000 users). Both global_hr_id and new_user_id are indexed.
Update
The query returns after about 750 seconds... long, but manageable. It returns 93k records, because I don't have a good way of filtering level 2 hits out of the root - you have start with global_hr_id is null, but unfortunately, that isn't always the case. I'll have to think some more about how to filter those out.
I've tried adding more complex start with clauses before, but I find that separately, they run < 1 second... together, they take 90 minutes >.<
Thanks again for you help... plodding away at this.
You have provided sample of data for only one user. Would be better to have a little bit more. Anyway, lets look at something like this.
SQL> with user_desc(USER_ID, NEW_USER_ID, GLOBAL_HR_ID)as(
2 select 'EX0T1100', 'EX000005', null from dual union all
3 select 'EX000005', null, 00126121 from dual union all
4 select 'GL110456', null, 00126121 from dual
5 )
6 select connect_by_root(user_id) rootuser
7 , count(connect_by_root(user_id)) over(partition by connect_by_root(user_id)) numroot
8 , level nodlevel
9 , connect_by_isleaf
10 , user_id
11 , new_user_id
12 , global_hr_id
13 from (select user_id
14 , coalesce(new_user_id, usr) new_user_id1
15 , new_user_id
16 , global_hr_id
17 from ( select user_id
18 , new_user_id
19 , global_hr_id
20 , decode(global_hr_id,null,null,lead(user_id) over (partition by global_hr_id order by user_id)) usr
21 from user_desc
22 )
23 )
24 start with global_hr_id is null
25 connect by prior new_user_id1 = user_id
26 ;
Result:
ROOTUSER NUMROOT NODLEVEL CONNECT_BY_ISLEAF USER_ID NEW_USER_ID GLOBAL_HR_ID
-------- ---------- ---------- ----------------- -------- ----------- ------------
EX0T1100 3 1 0 EX0T1100 EX000005
EX0T1100 3 2 0 EX000005 126121
EX0T1100 3 3 1 GL110456 126121
On a new job I have to figure out how some database reporting scripts are working.
There is one table that is giving me some trouble. I see in existing scripts that it is a partitioned table.
My problem is that whatever query I run on this table returns me "no rows selected".
Here are some details about my investigation in this table:
Table size estimate
SQL> select sum(bytes)/1024/1024 Megabytes from dba_segments where segment_name = 'PPREC';
MEGABYTES
----------
45.625
Partitions
There are a total of 730 partitions on date range.
SQL> select min(PARTITION_NAME),max(PARTITION_NAME) from dba_segments where segment_name = 'PPREC';
MIN(PARTITION_NAME) MAX(PARTITION_NAME)
------------------------------ ------------------------------
PART20110201 PART20130130
There are several tablespaces and partitions are allocated in them
SQL> select tablespace_name, count(partition_name) from dba_segments where segment_name = 'PPREC' group by tablespace_name;
TABLESPACE_NAME COUNT(PARTITION_NAME)
------------------------------ ---------------------
REC_DATA_01 281
REC_DATA_02 48
REC_DATA_03 70
REC_DATA_04 26
REC_DATA_05 44
REC_DATA_06 51
REC_DATA_07 13
REC_DATA_08 48
REC_DATA_09 32
REC_DATA_10 52
REC_DATA_11 35
REC_DATA_12 30
Additional query:
SQL> select * from dba_segments where segment_name='PPREC' and partition_name='PART20120912';
OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE TABLESPACE_NAME HEADER_FILE HEADER_BLOCK BYTES BLOCKS EXTENTS
----- ------------ -------------- --------------- --------------- ----------- ------------ ----- ------ -------
HIST PPREC PART20120912 TABLE PARTITION REC_DATA_01 13 475315 65536 8 1
INITIAL_EXTENT NEXT_EXTENT MIN_EXTENTS MAX_EXTENTS PCT_INCREASE FREELISTS FREELIST_GROUPS RELATIVE_FNO BUFFER_POOL
-------------- ----------- ----------- ----------- ------------ --------- --------------- ------------ -----------
65536 1 2147483645 13 DEFAULT
Tabespace usage
Here is a space summary (composite of dba_tablespaces, dba_data_files, dba_segments, dba_free_space)
TABLESPACE_NAME TOTAL_MEGABYTES USED_MEGABYTES FREE_MEGABYTES
------------------------------ --------------- -------------- --------------
REC_01_INDX 30,700 250 30,449
REC_02_INDX 7,745 7 7,737
REC_03_INDX 22,692 15 22,677
REC_04_INDX 15,768 10 15,758
REC_05_INDX 25,884 16 25,868
REC_06_INDX 27,992 16 27,975
REC_07_INDX 17,600 10 17,590
REC_08_INDX 18,864 11 18,853
REC_09_INDX 19,700 12 19,687
REC_10_INDX 28,716 16 28,699
REC_DATA_01 102,718 561 102,156
REC_DATA_02 24,544 3,140 21,403
REC_DATA_03 72,710 4 72,704
REC_DATA_04 29,191 2 29,188
REC_DATA_05 42,696 3 42,692
REC_DATA_06 52,780 323 52,456
REC_DATA_07 16,536 1 16,534
REC_DATA_08 49,247 3 49,243
REC_DATA_09 30,848 2 30,845
REC_DATA_10 49,620 3 49,616
REC_DATA_11 40,616 2 40,613
REC_DATA_12 184,922 123,435 61,486
The tablespace usage seems to confirm that this table is not empty, in fact its last tablespace (REC_DATA_12) seems pretty busy.
Existing scripts
What I find puzzling is that there are some PL/SQL stored procedures that seem to work on that table and get data out of it.
An example of such a stored procedure is as follows:
procedure FIRST_REC as
vpartition varchar2(12);
begin
select 'PART'||To_char(sysdate,'YYYYMMDD') INTO vpartition FROM DUAL;
execute immediate
'MERGE INTO FIRST_REC_temp a
USING (SELECT bno, min(trdate) mintr,max(trdate) maxtr
FROM PPREC PARTITION ('||vpartition||') WHERE route_id IS NOT NULL AND trunc(trdate) <= trunc(sysdate-1)
GROUP BY bno) b
ON (a.bno=b.bno)
when matched then
update set a.last_tr = b.maxtr
when not matched then
insert (a.bno,a.last_tr,a.first_tr)
values (b.bno,b.maxtr,b.mintr)';
commit;
However if I try using the same syntax manually on the table, here is what I get:
SQL> select count(*) from PPREC PARTITION (PART20120912);
COUNT(*)
----------
0
I have tried a few random partitions and I always get the same 0 count.
Summary
- I see a table that seems to contain data (space used, tablespaces, data files)
- The table is partitioned (one partition per day over a period of 730 days ending end of January 2013)
- Scripts are extracting data from that table somehow
Question
- My queries using PARTITION are all returning me "no rows selected". What am I doing wrong? How could I find out how to extract data from this table?
I suppose it's possible that some other process might be deleting the data, but without visiting your site there's no way for anyone here to tell if that might be so.
I don't see in your post that you mentioned the name of the partitioning DATE column, but based on the SQL you posted I'll assume it's TRDATE - if this is not correct, change TRDATE in the statement below to be the partitioning column.
That said, give this a try:
SELECT COUNT(*)
FROM PPREC
WHERE TRDATE >= TO_DATE('01-SEP-2012 00:00:00', 'DD-MON-YYYY HH24:MI:SS')
This assumes you should have data in this table from September. If you find data, great. If you don't - well, Back In The Day (when men were men, women were women, and computers were water-cooled :-) we had a little saying about memory on IBM mainframes:
1. If you can see it, and it's there, it's Real.
2. If you can't see it, but it's there, it's Protected.
3. If you can see it, but it's not there, it's Virtual.
4. If you can't see it, and it's not there, it's GONE!
:-)
Use of the PARTITION clause should be reserved for situations where you are experiencing a performance problem (note: guessing about what is or is not going to be a performance problem is not allowed. Until you've got a performance problem you don't have a performance problem. Over the years I've found that software spends a lot of execution time in the darndest places :-), and the usual fixes (adding indexes, deleting unnecessary data, human sacrifice, etc) haven't worked. Basically, write your queries normally and trust the database to get it right. (In the general case - always write the simplest code - and do the simplest thing - that could possibly work. 99+ percent of the time it will be fine. That allows you to spend your optimization time on the less-than-one-percent cases where simple isn't good enough - and most of the software you write or design will be simple and easy to understand).
Share and enjoy.