SQL Server : bulk Insert and ignore duplicates

SQL Server : bulk Insert and ignore duplicates - insert

I'm trying to do a bulk insert (SQL Server 2008) into a table but the insert must ignore any duplicate already in the table.
The simplified table will look like this with existing values.
TBL_STOCK
id | Stock
---------------
1 | S1
2 | S2
3 | S3
Now I want to do a bulk insert that looks like
INSERT INTO TBL_STOCK (Id, Stock)
VALUES
(3, S3),
(4, S4),
(5, S5)
This works but will cause duplicate entries
How do I go about ignoring duplicate entries in the Stock column?

By "ignoring duplicate entries", you mean avoiding them in TBL_STOCK, right ?
I might be a bit late, but have you tried the following:
INSERT INTO #TempStock (Id, Stock) -- temporary table
VALUES
(3, S3),
(4, S4),
(5, S5)
INSERT INTO TBL_STOCK
SELECT * FROM #TempStock
WHERE NOT EXISTS (SELECT Stock FROM #TempStock WHERE #TempStock.Stock = TBL_STOCK.Stock)
DROP TABLE #TempStock

Related

Why aren't rows returned in the order of insertion?

Oracle 18c:
I have 1000 rows of test data:
create table lines (id number, shape sdo_geometry);
begin
insert into lines (id, shape) values (1, sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(574360, 4767080, 574200, 4766980)));
insert into lines (id, shape) values (2, sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(573650, 4769050, 573580, 4768870)));
insert into lines (id, shape) values (3, sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(574290, 4767090, 574200, 4767070)));
insert into lines (id, shape) values (4, sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(571430, 4768160, 571260, 4768040)));
insert into lines (id, shape) values (5, sdo_geometry(2002, 26917, null, sdo_elem_info_array(1, 2, 1), sdo_ordinate_array(571500, 4769030, 571350, 4768930)));
...
end;
/
Full data here: db<>fiddle
When I select the data:
select
id,
sdo_util.to_wktgeometry(shape)
from
lines
...the data doesn't get returned in the order that I inserted it.
SQL Developer / on-prem db:
db<>fiddle:
I would have expected ID #1 to be the first row, and so on.
I know in reality, we would never rely on the order of the rows in the table. We would sort the data using order by if the ordering was important.
But I'm still curious, why wouldn't the data be returned in the order that it was inserted in? What's going on there?

The short answer is: tables are basically “heaps" and the rows are inserted where ever they want.

How to insert large amount of data into a ClickHouse DB?

I have an instance of a ClickHouse server running and I have successfully connected to it through a client. I'm using Tabix.io to run my queries. I have created a DB and a table called "names". I want to input a lot of randomly generated names inside that table. I know that running multiple commands like this:
insert into names (id, first_name, last_name) values (1, 'Stephana', 'Bromell');
insert into names (id, first_name, last_name) values (2, 'Babita', 'Leroux');
insert into names (id, first_name, last_name) values (3, 'Pace', 'Christofides');
...
insert into names (id, first_name, last_name) values (999, 'Ralph', 'Jackson');
is not supported and therefore it is only the first query that is executed. In other words only Stephana Bromell appear in the "names" table.
What is the ClickHouse alternative for inserting larger amounts of data?

multiple values in a single insert.
insert into names (id, first_name, last_name) values (1, 'Stephana', 'Bromell') (2, 'Babita', 'Leroux') (3, 'Pace', 'Christofides') (999, 'Ralph', 'Jackson');

How about batch inserting using http client with CSV
create csv file (names.csv) with content:
1,Stephana,Bromell
2,Babita,Leroux
3,Pace,Christofides
...
999,Ralph,Jackson
call HTTP API:
curl -i -X POST \
-T "./names.csv" \
'http://localhost:8123/?query=INSERT%20INTO%20names%20FORMAT%20CSV'

Oracle - Text Classification (adding a secondary column)

I'm following the demo on training on classification from Oracle.
I've attached the script.
In what way can this be used to create multiple sets of rules based on a secondary column? The column wouldn't be an extra predictor, but really a way to completely isolate the rules.
That column might have its own list of data, and I'd hate to create separate sets of tables to handle the scenario.
So for example, if I added a doc_type field to ml_docs (and to the categories). When its 1 I might want to use one set of rules and if its a 2, a different set of rules. I want them to be completely isolated. What implications would that have on the script? Does Oracle support this?
create table ml_docs (
doc_id number primary key,
doc_text clob);
insert into ml_docs values
(1, 'MacTavishes is a fast-food chain specializing in burgers, fries and -
shakes. Burgers are clearly their most important line.');
insert into ml_docs values
(2, 'Burger Prince are an up-market chain of burger shops, who sell burgers -
and fries in competition with the likes of MacTavishes.');
insert into ml_docs values
(3, 'Shakes 2 Go are a new venture in the low-cost restaurant arena,
specializing in semi-liquid frozen fruit-flavored vegetable oil products.');
insert into ml_docs values
(4, 'TCP/IP network engineers generally need to know about routers,
firewalls, hosts, patch cables networking etc');
insert into ml_docs values
(5, 'Firewalls are used to protect a network from attack by remote hosts,
generally across TCP/IP');
create table ml_category_descriptions (
cd_category number,
cd_description varchar2(80));
create table ml_doc_categories (
dc_category number,
dc_doc_id number,
primary key (dc_category, dc_doc_id))
organization index;
-- descriptons for categories
insert into ml_category_descriptions values (1, 'fast food');
insert into ml_category_descriptions values (2, 'computer networking');
insert into ml_doc_categories values (1, 1);
insert into ml_doc_categories values (1, 2);
insert into ml_doc_categories values (1, 3);
insert into ml_doc_categories values (2, 4);
insert into ml_doc_categories values (2, 5);
exec ctx_ddl.create_preference('bid_lex', 'basic_lexer');
exec ctx_ddl.set_attribute ('bid_lex', 'index_themes', 'no');
exec ctx_ddl.set_attribute ('bid_lex', 'index_text', 'yes');
create index ml_docsindex on ml_docs(doc_text) indextype is ctxsys.context
parameters ('lexer bid_lex');
create table ml_rules(
rule_cat_id number,
rule_text varchar2(4000),
rule_confidence number
);
begin
ctx_cls.train(
index_name => 'ml_docsindex',
docid => 'doc_id',
cattab => 'ml_doc_categories',
catdocid => 'dc_doc_id',
catid => 'dc_category',
restab => 'ml_rules',
rescatid => 'rule_cat_id',
resquery => 'rule_text',
resconfid => 'rule_confidence'
);
end;
create index rules_idx on ml_rules (rule_text) indextype is ctxsys.ctxrule;
set serveroutput on;
declare
incoming_doc clob;
begin
incoming_doc
:= 'I have spent my entire life managing restaurants selling burgers';
for c in
( select distinct cd_description from ml_rules, ml_category_descriptions
where cd_category = rule_cat_id
and matches (rule_text, incoming_doc) > 0) loop
dbms_output.put_line('CATEGORY: '||c.cd_description);
end loop;
end;

Invalid memory alloc request size 1610613056 in Greenplum 4.3.14.0

I have table "test" and I create it with this query:
CREATE TABLE test
(
id bigint,
str1 timestamp without time zone,
str2 text,
str3 text
);
After create table I added data:
INSERT INTO test VALUES (1, '2017-08-29 10:51:40.190913', 'gfsdfg1', 'sfgsdhgy1');
INSERT INTO test VALUES (2, '2016-08-29 10:51:40.190913', 'gfsdfg2', 'sfgsdhgy2');
INSERT INTO test VALUES (3, '2015-08-29 10:51:40.190913', 'gfsdfg3', 'sfgsdhgy3');
INSERT INTO test VALUES (4, '2014-08-29 10:51:40.190913', 'gfsdfg4', 'sfgsdhgy4');
INSERT INTO test VALUES (5, '2013-08-29 10:51:40.190913', 'gfsdfg5', 'sfgsdhgy5');
INSERT INTO test VALUES (6, '2012-08-29 10:51:40.190913', 'gfsdfg6', 'sfgsdhgy6');
INSERT INTO test VALUES (7, '2011-08-29 10:51:40.190913', 'gfsdfg7', 'sfgsdhgy7');
INSERT INTO test VALUES (8, '2010-08-29 10:51:40.190913', 'gfsdfg8', 'sfgsdhgy8');
INSERT INTO test VALUES (9, '2009-08-29 10:51:40.190913', 'gfsdfg9', 'sfgsdhgy9');
INSERT INTO test VALUES (10, '2008-08-29 10:51:40.190913', 'gfsdfg10', 'sfgsdhgy10');
INSERT INTO test VALUES (11, '2009-08-29 10:51:40.190913', 'gfsdfg11', 'sfgsdhgy11');
INSERT INTO test VALUES (12, '2015-08-29 10:51:40.190913', 'gfsdfg12', 'sfgsdhgy12');
INSERT INTO test VALUES (13, '2020-08-29 10:51:40.190913', 'gfsdfg13', 'sfgsdhgy13');
And then we I trying UPDATE this table with such query:
UPDATE test SET
str1 = c.str1,
str2 = c.str2,
str3 = c.str3
FROM (
VALUES
(10, '2017-08-29 11:11:37'::timestamp without time zone, 'str2-10', 'str3-10'),
(11, '2017-08-29 11:11:37'::timestamp without time zone, 'str2-11', 'str3-11'),
(12, '2017-08-29 11:11:37'::timestamp without time zone, 'str2-12', 'str3-12'),
(13, '2017-08-29 11:11:37'::timestamp without time zone, 'str2-13', 'str3-13')
) AS c (id, str1, str2, str3)
WHERE c.id = test.id;
And I got error:
ERROR: invalid memory alloc request size 1610613056 (context 'ExecutorState') (mcxt.c:1069) (mcxt.c:477) (seg0 node03:40000 pid=113577) (cdbdisp.c:1322)
How can I fix this error?

What version of GPDB are you using?
There are a few known planner errors -- this looks like an older legacy planner issue.
Can you try the same with set optimizer=on; or off?
With the large memory allocation size it's more likely this is an issue with statistics for the tables causing the planner to blow up.
As a best practice, follow up every CREATE with an ANALYZE.
ANALYZE the tables then running the query again.

Oracle query rewrite with virtual columns in the source table

I have a table, demo_fact in Oracle 11g and it has several virtual columns defined as such:
ALTER TABLE demo_fact ADD (demo_measure_from_virtual NUMBER GENERATED ALWAYS AS
(CASE WHEN demo_category_column = 20 THEN demo_numericdata_column ELSE 0 END)
VIRTUAL VISIBLE);
Then I have a materialized view defined as
CREATE MATERIALIZED VIEW demo_agg_mv
REFRESH FORCE ON DEMAND
ENABLE QUERY REWRITE
AS
SELECT
demo_dim_one,
demo_dim_two,
SUM(demo_measure_from_virtual) demo_measure_from_virtual
FROM demo_fact
GROUP BY demo_dim_one, demo_dim_two
Now I want Query Rewrite to kick in on the following query:
SELECT demo_dim_one, SUM(demo_measure_from_virtual)
FROM demo_fact
GROUP BY demo_dim_one
but it doesn't. I ran EXPLAIN_REWRITE on and here is the output:
QSM-01150: query did not rewrite
QSM-01102: materialized view, DEMO_AGG_MV, requires join back to table,
DEMO_FACT, on column, DEMO_MEASURE_FROM_VIRTUAL
QSM-01082: Joining materialized view, DEMO_AGG_MV, with table, DEMO_FACT,
not possible
QSM-01102: materialized view, DEMO_AGG_MV, requires join back to table,
DEMO_FACT, on column, DEMO_NUMERICDATA_COLUMN
Backstory: I'm doing this with 70M rows and 50 virtual columns (all of them have the same structure, the simple case statement above, but with a different comparison column and a different result column)
This problem seems to only manifest when the fact table has virtual columns, but changing them to non-virtual would consume too much diskspace. Why isn't Oracle rewriting the query? What can I do to fix it?

I don't know how helpful this is for you but Oracle requires all columns that the materialzied view grouped on to be included in the statement to be rewritten. (edit at least in conjunction with virtual columns. This is probably "not by design"...)
If you try to explain_rewrite on
select
demo_dim_one,
sum(s)
from (
select
demo_dim_one,
sum(demo_measure_from_virtual) s
from
demo_fact
group by
demo_dim_one,
demo_dim_two
)
group by demo_dim_one
it should tell you that it has rewritten the query.
This can be demonstrated like so:
A table to on which the virtual column will be defined:
create table tq84_virt_col (
a varchar2(2),
b varchar2(2),
c number,
d number
);
insert into tq84_virt_col values ('A', 'X', 1, 1);
insert into tq84_virt_col values ('A', 'X', 2, 1);
insert into tq84_virt_col values ('A', 'Y', 3, 0);
insert into tq84_virt_col values ('A', 'Y', 4, 1);
insert into tq84_virt_col values ('B', 'Y', 11, 1);
insert into tq84_virt_col values ('B', 'X', 12, 0);
insert into tq84_virt_col values ('B', 'X', 13, 1);
The definition of the virtual column:
alter table tq84_virt_col add (
virt_col number generated always as (
case when d = 1 then c else 0 end
)
virtual visible
);
The materialized view. Note: it groups on columns a and b:
create materialized view tq84_mat_view
refresh force on demand
enable query rewrite
as
select
a, b,
sum(virt_col) sum_virt_col
from
tq84_virt_col
group by
a,b
The materialized view will not be used, as you have observed:
begin
dbms_mview.explain_rewrite(
'select a, sum(virt_col) from tq84_virt_col group by a'
);
end;
/
select message
from rewrite_table;
QSM-01150: query did not rewrite
QSM-01102: materialized view, TQ84_MAT_VIEW, requires join back to table, TQ84_VIRT_COL, on column, VIRT_COL
QSM-01082: Joining materialized view, TQ84_MAT_VIEW, with table, TQ84_VIRT_COL, not possible
QSM-01102: materialized view, TQ84_MAT_VIEW, requires join back to table, TQ84_VIRT_COL, on column, C
Now, both columns a and b are selected and grouped on (with an outer query to ensure the same result set):
truncate table rewrite_table;
begin
dbms_mview.explain_rewrite(
'select a, sum(s) from (select a, sum(virt_col) s from tq84_virt_col group by a, b) group by a'
);
end;
/
select message
from rewrite_table;
QSM-01151: query was rewritten
QSM-01209: query rewritten with materialized view, TQ84_MAT_VIEW, using text match algorithm
QSM-01219: no suitable materialized view found to rewrite this query

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

SQL Server : bulk Insert and ignore duplicates - insert

Related

Why aren't rows returned in the order of insertion?

How to insert large amount of data into a ClickHouse DB?

Oracle - Text Classification (adding a secondary column)

Invalid memory alloc request size 1610613056 in Greenplum 4.3.14.0

Oracle query rewrite with virtual columns in the source table

Categories

Resources