AWS Randomize data for large tables

AWS Randomize data for large tables - random

I have a table in redshift with values over 1.8 billion i am trying to randomize that data.
Here is the table values attributes
id bigint,
customer_internal_id bigint,
customer_id VARCHAR(256) Not NULL,
customer_name VARCHAR(256) Not NULL,
customer_type_id bigint,
start_date date,
end_date date,
request_id bigint,
entered VARCHAR(256) not NULL,
superseded VARCHAR(256) not NULL,
customer_latitude double precision,
customer_longitude double precision,
zip_internal_id bigint
How can i achieve this as i tried to look for more of an option but there is no enough documentation available.
Here is the expected output.
i have some code written for PostgresSQL
with result as (
select id, customer_id, customer_name,
lead(customer_id) over w as first_1,
lag(customer_name) over w as first_2
from master.customer_temp_df
window w as (order by random())
)
update master.customer_temp_df
set customer_id = coalesce(first_1, first_2),customer_name = coalesce(first_2, first_1)
from result
where master.customer_temp_df.id = result.id;
but this doesnt work in redshift and i am looking for something like this.
The final goal is to randomize entire table.

Related

How to select only those rows which are greater than modified time using spring data jpa

For Example ,
I have created a table ,
CREATE DATABASE es_db;
USE es_db;
DROP TABLE IF EXISTS es_table;
CREATE TABLE es_table (
id BIGINT(20) UNSIGNED NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY unique_id (id),
client_name VARCHAR(32) NOT NULL,
modification_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
insertion_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
Now assume i have to select those data which are greater than time i give as input .
consider this query for example,
SELECT *, UNIX_TIMESTAMP(modification_time) AS unix_ts_in_secs FROM es_table WHERE (UNIX_TIMESTAMP(modification_time) > :sql_last_modifiedvalue AND modification_time < NOW()) ORDER BY modification_time ASC
Is there away to translate the same to native query ? i can achieve the same with jdbctemplate but would like to know if this is possible with native query?

Insert into not working in oracle when a single column has more characters

I have created a table as:
CREATE TABLE SHOP.EMPLOYEES
(
EMPLOYEEID NUMBER(11) NOT NULL,
LASTNAME VARCHAR2(255 BYTE) DEFAULT NULL,
FIRSTNAME VARCHAR2(255 BYTE) DEFAULT NULL,
BIRTHDATE DATE DEFAULT NULL,
PHOTO VARCHAR2(255 BYTE) DEFAULT NULL,
NOTES VARCHAR2(100 BYTE) DEFAULT NULL
)
I have column notes which has more than 100 characters.So,I tried is:
INSERT INTO shop.employees (EmployeeID, LastName, FirstName, BirthDate, Photo, Notes)
VALUES (1, 'Davolio', 'Nancy', '1968-12-08', 'EmpID1.pic', 'Education includes a BA in psychology from Colorado State University. She also completed (The Art of the Cold Call). Nancy is a member of Toastmasters International.')
But I am getting an error:
Error at line 1
ORA-01861: literal does not match format string
What could be the best datatype for those long text in Oracle?

1968-12-08 is string and you need to insert date in your table.
Conversion of string to date is needed in whenever dates are used.
There are two ways to convert your string to date.
DATE '1968-12-08'
TO_DATE('1968-12-08', 'YYYY-MM-DD')
Cheers!!

BIRTHDATE is a DATE, not a varachar, so you need to convert it:
to_date('1968-12-08', 'yyyy-mm-dd')

Obviously, you can't expect to put something as long as 300 characters into something that accepts 100 characters, can you?
But, that's not your problem. Date is. The 4th column is birthdate, its datatype is date, but you are inserting a string into it, because '1968-12-08' is a string. You should have used a date literal instead, i.e. date '1968-12-08'.
Oh, yes - back to your original question (although a wrong one in this context): best datatype for a long text. You can create a column whose datatype is VARCHAR2(4000) and it'll happily accept that "long" string you used. Or, you can even choose a CLOB which accepts up to 4 giga of characters; more than enough for you, I presume.
Finally, your query:
SQL> CREATE TABLE EMPLOYEES
2 (
3 EMPLOYEEID NUMBER(11) NOT NULL,
4 LASTNAME VARCHAR2(255 BYTE) DEFAULT NULL,
5 FIRSTNAME VARCHAR2(255 BYTE) DEFAULT NULL,
6 BIRTHDATE DATE DEFAULT NULL,
7 PHOTO VARCHAR2(255 BYTE) DEFAULT NULL,
8 NOTES VARCHAR2(100 BYTE) DEFAULT null
9 );
Table created.
Note date literal in line #4 as well as substr function in line #5 (which restricted string length to 100).
SQL> INSERT INTO employees
2 (EmployeeID, LastName, FirstName, BirthDate, Photo, Notes)
3 VALUES
4 (1, 'Davolio', 'Nancy', date '1968-12-08', 'EmpID1.pic',
5 substr('Education includes a BA in psychology from Colorado State University. She also completed (The Art of the Cold Call). Nancy is a member
of Toastmasters International.', 1, 100))
6 ;
1 row created.
SQL>

In this case I suggest simply make the NOTES column larger:
ALTER TABLE SHOP.EMPLOYEES
MODIFY (NOTES VARCHAR2(4000));
dbfiddle here
If you need something larger than this you could use the CLOB data type.

Error converting varchar to numeric (but there's no number)

I have a table with several columns, like this:
CREATE TABLE CRM.INFO_ADICIONAL
(
ID_INFO_ADICIONAL NUMBER(10) NOT NULL,
NOMBRE VARCHAR2(100 BYTE) NOT NULL,
OBLIGATORIO NUMBER(1) NOT NULL,
TIPO_DATO VARCHAR2(2 BYTE) NOT NULL,
ACTIVO NUMBER(1) NOT NULL,
ID_TIPO_REQUERIMIENTO NUMBER(10) NOT NULL,
ID_USUARIO_AUDIT NUMBER(10) NOT NULL,
ORDEN NUMBER(3) DEFAULT 1,
RECHAZO_POR_NO NUMBER(1),
ID_TIPO_ARCHIVO_ADJUNTO NUMBER(10),
SOLICITAR_EN VARCHAR2(30 BYTE),
ID_CONSULTA NUMBER(10),
COMBO_ID VARCHAR2(40 BYTE),
APLICAR_COMO_VENC NUMBER(1),
MODIFICABLE NUMBER(1) DEFAULT 0,
ID_AREA_GESTION NUMBER(10),
ID_TAREA NUMBER(10)
)
The "COMBO_ID" column is the target. It is defined as VARCHAR, but when I'm trying to insert a row, TOAD displays
"ORA-06502: PL/SQL: error : error de conversión de carácter a número
numérico o de valor"
Or a 'numeric conversion error', in english.
This table have some pre-existing data, and I even found some rows including values at COMBO_ID column, all of them being VARCHAR, i.e.:
NACION (Nation), SEXO (Sex), etc
I tried a few simple SELECT statements
SELECT
ID_INFO_ADICIONAL,
NOMBRE,
OBLIGATORIO,
TIPO_DATO,
ACTIVO,
ID_TIPO_REQUERIMIENTO,
ID_USUARIO_AUDIT,
ORDEN,
RECHAZO_POR_NO,
ID_TIPO_ARCHIVO_ADJUNTO,
SOLICITAR_EN,
COMBO_ID,
APLICAR_COMO_VENC,
ID_CONSULTA,
MODIFICABLE,
ID_AREA_GESTION,
ID_TAREA
INTO
pRegistro
FROM
crm.info_adicional
where pRegistro is declared as
pRegistro INFO_ADICIONAL%ROWTYPE;
Again, I'm still getting this 'numeric conversion error'.
But, wait, if I hardcode the SELECT value in COMBO_ID column with a NUMBER:
SELECT
--other columns
123456 COMBO_ID,
--other columns
INTO
pRegistro
FROM
crm.info_adicional
It works, what the heck, it's defined as VARCHAR.
If I do the same but harcoding a string, it fails to execute again
Already tried in my DEV environment, and it's working fine.
I'm not a pro in Oracle, but I feel pretty lost.
Could it be that tables get "confused"?
Any clues?

That error can also be raised if you try to push a character string that is longer than your VARCHAR2's capacity (40 in your case).
Try to check if all the data you are trying to insert is correct :
SELECT
COMBO_ID
FROM
crm.info_adicional
ORDER BY length(COMBO_ID) desc;
That would also explain why it works fine on your DEV environment which, I suppose, has different data.

Okay, I already found the answer.
Quoting Oracle Documentation:
The %ROWTYPE attribute provides a record type that represents a row in a table or view. Columns in a row and corresponding fields in a record have the same names and datatypes.
So, basically, the SELECT statement needed to be in the same order as the table columns definition.
In my case, I had a few columns (including COMBO_ID) in a different order.
Tried, re-ordering, and works like a charm.
Thank you all for the support.

optimize an inner join between two multi-million row tables

I'm new to Postgres and even newer to understanding how explain works. I have a query below which is typical, I just replace the date:
explain
select account_id,
security_id,
market_value_date,
sum(market_value) market_value
from market_value_history mvh
inner join holding_cust hc on hc.id = mvh.owning_object_id
where
hc.account_id = 24766
and market_value_date = '2015-07-02'
and mvh.created_by = 'HoldingLoad'
group by account_id, security_id, market_value_date
order by security_id, market_value_date;
Attached is a screenshot of explain
The count for holding_cust table is 2 million rows and market_value_history table has 163 million rows
Below are the table definitions and indexes for market_value_history and holding_cust:
I'd appreciate any advice you may be able to give me on tuning this query.
CREATE TABLE public.market_value_history
(
id integer NOT NULL DEFAULT nextval('market_value_id_seq'::regclass),
market_value numeric(18,6) NOT NULL,
market_value_date date,
holding_type character varying(25) NOT NULL,
owning_object_type character varying(25) NOT NULL,
owning_object_id integer NOT NULL,
created_by character varying(50) NOT NULL,
created_dt timestamp without time zone NOT NULL,
last_modified_dt timestamp without time zone NOT NULL,
CONSTRAINT market_value_history_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.market_value_history
OWNER TO postgres;
-- Index: public.ix_market_value_history_id
-- DROP INDEX public.ix_market_value_history_id;
CREATE INDEX ix_market_value_history_id
ON public.market_value_history
USING btree
(owning_object_type COLLATE pg_catalog."default", owning_object_id);
-- Index: public.ix_market_value_history_object_type_date
-- DROP INDEX public.ix_market_value_history_object_type_date;
CREATE UNIQUE INDEX ix_market_value_history_object_type_date
ON public.market_value_history
USING btree
(owning_object_type COLLATE pg_catalog."default", owning_object_id, holding_type COLLATE pg_catalog."default", market_value_date);
CREATE TABLE public.holding_cust
(
id integer NOT NULL DEFAULT nextval('holding_cust_id_seq'::regclass),
account_id integer NOT NULL,
security_id integer NOT NULL,
subaccount_type integer,
trade_date date,
purchase_date date,
quantity numeric(18,6),
net_cost numeric(18,2),
adjusted_net_cost numeric(18,2),
open_date date,
close_date date,
created_by character varying(50) NOT NULL,
created_dt timestamp without time zone NOT NULL,
last_modified_dt timestamp without time zone NOT NULL,
CONSTRAINT holding_cust_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.holding_cust
OWNER TO postgres;
-- Index: public.ix_holding_cust_account_id
-- DROP INDEX public.ix_holding_cust_account_id;
CREATE INDEX ix_holding_cust_account_id
ON public.holding_cust
USING btree
(account_id);
-- Index: public.ix_holding_cust_acctid_secid_asofdt
-- DROP INDEX public.ix_holding_cust_acctid_secid_asofdt;
CREATE INDEX ix_holding_cust_acctid_secid_asofdt
ON public.holding_cust
USING btree
(account_id, security_id, trade_date DESC);
-- Index: public.ix_holding_cust_security_id
-- DROP INDEX public.ix_holding_cust_security_id;
CREATE INDEX ix_holding_cust_security_id
ON public.holding_cust
USING btree
(security_id);
-- Index: public.ix_holding_cust_trade_date
-- DROP INDEX public.ix_holding_cust_trade_date;
CREATE INDEX ix_holding_cust_trade_date
ON public.holding_cust
USING btree
(trade_date);

Two things:
As Dmitry pointed out, you should look at creating an Index on market_value_date field. Its possible that post that you have a completely different query plan, which may or may not bring up other bottlenecks, but it should certainly remove this seq-Scan.
Minor (since I doubt if it affects performance), but secondly, if you aren't enforcing field length by design, you may want to change createdby field to TEXT. As can be seen in the query, its trying to cast all createdby fields to TEXT for this query.

cx_Oracle.DatabaseError: ORA-00947: not enough values

I have a table:
create table employee (
employee_id NUMBER NOT NULL,
name VARCHAR2(255) NOT NULL,
notes VARCHAR2(4000),
created_by varchar2(255) not null,
created_at date default sysdate not null,
updated_by varchar2(255) not null,
updated_at date default sysdate not null,
PRIMARY KEY(vendor_id)
);
so when I insert from SQL developer:
insert into employee(employee_id, name,notes) values(1,'xyz','test');
it auto populates create_by, created_at, updated_at and updated_by.
row gets inserted successfully.
Whereas if I try to insert using cx_Oracle module in python,
cursor.execute("INSERT INTO employee VALUES (:employee_id,:name,:notes)",
{
'employee_id' : max_value,
'name' : each_vendor,
'notes' : 'test'
}
)
it throws error saying not enough values.
Why do I get this error? How can I solve it?

The answer is very simple, and has nothing to do with python. Your 2 insert statements are very different.
In the 1st, you explicitly name the columns you intend to provide values for: (employee_id, name,notes). However, in the insert statement used from Python, you don't specify the 3 columns by name. As a result, your insert statement expects you to provide the values for all columns in the table.
The fix: explicitly name the 3 columns:
cursor.execute("INSERT INTO employee (employee_id, name, notes) VALUES (:employee_id,:name,:notes)",
{
'employee_id' : max_value,
'name' : each_vendor,
'notes' : 'test'
}
)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

AWS Randomize data for large tables - random

Related

How to select only those rows which are greater than modified time using spring data jpa

Insert into not working in oracle when a single column has more characters

Error converting varchar to numeric (but there's no number)

optimize an inner join between two multi-million row tables

cx_Oracle.DatabaseError: ORA-00947: not enough values

Categories

Resources