Spring Batch tables cleanup takes forever - spring

For an application based on Spring Boot and relying on a PostgreSQL 9.6 database, I'm using Spring Batch to schedule a few operations which must take place every n seconds (customizable but usually ranging between a few seconds and a few minutes); as a result, at the end of the day a lot of jobs are performed by the system and a lot of information are persisted by Spring Batch.
The fact is that I'm not really interested in historicizing those jobs so, at the beginning, I used the in-memory version of Spring Batch to avoid any kind of persistency on such (to me) useless information.
However, in case of configurations with small n running on environments with low resources, this approach led to performance issues so I decided to try with the database way.
Unfortunately, those tables grow quite fast and I would like to implement a cleanup procedure to get rid of all data older than, for instance, a day.
Here comes the pain: in fact, even if nothing is locking those tables (so, the main application is down and noone is interacting with the database) it takes forever to clean them and I really cannot understand the reason why.
Spring Batch (4.0.1) provides the following PG script to generate those tables:
CREATE TABLE BATCH_JOB_INSTANCE (
JOB_INSTANCE_ID BIGINT NOT NULL PRIMARY KEY ,
VERSION BIGINT ,
JOB_NAME VARCHAR(100) NOT NULL,
JOB_KEY VARCHAR(32) NOT NULL,
constraint JOB_INST_UN unique (JOB_NAME, JOB_KEY)
) ;
CREATE TABLE BATCH_JOB_EXECUTION (
JOB_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY ,
VERSION BIGINT ,
JOB_INSTANCE_ID BIGINT NOT NULL,
CREATE_TIME TIMESTAMP NOT NULL,
START_TIME TIMESTAMP DEFAULT NULL ,
END_TIME TIMESTAMP DEFAULT NULL ,
STATUS VARCHAR(10) ,
EXIT_CODE VARCHAR(2500) ,
EXIT_MESSAGE VARCHAR(2500) ,
LAST_UPDATED TIMESTAMP,
JOB_CONFIGURATION_LOCATION VARCHAR(2500) NULL,
constraint JOB_INST_EXEC_FK foreign key (JOB_INSTANCE_ID)
references BATCH_JOB_INSTANCE(JOB_INSTANCE_ID)
) ;
CREATE TABLE BATCH_JOB_EXECUTION_PARAMS (
JOB_EXECUTION_ID BIGINT NOT NULL ,
TYPE_CD VARCHAR(6) NOT NULL ,
KEY_NAME VARCHAR(100) NOT NULL ,
STRING_VAL VARCHAR(250) ,
DATE_VAL TIMESTAMP DEFAULT NULL ,
LONG_VAL BIGINT ,
DOUBLE_VAL DOUBLE PRECISION ,
IDENTIFYING CHAR(1) NOT NULL ,
constraint JOB_EXEC_PARAMS_FK foreign key (JOB_EXECUTION_ID)
references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ;
CREATE TABLE BATCH_STEP_EXECUTION (
STEP_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY ,
VERSION BIGINT NOT NULL,
STEP_NAME VARCHAR(100) NOT NULL,
JOB_EXECUTION_ID BIGINT NOT NULL,
START_TIME TIMESTAMP NOT NULL ,
END_TIME TIMESTAMP DEFAULT NULL ,
STATUS VARCHAR(10) ,
COMMIT_COUNT BIGINT ,
READ_COUNT BIGINT ,
FILTER_COUNT BIGINT ,
WRITE_COUNT BIGINT ,
READ_SKIP_COUNT BIGINT ,
WRITE_SKIP_COUNT BIGINT ,
PROCESS_SKIP_COUNT BIGINT ,
ROLLBACK_COUNT BIGINT ,
EXIT_CODE VARCHAR(2500) ,
EXIT_MESSAGE VARCHAR(2500) ,
LAST_UPDATED TIMESTAMP,
constraint JOB_EXEC_STEP_FK foreign key (JOB_EXECUTION_ID)
references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ;
CREATE TABLE BATCH_STEP_EXECUTION_CONTEXT (
STEP_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY,
SHORT_CONTEXT VARCHAR(2500) NOT NULL,
SERIALIZED_CONTEXT TEXT ,
constraint STEP_EXEC_CTX_FK foreign key (STEP_EXECUTION_ID)
references BATCH_STEP_EXECUTION(STEP_EXECUTION_ID)
) ;
CREATE TABLE BATCH_JOB_EXECUTION_CONTEXT (
JOB_EXECUTION_ID BIGINT NOT NULL PRIMARY KEY,
SHORT_CONTEXT VARCHAR(2500) NOT NULL,
SERIALIZED_CONTEXT TEXT ,
constraint JOB_EXEC_CTX_FK foreign key (JOB_EXECUTION_ID)
references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ;
CREATE SEQUENCE BATCH_STEP_EXECUTION_SEQ MAXVALUE 9223372036854775807 NO CYCLE;
CREATE SEQUENCE BATCH_JOB_EXECUTION_SEQ MAXVALUE 9223372036854775807 NO CYCLE;
CREATE SEQUENCE BATCH_JOB_SEQ MAXVALUE 9223372036854775807 NO CYCLE;
By respecting the references precedence, I try to cleanup those tables by executing the following deletions:
delete from BATCH_STEP_EXECUTION_CONTEXT;
delete from BATCH_STEP_EXECUTION;
delete from BATCH_JOB_EXECUTION_CONTEXT;
delete from BATCH_JOB_EXECUTION_PARAMS;
delete from BATCH_JOB_EXECUTION;
delete from BATCH_JOB_INSTANCE;
Everything is ok with the first 4 tables but, as soon as I reach the BATCH_JOB_EXECUTION one, it keeps like 30 minutes to remove a few hundred thousands of rows.
Even worse, after deleting everything from the first 5 tables, the last one (which is now linked to nothing) takes even more.
Can you see a reason why this simple operation takes so long to complete? I mean, of course it has to check for constraints violations but it seems however unreasonably slow.
Plus, is there a better way to use Spring Batch without wasting disk space with unnecessary jobs information?

Related

How to select only those rows which are greater than modified time using spring data jpa

For Example ,
I have created a table ,
CREATE DATABASE es_db;
USE es_db;
DROP TABLE IF EXISTS es_table;
CREATE TABLE es_table (
id BIGINT(20) UNSIGNED NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY unique_id (id),
client_name VARCHAR(32) NOT NULL,
modification_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
insertion_time TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);
Now assume i have to select those data which are greater than time i give as input .
consider this query for example,
SELECT *, UNIX_TIMESTAMP(modification_time) AS unix_ts_in_secs FROM es_table WHERE (UNIX_TIMESTAMP(modification_time) > :sql_last_modifiedvalue AND modification_time < NOW()) ORDER BY modification_time ASC
Is there away to translate the same to native query ? i can achieve the same with jdbctemplate but would like to know if this is possible with native query?

Unusable partition Oracle / datastage

I am facing an issue with my datastage job. I have to fill a table ttperiodeas in Oracle from a .csv file. The SQL query in Oracle connector is shown in this screenshot:
Oracle connector
And here is the oracle script
CREATE TABLE TTPERIODEAS
(
CDPARTITION VARCHAR2(5 BYTE) NOT NULL ENABLE,
CDCOMPAGNIE NUMBER(4,0) NOT NULL ENABLE,
CDAPPLI NUMBER(4,0) NOT NULL ENABLE,
NUCONTRA CHAR(15 BYTE) NOT NULL ENABLE,
DTDEBAS NUMBER(8,0) NOT NULL ENABLE,
DTFINAS NUMBER(8,0) NOT NULL ENABLE,
TAUXAS NUMBER(8,5) NOT NULL ENABLE,
CONSTRAINT PK_TTPERIODEAS
PRIMARY KEY (CDPARTITION, CDCOMPAGNIE, CDAPPLI, NUCONTRA, DTDEBAS)
)
PARTITION BY LIST(CDPARTITION)
(PARTITION P_PERIODEAS_13Q VALUES ('13Q'));
When running the job, I get the following message error and the table is not filled.:
The index 'USINODSD0.SYS_C00249007' its partition is unusable
Please I need help thanks
The index is global (i.e. not partitioned) because there is no using index local at the end of the definition. This is also true for the PK index shown above. (I'm assuming they are two different things, because by default the DDL above would create an index named PK_TTPERIODEAS, so I'm not sure what SYS_C00249007 is.) If you can drop and rebuild them as local indexes (i.e. partitioned to match the table) then truncating or dropping a partition will no longer invalidate indexes.
For example, you could rebuild the primary key as:
alter table ttperiodeas
drop primary key;
alter table ttperiodeas
add constraint pk_ttperiodeas primary key (cdpartition,cdcompagnie,cdappli,nucontra,dtdebas)
using index local;
I don't know how SYS_C00249007 is defined, but you could use something similar.
The create table command might be something like:
create table ttperiodeas
( cdpartition varchar2(5 byte) not null
, cdcompagnie number(4,0) not null
, cdappli number(4,0) not null
, nucontra varchar2(15 byte) not null
, dtdebas number(8,0) not null
, dtfinas number(8,0) not null
, tauxas number(8,5) not null
, constraint pk_ttperiodeas
primary key (cdpartition,cdcompagnie,cdappli,nucontra,dtdebas)
using index local
)
partition by list(cdpartition)
( partition p_periodeas_13q values ('13Q') );
Alternatively, you could add the update global indexes clause when dropping the partition:
alter table demo_temp drop partition p_periodeas_14q update global indexes;
(By the way, NUCONTRA should probably be a standard VARCHAR2 and not CHAR, which is intended for cross-platform compatibility and ANSI completeness, and in practice just wastes space and creates bugs.)
the message says that the index for the given partition is unusable: so you could try to rebuild the correponding index partition by the use of
create index [index_name] rebuild partition [partition_name]
(with the fitting values for [index_name] and [partition_nme].
Before you do that you should check the status of the index partitions in user_indexes - since your error message looks not like Oracle error messages usually do.
But since the index is global as William Robertson pointed out, this is not applicable for the given situation.

optimize an inner join between two multi-million row tables

I'm new to Postgres and even newer to understanding how explain works. I have a query below which is typical, I just replace the date:
explain
select account_id,
security_id,
market_value_date,
sum(market_value) market_value
from market_value_history mvh
inner join holding_cust hc on hc.id = mvh.owning_object_id
where
hc.account_id = 24766
and market_value_date = '2015-07-02'
and mvh.created_by = 'HoldingLoad'
group by account_id, security_id, market_value_date
order by security_id, market_value_date;
Attached is a screenshot of explain
The count for holding_cust table is 2 million rows and market_value_history table has 163 million rows
Below are the table definitions and indexes for market_value_history and holding_cust:
I'd appreciate any advice you may be able to give me on tuning this query.
CREATE TABLE public.market_value_history
(
id integer NOT NULL DEFAULT nextval('market_value_id_seq'::regclass),
market_value numeric(18,6) NOT NULL,
market_value_date date,
holding_type character varying(25) NOT NULL,
owning_object_type character varying(25) NOT NULL,
owning_object_id integer NOT NULL,
created_by character varying(50) NOT NULL,
created_dt timestamp without time zone NOT NULL,
last_modified_dt timestamp without time zone NOT NULL,
CONSTRAINT market_value_history_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.market_value_history
OWNER TO postgres;
-- Index: public.ix_market_value_history_id
-- DROP INDEX public.ix_market_value_history_id;
CREATE INDEX ix_market_value_history_id
ON public.market_value_history
USING btree
(owning_object_type COLLATE pg_catalog."default", owning_object_id);
-- Index: public.ix_market_value_history_object_type_date
-- DROP INDEX public.ix_market_value_history_object_type_date;
CREATE UNIQUE INDEX ix_market_value_history_object_type_date
ON public.market_value_history
USING btree
(owning_object_type COLLATE pg_catalog."default", owning_object_id, holding_type COLLATE pg_catalog."default", market_value_date);
CREATE TABLE public.holding_cust
(
id integer NOT NULL DEFAULT nextval('holding_cust_id_seq'::regclass),
account_id integer NOT NULL,
security_id integer NOT NULL,
subaccount_type integer,
trade_date date,
purchase_date date,
quantity numeric(18,6),
net_cost numeric(18,2),
adjusted_net_cost numeric(18,2),
open_date date,
close_date date,
created_by character varying(50) NOT NULL,
created_dt timestamp without time zone NOT NULL,
last_modified_dt timestamp without time zone NOT NULL,
CONSTRAINT holding_cust_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.holding_cust
OWNER TO postgres;
-- Index: public.ix_holding_cust_account_id
-- DROP INDEX public.ix_holding_cust_account_id;
CREATE INDEX ix_holding_cust_account_id
ON public.holding_cust
USING btree
(account_id);
-- Index: public.ix_holding_cust_acctid_secid_asofdt
-- DROP INDEX public.ix_holding_cust_acctid_secid_asofdt;
CREATE INDEX ix_holding_cust_acctid_secid_asofdt
ON public.holding_cust
USING btree
(account_id, security_id, trade_date DESC);
-- Index: public.ix_holding_cust_security_id
-- DROP INDEX public.ix_holding_cust_security_id;
CREATE INDEX ix_holding_cust_security_id
ON public.holding_cust
USING btree
(security_id);
-- Index: public.ix_holding_cust_trade_date
-- DROP INDEX public.ix_holding_cust_trade_date;
CREATE INDEX ix_holding_cust_trade_date
ON public.holding_cust
USING btree
(trade_date);
Two things:
As Dmitry pointed out, you should look at creating an Index on market_value_date field. Its possible that post that you have a completely different query plan, which may or may not bring up other bottlenecks, but it should certainly remove this seq-Scan.
Minor (since I doubt if it affects performance), but secondly, if you aren't enforcing field length by design, you may want to change createdby field to TEXT. As can be seen in the query, its trying to cast all createdby fields to TEXT for this query.

Spring Batch Repository: JdbcJobInstanceDao

I have gone through the grepcode for JdbcJobInstanceDao and find out this code snippet which I am trying hard to understand.
According to the Spring Batch Repository Schema,
CREATE TABLE BATCH_JOB_INSTANCE (
JOB_INSTANCE_ID BIGINT NOT NULL PRIMARY KEY ,
VERSION BIGINT ,
JOB_NAME VARCHAR(100) NOT NULL,
JOB_KEY VARCHAR(32) NOT NULL,
constraint JOB_INST_UN unique (JOB_NAME, JOB_KEY)
) ENGINE=InnoDB;
JOB_NAME is actually Unique. But however in the JdbcJobInstanceDao#getJobInstances(String jobName, int start,int count) method, it is treated as if a list of entries can be existed in the BATCH_JOB_INSTANCE table for the same JOB_NAME.
Is this a possibility? Please explain.
The JOB_NAME is not unique. The combination of JOB_NAMEand JOB_KEY (which is a hash of the job parameters) is unique.
So multiple instances of the same job can exist, as long as they have different job parameters.

Auto_increment in oracle

I am student
i doing assignement , do auto increment
How to create an auto increment in Oracle ?
CREATE TABLE mua_thi
(
mamuathi varchar2(10) not null,
check(mamuathi like 'MT%')
)
mamuathi = MT + auto_increment;
create or replace trigger tangmuathi
before insert or update
on mua_thi
begin
set new.mamuathi := MT + muathitang.nextval from Dual;
end;
create sequence muathitang start
with 1 increment by 1;
mamuathi = MT + auto_increment;
Don't structure your table like that. That's a smart key (a single string with several components concatenated). Smart keys are dumb. If the "MT" is crucial (why have a key with a hardcoded, unchanging element?) make it a separate column.
CREATE TABLE mua_thi ( mamuathi varchar2(2) not null
, id number (8) not null
, primary key (mamuathi, id )
, check(mamuathi = 'MT')
);
Actually there's still some bad practice there. One, name the constraints - it makes life easier:
, constraint mt_pk primary key (mamuathi, id )
, constraint mt_ck check(mamuathi = 'MT')
Two, if mamuathi is genuinely a constant it's pointless using it in the key:
, constraint mt_pk primary key ( id )
Three, mamuathi may evolve to several values , so think about whether a foreign key to a look-up table might be better.
Obviously the drawback to splitting a smart key is the need to refrence multiple columns. In 11g we can use the virtual column feature to avoid that inconveience:
CREATE TABLE mua_thi ( mamuathi varchar2(2) not null
, id number (8) not null
, mamuathi_disp AS mamuathi||lpad(id,8,'0')
, primary key (mamuathi, id )
, check(mamuathi = 'MT')
);

Resources