Currently, I have an endpoint that allows the update of multiple records. Before saving the changes to the database I need to validate the data that is being sent from the front end.
I am having an issue with some kinds of validations that require checks against all the other records in the database table. (Ex: ate intervals overlaps/gaps, unique pairs checks).
In these cases when I try to do the validations I have two sets of data
The data sent from the front end that are stored in memory/variable.
The data on the database.
For the validations to be run correctly I need a way to merge the data in memory(the updated records) with the data in a database(original records + other data that is not currently being updated).
Is there a good way of doing this that does not require loading everything on the memory and merging both datasets there?
Another idea I am thinking of is to open a database transaction set the new data to the database and then when executing the gaps/overlap check queries use dirty read. I don't know if this is a good approach though.
Extra notes:
I am using Oracle as a database and Dapper to communicate with it.
The tables that need validation usually hold millions of records.
The same issue is for the create endpoint.
Another example
I am trying to create entities. The create endpoint is called and it has these data on the body (date format dd/mm/yyy):
StartDate
EndDate
01/01/2022
05/01/2022
10/01/2022
11/01/2022
12/01/2022
15/01/2022
In database I have these records saved:
Id
StartDate
EndDate
1
06/01/2022
09/01/2022
2
16/01/2022
20/01/2022
I need to check if there are any gaps between the dates. If there are I need to send a warning to the user(data in the database can be invalid - the application has old data and I can't do anything about that at the moment).
The way I check for this right now is like by using the SQL below
WITH CTE_INNERDATA AS(
SELECT s.STARTDATE, s.ENDDATE
FROM mytable s
WHERE FK = somefkvalue
UNION ALL
SELECT :StartDt AS STARTDATE, :EndDt AS ENDDATE FROM DUAL -- this row contains the data from one of the rows that came form the front-end
),
CTE_DATA AS (
SELECT ctid.STARTDATE, ctid.ENDDATE, LAG(ctid.ENDDATE, 1) OVER (ORDER BY ctid.STARTDATE) AS PREV_ENDDATE FROM CTE_INNERDATA ctid
)
SELECT COUNT(1) FROM cte_data ctd
WHERE PREV_ENDDATE IS NOT NULL
AND PREV_ENDDATE < STARTDATE
Using this SQL query when validating the third row (12/01/2022 - 15/01/2022) there will be a gap between dates 09/01/2022 and 12/01/2022.
This issue would be fixed if instead of using union with a single row, to use it with all the rows send from the front-end but I can't figure out a way to do something like that.
#Update
I iterate through the records the frontend sent and call this method to check for gaps.
private async Task ValidateIntervalGaps(int someFkValue, DateTime startdate, DateTime endDate)
{
var connection = _connectionProvider.GetOpenedConnection();
var gapsCount = await connection.QueryFirstAsync<int>(#"<<Query from above>>",
new { StartDt = startdate, EndDt = endDate, somefkvalue= someFkValue });
if (gapsCount > 0)
{
// Add warning message here
}
}
I have coded the below code snippet, the values are storing in DB is correct but while fetching the carId after inserting its always returning 1 instead of actual value, I can't use order="AFTER" as already I have used one order in generating sequence, and cannot use annotations based way as our org code structure does not allows that. Can someone please identify what I am doing wrong in XML-based way of inserting and fetching data?
Note** - Using oracle DB
<insert id = "insertIntoCar" parameterType = "CarEntity" useGeneratedKeys="true" keyColumn="id" keyProperty="carId">
<selectKey keyProperty="carId" resultType="long" order="BEFORE">
SELECT CAR_ID_SEQ.nextval From DUAL
<selectKey>
INSERT INTO CAR (CAR_ID, CAR_TYPE, CAR_STATUS)
VALUES (#{carId}, #{carType}, #{carStatus})
<insert>
In Oracle (PROD), we will be creating views on table(s) and the users will be querying the views to fetch data for each reporting period (a single month, eg: between '01-DEC-2015' and '31-DEC-2015'). We created a view as
CREATE OR REPLACE VIEW VW_TABLE1 AS SELECT ACCNT_NBR, BIZ_DATE, MAX(COL1) COL1, MAX(COL2) COL2 FROM TABLE1_D WHERE BIZ_DATE IN (SELECT BIZ_DATE FROM TABLE2_M GROUP BY BIZ_DATE) GROUP BY ACCNT_NBR, BIZ_DATE;
The issue here is TABLE1_D (daily table, has data from Dec2015 to Feb2016) has records with multiple dates for a month say for Dec2015, it has records with 01-DEC-2015, 02-DEC-2015,....,29-DEC-2015,30-DEC-2015 (may not be continuous, but loaded on business date) with each day having close to 2,500,000 of records.
TABLE2_M is a monthly table and has a single date for a month (eg for Dec2015 say 30-DEC-2015) with around 4000 records for each date.
When we query the view as
SELECT * FROM VW_TABLE1 WHERE BIZ_DATE BETWEEN '01-DEC-2015' AND '31-DEC-2015'
it returns the aggregated data in table TABLE1_D for 30-DEC-2015 as expected. I thought the Grouping on BIZ_DATE in TABLE1_D is unnecessary as only one BIZ_DATE will be the output from the INNER query.
Checked by removing the BIZ_DATE in the final GROUP BY assuming that there will be data for a single day from the inner query.
Hence took 2 rows for the dates 30-dec-2015 and 30-jan-2016 from both tables and created them in SIT for testing and created view as
CREATE VIEW VW_TABLE1 AS SELECT ACCNT_NBR, MAX(BIZ_DATE) BIZ_DATE, MAX(COL1) COL1, MAX(COL2) COL2 FROM TABLE1_D WHERE BIZ_DATE IN (SELECT BIZ_DATE FROM TABLE2_M GROUP BY BIZ_DATE) GROUP BY ACCNT_NBR;
The select with between for each month (or = exact month date) gives correct data in SIT; i.e., when used BETWEEN for a single month, it produces the respective months data.
SELECT * FROM VW_TABLE1 WHERE BIZ_DATE BETWEEN '01-DEC-2015' AND '31-DEC-2015';
SELECT * FROM VW_TABLE1 WHERE BIZ_DATE = '30-DEC-2015';
With this, I modified the view DDL in PROD (to be same as SIT). But surprisingly the same select (2nd one with ='30-DEC-2015' ; 1st one was taking too long due to volume of data, hence aborted)
returned no data; as I hope that the inner query is sending out dates all 30-DEC-2015 to 30-JAN-2016 and thereby the MAX(BIZ_DATE) is being derived to be from 30-jan-2016. (Table2_M doesn't have FEB2016 data)
I verified whether there was any version differences of Oracle in SIT and PROD and found it to be same from v$version (11.2.0.4.0). Can you please explain this behavior as the same query on same view DDL in different environments returning different results with same data ...
Using h2 in embedded mode, I am restoring an in memory database from a script backup that was previously generated by h2 using the SCRIPT command.
I use this URL:
jdbc:h2:mem:main
I am doing it like this:
FileReader script = new FileReader("db.sql");
RunScript.execute(conn,script);
which, according to the doc, should be similar to this SQL:
RUNSCRIPT FROM 'db.sql'
And, inside my app they do perform the same. But if I run the load using the web console using h2.bat, I get a different result.
Following the load of this data in my app, there are rows that I know are loaded but are not accessible to me via a query. And these queries demonstrate it:
select count(*) from MY_TABLE yields 96576
select count(*) from MY_TABLE where ID <> 3238396 yields 96575
select count(*) from MY_TABLE where ID = 3238396 yields 0
Loading the web console and using the same RUNSCRIPT command and file to load yields a database where I can find the row with that ID.
My first inclination was that I was dealing with some sort of locking issue. I have tried the following (with no change in results):
manually issuing a conn.commit() after the RunScript.execute()
appending ;LOCK_MODE=3 and the ;LOCK_MODE=0 to my URL
Any pointers in the right direction on how I can identify what is going on? I ended up inserting :
Server.createWebServer("-trace","-webPort","9083").start()
So that I could run these queries through the web console to sanity check what was coming back through JDBC. The problem happens consistently in my app and consistently doesn't happen via the web console. So there must be something at work.
The table schema is not exotic. This is the schema column from
select * from INFORMATION_SCHEMA.TABLES where TABLE_NAME='MY_TABLE'
CREATE MEMORY TABLE PUBLIC.MY_TABLE(
ID INTEGER SELECTIVITY 100,
P_ID INTEGER SELECTIVITY 4,
TYPE VARCHAR(10) SELECTIVITY 1,
P_ORDER DECIMAL(8, 0) SELECTIVITY 11,
E_GROUP INTEGER SELECTIVITY 1,
P_USAGE VARCHAR(16) SELECTIVITY 1
)
Any push in the right direction would be really appreciated.
EDIT
So it seems that the database is corrupted in some way just after running the RunScript command to load it. As I was trying to debug to find out what is going on, I tried executing the following:
delete from MY_TABLE where ID <> 3238396
And I ended up with:
Row not found when trying to delete from index "PUBLIC.MY_TABLE_IX1: 95326", SQL statement:
delete from MY_TABLE where ID <> 3238396 [90112-178] 90112/90112 (Help)
I then tried dropping and recreating all my indexes from within the context, but it had no effect on the overall problem.
Help!
EDIT 2
More information: The problem occurs due to the creation of an index. (I believe I have found a bug in h2 and I have working on creating a minimal case that reproduces it). The simple code below will reproduce the problem, if you have the right set of data.
public static void main(String[] args)
{
try
{
final String DB_H2URL = "jdbc:h2:mem:main;LOCK_MODE=3";
Class.forName("org.h2.Driver");
Connection c = DriverManager.getConnection(DB_H2URL, "sa", "");
FileReader script = new FileReader("db.sql");
RunScript.execute(c,script);
script.close();
Statement st = c.createStatement();
ResultSet rs = st.executeQuery("select count(*) from MY_TABLE where P_ID = 3238396");
rs.next();
if(rs.getLong(1) == 0)
System.err.println("It happened");
else
System.err.println("It didn't happen");
} catch (Throwable e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
I have reduced the db.sql script to about 5000 rows and it still happens. When I attempted to go to 2500 rows, it stopped happening. If I remove the last line of the db.sql (which is the index creation), the problem will also stop happening. The last line is this:
CREATE INDEX PUBLIC.MY_TABLE_IX1 ON PUBLIC.MY_TABLE(P_ID);
But the data is an important player in this. It still appears to only ever be the one row and the index somehow makes it inaccessible.
EDIT 3
I have identified the minimal data example to reproduce. I stripped the table schema down to a single column, and I found that the values in that column don't seem to matter -- just the number of rows. Here is the contents of (snipped with obvious stuff) of my db.sql generated via the SCRIPT command:
;
CREATE USER IF NOT EXISTS SA SALT '8eed806dbbd1ea59' HASH '6d55cf715c56f4ca392aca7389da216a97ae8c9785de5d071b49de5436b0c003' ADMIN;
CREATE MEMORY TABLE PUBLIC.MY_TABLE(
P_ID INTEGER SELECTIVITY 100
);
-- 5132 +/- SELECT COUNT(*) FROM PUBLIC.MY_TABLE;
INSERT INTO PUBLIC.MY_TABLE(P_ID) VALUES
(1),
(2),
(3),
... snipped you obviously have breaks in the bulk insert here ...
(5143),
(3238396);
CREATE INDEX PUBLIC.MY_TABLE_IX1 ON PUBLIC.MY_TABLE(P_ID);
But that will recreate the problem. [Note that my numbering skips a number every time there was a bulk insert line. So there really is 5132 rows, though you see 5143 select count(*) from MY_TABLE yields 5132]. Also, I seem to be able to recreate the problem in the WebConsole directly now by doing:
drop table MY_TABLE
runscript from 'db.sql'
select count(*) from MY_TABLE where P_ID = 3238396
You have recreated the problem if you get 0 back from the select when you know you have a row in there.
Oddly enough, I seem to be able to do
select * from MY_TABLE order by P_ID desc
and I can see the row at this point. But going directly for the row:
select * from MY_TABLE where P_ID = 3238396
Yields nothing.
I just realized that I should note that I am using h2-1.4.178.jar
The h2 folks have already apparently resolved this.
https://code.google.com/p/h2database/issues/detail?id=566
Just either need to get the code from version control or wait for the next release build. Thanks Thomas.
I'm merging several tables in Oracle 10g, into a consolidated table, like this:
table_A (will have all the records)
table_b -part of the data to be merged
table_c -part of the data to be merged
table_d -part of the data to be merged
now, i run it with error logging like this
MERGE INTO TABLE_A A USING (SELECT * FROM TABLE_B) B
ON
(
A.NOMBRE=B.NOMBRE AND
A.PRIMER_APELLIDO=B.PRIMER_APELLIDO AND
A.SEGUNDO_APELLIDO=B.SEGUNDO_APELLIDO AND
TO_CHAR(A.FECHA_NACIMIENTO,'DD/MM/YYYY')=TO_CHAR(B.FECHA_NACIMIENTO,'DD/MM/YYYY') AND
A.SEXO=B.SEXO
)
WHEN MATCHED THEN
UPDATE SET DGP2011='1'
WHEN NOT MATCHED THEN
INSERT
(
A.FOLIO_RELACIONADO,
A.CVE_PROGRAMA,
A.FECHA_ALTA,
A.PRIMER_APELLIDO,
A.SEGUNDO_APELLIDO,
A.NOMBRE,
A.FECHA_NACIMIENTO,
A.SEXO,
A.CVE_NACIONALIDAD,
A.CVE_ENTIDAD_NACIMIENTO,
A.CVE_GRADO_ESCOLAR,
A.CVE_GRADO_ESTUDIOS,
A.CURP,
A.CALLE,
A.NUM_EXT,
A.NUM_INT,
A.CODIGO_POSTAL,
A.ENTRE_CALLE,
A.Y_CALLE,
A.OTRA_REFERENCIA,
A.TELEFONO,
A.COLONIA,
A.LOCALIDAD,
A.CVE_MUNICIPIO,
A.CVE_ENTIDAD_FEDERATIVA,
A.CVE_CCT,
A.PRIMER_APELLIDO_C,
A.SEGUNDO_APELLIDO_C,
A.NOMBRE_C,
A.FECHA_NACIMIENTO_C,
A.SEXO_C,
A.CVE_ESTADO_CIVIL_C,
A.CVE_GRADO_ESTUDIOS_C,
A.CVE_PARENTESCO_C,
A.CURP_C,
A.CVE_TIPO_ID_OFCL_C,
A.ID_DOCTO_OFL_C,
A.CVE_NACIONALIDAD_C,
A.CVE_ENTIDAD_NACIMIENTO_C,
A.CALLE_C,
A.NUM_EXT_C,
A.NUM_INT_C,
A.CODIGO_POSTAL_C,
A.ENTRE_CALLE_C,
A.Y_CALLE_C,
A.OTRA_REFERENCIA_C,
A.TELEFONO_C,
A.COLONIA_C,
A.LOCALIDAD_C,
A.CVE_MUNICIPIO_C,
A.CVE_ENTIDAD_FEDERATIVA_C,
A.E_MAIL_C,
A.DGP2011
)
VALUES
(
B.FOLIO_RELACIONADO,
B.CVE_PROGRAMA,
B.FECHA_ALTA,
B.PRIMER_APELLIDO,
B.SEGUNDO_APELLIDO,
B.NOMBRE,
TO_CHAR(B.FECHA_NACIMIENTO,'DD/MM/YYYY'),
B.SEXO,
B.CVE_NACIONALIDAD,
B.CVE_ENTIDAD_NACIMIENTO,
B.CVE_GRADO_ESCOLAR,
B.CVE_GRADO_ESTUDIOS,
B.CURP,
B.CALLE,
B.NUM_EXT,
B.NUM_INT,
B.CODIGO_POSTAL,
B.ENTRE_CALLE,
B.Y_CALLE,
B.OTRA_REFERENCIA,
B.TELEFONO,
B.COLONIA,
B.LOCALIDAD,
B.CVE_MUNICIPIO,
B.CVE_ENTIDAD_FEDERATIVA,
B.CVE_CCT,
B.PRIMER_APELLIDO_C,
B.SEGUNDO_APELLIDO_C,
B.NOMBRE_C,
TO_CHAR(B.FECHA_NACIMIENTO_C,'DD/MM/YYYY'),
B.SEXO_C,
B.CVE_ESTADO_CIVIL_C,
B.CVE_GRADO_ESTUDIOS_C,
B.CVE_PARENTESCO_C,
B.CURP_C,
B.CVE_TIPO_ID_OFCL_C,
B.ID_DOCTO_OFL_C,
B.CVE_NACIONALIDAD_C,
B.CVE_ENTIDAD_NACIMIENTO_C,
B.CALLE_C,
B.NUM_EXT_C,
B.NUM_INT_C,
B.CODIGO_POSTAL_C,
B.ENTRE_CALLE_C,
B.Y_CALLE_C,
B.OTRA_REFERENCIA_C,
B.TELEFONO_C,
B.COLONIA_C,
B.LOCALIDAD_C,
B.CVE_MUNICIPIO_C,
B.CVE_ENTIDAD_FEDERATIVA_C,
B.E_MAIL_C,
'1'
)LOG ERRORS INTO ELOG_SEGURO_ESCOLAR REJECT LIMIT UNLIMITED;
and it just raises the error "ORA-01722: invalid number" and toad highlights the 'A.' part of the query.
Now about the tables
table A has all the fields in varchar2 (4000)
table b to d have formatting according to the data they hold (date, number, etc)
the thing is, even with the error logging clause it raises the error and doesn't merge anything!
Plus i have no idea what i should be looking for to find the 'invalid number' field
Any advice would be deeply appreciated
Found it!
It was the TO_CHAR(A.FECHA_NACIMIENTO,'DD/MM/YYYY') line. Just left it like this
A.FECHA_NACIMIENTO=B.FECHA_NACIMIENTO and it worked. Thanks anyway!