How do I check for null column values with assertionMode = DEFAULT in DBUnit? - dbunit

I have a table with 2 rows and 34 columns. 3 of the columns should be null. When using assertionMode = NON-STRICT, I can specify 34-3=31 columns in my after.xml file, and DBUnit will realize that the 3 missing columns are supposed to be null. DBUnit ensures that the 2 rows and 34 columns in the table precisely match the 2 rows and 31 columns in my after.xml, and any columns I did not specify in after.xml (3 columns) will be assumed to be null which DBUnit verifies as well.
NON_STRICT no longer works, though, if I want to ensure that I have exactly 2 rows, no more no less (for example, if the table does have the 2 rows specified in the after.xml, it will pass the test regardless of if I have more rows or not).
So, I figured I should use assertionMode = DEFAULT. When I do this, the error I get is something like "expected columns is 34, but found 31 in table". It only found 31 because I only specified 31 in my after.xml. The other 3 are null, so I left them out of the after.xml per DBUnit docs. DEFAULT seems to make DBUnit expect 34 columns to be specified in my after.xml regardless of if they're null or not, am I missing something?
I have tried roundabout methods, like specifying a query to get me the 31 columns rather than the 34 (3 columns are null) and using the results of that query to compare to my after.xml. I have also just counted up the number of rows and ensured that I have 2 rows. But those seem to be roundabout methods.
Is there a way to specify columns as null in DBUnit other than leaving them out in my after.xml since DEFAULT doesn't seem to like that?
What is the proper way to specify columns as null in DBUnit with assertionMode = DEFAULT?

I have tried roundabout methods,... am I missing something?
To have dbUnit ignore columns from comparison, use an excluded columns table:
ITable filteredTable = DefaultColumnFilter.excludedColumnsTable(filteredTable, excludeColumns);
See Ignoring some columns in comparison
Is there a way to specify columns as null in DBUnit other than leaving
them out in my after.xml since DEFAULT doesn't seem to like that?
To have dbUnit insert nulls into database tables, use the ReplacementDataSet and setup a null replacement object, e.g. [NULL]. See ReplacementDataSet.

Related

Import massive table from Oracle to PostgreSQL with oracle-fdw return ORA-01406

I work on a project to transfer data from an Oracle database to a PostgreSQL database to create a datawarehouse with bash & SQL scripts. To access to the Oracle database, I use the PostgreSQL extension oracle-fdw.
One of my scripts import data from a massive table (~ 100 000 000 new rows/day). This table is partitioned and each partition contains 1 day of data. The query I use to import data looks like that :
INSERT INTO postgre_target_table (some_fields)
SELECT some_aggregated_fields -- (~150 fields)
FROM oracle_source_table
WHERE partition_id = :v_partition_id AND some_others_filters
GROUP BY primary_key;
On DEV server, the query works fine (there is much less data on this server) but in PREPROD, it returns the error ORA-01406: fetched column value was truncated.
In some posts, people say that the output fields may be too small but if I try to send a simple SELECT query without INSERT or GROUP BY I have the same error.
Another idea I found in another post is to create an Oracle side view but in my query I use multiple parameters that I cannot use in a view.
The last idea I found is to create an Oracle stored procedure that fills a table with aggregated data and then import data from this table but the Oracle database is critical and my customer prefers to avoid adding more data on it.
Now, I'm starting to think there's no solution and it's not good...
PostgreSQL version : 12.4 / Oracle version : 11.2
UPDATE
It seems my problem is more complecated than I thought.
After applying the modification given by Laurenz Albe, the query runs correctly on PGAdmin but the problem still appears when I use psql command.
Moreover, another query seems to have the same problem. This other query does not use the same source table as the first query, it uses 4 joined tables without any partition. The common point between these queries is the structure.
The detail I omit to specify in the original post is that the purpose of both queries is to pivot a table. They look like that :
SELECT osr.id,
MIN(CASE osr.category
WHEN 123 THEN
1
END) AS field1,
MIN(CASE osr.category
WHEN 264 THEN
1
END) AS field2,
MIN(CASE osr.category
WHEN 975 THEN
1
END) AS field3,
...
FROM oracle_source_table osr
WHERE osr.category IN (123, 264, 975, ...)
GROUP BY osr.id;
Now that I have detailed what the queries look like, I can give you some results I had with the second one without changing the value of max_long (this query is lighter than the first one) :
Sometimes it works (~10%), sometimes it failed (~90%) on PGadmin but it never works with psql command
If I delete the WHERE, it always works
I don't understand why deleting the WHERE change something, the field used in this clause is a NUMBER(6, 0) between 0 and 2500 and it is still used in the SELECT clause... Oh and in the 4 Oracle tables used by this query, there is no LONG datatype, only NUMBER datatype is used.
Among 20 queries I have, only these two have a problem, their structure is similar and I don't believe in coincidences.
Don't despair!
Set the max_long option on the foreign table big enough that all your oversized data fit.
The documentation has the details:
max_long (optional, defaults to "32767")
The maximal length of any LONG, LONG RAW and XMLTYPE columns in the Oracle table. Possible values are integers between 1 and 1073741823 (the maximal size of a bytea in PostgreSQL). This amount of memory will be allocated at least twice, so large values will consume a lot of memory.
If max_long is less than the length of the longest value retrieved, you will receive the error message
ORA-01406: fetched column value was truncated
Example:
ALTER FOREIGN TABLE my_tab OPTIONS (ADD max_long '1000000');

Oracle - Select Only Columns That Contain Data

We have a database with a vast number of tables and columns that was set up by a 3rd party.
Many of these columns are entirely unused. I am trying to create a query that returns a list of all the columns that are actually used (contain > 0 values).
My current attempt -
SELECT table_name, column_name
FROM ALL_TAB_COLUMNS
WHERE OWNER = 'XUSER'
AND num_nulls < 1
;
Using num_nulls < 1 dramatically reduces the number of returned values, as expected.
However, on inspection of some of the tables, there are columns missing from the results of the query that appear to have values in them.
Could anybody explain why this might be the case?
First of all, statistics are not always 100% accurate. They can be gathered on a subset of the table rows, since they are, after all, statistics. Just like pollsters do not have to ask every American how they feel about a given politician, Oracle can get an accurate-enough sense of the data in a table by reading only a portion of it.
Even if the statistics were gathered on 100% of the rows in a table (and they can be gathered that way, if you want), the statistics will become outdated as soon as there are any inserts, updates, or deletes on the table.
Second of all, num_nulls < 1 wouldn't tell you the columns that had no data. Imagine a table with 100 rows and "column X" having num_nulls equal to 80. That would imply the column has 20 non-null values, but would NOT pass your filter. A better approach (if you trust your statistics are not stale and based on a 100% sample of the rows), might be to compare DBA_TAB_COLUMNS.NUM_NULLS < DBA_TABLES.NUM_ROWS. For example, a column that has 99 nulls in a 100 row table has data in 1 row.
"there are columns missing from the results of the query that appear to have values in them."
Potentially every non-mandatory column could appear in this set, because it is likely that some rows will have values but not all rows. "Some rows" being greater than zero means such columns won't pass your test for num_nulls < 1.
So maybe you should search for columns which aren't in use. This query will find columns where every row is null:
select t.table_name
, tc.column_name
from user_tables t
join user_tab_cols tc on t.table_name = tc.table_name
where t.num_rows > 0
and t.num_rows = tc.num_nulls;
Note that if you are using Partitioning you will need to scan user_tab_partitions.num_rows and user_part_col_statistics.num_nulls.
Also, I second the advice others have given regarding statistics. The above query may throw out some false positives. I would treat the results generated from that query as a list of candidates to be investigated further. For instance you could generate queries which counted the actual number of nulls for each column.

How MAX of a concatenated column in oracle works?

In Oracle, while trying to concatenate two columns of both Number type and then trying to take MAX of it, I am having a question.
i.e column A column B of Number data type,
Select MAX(A||B) from table
Table data
A B
20150501 95906
20150501 161938
when I’m running the query Select MAX(A||B) from table
O/P - 2015050195906
Ideally 20150501161938 should be the output????
I am trying to format column B like TO_CHAR(B,'FM000000') and execute i'm getting the expected output.
Select MAX(A || TO_CHAR(B,'FM000000')) FROM table
O/P - 2015011161938
Why is 2015050195906 is considered as MAX in first case.
Presumably, column A is a date and column B is a time.
If that's true, treat them as such:
select max(to_date(to_char(a)||to_char(b,'FM000000'),'YYYYMMDDHH24MISS')) from your_table;
That will add a leading space for the time component (if necessary) then concatenate the columns into a string, which is then passed to the to_date function, and then the max function will treat as a DATE datatype, which is presumably what you want.
PS: The real solution here, is to fix your data model. Don't store dates and times as numbers. In addition to sorting issues like this, the optimizer can get confused. (If you store a date as a number, how can the optimizer know that '20141231' will immediately be followed by '20150101'?)
You should convert to number;
select MAX(TO_NUMBER(A||B)) from table
Concatenation will result in a character/text output. As such, it sorts alphabetically, so 9 appears after 16.
In the second case, you are specifiying a format to pad the number to six digits. That works well, because 095906 will now appear before 161938.

Select only Columns without Null values in Oracle

I have 20 columns in my table....
How can I select only columns that don't have null value
col1 col2 col3
20 12 null
Desired output
col1 col2
20 12
The semantics of SQL don't allow this - every SQL query includes a projection, by which you specify what columns you want in the output.
Unless you run the query twice, you can't know ahead of time what the results will be. In fact, even if you run the query twice, the results may change in between (unless you run it in serializable mode).
In other words, the question doesn't make a lot of sense.
On the other hand, if your requirement is to simply hide the column when displayed to the user, that's an entirely different question - one for which the answer does not lie in SQL, but in your presentation logic.
You can go to the table's metadata and check for the columns which are defined NOT NULL and create a select query with only those columns.

update dynamic lookup value

how can I update a value in a dynamic lookup in the follow manner:
say I have an integer column in the lookup and on the next lookup I get a new integer that should be added to the existing value.
how can i achieve it?
for example:
I have the empty source table
emp_id | value
row insert emp_id | value
1) 101, 1000 expecting insert 101 1000
2) 101, 1005 expecting update 101 2005
3) 101, -300 expecting update 101 1705
I hope my example is clear
There is an option within the "Properties" tab of the lookup transformation that will give you what you need.
However, if set this can cause performance degradation within the mapping and is something I generally try and avoid setting. I am usually able to avoid setting this parameter by including more logic in the Source Qualifier SQL override, e.g. in your example to ensure I only return the most recent record for each employee.

Resources