more efficent way of reading data from two table and writing them in a new one using batch - spring

I'm trying to write a spring batch to move data from two tables to a single table. I'm having a problem now and I thought of many ways to solve this problem but I'm still wondering if there is a more efficent solution to my problem?
Basically the problem is, I have two tables lets call them table A and table B and their structure is as the following:
table A
column 1A column 2A
======== ========
bmw 123555
nissan 123456777
audi 12888
toyota 9800765
kia 85834945
table B
column 1B column 2B
======== ========
12 caraudi
123456 carnissan
123 carbmw
0125 carvvv
88963 carbbn
what I'm trying to do is to create a table c from the batch's wrtier which holds all the data from table B (column 1B and column 2B)and column 1A only without losing any data from both tables and without writing duplicated data based on column 2A and column 1B. column A and column B have only one column in common (coulmn 1B == column 2A) but column 2A has a 3 digits suffix added to each id so if we do a join and compare I have to use a substr method and it will be very slow coz I have huge tables.
The other solution I thinked of is to have a reader for table A and write all results to tempA table without the suffix, then another reader that compare tables tempA and table B and write the data to table c as the following
table c
column 1A ( can be nullable because not all the records in column 2A exists in column 1B)
column 1B
column 2B
so the table will look like this
table C
column 1c column 2c column 3c
========= ========= =========
12 caraudi audi
123456 carnissan nissan
123 carbmw bmw
0125 carvv
88963 carbbn
9800765 toyota
85834945 kia
is this the bet way to solve the problem? or is there any other way that is more efficient?
thanks in advance!

Before giving up on a LEFT OUTER JOIN from tableA to tableB (or a FULL OUTER JOIN if your query conditions require it) consider using db2expln or the Visual Explain utility in IBM Data Studio to determine the cost of some alternative ways to perform a "begins with" match on VARCHAR columns:
ON a.col2a LIKE b.col1b || '___'
ON a.col2a >= b.col1b || '000' AND a.col2a <= b.col1b || '999'
If 1b is a CHAR column, you might need to trim off its trailing spaces before concatenating additional characters to it: RTRIM( b.col1b ) || '000'
Assuming column 1b is indexed, one prefix-based matching predicate or another is bound to make a join between those two tables less expensive than creating, populating, and joining to your own temp table. If I'm wrong (or there are other complicating factors) and a temp table ends up being the best option, be sure to use a declared global temporary table (DGTT) so you can avoid the logging overhead of populating it.

Related

Power Bi count rows for all tables in one measure

In my power Bi I would like to count rows for all my tables and having this output:
Table Name
Row count
Table1
126
Table2
985
Table3
998
...
...
As long as I have few tables I can do
NEWTABLE = UNION(
ROW("TableName","Table1", "Rowcount",ROWSCOUNT(Table1)),
ROW("TableName","Table2", "Rowcount",ROWSCOUNT(Table2)),
...
)
But this starts to be complicated when I have many tables.
Is there a way I can do it? Like a loop or something?
Thank you
If you only need a metrics then you can use DaxStudio -> ViewMetrics
where cardinality is your "rowCounts"
If you need something more, then you can get all table name from DMV
select * from $SYSTEM.TMSCHEMA_TABLES
populate this as another table in your model, and use M language to loop through.
here useful example:
https://community.powerbi.com/t5/Power-Query/Power-query-Counting-rows-from-all-table-in-query-editor-but-not/td-p/1198489

HIVE QL: split column into 2 columns and rank each column

code
select c1,c2,c3,c4,c5,c6
from table
where c5 in ('a', 'b')
From here, I want to split column c5 into two columns and then rank those based on the value they have for c6. One column should be made up of all a results, and the other should be all b results. I have been able to rank them using rank() over, but have been unable to split the columns apart. I haven't gotten the techniques other people have used to work for me.
select c1,c2,c3,c4,c5,c6, rank() over (partition by ... order by case when c5='a' then 1 case when c5='b' then 2 end) as rnk;
I do not understand completely what c5 contains exactly. Replace conditions like when c5='a' in case with yours.

How to filter rows in a table based on values in another table in power query

I have two tables in power query.
Price table
Date Company Price
01/01/2000 A 10
01/02/2000 A 12
01/03/2000 A 15
01/01/2000 B 15
01/02/2000 B 85
01/03/2000 B 98
Size table
Date Company Size
01/06/2000 A 10
01/06/2001 A 12
01/06/2002 A 15
01/06/2000 B 15
01/06/2001 B 85
01/06/2002 B 98
In Price table, I want only to have companies which are in size table. In other words, If company C is not in the size table, I do not need that company data points in the price table. Here no need to consider the date.
In Power Query you can use the Merge Queries function to achieve that. (In the Home --> Combine section of the ribbon.
Select the Join Kind to determine which rows to keep.
In your example, create a query from the 2nd table and apply the following steps:
Remove the date and the size column
Remove duplicates
Afterwards you can join the first table with the newly created query and do a inner join. (Only keep matching entries)

talend - ignore row if all columns except first have no value

I have the following table:
date c1 c2 ... cn
01/01 2 3 ... 4
01/02 ...
01/03 ...
What is the easiest way to filter out the rows where all except the date column have no value? (in this example, the rows with date 01/02 and 01/03)
The easiest way is to setup an input component and change its schema a bit by saying in the schema definition that a value is mandatory, and these records should be ignored

How to know if a record DOESN'T exists on a table in Oracle

I'm dealing whit this for a couple of hours and I can't find the way to get the answer.
I've a table with a maximun of 4 records for a product (let's call it that way) for a diferent period (column name with a number). I'm trying to return the ones that DO NOT has a particular type of CONSUMPTION_TYPE_ID. But it doesn't work.
I'll explain it simple. I've a table with these fields (there are more, but these one are just fine)
product_id - CONSUMPTION_TYPE_ID - consumption_period
123 103 1
123 104 1
123 107 1
123 108 1
I need to return the ones that don't has one particular type of consumption, let's say that the type 107 is missing (the row doesn't exists), the select query should show the other 3 or any present. I don't mind doing the same select 4 times, I could also try to do a cursor for it and use loop to check every one. The point is, that the type of query with "not in" or "not exists" doesn't work. It gives me a result like the one given below, but when I query the "consumption_period" it shows me the missing "CONSUMPTION_TYPE_ID" and that's because the "not in" clause it's only hidding the results.
this is what I need.
select * from t1 where CONSUMPTION_TYPE_ID != 108;
product_id - CONSUMPTION_TYPE_ID - consumption_period
123 103 1
123 104 1
123 107 1
I hope you can help me with this. I'm stucked, it maybe simple, but I'm having one of those stucked times. Thanks in advance for any help
You probably should've posted that NOT EXISTS query that doesn't work, because that is the right way to do this.
If I got your requirements right: all products that do not have a record for a specific consumption_type_id.
SELECT DISTINCT product_id
FROM t1 t
WHERE NOT EXISTS
(SELECT 1 FROM t1
WHERE t.product_id = product_id
AND Consumption_Type_ID = ?)
The obvious answer here is to search for CONSUMPTION_TYPE_ID = 108 and have the surrounding code check for a lack of rows, rather than the existence of rows.
If you really need a row return for each consumption_type_id that's not in this table, then you should probably be selecting from the lookup table for consumption_type_id:
select *
from consumption_type ct
where not exists (select *
from t1
where t1.consumption_type_id = ct.consumption_type_id)
and ct.consumption_type_id = 108

Resources