How to merge two tables with same schema in Talend avoiding duplicates? - etl

I have two tables TableA and TableB
TableA looks similar to following:
customerId | name | email |telephone
-------------------------------------------------
00001 | Anne | anne#gmail.com | 123456
00002 | Ben | ben#gmail.com |
00003 | Ryan | ryan#yahoo.com |
TableB looks similar to following:
customerId | name | email | telephone
---------------------------------------------------
76105 | Anne | anne#gmail.com |
89102 | Ben | ben#gmail.com | 567890
23390 | Ryan | ryan#yahoo.com | 756541
43769 | Abby | abby#yahoo.com | 890437
I'm trying to achieve the following 2 tables.
TableC
customerId | name | email |telephone
-------------------------------------------------
00001 | Anne | anne#gmail.com | 123456
00002 | Ben | ben#gmail.com | 567890
00003 | Ryan | ryan#yahoo.com | 756541
TableD
customerId | name | email |telephone
-------------------------------------------------
43769 | Abby | abby#gmail.com | 890437
I was using a tmap with TableA as the main and the TableB as the look up. In the tmap I created an inner join between TableA and TableB using email as the foreign key. I wrote innerJoin outputs to one table and innerJoin rejects to another. However I find some of the records missing in TableC.
What is the correct way to achieve this in Talend DI?

I think the choice of the main and the lookup impact the reject catching, here is what you need :
tmap :
tFixedFlowInput : to simulate your data
tLogRow: to display output data

Related

Oracle SQL: Display single columns from multiple rows of a single table with Logic

Oracle SQL
I would like to look-up data from 2 rows of the same column from the same table together in an existing long query with multiple joins.
Current table set-up (single table):
Table: ACCOUNT_DETAILS
| TRX_ID | TYPE | FAC_ID | ACC_ID |
| ------ | ---- | ------ | ------ |
| 1234 | CRDR | ABC123 | AB1234 |
| 1234 | DBTR | XYZ222 | XY9800 |
| 9876 | CRDR | EFG999 | EF7659 |
| 9876 | DBTR | ABC123 | AB9900 |
Expected Result:
Table: REPORT
| TRX_ID | Counterparty FAC_ID | Counterparty ACC_ID |
| ------ | ------------------- | ------------------- |
| 1234 | XYZ222 | XY9800 |
| 9876 | EFG999 | EF7659 |
Logic needed:
If FAC_ID NOT LIKE 'ABC%' then refer to the Counterparty FAC_ID (e.g. for TRX_ID = 1234, it will refer to the DBTR FAC_ID and DBTR ACC_ID; TRX_ID = 9876, it will refer to the CRDR FAC_ID AND CRDR ACC_ID)
Example:
SELECT (CASE WHEN TYPE = 'DBTR' AND FAC_ID LIKE 'ABC%' THEN (SELECT FAC_ID FROM ACCOUNT_DETAILS WHERE TYPE = 'CRDR')
ELSE (SELECT FAC_ID FROM ACCOUNT_DETAILS WHERE TYPE = 'DBTR') END)
FROM ACCOUNT_DETAILS
I've tried options such as JOINs, UNIONs and subqueries but it does not work. I would like to have the Counterparty FAC_ID and Counterparty ACC_IDs in separate single lines in the query, as I will include it in a long query that I already have.
We can use CASE in a CTE to flag the rows where FAC_ID does not start with "ABC" and then use the flag in the WHERE clause.
See db<>fiddle here for schema.
with cte as
(
select
TRX_ID,
FAC_ID,
ACC_ID,
CASE WHEN FAC_ID LIKE 'ABC%' THEN 0 ELSE 1 END ordinal
FROM ACCOUNT_DETAILS)
SELECT
TRX_ID,
FAC_ID,
ACC_ID
FROM CTE
WHERE ordinal = 1;
TRX_ID | FAC_ID | ACC_ID
-----: | :----- | :-----
1234 | XYZ222 | XY9800
9876 | EFG999 | EF7659
db<>fiddle here

Convert raw query into laravel eloquent

I have this written and working as a raw SQL query, but I am trying to convert it to a more Laravel eloquent / query builder design instead of just a raw query.
My table structure like this:
Table One (Name model)
______________
| id | name |
|------------|
| 1 | bob |
| 2 | jane |
--------------
Table Two (Date Model)
_________________________________
| id | table_1_id | date |
|-------------------------------|
| 1 | 1 | 2000-01-01 |
| 2 | 1 | 2000-01-31 |
| 4 | 1 | 2000-02-28 |
| 5 | 1 | 2000-03-03 |
| 6 | 2 | 2000-01-03 |
| 7 | 2 | 2000-01-05 |
---------------------------------
I am returning only the the highest (most recent) dates from table 2 (Dates model) that match the user bob from table 1 (Name model).
For instance, in the example above, I return this from my query
2000-01-31
2000-02-28
2000-03-03
Here is what I am doing now (which works), but i'm just not sure how to use YEAR, MONTH and MAX with laravel.
DB::select(
DB::raw("
SELECT MAX(date) as max_date
FROM table_2
INNER JOIN table_1 ON table_1.id = table_2.table_1_id
WHERE table_1.name = 'bob'
GROUP BY YEAR(date), MONTH(date)
ORDER BY max_date DESC
")
);
Try this code if any problem then,
DB::table('table_1')->join('table_2', 'table_1.id','=','table_2.table_1_id')
->select(DB::raw('MAX(date) as max_date'),DB::raw('YEAR(date) year, MONTH(date) month'),'table_1.name')
->where('name','bob')
->groupBy('year','month')
->orderBy('max_date')
->get();
If any problem with above code then feel free to ask.

Automatically generating documentation about the structure of the database

There is a database that contains several views and tables.
I need create a report (documentation of database) with a list of all the fields in these tables indicating the type and, if possible, an indication of the minimum/maximum values and values from first row. For example:
.------------.--------.--------.--------------.--------------.--------------.
| Table name | Column | Type | MinValue | MaxValue | FirstRow |
:------------+--------+--------+--------------+--------------+--------------:
| Table1 | day | date | ‘2010-09-17’ | ‘2016-12-10’ | ‘2016-12-10’ |
:------------+--------+--------+--------------+--------------+--------------:
| Table1 | price | double | 1030.8 | 29485.7 | 6023.8 |
:------------+--------+--------+--------------+--------------+--------------:
| … | | | | | |
:------------+--------+--------+--------------+--------------+--------------:
| TableN | day | date | ‘2014-06-20’ | ‘2016-11-28’ | ‘2016-11-16’ |
:------------+--------+--------+--------------+--------------+--------------:
| TableN | owner | string | NULL | NULL | ‘Joe’ |
'------------'--------'--------'--------------'--------------'--------------'
I think the execution of many queries
SELECT MAX(column_name) as max_value, MIN(column_name) as min_value
FROM table_name
Will be ineffective on the huge tables that are stored in Hadoop.
After reading documentation found an article about "Statistics in Hive"
It seems I must use request like this:
ANALYZE TABLE tablename COMPUTE STATISTICS FOR COLUMNS;
But this command ended with error:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask
Do I understand correctly that this request add information to the description of the table and not display the result? Will this request work with view?
Please suggest how to effectively and automatically create documentation for the database in HIVE?

insert id number only in sql

I have a SQL Server table like this
+----+-----------+------------+
| id | acoount | date |
+----+-----------+------------+
| | John | 2/6/2016 |
| | John | 2/6/2016 |
| | John | 4/6/2016 |
| | John | 4/6/2016 |
| | Andi | 5/6/2016 |
| | Steve | 4/6/2016 |
+----+-----------+------------+
i want insert the id coloumn like this.
+-----------+-----------+------------+
| id | acoount | date |
+-----------+-----------+------------+
| 020616001 | John | 2/6/2016 |
| 020616002 | John | 2/6/2016 |
| 040616001 | John | 4/6/2016 |
| 040616002 | John | 4/6/2016 |
| 050616001 | Andi | 5/6/2016 |
| 040616003 | Steve | 4/6/2016 |
+-----------+-----------+------------+
I want to generate id number of the date provided like this. 02+06+16(from date)+001 = 020616001. if have same date, id + 1.
I have tried but still failed .
I want make it in oracle sql develop.
Someone help me.
Thanks.
Try the below SQL as per the given data, Its in SQL Server 2012....
select REPLACE(CONVERT(VARCHAR(10),convert(date,t.[date]), 101), '/', '')
+'00'+convert(varchar(2),row_number()over(partition by account,[date] order by t.[date])) as ID,
t.account,
t.date
from (values ('John','2/6/2016'),
('John','2/6/2016'),
('John','4/6/2016'),
('John','4/6/2016'),
('Andi','5/6/2016'),
('Steve','4/6/2016'))T(account,[date])
Update your table using statement .
update table set id= replace(CONVERT(VARCHAR(10),CONVERT(datetime ,date,103),3) ,'/', '') + Right('00'+convert(varchar(2),row_number()over(partition by account,[date] order by t.[date])) ,3)
MySql
i can give you the logic of 020616001 this part right now .......
for same id +1 i have to work on it....that i ll let u know after my work
insert into table_name(id)
select concat
(
if(length (day(current_date))>1,day(current_date),Concat(0,day(current_date))),
if(length (month(current_date))>1,month(current_date),Concat(0,month(current_date))),
(right(year(current_date),2)),'001'
)as id
you cannot convert your dates column to datetime type in normal way because it is dd/mm/yyyy.
Try this,
declare #t table(acoount varchar(50),dates varchar(20))
insert into #t values
('John','2/6/2016')
,('John','2/6/2016')
,('John','4/6/2016')
,('John','4/6/2016')
,('Andi','5/6/2016')
,('Steve','4/6/2016')
;With CTE as
(select * , SUBSTRING(dates,0,charindex('/',dates)) dd
,SUBSTRING(stuff(dates,1,charindex('/',dates),''),0, charindex('/',stuff(dates,1,charindex('/',dates),''))) MM
,right(dates,2) yy
from #t
)
,CTE1 as
(
select *
,ROW_NUMBER()over(partition by yy,mm,dd order by yy,mm,dd)rn from cte c
)
select *, REPLICATE('0',2-len(dd))+cast(dd as varchar(2))
+REPLICATE('0',2-len(MM))+cast(MM as varchar(2))
+yy+REPLICATE('0',3-len(rn))+cast(rn as varchar(2))
from cte1

LISTAGG function with two columns

I have one table like this (report)
--------------------------------------------------
| user_id | Department | Position | Record_id |
--------------------------------------------------
| 1 | Science | Professor | 1001 |
| 1 | Maths | | 1002 |
| 1 | History | Teacher | 1003 |
| 2 | Science | Professor | 1004 |
| 2 | Chemistry | Assistant | 1005 |
--------------------------------------------------
I'd like to have the following result
---------------------------------------------------------
| user_id | Department+Position |
---------------------------------------------------------
| 1 | Science,Professor;Maths, ; History,Teacher |
| 2 | Science, Professor; Chemistry, Assistant |
---------------------------------------------------------
That means I need to preserve the empty space as ' ' as you can see in the result table.
Now I know how to use LISTAGG function but only for one column. However, I can't exactly figure out how can I do for two columns at the sametime. Here is my query:
SELECT user_id, LISTAGG(department, ';') WITHIN GROUP (ORDER BY record_id)
FROM report
Thanks in advance :-)
It just requires judicious use of concatenation within the aggregation:
select user_id
, listagg(department || ',' || coalesce(position, ' '), '; ')
within group ( order by record_id )
from report
group by user_id
i.e. aggregate the concatentation of department with a comma and position and replace position with a space if it is NULL.

Resources