talend etl tmap join tfileInputExcel and Date dimension error

talend etl tmap join tfileInputExcel and Date dimension error - etl

I created a job containing tMap join date dimension:
id date
date_du_jour
libelle_mois
..
and tFileInputExcel:
id
mois
..
to load a fact table but i get an error:
0 row input in my fact table.
Below is how my package looks:
I need your proposition.

Related

NIFI - How to insert distinct data from the flow and refer to that data ID in other places

I'm trying to learn NIFI so this is all new to me, I used to work with Talend and I have hard time translating to NIFI. So the main idea: For example is I have two tables in Postgresql
Table CITY :
ID (auto generated), city_name
Table PERSON :
ID (auto generated), first_name, last_name, city_id
and I have a CSV file :
first_name, last_name, city_name
Can you please explain how I can insert in tow tables from one flowfile and refer in the table PERSON to the ID of the city not the name from the table CITY.
Thank you

you could use LookupRecord to enrich each record with city id and split input in two files: matched/unmatched.
for matched you have to execute simple insert into PERSON table - because city id was found.
for unmatched you have to generate insert/upsert into CITY table and then route all those records to lookup record again.
or you could insert everything as is into temp table with structure that matches your CSV.
and then execute 2 simple sql statements:
populate missing cities from temp table
insert into person from temp table with lookup city id on the level of SQL server

How to use derived columns in same hive table?

Could you please help me below query.
Suppose there is table employee and columns A , B and Date column.
I have to load data from table employee to another table emp with below transformation applied
Transformation in Employee table
Absolute value of column A - (column name in emp wil be ABS_A)
Absolute value of column B -(column name in emp wil be ABS_B)
Find the sum(ABS_A) for a given Date column
4.Find the sum(ABS_b) for a given Date column
Find sum(ABS_A)/sum(ABS_B) - column name will be Average.
So the Final table emp will have below columns
1.A
2.B
3.ABS_A
4.ABS_B
5.Average
How to handle such derived column in hive?
I tried below query but now working. could anyone guide me.
insert overwrite into emp
select
A,
B,
ABS(A) as ABS_A,
ABS(B) as ABS_B,
sum(ABS_A) OVER PARTION BY DATE AS sum_OF_A,
sum(ABS_B) OVER PARTTION BY DATE AS sum_of_b,
avg(sum_of_A,sum_of_b) over partition by date as average
from employee

Hive does not support using derived columns in the same subquery level. Use subqueries or functions in place of column aliases.
insert overwrite table emp
select A, B, ABS_A, ABS_B, sum_OF_A, sum_of_b, `date`, sum_OF_A/sum_of_b as average
from
(
select A, B, ABS(A) as ABS_A, ABS(B) as ABS_B, `date`,
sum(ABS(A)) OVER (PARTTION BY DATE) AS sum_OF_A,
sum(ABS(B)) OVER (PARTTION BY DATE) AS sum_of_b
from employee
)s;

Loading Data into an empty Impala Table with account data partitioned by area code

I'm trying to copy data from a table called accounts into an empty table called accounts_by_area_code. I have the following fields in accounts_by_area_code: acct_num INT, first_name STRING, last_name STRING, phone_number STRING. The table is partitioned by areacode (the first 3 digits of phone_number.
I need to use a SELECT statement to extract the area code into an INSERT INTO TABLE command to copy the speciﬁed columns to the new table, dynamically partitioning by area code.
This is my last attempt:
impala-shell -q "INSERT INTO TABLE accounts_by_areacode (acct_num, first_name, last_name, phone_number, areacode) PARTITION (areacode) SELECT STRLEFT (phone_number,3) AS areacode FROM accounts;"
This generates ERROR: AnalysisException: Column permutation and PARTITION clause mention more columns (5) than the SELECT / VALUES clause and PARTITION clause return (1). I'm not convinced I have even the basic syntax correct so any help would be great as I'm new to Impala.

Impala creates partitions dynamically based on data. So not sure why you want to create an empty table with partitions because it will be auto created while inserting new data.
Still, I think you can create empty table with partitions like this-
impala-shell -q "INSERT INTO TABLE accounts_by_areacode (acct_num) PARTITION (areacode)
SELECT CAST(NULL as STRING), STRLEFT (phone_number,3) AS areacode FROM accounts;"

Hive partition table by month from daily timestamp?

Is it possible to create partition like 01 from date like 2017-01-02' where 01 is month ?
I have daily sales record and I need to do query like select * from sales where month = '01'. So it will be better if I could partition my daily sales by month.but my data has date of format 2017-01-01 and doing
create table tl (columns ......) partitioned by (date <datatype> ) will create partition on daily basis which is the last thing I want .
I need to create partition dynamically.

CAUTION:- You need to escape date column(by using ` i.e. backtick around column name) in create statement. Because date is a datatype in hive.
You can create partitions dynamically:-
by setting below parameter in query.
set hive.exec.dynamic.partition.mode=nonstrict;
Along with that you need to select only month part from source table:-
insert into table sales partition(date) select columns...,SUBSTR(date,5,2) from source_table
This insert statement will create partitions like.
show partitions sales
date=01
date=02
date=03
date=04

Hive static partitions issue

I have a csv file which have 600 records, 300 for male and female each.
I have created a Table_Temp and fill all these records in that table. Then, I create Table_Main with gender as partition column.
For Temp_Table query is:
Create table if not exists Temp_Table
(id string, age int, gender string, city string, pin string)
row format delimited
fields terminated by ',';
Then I write the below query:
Insert into Table_Main
partitioned (gender)
select a,b,c,d,gender from Table)Temp
Problem: I am getting a file in /user/hive/warehouse/mydb.db/Table_Main/gender=Male/000000_0
In this file, I am getting total 600 records. I am not sure whats happening but what I was expected is I should get 300 records in this file(only Male).
Q:1. Where am I mistaken ?
Q:2. Should I not get one more folder for all other values(which are not in static partition) ? If NOT, what will happen to those ?

In static partition we need to specify a where condition while inserting data into partition table.(which I have not done).
For this we can use dynamic partition without where condition.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

talend etl tmap join tfileInputExcel and Date dimension error - etl

I created a job containing tMap join date dimension: id date date_du_jour libelle_mois .. and tFileInputExcel: id mois .. to load a fact table but i get an error: 0 row input in my fact table. Below is how my package looks: I need your proposition.

Related

NIFI - How to insert distinct data from the flow and refer to that data ID in other places

How to use derived columns in same hive table?

Loading Data into an empty Impala Table with account data partitioned by area code

Hive partition table by month from daily timestamp?

Hive static partitions issue

Categories

Resources