talend - ignore row if all columns except first have no value - filter

I have the following table:
date c1 c2 ... cn
01/01 2 3 ... 4
01/02 ...
01/03 ...
What is the easiest way to filter out the rows where all except the date column have no value? (in this example, the rows with date 01/02 and 01/03)

The easiest way is to setup an input component and change its schema a bit by saying in the schema definition that a value is mandatory, and these records should be ignored

Related

Complex count row etl requirement

I have a requirement related as below
1-If there is employee record then count the number of rows
a-if there are four rows then follow the layout 1,
and populate the column1 and column 2 with values in report and ltrimrtrim
b- if there are three rows, then follow the layout 2,
and hardcode the column 1 and column 2 with NULL
Otherwise, look for the employee record.
Couldn't get the logic, I used the router with as if column 1 and two null the send to layout two else 1. But the requirement is different.
router transformation, if null, layout one else 2
Step 1 - Use SRT>AGG>JNR to calculate count. create new column as count_all and set to COUNT(*). Please group by proper key columns.
Step 2 - Use RTR next to split data based on your condition.
group 1- count_all =4 then follow the layout 1 and...
group 2- count_all =3 then follow the layout 2 and...
group 3 - if count <3 then do employee record.

more efficent way of reading data from two table and writing them in a new one using batch

I'm trying to write a spring batch to move data from two tables to a single table. I'm having a problem now and I thought of many ways to solve this problem but I'm still wondering if there is a more efficent solution to my problem?
Basically the problem is, I have two tables lets call them table A and table B and their structure is as the following:
table A
column 1A column 2A
======== ========
bmw 123555
nissan 123456777
audi 12888
toyota 9800765
kia 85834945
table B
column 1B column 2B
======== ========
12 caraudi
123456 carnissan
123 carbmw
0125 carvvv
88963 carbbn
what I'm trying to do is to create a table c from the batch's wrtier which holds all the data from table B (column 1B and column 2B)and column 1A only without losing any data from both tables and without writing duplicated data based on column 2A and column 1B. column A and column B have only one column in common (coulmn 1B == column 2A) but column 2A has a 3 digits suffix added to each id so if we do a join and compare I have to use a substr method and it will be very slow coz I have huge tables.
The other solution I thinked of is to have a reader for table A and write all results to tempA table without the suffix, then another reader that compare tables tempA and table B and write the data to table c as the following
table c
column 1A ( can be nullable because not all the records in column 2A exists in column 1B)
column 1B
column 2B
so the table will look like this
table C
column 1c column 2c column 3c
========= ========= =========
12 caraudi audi
123456 carnissan nissan
123 carbmw bmw
0125 carvv
88963 carbbn
9800765 toyota
85834945 kia
is this the bet way to solve the problem? or is there any other way that is more efficient?
thanks in advance!
Before giving up on a LEFT OUTER JOIN from tableA to tableB (or a FULL OUTER JOIN if your query conditions require it) consider using db2expln or the Visual Explain utility in IBM Data Studio to determine the cost of some alternative ways to perform a "begins with" match on VARCHAR columns:
ON a.col2a LIKE b.col1b || '___'
ON a.col2a >= b.col1b || '000' AND a.col2a <= b.col1b || '999'
If 1b is a CHAR column, you might need to trim off its trailing spaces before concatenating additional characters to it: RTRIM( b.col1b ) || '000'
Assuming column 1b is indexed, one prefix-based matching predicate or another is bound to make a join between those two tables less expensive than creating, populating, and joining to your own temp table. If I'm wrong (or there are other complicating factors) and a temp table ends up being the best option, be sure to use a declared global temporary table (DGTT) so you can avoid the logging overhead of populating it.

HBase: How to skip row with specific column in hbase?

is there any filter can be used to skip rows that contains specific column ?
eg
name price invalid
r1 a 10
r2 b 5 1
r3 c 20
i just want row without invalid column (r1, r3)
i tried SingleColumnValueFilter but it always skip row when the column is missing
You can try this.. Use the same SingleColumnValue filter and and compare with a value that can never be set to that column, like if you are planning to store integer have a comparision with charecter!
SingleColumnValueFilter('ColumnFamily','Qualifier',!=,'binary:x',false,true)
Let me know if this works for you!

how to sort a column by slash in sql server

I don't get how to sort or order by column that contains values as following
abc/aa
aa
bb/cba
bb/aa
cc
Now I need the values in the column to be displayed as the values containing slash to be displayed last and those that don't have slash to be displayed at first.
Required Output
aa
cc
cba
abc/aa
bb/aa
bb/cba
Please guide me
Thanks in Advance
You don't provide your query, but the form will be
DECLARE #Tbl TABLE (CharVal VARCHAR(50))
INSERT INTO #Tbl VALUES ('abc/aa'),('aa'),('bb/cba'),('bb/aa'),('cc')
SELECT CharVal FROM #Tbl
ORDER BY CASE WHEN PATINDEX('%/%',CharVal) > 0 THEN 1 ELSE 0 END, CharVal
Output:
CharVal
aa
cc
abc/aa
bb/aa
bb/cba
EDIT: Corrected 0/1 reversal in the CASE statement that resulted in incorrect sort order, thanks #Aaron_Bertrand! Also added populating a temp table with the data and showing the output.

Stacked column Flash chart counting all values

I am building stacked column flash chart on my query. I would like to split values in column for different locations. For argument sake I have 5 ids in location 41, 3 ids in location 21, 8 ids in location 1
select
'' link,
To_Char(ENQUIRED_DATE,'MON-YY') label,
count(decode(location_id,41,id,0)) "location1",
count(decode(location_id,21,id,0)) "location2",
count(decode(location_id,1,id,0)) "location3"
from "my_table"
where
some_conditions = 'Y';
as a result of this query Apex is creating stacked column with three separate parts( hurray!), however it instead of having values 5,3 and 8, it returns three regions 16,16,16. ( 16 = 5 +3+8).
So obviously Apex is going through all decode conditions and adding all values.
I am trying to achieve something described in this
article
Apex doesn't appear to be doing anything funky, you'd get the same result running that query through SQL*Plus. When you do:
count(decode(location_id,41,id,0)) "location1",
.. then the count gets incremented for every row - it doesn't matter which column you include, and the zero is just treated as any fixed value. I think you meant to use sum:
sum(decode(location_id,41,1,0)) "location1",
Here each row is assigned either zero or one, and summing those gives you the number that got one, which is the number that had the specified id value.
Personally I'd generally use caseover decode, but the result is the same:
sum(case when location_id = 41 then 1 else 0 end) "location1",

Resources