Using MD5 and missing some records in my output in Upsolver SQLake - upsolver

I'm losing some data in my output and need help to identify the issue. We create a hashkey using the below 3 columns, rest is pretty much straight forward select from data source Upsert on hashkey.
SET hashkey = MD5(advertiser_id || marketplace_id || retailer);
SELECT hashkey,
col1, col2, col3...
REPLACE ON DUPLICATE hashkey

MD5 returns null if any of the inputs is null. Most likely this is causing the hashkey to return NULL if any of your three columns are null and then those records might be getting missed out. I would COALESCE the column which might be nullable to some fixed value like NA or anything to avoid the null case. For example, if marketplace_id can be null, then I would do below to solve this issue.
SET hashkey = MD5(advertiser_id || COALESCE(marketplace_id,'NA') || retailer);

Related

find a best way to traverse oracle table

I have an oracle table. Table's DDL is (not have the primary key)
create table CLIENT_ACCOUNT
(
CLIENT_ID VARCHAR2(18) default ' ' not null,
ACCOUNT_ID VARCHAR2(18) default ' ' not null,
......
)
create unique index UK_ACCOUNT
on CLIENT_ACCOUNT (CLIENT_ID, ACCOUNT_ID)
Then, the data's scale is very huge, maybe 100M records. I want to traverse this whole table's data with batch.
Now, I use the table's index to batch traverse. But I have some oracle grammar problems.
# I want to use this SQL, but grammar error.
# try to use b-tree's index to locate start position, but not work
select * from CLIENT_ACCOUNT
WHERE (CLIENT_ID, ACCOUNT_ID) > (1,2)
AND ROWNUM < 1000
ORDER BY CLIENT_ID, ACCOUNT_ID
Has the fastest way to batch touch table data?
Wild guess:
select * from CLIENT_ACCOUNT
WHERE CLIENT_ID > '1'
and ACCOUNT_ID > '2'
AND ROWNUM < 1000;
It would at least compile, although whether it correctly implements your business logic is a different matter. Note that I have cast your filter criteria to strings. This is because your columns have a string datatype and you are defaulting them to spaces, so there's a high probability those columns contain non-numeric values.
If this doesn't solve your problem, please edit your question with more details; sample input data and expected output is always helpful in these situations.
Your data model seems odd.
Your columns are defined as varchar2. So why is your criteria numeric?
Also, why do you default the key columns to space? It would be better to leave unpopulated values as null. (To be clear, NULL is not a good thing in an indexed column, it's just better than a space.)

CockroachDB pg_column_size retuning null

I am currently using the pg_column_size to calculate the size of a row in the DB.
something like:
select pg_column_size(col1, col2, col3..... col10) from _table
Problem is when 1+ column value is null the whole function returns null.
Is there anyway to set a default value for each column within the function to avoid getting null?
try
select pg_column_size(<table_name>.*) from <table_name>;
Are you specifically trying to select a subset of columns?
If you do select pg_column_size(table) from table;, it should work.

how to get Null records from oracle database if column is Number type and nullable

I have Oracle database table with three columns i.e Id,RTOName,VehicleCode. my table looks like below
RTOName is the varchar2 type and VehicleCode is NUMBER(2,0) and is nullable field.
So I have the data like below and I want to fetch the records with Some VehicleCode and with null value. The table design like this already done so changing that will impact a lot in my application. I have a JPA Native Query that I used like this and I want to fetch the records with null values.
Query query = createNativeQuery("select RTOName,VehicleCode from tbl_vehiclecodes WHERE VehicleCode=#vCode");
query.setParameter("vCode", vehicleCode);
From above Query I will get only Non null valued record. Eg. for vCode parameter 61 I will get
Marathalli,61. If my vCode is null I have a problem and I wont get any record.
How to achieve this in Native Query?
I know that we can use IS NULL in the Query in where clause. Since I have some numbers here In my case how to solve this? Any help
Thanks
We can use OR here,
Following query will give you records with matched records for parameter vCode along with rows having null and in case of vCode is null you get the records only with null values.
Query query = createNativeQuery("select RTOName,VehicleCode from tbl_vehiclecodes WHERE (VehicleCode is null or VehicleCode=#vCode)");
Edit: considering the doubts from #Ranagal
If you want like in case of null value passed to vCode you want all the records having value in vehiclecode and also with null then we need to change the query like,
Query query = createNativeQuery("select RTOName,VehicleCode from tbl_vehiclecodes WHERE (VehicleCode is null or VehicleCode=coalesce(#vCode,VehicleCode))");

How to replace NULL values in one column to 0 (of a very large table) without creating a new column of the desired results added to the table in HIVE?

I am trying to replace all of the NULL values to 0 in a column of a big table in HIVE.
However, every time I try to implement some code I end up generating a new column to the table. The column I am trying to change/modify still exists and still has the NULL values but the new column that is automatically generated (i.e. _c1) is what I want the column I am trying to modify, to look like.
I tried to run a COALESCE but that also ended up generating a new column. I also tried to implement a CASE WHEN, but the same results ensued.
Select *,
CASE WHEN columnname IS NULL THEN 0
ELSE columnname
END
from tablename;
Also tried
SELECT coalesce(columnname, CAST(0 AS BIGINT)) FROM tablename
I would just like to update the table with the other columns being as is but the column I want to modify still has its original name but instead of NULL values it has 0's that replaced them.
I don't want to generate a new column but modify an existing one.
How should I do that?
Use insert overwrite .. option.
insert overwrite table tablename
select c1,c2,...,coalesce(columnname,0) as columnname
from tablename
Note that you have to specify all the other column names required in select.

How to insert a blank value instead of NULL in Columns other than String datatype in hive

I have a create statement like
CREATE TABLE temp_tbl (EmpId String,Salary int);
I would like to insert an employee id and a blank value into table.
So What I have done is
insert overwrite table temp_tbl select '013' as EmpId,'' as Salary from tbl;
hive> select * from temp_tbl;
OK
013 NULL
But expected result is
hive> select * from temp_tbl;
OK
013 NULL ---> Blank instead of NULL
Also tried with "". Still I get it as NULL instead of blank
3.Tried to create table with serialization property
CREATE TABLE temp_tbl (EmpId String,Salary int) TBLPROPERTIES ('serialization.null.format' = '');
That too didn't change NULL value to blank.
What can be the workaround for the same.
Use Case while selecting the data.
Select
(CASE
WHEN columnName is null THEN ''
ELSE columnName
END) as 'Result' from temp_tbl;
All types except strings/varchar/char and some complex types like array, in Hive cannot be blank, only NULL is possible. Empty string '' is quite normal value of type String. You can produce empty array() as well (Array with zero size).
As a workaround, you can use some predefined values which are not normally in your data to represent some special numeric values, like -99999. Alternatively you can store your numeric values in a String column, in such case you will be able to have empty values in it. But it's not possible to assign (cast) empty strings to numeric types, because such empty value is not allowed.
If you try to assign empty string to numeric column or cast to numeric type, the result will be the same as if you are converting non-numeric string to numeric - NULL (in Hive if not possible to cast, it returns NULL) or get java.lang.NumberFormatException in Java.
Knowing that datatype Int can be either NULL or integer , I'd think of how to work around the problem.
I have the impression that 0 can do the job. Why can it not?
If 1 is not ideal, why not create a new temp_employees_with_no_salary table?
If 2 is not ideal, can you afford to change the datatype of temp_tbl.Salary from Int to String, then use CAST(Salary AS INT) to work with it?

Resources