Hive: handle negative number in data stored in HDFS - hadoop

I have an situation to handle negative numeric data. The position of the negative sign for some of the numeric values in the files (Stored in HDFS) are on the right side (like this 12345-), ideally negative number are denoted as minus in the left (like this, -12345).
I cannot change the data because this data is correct and when the data is used by the source system (SAP) it is able to read the data as negative number.
In hive I have to run some arithmetic manipulation, say I want SUM this values which contains data like '12345-', then HIVE is unable to recognize this value as number (the column type is DECIMAL(10,2)) and the result shows NULL as value ! Need your kind advice how to handle this situation.Thanks in advance.

Check last character, if it is '-' then use substr and concat to construct correct value:
select case when substr('12345-',-1,1)='-' then cast(concat('-',substr('12345-',1,length('12345-')-1)) as int) else cast('12345-' as int) end as column_name;
OK
-12345
Replace '12345-' with your column_name

Use this syntax:
CAST(CAST(-1 AS DECIMAL(1,0)) AS
DECIMAL(10,2))*CAST(regexp_replace(regexp_replace(TRIM(column name),'\\-',''),'-','') as decimal(10,2)),

Related

Oracle SQL PLSQL large number field strange behavior

Have existing table called temptable, column largenumber is a NUMBER field, with no precision set:
largenumber NUMBER;
Query:
select largenumber from temptable;
It returns:
-51524845525550100000000000000000000
But If I do
column largenumber format 999999999999999999999999999999999999999
And then
select largenumber from temptable;
It returns:
-51524845525550:100000000000000000000
Why is there a colon?
To test, I took the number, remove the colon, and insert it to another table temptable2, and did the same column largenumber format, the select returns the number without the colon:
select largenumber from temptable2;
It returns:
-51524845525550100000000000000000000
So the colon is not present here.
So what could possibly be in the original number field to cause that colon?
In the original row, If I do a select and try to do any TO_CHAR, REPLACE, CAST, or concatenate to text, it would give me number conversion error.
For example, trying to generate a csv:
select '"' || largenumber || '",'
FROM temptable;
would result in:
ORA-01722 ("invalid number") error occurs when an attempt is made to convert a character string into a number, and the string cannot be converted into a valid number
In a comment (in response to a question from me), you shared that dump(largenumber) on the offending value returns
Typ=2 Len=8: 45,50,56,53,52,48,46,48
From the outset, that means that the data stored on disk is invalid (it is not a valid representation of a value of number data type). Typ=2 is correct, that is for data type number. The length (8 bytes) is correct (we can all count to eight to see that).
What is wrong is the bytes themselves. And, we only need to inspect the first and the last byte to see that.
The first byte is 45. It encodes the sign and the exponent of your number. The first bit (1 or 0) represents the sign: 1 for positive, 0 for negative. 45 is less than 128, so the first bit in the first byte is 0; so the number is negative. (So far this matches what you know about the intended value.)
But, for negative numbers, the last byte is always the magic value 102. Always. In another comment under your original question, Connor McDonald asks about your platform - but this is platform-independent, it is how Oracle encodes numbers for permanent storage on any platform. So, we already know that the dump value you got tells us the value is invalid.
In fact, Connor, in the same comment, gave the correct representation of that number (according to Oracle's scheme for internal representation of numbers). Indeed, just the last byte is wrong: your dump shows 48, but it should be 102.
How can you fix this? If it's a one-off, just use an update statement to replace the value with the correct one and move on. If your table has a primary key, let's call it id, then find the id for this row, and then
update {your_table} set largenumber = -50...... where id = {that_id};
Question is, how many such corrupt values might you have in your table? If it's just one, you can shrug it off; but if it's many (or even "a handful") you may want to figure out how they got there in the first place.
In most cases, the database will reject invalid values; you can't simply insert 'abc' in a number column, for example. But there are ways to get bad data in; even intentionally, and in a repeatable way. So, you would have to investigate how the bad values were inserted (what process was used for insertion).
For a trivial way to insert bad data in a number column, in a repeatable manner, you can see this thread on the Oracle developers forum: https://community.oracle.com/tech/developers/discussion/3903746/detecting-invalid-values-in-the-db
Please be advised that I had just started learning Oracle at that time (I was less than two months in), so I may have said some stupid things in that thread; but the method to insert bad data is described there in full detail, and it was tested. That shows just one possible (and plausible!) way to insert invalid stuff in a table; how it happened in your specific case, you will have to investigate yourself.

How MAX of a concatenated column in oracle works?

In Oracle, while trying to concatenate two columns of both Number type and then trying to take MAX of it, I am having a question.
i.e column A column B of Number data type,
Select MAX(A||B) from table
Table data
A B
20150501 95906
20150501 161938
when I’m running the query Select MAX(A||B) from table
O/P - 2015050195906
Ideally 20150501161938 should be the output????
I am trying to format column B like TO_CHAR(B,'FM000000') and execute i'm getting the expected output.
Select MAX(A || TO_CHAR(B,'FM000000')) FROM table
O/P - 2015011161938
Why is 2015050195906 is considered as MAX in first case.
Presumably, column A is a date and column B is a time.
If that's true, treat them as such:
select max(to_date(to_char(a)||to_char(b,'FM000000'),'YYYYMMDDHH24MISS')) from your_table;
That will add a leading space for the time component (if necessary) then concatenate the columns into a string, which is then passed to the to_date function, and then the max function will treat as a DATE datatype, which is presumably what you want.
PS: The real solution here, is to fix your data model. Don't store dates and times as numbers. In addition to sorting issues like this, the optimizer can get confused. (If you store a date as a number, how can the optimizer know that '20141231' will immediately be followed by '20150101'?)
You should convert to number;
select MAX(TO_NUMBER(A||B)) from table
Concatenation will result in a character/text output. As such, it sorts alphabetically, so 9 appears after 16.
In the second case, you are specifiying a format to pad the number to six digits. That works well, because 095906 will now appear before 161938.

How to compare Clob column for multiple values?

Can anyone tell me how to compare column which has clob datatype in oracle for multiple values?
For one value we are comparing like
dbms_lob.compare(attr_value,'A')=0
Similarly if I want to know whether attr_value is in ('A','B','C','D'). I tried this:
dbms_lob.compare(attr_value,'A')=0 or dbms_lob.compare(attr_value,'B')=0 or ...
This is not giving me proper result. Is there any other way?
OR should work fine. Also you may try this:
SELECT * FROM your_tab WHERE CAST(s as VARCHAR2(2)) IN ('A', 'B', 'C', 'D');
but I'm not sure about the performance.
Since it seems you don't really want to compare CLOBS of massive size with a bunch of other massive CLOBS, the fastest way would be to just compare a Substring of the CLOB:
WHERE DBMS_LOB.SUBSTR( attr_value, 4000, 1 ) IN ('A','B','C')
Here 4000 can be replaced by the maximum length of all you comparison values.
If you really want to compare massive CLOBS I don't think a select is the right approach, you
should probably rework your application logic...
DBMS_LOB.COMPARE does an exact comparison between two LOB objects. The documentation says:
COMPARE returns zero if the data exactly matches over the range
specified by the offset and amount parameters. Otherwise, a nonzero
INTEGER is returned.
On Oracle 11g, you could use REGEXP_INSTR function:
SELECT REGEXP_INTR(attr_value,'A|B|C|D|E') from dual;
I hope it helps.

Inserting/Updating numeric string in Sqlite with Ruby (Newbie query)

I have a simple Sqlite table with 2 columns for a telephone number and a counter. I want to update the table on the basis of .csv files that also contain telephone numbers and counters. If the number exists in the database it should be updated by the sum of the existing counter + the counter in the file. If it doesn't exist a new record should be inserted with the value from the file.
My one remaining problem is that the telephone numbers have a zero in the first position.
When I populate the db the zero is retained, (I can manually select and find an existing number like 09999) when I fetch the values from the file the zero is retained but when I try to insert/update something happens in my Ruby code that inserts a new record without the leading zero, so 0999 becomes 999 in the db. Numbers without leading zeros are handled correctly.
My code looks like this:
rowArray=thisFile[k].split(';')
number = rowArray[0]
couplings = rowArray[1]
updString="INSERT OR REPLACE INTO Caller (Telno,count) VALUES (#{number},COALESCE((SELECT count + #{couplings} FROM Caller WHERE Telno=#{number}),# {couplings}))"
db.execute(updString)
Any idea what I'm doing wrong here? The easiest solution would be to drop the leading zero but I would prefer to do it right. Many thanks in advance.
You need to use placeholders in your prepare call and pass the actual values in a call to execute. Like this
insert = db.prepare(<<__SQL__)
INSERT OR REPLACE INTO Caller (Telno, count)
VALUES (:number, COALESCE((SELECT count + :couplings FROM Caller WHERE Telno = :number), :couplings))
__SQL__
insert.execute(number: number, couplings: couplings)
(Note that :number and :couplings in the SQL statement are named placeholders. They can be anything, but I have chosen them to match the corresponding names of the variables that are to be bound.)
The problem is that, using simple interpolation, you end up with a string like
INSERT OR REPLACE INTO Caller (Telno, count) VALUES (0999, ...
and the 0999 appears to be a number rather than a string. If you pass strings to execute then the variables will be bound with the correct type.

make a sequence in oracle database

How can I create a sequence, which has two parts one fixed characters part and another variable integer part like "LTR00001" and the next value in the sequence should be "LTR00002"
You cannot. Sequences are just integers.
But you can select a formatted string from the sequence
SELECT 'LTR' || to_char('09999', the_sequence.nextval) FROM DUAL;

Resources