Sqoop importing zero decimals as 0E-22 - hadoop

When I import a table using hadoop and sqoop from my MSSQL database and that table has decimal columns any columns that are zero (eg 0.000000000000..) are saved as "0E-22".
This is quite a pain as when casting the value to a decimal in my Map or Reduce it throws an exception. So I either have to export the column as a varchar or to a check before trying to cast it. Neither are ideal.
Has anyone encountered this before and got a work around?
Thanks

I would suggest trying soon to be released Sqoop 1.4.3, where we fixed SQOOP-830
that might help you as well.

It is a strange case, there is some workaround. You can rewrite your_table, i.e.
INSERT OVERWRITE TABLE your_table
SELECT columns_no_need_for_change , CASE WHEN possible_bad_column = '0E-22' THEN '0' ELSE possible_bad_column END FROM your_table
Let us know how if you succeed or not. GL!

Related

How to filter on RAW data type oracle

We are using Oracle for one of our client databases. I am not very well versed with it. There is a column basis on which I need to filter records. The column was printing System.Byte before and when I converted it to VARCHAR(50) it was printing as 000B000000000000000000000000000A.
I need to know how to filter the records with this value in the mentioned column.
If the idea of the column is to represent a hex string:
SELECT UTL_I18N.RAW_TO_CHAR ('000B000000000000000000000000000A', 'AL32UTF8')
FROM DUAL;
could work for you, however more information about the application and expected results will be needed for a more fitting solution.

Inserting char variables into table oracle sql developer

I've a table CMP with two fields as CMP_CODE varchar2(20) and CMP_NAME varchar2(50).
When I'm trying to insert an entry like '001' to CMP_CODE, every time it is getting inserted as '1'.
My statement was like
Insert into CMP(CMP_CODE ,CMP_NAME) values ('007','test');
Previously the problem was not there, but I've re-installed our XE database recently, is the problem with that?
Your valuable help in this regard is highly appreciated. Thanks in advance.
The fieldtype "VARCHAR2" itself is probably NOT responsible for snipping of your zeros.
The error seems to be in your application. In case, if you use an Numeric-Variable-Type (eg. Integer, Long, Float, Decimal) this behavior is very basic and in most cases desirable.
But due very less information about your situation it is kind of hard, to tell whats is really going on.

sqoop import fails with numeric overflow

sqoop import job failed caused by: java.sql.SQLException: Numeric Overflow
I have to load Oracle table, it has column type NUMBER in Oracle,without scale, and it's converted to DOUBLE in hive. This is the biggest possible size for both, Oracle and Hive numeric values. The question is how to overcome this error?
OK, my first answer assumed that your Oracle data was good, and your Sqoop job needed specific configuration to cope with NUMBER values.
But now I suspect that your Oracle data contains shit, and specifically NaN values, as a result of calculation errors.
See that post for example: When/Why does Oracle adds NaN to a row in a database table
And Oracle even has distinct "Not-a-Number" categories to represent "infinity", to make things even more complicated.
But on Java side, BigDecimal does not support NaN -- from the documentation, in all conversion methods...
Throws:
NumberFormatException - if value is infinite or NaN.
Note that the JDBC driver masks that exception and displays NumericOverflow instead, to make things more complicated to debug...
So your issue looks like that one: Solr Numeric Overflow (from Oracle) -- but unfortunately SolR allows to skip errors, while Sqoop does not; so you cannot use the same trick.
In the end, you will have to "mask" these NaN values with Oracle function NaNVL, using a free-form query in Sqoop:
$ sqoop import --query 'SELECT x, y, NANVL(z, Null) AS z FROM wtf WHERE $CONDITIONS'
Edit: that answer assumed that your Oracle data was good, and your Sqoop job needed specific configuration to cope with NUMBER values. That was not the case, see alternate answer.
In theory, it can be solved.
From the Oracle documentation about "Copying Oracle tables to Hadoop" (within their Big Data appliance), section "Creating a Hive table" > "About datatype conversion"...
NUMBER
INT when the scale is 0 and the precision is less than 10
BIGINT when the scale is 0 and the precision is less than 19
DECIMAL when the scale is greater than 0 or the precision is greater than 19
So you must find out what is the actual range of values in your Oracle table, then you will be able to specify the target Hive column either a BIGINT or a DECIMAL(38,0) or a DECIMAL(22,7) or whatever.
Now, from the Sqoop documentation about "sqoop - import" > "Controlling type mapping"...
Sqoop is preconfigured to map most SQL types to appropriate Java or
Hive representatives. However the default mapping might not be
suitable for everyone and might be overridden by --map-column-java
(for changing mapping to Java) or --map-column-hive (for changing
Hive mapping).
Sqoop is expecting comma separated list of mappings (...) for
example $ sqoop import ... --map-column-java id=String,value=Integer
Caveat #1: according to SQOOP-2103, you need Sqoop V1.4.7 or above to use that option with Decimal, and you need to "URL Encode" the comma, e.g. for DECIMAL(22,7)
--map-column-hive "wtf=Decimal(22%2C7)"
Caveat #2: in your case, it is not clear whether the overflow occurs when reading the Oracle value into a Java variable, or when writing the Java variable into the HDFS file -- or even elsewhere. So maybe --map-column-hive will not be sufficient.
And again, according to that post which points to SQOOP-1493, --map-column-java does not support Java type java.math.BigDecimal until at least Sqoop V1.4.7 (and it's not even clear whether it is supported in that specific option, and whether it is expected as BigDecimal or java.math.BigDecimal)
In practice, since Sqoop 1.4.7 is not available in all distros, and since your problem is not well diagnosed, it may not be feasible.
So I would advise to just hide the issue by converting your rogue Oracle column to a String, at read time.
Cf. documentation about "sqoop - import" > "Free-form Query Imports"...
Instead of using the --table, --columns and --where arguments, you can
specify a SQL statement with the --query argument (...) Your query must include the token $CONDITIONS (...) For example:
$ sqoop import --query 'SELECT a.*, b.* FROM a JOIN b ON a.id=b.id WHERE $CONDITIONS' ...
In your case, SELECT x, y, TO_CHAR(z) AS z FROM wtf plus the appropriate formatting inside TO_CHAR so that you don't lose any information due to rounding.

How does --direct parameter in Sqoop export work with Vertica?

I got Too many ROS containers ... error when exporting large amount of data from HDFS to Vertica. I know there is a direct option for vsql COPY which will bypass the WOS and load data into ROS containers. I also notice the --direct in Sqoop Export, see this Sqoop User Guide. I'm just wondering if these two "direct" have same function.
I have tried modify Vertica configuration parameters like MoveOutInterval, MergeOutInterval... But this didn't help much.
So does anyone know if direct mode of Sqoop export will help to solve the ROS containers issue. Thanks!
--direct is only supported by specific database connectors. Since there isn't one for Vertica, you would be using the Generic JDBC one. I really doubt using --direct does anything... but if you really want to test this you can look at the statement sent in query_requests.
select *
from query_requests
where request_type = 'LOAD'
and start_timestamp > clock_timestamp() - interval '1 hour'
That will show you all load statements within the last hour. The sqoop statements should get converted to a COPY. I would really hope anyhow! If it is a bunch of INSERT ... VALUES statements then I highly suggest NOT using it. If it is not producing a COPY then you'll need to change the query above to look for the INSERT.
select *
from query_requests
where request_type = 'QUERY'
and request ilike 'insert%'
and start_timestamp > clock_timestamp() - interval '1 hour'
Let me know what you find here. If it is doing INSERT...VALUES then I can tell you how to fix it (but it is a bit of work).

Crystal Reports Issue turning a string into a number

I'm having one of those throw the computer out the window days.
I am working on a problem involving Crystal Reports (Version 10) and an Oracle Database (11g).
I am taking a view from the database that returns a string (varcahr2(50)) which is actually a number, when a basic SELECT * query is run on this view I get the number back in the format 000000000000100.00.
When this view is then used in Crystal Reports I can view the field data, but I can't sum the data as it is not a number.
I began, by attempting to using ToNumber on the field, to which Crystal's response was that the string was not numeric text. Ok fair enough, I went back to the view and ran TO_NUMBER, when this was then used in crystal it did not return any results. I also attempted to run TO_CHAR on the view so that I could hopefully import the field as text and then perform a ToNumber, yet the same as with the TO_NUMBER no records were displayed.
I've started new reports, I've started new views. No avail.
This seems to have something to do with how I am retrieving the data for the view.
In simplistic terms I'm pulling data from a table looking at two fields a Foreign Key and a Value field.
SELECT PRIMARY_KEY,
NVL(MAX(DECODE(FOREIGN_KEY, FOREIGN_KEY_OF_VALUE_I_NEED, VALUE_FIELD)), 0)
FROM MY_TABLE
GROUP BY PRIMARY_KEY
When I attempted to put modify the result using TO_NUMBER or TO_CHAR I have used it around the VALUE_FIELD itself and the entire expression, wither way works when the run in a SQL statement. However any TO_NUMBER or TO_CHAR modification to the statement returns no results in Crystal Reports when the view is used.
This whole problem smacks of something that is a tick box or equivalent that I have overlooked.
Any suggestions of how to solve this issue or where I could go to look for an answer would be greatly appreciated.
I ran this query in SQL Developer:
SELECT xxx, to_number(xxx) yyy
FROM (
SELECT '000000000000100.00' XXX FROM DUAL
)
Which resulted in:
XXX YYY
000000000000100.00 100
If your field is truly numeric, you could create a SQL Expression field to do the conversion:
-- {%NUMBER_FIELD}
TO_NUMBER(TABLE.VALUE_FIELD)
This turned out to be an issue with how Crystal Reports deals with queries from a database. All I needed to do was contain my SQL statement within another Select Statement and on this instance of the column apply the TO_NUMBER so that Crystal Reports would recognize the column values as numbers.
Hopefully this helps someone out, as this was a terrible waste of an afternoon.

Resources