DECIMAL value out of range - hadoop

I am trying to publish data from our SAS environment into a remote Hadoop/Hive database (as sequence files). I'm performing basic tests by taking some source data from our business users and using a data step to write out to the Hadoop library.
I'm getting errors indicating that a value at row X is out of range.
For example:
ERROR: Value out of range for column BUY_RT1, type DECIMAL(5, 5). Disallowed value is: 0.
The source data has a numeric format of 6.5, and the actual value is .00000.
Why is .00000 out of range? Would the format for Hadoop need to be DECIMAL(6, 5)?
I get the same error when the value is 0.09:
ERROR: Value out of range for column INT_RT, type DECIMAL(5, 5). Disallowed value is: 0.09

You may need to check the actual values in SAS. If a numeric value in SAS has a format applied, you will see the formatted (possibly rounded) version of the numeric value wherever you output the value, but the underlying numeric may still have more significant digits that you're not seeing, due to the format.
For example, you say your source data has a format of 6.5 and the 'actual value' is 0.00000; are you sure that's the actual value? To check, you could try comparing the value to a literal 0, or putting the value to the SAS log with a different format like BEST32. (eg put BUY_RT1 best32.;).
If this is the problem, the solution is to properly round the source numeric values, rather than just applying a format.

Related

Values within the NUMBER data type

Im working on a project and noticed a column that has a NUMBER data type specified like so:
NUMBER(22,38,18)
While I'm aware with precision and scale, I'm confused about the third number. I cant seem to find any documentation where NUMBER has a third value.
What does that 18 mean? And would a value of 1.5187000000000004 be an issue now that im seeing 3 values within the NUMBER brackets?

How to use INDEX MATCH in Google Sheets when the two values are formatted differently (text versus numbers)?

I'm trying to match a string of numbers (like 370488004) using the typical INDEX MATCH formula. Unfortunately, in one range the numbers are formatted as plain text, and in the other range they are formatted differently. Usually 'Automatic' or 'Number'. I want to avoid having to update the formatting of both ranges whenever the values get updated (usually via a paste from an outside source). Especially since it's not always going to be me doing the updating.
Is there a way I can write my INDEX MATCH formula so that it ignores the formatting of the values it's attempting to match?
The value returned by the INDEX formula can be in any format. Plain text, number, doesn't matter. The problem is the two values I'm matching are in different formats. I need a formula that ignores that their formatting.
Here's an example sheet:
https://docs.google.com/spreadsheets/d/1cwO7HGtwR4mRnAqcjxqr1qbhGwJHLjBKkp7-iwzkOqY/edit?usp=sharing
You can use VALUE or INT to force it into a number value, or if you want to keep it text use TEXT. Example would be:
=INDEX(VALUE(D1:E4),MATCH(G1,E1:E4,FALSE),1)
The numbers in column D are in fact text, but utilizing VALUE first for the range puts them all in number format. It is finding the value associated with "Green" written in G1. Without seeing a working example sheet this is the best solution I can offer.
UPDATE:
You can use VLOOKUP array with a static range (otherwise error), or QUERY to have the range infinite.
=ARRAYFORMULA(VLOOKUP(VALUE($G3:$G5),$B3:$C,2,FALSE))
=QUERY(FILTER($B3:$C,$B3:$B=VALUE($G3:$G)),"Select Col2")

Issue with choice action when running transform map

I'm trying to insert records to a table by using transform maps. I have this field in the target table, which is a choice type, and I have set the choice action in the source table's field to reject if there's no matching value found. But, when I tried inserting the record using the transform map with the correct value, which exists in the choice list of the target field, it still got rejected and hence not inserting the records.
I have tried searching for possible reasons as to why it still got rejected even with correct value in the source field. Here's the sample link that I have found: https://hi.service-now.com/kb_view.do?sysparm_article=KB0677334
It says that if there are more than 40 characters for the choice list value it will be truncated and might not match those choice. But the choices in the target field has only 20 characters or less.
I have first tried running the transform map in the lower environments before proceeding to production. In the lower environment it works fine and the records got inserted. But, when I tried it in production it got rejected.
There is a difference between choice and choice list. Within the choice list the values are comma separated sys_ids. I could imagine that you have multiple values for import and then the max character are reached or the values do not match, etc.
You could use this approach:
Instead of a direct assignment, source to target field, use the script to target. Then you gain the full script power ;)
Maybe here you could add some logic like switch case or whatever, I guess you get the point.

Weka NumericToNominal attributeIndices

I am using the Weka GUI and imported a csv file.
I want to transform a numerical attribute to nominal with the "NumericToNominal"-filter.
There are values between "-1" and "770".
If I set the attributeIndices value to "first-30,31-100,101-150,151-last", I get the error message: "Problem filtering instances: Invalid range list at first-30".
Do you have any idea, what is wrong?
Thanks in advance
I have just used the same NumericToNominal filter because I read in a csv file from the UI and it claimed everything was numeric.
You are using the -R switch and so it is looking for the range of column numbers. The values in whatever columns should not matter. Columns begin at 1 or first as you have above. The error message you get "Invalid range list" is when you reference a column number that does not exist. Therefore, it seems to indicate that either you have less than 30 columns or one of the columns between 1 and 30 has somehow been removed.. Did you mix up column numbers with the values contained within said columns because I believe having a negative value would not be a problem for this process?

Time value as output

For few columns from the source i.e .csv file, we are having values like 1:52:00, 14:45:00.
I am supposed to load to the Oracle table.
Which data type should I choose in Target as well as source?
Should i be doing any thing in the expression transformation?
Use SQLLDR to load the data into database with the format described as in the link
http://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements004.htm
ie.'HH24:MI:SS'
Oracle does not support time-only values, it supports dates (with a time component).
You have a few options:
Store the value as a string, perhaps providing a leading zero for
the hour.
Store the value as the number of seconds (or minutes) past midnight.
Store the value as the time component of some arbitrarily defined date, for
example 0001-JAN-01 01:52:00 and 0001-Jan-01 14:45:00. Tell your report writers to ignore the date portion of the value.
Your source datatype will be string(8). Use LPAD to add leading zeroes.

Resources