Oracle regexp_like failing on FLOAT from view - oracle

I'm trying to use regexp_like to find and remove overly-precise floating point numbers.
select c from t order by c asc;
returns many results like this: 0.0000000012345678
Using regexp_like I can get results for two decimal places (0.25):
select * from t where REGEXP_LIKE(c,'^\d+\.\d{2}');
However, when I try anything more than two places, I get no results:
select * from t where REGEXP_LIKE(c,'^\d+\.\d{3}');
...
select * from t where REGEXP_LIKE(c,'^\d+\.\d{10}');
The only add'l info is that I'm selecting against a view of a second view and the column I'm searching (c, above) is designated as a FLOAT.

You can treat them as numbers. You can truncate the value to a fixed number of decimal places:
The TRUNC (number) function returns n1 truncated to n2 decimal places.
and then see if it matches. For example, to find any values with more than 2 significant digits after the decimal point:
select * from t where c != trunc(c, 2);
or to find those with more than 10 significant digits:
select * from t where c != trunc(c, 10);
I've used != rather than > in case you have negative values.
You can also use that as a filter in a delete/update, or as the set part of an update if you want to reduce the precision - though in that case you might want to use round() instead fo trunc().
When you use regexp_like you're doing an implicit conversion of your float value to a string, and as the docs for to_char() note:
If you omit fmt, then n is converted to a VARCHAR2 value exactly long enough to hold its significant digits.
which means that 0.25 becomes the string '.25', with no leading zero; which doesn't match even your first pattern.
You can allow for that leading zero not being there by using * instead of +, e.g. to find values with at least 10 significant digits after the decimal point:
select * from t where REGEXP_LIKE(c,'^\d*\.\d{10}');
or with exactly 10:
select * from t where REGEXP_LIKE(c,'^\d*\.\d{10}$');
etc.; but it seems simpler to treat them just as numbers rather than as strings.

Related

how to make sure to never get ora-01438: value larger than specified precision allowed for this column?

I'm doing a division for each record and updating a certain column with the result
so my sql looks something like this
update table1 set frequency = num/denom where id>XXX
my frequency data type is number(10,10)
Based on https://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#CNCPT1838
First, I'm not even sure why I get this data because the answer will always be 0.XXX, so giving 10 before the comma would be a plenty. Then the 10 after the comma should be okay too because it will truncate if the answer is bigger.
NUMBER(10, 10) means 10 digits and a scale of 10.
That means you have 10 digits right of the decimal point which means no digit left of it.
So having the table
CREATE TABLE t
(
test NUMBER (10, 10)
);
insert into t values (0.9999999999); will work, while
insert into t values (0.99999999999);will fail because the value is rounded up to 1.
So if num/denom is 1 or even larger you will get ORA-01438: value larger than specified precision allowed for this column.
But you will also get this error, if num/denom is larger then 0.99999999995 as oracle tries to round it to 1.
First of all, let me get this confusion around the precision and scale cleared out. According to the documentation, it is stated:
For numeric columns, you can specify the column as:
column_name NUMBER
Optionally, you can also specify a precision
(total number of digits) and scale (number of digits to the right of
the decimal point):
column_name NUMBER (precision, scale)
In your case:
frequency NUMBER(10,10)
This means, that the total number of digits is 10 and this means that the column can accommodate values from:
0.0000000001
to:
9999999999
This includes Integers up to 9999999999 (10 nines) and floats from 0.0000000001 (9 zeroes and a 1 at the end).
Now that we know this, let's proceed to the problem..
You need this query to never fail with ORA-01438:
update table1 set frequency = num/denom where id>XXX;
You can do the following check, on update time:
update table1
set frequency = CASE LENGTH(TRUNC(num/denom)) >=10
THEN TRUNC(num/denom, 10)
ELSE
ROUND(num/denom), 10 - LENGTH(TRUNC(num/denom))) --TRUNC
END
where id>XXX;
What this would do is check:
1. If the whole part of the division is more than or equal to 10; in that case, return only the first 10 digits (TRUNCATE).
2. If the whole part is less than 10; in that case ROUND the result to "10 - LENGTH_OF_WHOLE_PART" decimal places, but still within the precision of 10, which is the one of the column.
*Note: The ROUND above will actually ROUND the result, giving you an inaccurate value. If you need to get a raw truncation of the result, use TRUNCATE instead of ROUND above!
Cheers

Enter string repeatedly Ruby Cucumber

I have a test that includes character lengths within fields etc.
I was wondering if I could have a set string of 10 characters like str = 'abcdefghij'
then have it multiply that string by the amount of times needed to fulfil the character length and fill in the field.
I've tried the times method but that just enters the same value over x iterations.
What I want is to take str, increase it ten fold and enter that value as 1 continuous string so abcdefghij becomes abcdefghijabcdefghijabcdefghijabcdefghijabcdefghij etc
I'd parameterize the number of times to increase it depending on the field I'm testing. I want to do this so that I don't have huge amounts of variables stored to satisfy each test.
Can this be done? I hope I've explained clearly.
String#* would do:
'abc' * 10
#⇒ "abcabcabcabcabcabcabcabcabcabc"
To use a floating point parameter:
λ = ->(input, count) do
i, f = *count.divmod(1)
input * i << input[0...(f * input.size).to_i]
end
λ.('abcd', 2.5)
#⇒ 'abcdabcdab'

Hashing a long integer ID into a smaller string

Here is the problem, where I need to transform an ID (defined as a long integer) to a smaller alfanumeric identifier. The details are the following:
Each individual on the problem as an unique ID, a long integer of size 13 (something like 123123412341234).
I need to generate a smaller representation of this unique ID, a alfanumeric string, something like A1CB3X. The problem is that 5 or 6 character length will not be enough to represent such a large integer.
The new ID (eg A1CB3X) should be valid in a context where we know that only a small number of individuals are present (less than 500). The new ID should be unique within that small set of individuals.
The new ID (eg A1CB3X) should be the result of a calculation made over the original ID. This means that taking the original ID elsewhere and applying the same calculation, we should get the same new ID (eg A1CB3X).
This calculation should occur when the individual is added to the set, meaning that not all individuals belonging to that set will be know at that time.
Any directions on how to solve such a problem?
Assuming that you don't need a formula that goes in both directions (which is impossible if you are reducing a 13-digit number to a 5 or 6-character alphanum string):
If you can have up to 6 alphanumeric characters that gives you 366 = 2,176,782,336 possibilities, assuming only numbers and uppercase letters.
To map your larger 13-digit number onto this space, you can take a modulo of some prime number slightly smaller than that, for example 2,176,782,317, the encode it with base-36 encoding.
alphanum_id = base36encode(longnumber_id % 2176782317)
For a set of 500, this gives you a
2176782317P500 / 2176782317500 chance of a collision
(P is permutation)
Best option is to change the base to 62 using case sensitive characters
If you want it to be shorter, you can add unicode characters. See below.
Here is javascript code for you: https://jsfiddle.net/vewmdt85/1/
function compress(n) {
var symbols = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïð'.split('');
var d = n;
var compressed = '';
while (d >= 1) {
compressed = symbols[(d - (symbols.length * Math.floor(d / symbols.length)))] + compressed;
d = Math.floor(d / symbols.length);
}
return compressed;
}
$('input').keyup(function() {
$('span').html(compress($(this).val()))
})
$('span').html(compress($('input').val()))
How about using some base-X conversion, for example 123123412341234 becomes 17N644R7CI in base-36 and 9999999999999 becomes 3JLXPT2PR?
If you need a mapping that works both directions, you can simply go for a larger base.
Meaning: using base 16, you can reduce 1 to 16 to a single character.
So, base36 is the "maximum" that allows for shorter strings (when 1-1 mapping is required)!

Oracle NOT BETWEEN for string comparison does not give same result as <= and >=

Using Oracle 11gR2 Expression Edition.
My data looks like following
ordertype
---------
ZOCO
ZOSA
ZOST
We are trying to find out records where the column is not between a certain range of values.
If I run a query with <= and >= operators:
SELECT * FROM table where ordertype <= 'ZAAA' OR ordertype >= 'ZZZZ';
then I get 0 results. This is the right answer.
However, if I use NOT BETWEEN:
SELECT * FROM table where ordertype NOT BETWEEN 'ZAAA' AND 'ZZZZ';
, then it gives multiple hits.
My understanding is that both syntax should give the same result but they are not. What am I missing? Reason I want to use NOT BETWEEN because a lot of our existing code already has this syntax and I do not want to change it without understanding the reasons.
Thank you.
Thanks for all those who posted. I ran the queries again and after fixing the "OR" in the first query, the results are the same. I still have the question of why Oracle character sorting is not recognizing it as expected, but my question which is about difference between NOT BETWEEN and <> was a false alarm. I apologize for confusion.
SELECT * FROM table where ordertype <= 'ZAAA' AND ordertype >= 'ZZZZ';
No string can be <= 'ZAAA' and >= 'ZZZZ'.
You need to use a disjunction instead:
SELECT * FROM table where ordertype < 'ZAAA' OR ordertype > 'ZZZZ';
BTW, given that BETWEEN is inclusive, NOT BETWEEN is exclusive
This is a common pitfall. you have to remember the De Morgan's Laws:
not (A and B) is the same as (not A) or (not B)
Feel free to experiment with this simple live example to convince yourself that those results are quite coherent: http://sqlfiddle.com/#!4/d41d8/38326
That being said, the only way (I can see) for the string like ZOCO for not being between ZAAA and ZZZZ would be:
having some hidden character just behind the Z (i.e.: 'Z'||CHR(0)||'OCO')
or using a locale such as Z-something is actually considered as a different letter, with a collation order outside of the given range. I don't know if such locale exists, but for example, in Welch, LL is considered as a single letter that should be sorted after the plain L. See http://en.wikipedia.org/wiki/Alphabetical_order#Language-specific_conventions
or having homogplyphs such as 0, 𐒠 or О instead of O in your data.
If it's not between the values, it has to be either < OR >, not AND.
In the first query, you ask for the records that are at the same time less than 'ZAAA' and also greater than 'ZZZZ'. Of course, there is no such value that fullfills both requirements, hence zero records are returned.
In the second query, you ask for records, that are either less than 'ZAAA' or greater than 'ZZZZ' (ie not between those boundaries [not between...]). There is a possibility that such records exist, and as your select statement proves, there are indeed such records, that are returned by the statement.
Your understanding that both statements are same is incorrect. NOT BETWEEN is not evaluated the way you're thinking. It simply returns the results which fall outside evaluation of BETWEEN for the parameters.
IF you check Oracle documentation for BETWEEN, it says -
The value of
expr1 NOT BETWEEN expr2 AND expr3
is the value of the expression
NOT (expr1 BETWEEN expr2 AND expr3)

DataColumn Expression Divide By Zero

I'm using basic .net DataColumns and the associated Expression property.
I have an application which lets users define which columns to select from a database table. They can also add other columns which perform expressions on the data columns resulting in a custom grid of information.
The problem I have is when they have a calculation column along the lines of "(C2/C3)*100" where C2 and C3 are data columns and the value for C3 is zero. The old "divide by zero" issue.
The simple answer would be to convert the expression to "IIF(C3 = 0, 0, (C2/C3)*100)", however we don't expect the user to know to do that and at compile time I don't know what columns are defined. So I would have to programmatically determine which columns are being used in a division in order to construct the IIF clause. That could get quite tricky.
Is there another way to not throw an error and replace the result with 0 if a "Divide By Zero" error occurs?
Ok, I found a way. The key is to use Double and not Decimal for the column type, e.g. in the example above C3 should be a Double. This will result in a result of Infinity instead, which can be evaluated against using the expression as a whole.
E.g.
IIF(CONVERT(([C4] / [C3] )*100, 'System.String') = 'NaN' OR CONVERT(([C4] / [C3] )*100, 'System.String') = 'Infinity' OR CONVERT(([C4] / [C3] )*100, 'System.String') = '-Infinity', 0, ([C4] / [C3] )*100)
Decimal it seems doesn't provide that Infinity option.

Resources