Native Impala UDF (Cpp) randomly gives result as NULL for same inputs in the same table for multiple invocations in same query - user-defined-functions

I have a Native Impala UDF (Cpp) with two functions
Both functions are complimentary to each other.
String myUDF(BigInt)
BigInt myUDFReverso(String)
myUDF("myInput") gives some output which when myUDFReverso(myUDF("myInput")) should give back myInput
When I run a impala query on a parquet table like this,
select column1,myUDF(column1),length(myUDF(column1)),myUDFreverso(myUDF(column1)) from my_parquet_table order by column1 LIMIT 10;
The output is NULL at random.
The output is say at 1st run as ,
+------------+----------------------+------------------------+-------------------------------------+
| column1 | myDB.myUDF(column1) | length(myUDF(column1)) | myDB.myUDFReverso(myUDF(column1)) |
+------------+----------------------+------------------------+-------------------------------------+
| 27011991 | 1.0.128.9 | 9 | 27011991 |
| 27011991 | 1.0.128.9 | 9 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | 14022013 |
| 14022013 | 1.0.131.239 | 11 | NULL |
+------------+----------------------+------------------------+-------------------------------------+
and suppose on the 2nd run,
+------------+----------------------+------------------------+-------------------------------------+
| column1 | myDB.myUDF(column1) | length(myUDF(column1)) | myDB.myUDFReverso(myUDF(column1)) |
+------------+----------------------+------------------------+-------------------------------------+
| 27011991 | 1.0.128.9 | 9 | 27011991 |
| 27011991 | 1.0.128.9 | 9 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | 14022013 |
| 14022013 | 1.0.131.239 | 11 | 14022013 |
| 14022013 | 1.0.131.239 | 11 | NULL |
| 14022013 | 1.0.131.239 | 11 | 14022013 |
| 14022013 | 1.0.131.239 | 11 | NULL |
+------------+----------------------+------------------------+-------------------------------------+
And sometimes it gives the correct value for all rows too.
I have tested this on Impala v1.2.4 as well as v2.1
What is the cause of this? Some memory issue?
Edit 1:
BigIntVal myUDF(FunctionContext* context, const StringVal& myInput)
{
if (myInput.is_null) return BigIntVal::null();
unsigned int temp_op= 0;
unsigned long result= 0;
uint8_t *p;
char c= '.';
p=myInput.ptr;
while (*p != '\0')
{
c= *p++;
int digit= c*2;
if (digit >= 22 && digit <= 31)
{
if ((temp_op= temp_op * 10 - digit) > 493)
{
return BigIntVal::null();
}
}
else if (c == '.')
{
result= (result << 8) + (unsigned long) temp_op;
temp_op= 0;
}
else
{
return BigIntVal::null();
}
}
return BigIntVal((result << 8) + (unsigned long) temp_op);
}
In .h file the macro lowerbytify is defined as
#define lowerbytify(T,A) { *(T)= (char)((A));\
*((T)+1)= (char)(((A) >> 8));\
*((T)+2)= (char)(((A) >> 16));\
*((T)+3)= (char)(((A) >> 24)); }
StringVal myUDFReverso(FunctionContext* context, const BigIntVal& origMyInput)
{
if (origMyInput.is_null)
return StringVal::null();
int64_t myInput=origMyInput.val;
char myInputArr[16];
unsigned int l=0;
unsigned char temp[8];
lowerbytify(temp, myInput);
char calc[4];
calc[3]= '.';
for (unsigned char *p= temp + 4; p-- > temp;)
{
unsigned int c= *p;
unsigned int n1, n2;
n1= c / 100;
c-= n1 * 100;
n2= c / 10;
c-= n2 * 10;
calc[0]= (char) n1 + '0';
calc[1]= (char) n2 + '0';
calc[2]= (char) c + '0';
unsigned int length= (n1 ? 4 : (n2 ? 3 : 2));
unsigned int point= (p <= temp) ? 1 : 0;
char * begin = &calc[4-length];
for(int step = length - point;step>0;step--,l++,begin++)
{
myInputArr[l]=*begin;
}
}
myInputArr[l]='\0';
StringVal result(context,l);
memcpy(result.ptr, myInputArr,l);
return result;
}

I don't think you can assume the string is null-terminated. You should use StringVal::len to iterate over the chars rather than while (*p != '\0'). Also, I'd recommend writing some unit tests using the UDF test framework in the impala-udf-samples github, see this example.

Related

How can I write a recursive query in Power Query to traverse up a tree

Newbie question: I have a table with ID, ParentID, and Type. I want to create two new columns (StrategyID, SubstrategyID) that contains the ID for the row if its Type = 'Strategy' or 'Substrategy'. Otherwise, I want to look at its parent row and return that ID if it matches the Types sought. If not, repeat and look at the parent of the parent, etc. I am not getting the syntax for functions in general and recursive functions in particular in PowerQuery.
I've looked at many examples and videos, and found some help, but not specifically for what I am trying to do.
------------------------------------------------------------
| Existing columns New Colums |
------------------------------------------------------------
| ID | ParentID | Type | StrategyID | SubstrategyID |
| 1 | 0 | Strategy | 1 | |
| 2 | 1 | Substrategy | 1 | 2 |
| 3 | 2 | Feature | 1 | 2 |
| 4 | 3 | Story | 1 | 2 |
| 5 | 3 | Story | 1 | 2 |
| 6 | 1 | Substrategy | 1 | 6 |
| 7 | 6 | Feature | 1 | 6 |
| 8 | 7 | Story | 1 | 6 |
| 9 | 7 | Story | 1 | 6 |
| 10 | 0 | Strategy | 10 | |
| 11 | 10 | Substrategy | 10 | 11 |
| 12 | 11 | Feature | 10 | 11 |
| 13 | 12 | Story | 10 | 11 |
| 14 | 12 | Story | 10 | 11 |
| 15 | 12 | Story | 10 | 11 |
| 16 | 10 | Substrategy | 10 | 16 |
| 17 | 16 | Feature | 10 | 16 |
| 18 | 17 | Story | 10 | 16 |
| 19 | 17 | Story | 10 | 16 |
------------------------------------------------------------
'''
Give this a try. Assumes source data in Table1 with 3 columns --"ID", "ParentID" and "Type"
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
ChangedType = Table.TransformColumnTypes(Source,{{"ID", type text}, {"ParentID", type text}}),
ID_List = List.Buffer( ChangedType[ID] ),
ParentID_List = List.Buffer( ChangedType[ParentID] ),
Type_List = List.Buffer( ChangedType[Type] ),
Highest = (n as text, searchfor as text) as text =>
let
Spot = List.PositionOf( ID_List, n ),
ThisType = Type_List{Spot},
Parent_ID = ParentID_List{Spot}
in if Parent_ID = null or ThisType=searchfor then ID_List{Spot} else #Highest(Parent_ID,searchfor),
FinalTable = Table.AddColumn( ChangedType, "StrategyID", each Highest( [ID],"Strategy" ), type text),
FinalTable2 = Table.AddColumn( FinalTable, "SubstrategyID", each Highest( [ID],"Substrategy" ), type text),
#"Replaced Errors" = Table.ReplaceErrorValues(FinalTable2, {{"SubstrategyID", null}})
in #"Replaced Errors"
I think you want to use PATH and PATHITEM.
Assuming your table is called 'Table'
create a new column:
Path = PATH(Table[ID],Table[ParentID])
Then:
StrategyID = PATHITEM(Table[Path],1,1)
SubstrategyID = PATHITEM(Table[Path],2,1)

Elixir: to assign variable in for generator(variable scope?)

I'm solving, find the largest prime factor of the number, Project Euler problem3.
Following Elixir code throw warnings, and do not evaluate in if block(assigning) I think:
num = 13195
range = num
|> :math.sqrt
|> Float.floor
|> round
for dv <- 2..range do
if rem(num, dv) == 0 and div(num, dv) != 1 do
num = div(num, dv)
end
end
num
|> IO.puts
Warnings are:
$ elixir 3.exs
warning: variable "num" is unused
3.exs:10
warning: the result of the expression is ignored (suppress the warning by assigning the expression to the _ variable)
3.exs:10
13195
$ elixir -v
Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]
Elixir 1.5.3
How can I update(assign) the num?
(following Python and Javascript codes are work for same the problem):
# 3.py
from math import ceil, sqrt
num = 600851475143
for div in range(2, ceil(sqrt(num)) + 1):
if num%div == 0 and num/div != 1:
num /= div
assert int(num) == 6857
// 3.js
var num = 600851475143;
var range = Array.from({length: Math.trunc(Math.sqrt(num))}, (x, i) => i + 2)
for (const div of range) {
if (num%div === 0 && num/div != 1) {
num /= div;
}
}
var assert = require('assert');
assert(num === 6857)
You are actually creating a new variable and shadowing the one from outer scope
You can rewrite it like this
num = 13195
range =
num
|> :math.sqrt()
|> Float.floor()
|> round
num =
2..range
|> Enum.reduce(num, fn elem, acc ->
if rem(acc, elem) == 0 and div(acc, elem) != 1 do
div(acc, elem)
else
acc
end
end)
IO.puts num
More on shadowing:
+------------------------------------------------------------+
| Top level |
| |
| +------------------------+ +------------------------+ |
| | Module | | Module | |
| | | | | |
| | +--------------------+ | | +--------------------+ | |
| | | Function clause | | | | Function clause | | |
| | | | | | | | | |
| | | +----------------+ | | | | +----------------+ | | |
| | | | Comprehension | | | | | | Comprehension | | | |
| | | +----------------+ | | | | +----------------+ | | |
| | | +----------------+ | | ... | | +----------------+ | | |
| | | | Anon. function | | | | | | Anon. function | | | |
| | | +----------------+ | | | | +----------------+ | | |
| | | +----------------+ | | | | +----------------+ | | |
| | | | Try block | | | | | | Try block | | | |
| | | +----------------+ | | | | +----------------+ | | |
| | +--------------------+ | | +--------------------+ | |
| +------------------------+ +------------------------+ |
| |
+------------------------------------------------------------+
Any variable in a nested scope whose name coincides with a variable from the surrounding scope will shadow that outer variable. In other words, the variable inside the nested scope temporarily hides the variable from the surrounding scope, but does not affect it in any way.
source

Enumerating Cartesian product while minimizing repetition

Given two sets, e.g.:
{A B C}, {1 2 3 4 5 6}
I want to generate the Cartesian product in an order that puts as much space as possible between equal elements. For example, [A1, A2, A3, A4, A5, A6, B1…] is no good because all the As are next to each other. An acceptable solution would be going "down the diagonals" and then every time it wraps offsetting by one, e.g.:
[A1, B2, C3, A4, B5, C6, A2, B3, C4, A5, B6, C1, A3…]
Expressed visually:
| | A | B | C | A | B | C | A | B | C | A | B | C | A | B | C | A | B | C |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | | | | | | | | | | | | | | | | | |
| 2 | | 2 | | | | | | | | | | | | | | | | |
| 3 | | | 3 | | | | | | | | | | | | | | | |
| 4 | | | | 4 | | | | | | | | | | | | | | |
| 5 | | | | | 5 | | | | | | | | | | | | | |
| 6 | | | | | | 6 | | | | | | | | | | | | |
| 1 | | | | | | | | | | | | | | | | | | |
| 2 | | | | | | | 7 | | | | | | | | | | | |
| 3 | | | | | | | | 8 | | | | | | | | | | |
| 4 | | | | | | | | | 9 | | | | | | | | | |
| 5 | | | | | | | | | | 10| | | | | | | | |
| 6 | | | | | | | | | | | 11| | | | | | | |
| 1 | | | | | | | | | | | | 12| | | | | | |
| 2 | | | | | | | | | | | | | | | | | | |
| 3 | | | | | | | | | | | | | 13| | | | | |
| 4 | | | | | | | | | | | | | | 14| | | | |
| 5 | | | | | | | | | | | | | | | 15| | | |
| 6 | | | | | | | | | | | | | | | | 16| | |
| 1 | | | | | | | | | | | | | | | | | 17| |
| 2 | | | | | | | | | | | | | | | | | | 18|
or, equivalently but without repeating the rows/columns:
| | A | B | C |
|---|----|----|----|
| 1 | 1 | 17 | 15 |
| 2 | 4 | 2 | 18 |
| 3 | 7 | 5 | 3 |
| 4 | 10 | 8 | 6 |
| 5 | 13 | 11 | 9 |
| 6 | 16 | 14 | 12 |
I imagine there are other solutions too, but that's the one I found easiest to think about. But I've been banging my head against the wall trying to figure out how to express it generically—it's a convenient thing that the cardinality of the two sets are multiples of each other, but I want the algorithm to do The Right Thing for sets of, say, size 5 and 7. Or size 12 and 69 (that's a real example!).
Are there any established algorithms for this? I keep getting distracted thinking of how rational numbers are mapped onto the set of natural numbers (to prove that they're countable), but the path it takes through ℕ×ℕ doesn't work for this case.
It so happens the application is being written in Ruby, but I don't care about the language. Pseudocode, Ruby, Python, Java, Clojure, Javascript, CL, a paragraph in English—choose your favorite.
Proof-of-concept solution in Python (soon to be ported to Ruby and hooked up with Rails):
import sys
letters = sys.argv[1]
MAX_NUM = 6
letter_pos = 0
for i in xrange(MAX_NUM):
for j in xrange(len(letters)):
num = ((i + j) % MAX_NUM) + 1
symbol = letters[letter_pos % len(letters)]
print "[%s %s]"%(symbol, num)
letter_pos += 1
String letters = "ABC";
int MAX_NUM = 6;
int letterPos = 0;
for (int i=0; i < MAX_NUM; ++i) {
for (int j=0; j < MAX_NUM; ++j) {
int num = ((i + j) % MAX_NUM) + 1;
char symbol = letters.charAt(letterPos % letters.length);
String output = symbol + "" + num;
++letterPos;
}
}
What about using something fractal/recursive? This implementation divides a rectangular range into four quadrants then yields points from each quadrant. This means that neighboring points in the sequence differ at least by quadrant.
#python3
import sys
import itertools
def interleave(*iters):
for elements in itertools.zip_longest(*iters):
for element in elements:
if element != None:
yield element
def scramblerange(begin, end):
width = end - begin
if width == 1:
yield begin
else:
first = scramblerange(begin, int(begin + width/2))
second = scramblerange(int(begin + width/2), end)
yield from interleave(first, second)
def scramblerectrange(top=0, left=0, bottom=1, right=1, width=None, height=None):
if width != None and height != None:
yield from scramblerectrange(bottom=height, right=width)
raise StopIteration
if right - left == 1:
if bottom - top == 1:
yield (left, top)
else:
for y in scramblerange(top, bottom):
yield (left, y)
else:
if bottom - top == 1:
for x in scramblerange(left, right):
yield (x, top)
else:
halfx = int(left + (right - left)/2)
halfy = int(top + (bottom - top)/2)
quadrants = [
scramblerectrange(top=top, left=left, bottom=halfy, right=halfx),
reversed(list(scramblerectrange(top=top, left=halfx, bottom=halfy, right=right))),
scramblerectrange(top=halfy, left=left, bottom=bottom, right=halfx),
reversed(list(scramblerectrange(top=halfy, left=halfx, bottom=bottom, right=right)))
]
yield from interleave(*quadrants)
if __name__ == '__main__':
letters = 'abcdefghijklmnopqrstuvwxyz'
output = []
indices = dict()
for i, pt in enumerate(scramblerectrange(width=11, height=5)):
indices[pt] = i
x, y = pt
output.append(letters[x] + str(y))
table = [[indices[x,y] for x in range(11)] for y in range(5)]
print(', '.join(output))
print()
pad = lambda i: ' ' * (2 - len(str(i))) + str(i)
header = ' |' + ' '.join(map(pad, letters[:11]))
print(header)
print('-' * len(header))
for y, row in enumerate(table):
print(pad(y)+'|', ' '.join(map(pad, row)))
Outputs:
a0, i1, a2, i3, e0, h1, e2, g4, a1, i0, a3, k3, e1,
h0, d4, g3, b0, j1, b2, i4, d0, g1, d2, h4, b1, j0,
b3, k4, d1, g0, d3, f4, c0, k1, c2, i2, c1, f1, a4,
h2, k0, e4, j3, f0, b4, h3, c4, j2, e3, g2, c3, j4,
f3, k2, f2
| a b c d e f g h i j k
-----------------------------------
0| 0 16 32 20 4 43 29 13 9 25 40
1| 8 24 36 28 12 37 21 5 1 17 33
2| 2 18 34 22 6 54 49 39 35 47 53
3| 10 26 50 30 48 52 15 45 3 42 11
4| 38 44 46 14 41 31 7 23 19 51 27
If your sets X and Y are sizes m and n, and Xi is the index of the element from X that's in the ith pair in your Cartesian product (and similar for Y), then
Xi = i mod n;
Yi = (i mod n + i div n) mod m;
You could get your diagonals a little more spread out by filling out your matrix like this:
for (int i = 0; i < m*n; i++) {
int xi = i % n;
int yi = i % m;
while (matrix[yi][xi] != 0) {
yi = (yi+1) % m;
}
matrix[yi][xi] = i+1;
}

Indexes hints in a Subquery

I have a SQL statement that has performance issues.
Adding the following index and a SQL hint to use the index improves the performance 10 fold but I do not understand why.
BUS_ID is part of the primary key(T1.REF is the other part fo the key) and clustered index on the T1 table.
The T1 table has about 100,000 rows. BUS_ID has only 6 different values. Similarly the T1.STATUS column can only have a limited number of
possibilities and the majority of these(99%) will be the same value.
If I run the query without the hint(/*+ INDEX ( T1 T1_IDX1) NO_UNNEST */) it takes 5 seconds and with the hint it takes .5 seconds.
I don't understand how the index helps the subquery as T1.STATUS isn't used in any of the 'where' or 'join' clauses in the subquery.
What am I missing?
SELECT
/*+ NO_UNNEST */
t1.bus_id,
t1.ref,
t2.cust,
t3.cust_name,
t2.po_number,
t1.status_old,
t1.status,
t1.an_status
FROM t1
LEFT JOIN t2
ON t1.bus_id = t2.bus_id
AND t1.ref = t2.ref
JOIN t3
ON t3.cust = t2.cust
AND t3.bus_id = t2.bus_id
WHERE (
status IN ('A', 'B', 'C') AND status_old IN ('X', 'Y'))
AND EXISTS
( SELECT /*+ INDEX ( T1 T1_IDX1) NO_UNNEST */
*
FROM t1
WHERE ( EXISTS ( SELECT /*+ NO_UNNEST */
*
FROM t6
WHERE seq IN ( '0', '2' )
AND t1.bus_id = t6.bus_id)
OR (EXISTS
(SELECT /*+ NO_UNNEST */
*
FROM t6
WHERE seq = '1'
AND (an_status = 'Y'
OR
an_status = 'X')
AND t1.bus_id = t6.bus_id))
AND t2.ref = t1.ref))
AND USER IN ('FRED')
AND ( t2.status != '45'
AND t2.status != '20')
AND NOT EXISTS ( SELECT
/*+ NO_UNNEST */
*
FROM t4
WHERE EXISTS
(
SELECT
/*+ NO_UNNEST */
*
FROM t5
WHERE pd IN ( '1',
'0' )
AND appl = 'RYP'
AND appl_id IN ( 'RL100')
AND t4.id = t5.id)
AND t2.ref = p.ref
AND t2.bus_id = p.bus_id);
Edited to include Explain Plan and index.
Without Index hint
------------------------------------------------------|-------------------------------------
Operation | Options |Cost| # |Bytes | CPU Cost | IO COST
------------------------------------------------------|-------------------------------------
select statement | | 20 | 1 | 211 | 15534188 | 19 |
view | | 20 | 1 | 211 | 15534188 | 19 |
count | | | | | | |
view | | 20 | 1 | 198 | 15534188 | 19 |
sort | ORDER BY | 20 | 1 | 114 | 15534188 | 19 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 6 | 1 | 84 | 53256 | 6 |
inlist iterator | | | | | | |
TABLE access t1 | INDEX ROWID | 4 | 1 | 29 | 36502 | 4 |
index-t1_idx#3 | RANGE SCAN | 3 | 1 | | 28686 | 3 |
TABLE access - t2 | INDEX ROWID | 2 | 1 | 55 | 16754 | 2 |
index t2_idx#0 | UNIQUE SCAN | 1 | 1 | | 9042 | 1 |
filter | | | | | | |
TABLE access-t1 | INDEX ROWID | 2 | 1 | 15 | 7433 | 2 |
TABLE access-t6 | INDEX ROWID | 3 | 1 | 4 | 23169 | 3 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7721 | 1 |
filter | | | | | | |
TABLE access-t6 | INDEX ROWID | 2 | 2 | 8 | 15363 | 2 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7521 | 1 |
index-t4_idx#1 | RANGE SCAN | 3 | 1 | 28 | 21584 | 3 |
inlist iterator | | | | | | |
index-t5_idx#1 | RANGE SCAN | 4 | 1 | 24 | 42929 | 4 |
index-t3_idx#0 | INDEX UNIQUE SCAN | 0 | 1 | | 1900 | 0 |
TABLE access-t3 | INDEX ROWID | 1 | 1 | 30 | 9231 | 1 |
--------------------------------------------------------------------------------------------
With Index hint
------------------------------------------------------|-------------------------------------
Operation | Options |Cost| # |Bytes | CPU Cost | IO COST
------------------------------------------------------|-------------------------------------
select statement | | 21 | 1 | 211 | 15549142 | 19 |
view | | 21 | 1 | 211 | 15549142 | 19 |
count | | | | | | |
view | | 21 | 1 | 198 | 15549142 | 19 |
sort | ORDER BY | 21 | 1 | 114 | 15549142 | 19 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 7 | 1 | 114 | 62487 | 7 |
nested loops | | 6 | 1 | 84 | 53256 | 6 |
inlist iterator | | | | | | |
TABLE access t1 | INDEX ROWID | 4 | 1 | 29 | 36502 | 4 |
index-t1_idx#3 | RANGE SCAN | 3 | 1 | | 28686 | 3 |
TABLE access - t2 | INDEX ROWID | 2 | 1 | 55 | 16754 | 2 |
index t2_idx#0 | UNIQUE SCAN | 1 | 1 | | 9042 | 1 |
filter | | | | | | |
TABLE access-t1 | INDEX ROWID | 3 | 1 | 15 | 22387 | 2 |
index-t1_idx#1 | FULL SCAN | 2 |97k| | 14643 | |
TABLE access-t6 | INDEX ROWID | 3 | 1 | 4 | 23169 | 3 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7721 | 1 |
filter | | | | | | |
TABLE access-t6 | INDEX ROWID | 2 | 2 | 8 | 15363 | 2 |
index-t6_idx#0 | UNIQUE RANGE SCAN | 1 | 3 | | 7521 | 1 |
index-t4_idx#1 | RANGE SCAN | 3 | 1 | 28 | 21584 | 3 |
inlist iterator | | | | | | |
index-t5_idx#1 | RANGE SCAN | 4 | 1 | 24 | 42929 | 4 |
index-t3_idx#0 | INDEX UNIQUE SCAN | 0 | 1 | | 1900 | 0 |
TABLE access-t3 | INDEX ROWID | 1 | 1 | 30 | 9231 | 1 |
--------------------------------------------------------------------------------------------
Table Index
CREATE INDEX T1_IDX#1 ON T1 (BUS_ID, STATUS)

I want to generate a unique id in Oracle, it contains alphanumerics and length is 9-digits.please help me out

I want to generate a unique id in Oracle, it contains alphanumerics and length is 9-digits.
I tried,
==> select substr(sys_guid(),5,9) guid from dual;
will it have the unique nature?
please anyone help me out.
Thank u.
Seems to be overcomplicated when you cound just use a numeric sequence but you could do:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE OR REPLACE FUNCTION numberToAlnumString(
n IN NUMBER
) RETURN VARCHAR2
AS
i NUMBER := n;
s VARCHAR2(9);
r NUMBER(2,0);
BEGIN
WHILE i > 0 LOOP
r := MOD( i, 36 );
i := ( i - r ) / 36;
IF ( r < 10 ) THEN
s := TO_CHAR(r) || s;
ELSE
s := CHR( 55 + r ) || s;
END IF;
END LOOP;
RETURN LPAD( s, 9, '0' );
END;
/
CREATE SEQUENCE test__id__seq INCREMENT BY 1 START WITH 1
/
CREATE TABLE test (
id CHAR(9) NOT NULL,
name VARCHAR2(20)
)
/
CREATE OR REPLACE TRIGGER test_ins_trig
BEFORE INSERT ON test
FOR EACH ROW
BEGIN
:new.id := numberToAlnumString( test__id__seq.NEXTVAL );
END;
/
INSERT INTO test ( name )
SELECT TO_CHAR( LEVEL )
FROM DUAL
CONNECT BY LEVEL < 100
/
Query 1:
SELECT * FROM test
Results:
| ID | NAME |
|-----------|------|
| 000000001 | 1 |
| 000000002 | 2 |
| 000000003 | 3 |
| 000000004 | 4 |
| 000000005 | 5 |
| 000000006 | 6 |
| 000000007 | 7 |
| 000000008 | 8 |
| 000000009 | 9 |
| 00000000A | 10 |
| 00000000B | 11 |
| 00000000C | 12 |
| 00000000D | 13 |
| 00000000E | 14 |
| 00000000F | 15 |
| 00000000G | 16 |
| 00000000H | 17 |
| 00000000I | 18 |
| 00000000J | 19 |
| 00000000K | 20 |
| 00000000L | 21 |
| 00000000M | 22 |
| 00000000N | 23 |
| 00000000O | 24 |
| 00000000P | 25 |
| 00000000Q | 26 |
| 00000000R | 27 |
| 00000000S | 28 |
| 00000000T | 29 |
| 00000000U | 30 |
| 00000000V | 31 |
| 00000000W | 32 |
| 00000000X | 33 |
| 00000000Y | 34 |
| 00000000Z | 35 |
| 000000010 | 36 |
| 000000011 | 37 |
| 000000012 | 38 |
| 000000013 | 39 |
| 000000014 | 40 |
| 000000015 | 41 |
| 000000016 | 42 |
| 000000017 | 43 |
| 000000018 | 44 |
| 000000019 | 45 |
| 00000001A | 46 |
| 00000001B | 47 |
| 00000001C | 48 |
| 00000001D | 49 |
| 00000001E | 50 |
| 00000001F | 51 |
| 00000001G | 52 |
| 00000001H | 53 |
| 00000001I | 54 |
| 00000001J | 55 |
| 00000001K | 56 |
| 00000001L | 57 |
| 00000001M | 58 |
| 00000001N | 59 |
| 00000001O | 60 |
| 00000001P | 61 |
| 00000001Q | 62 |
| 00000001R | 63 |
| 00000001S | 64 |
| 00000001T | 65 |
| 00000001U | 66 |
| 00000001V | 67 |
| 00000001W | 68 |
| 00000001X | 69 |
| 00000001Y | 70 |
| 00000001Z | 71 |
| 000000020 | 72 |
| 000000021 | 73 |
| 000000022 | 74 |
| 000000023 | 75 |
| 000000024 | 76 |
| 000000025 | 77 |
| 000000026 | 78 |
| 000000027 | 79 |
| 000000028 | 80 |
| 000000029 | 81 |
| 00000002A | 82 |
| 00000002B | 83 |
| 00000002C | 84 |
| 00000002D | 85 |
| 00000002E | 86 |
| 00000002F | 87 |
| 00000002G | 88 |
| 00000002H | 89 |
| 00000002I | 90 |
| 00000002J | 91 |
| 00000002K | 92 |
| 00000002L | 93 |
| 00000002M | 94 |
| 00000002N | 95 |
| 00000002O | 96 |
| 00000002P | 97 |
| 00000002Q | 98 |
| 00000002R | 99 |
No, this approach will not have the unique nature.
if you want auto increment in your column value you can use Sequence for the this.
CREATE OR REPLACE SEQUENCE dept_seq
INCREMENT BY 1
START WITH 100000000
NOMAXVALUE
NOCYCLE
CACHE 10;
after creating sequence you can use After Insert Trigger to insert identical value.
here is trigger example...
CREATE OR REPLACE TRIGGER dep_ins_trig
BEFORE INSERT ON <table_name>
FOR EACH ROW
BEGIN
SELECT dept_seq.NEXTVAL
INTO :new.emp_id
FROM dual;
END;
/
---------------------------------------------------------------------------------------
Trigger and Sequence can be used when you want serialized (Auto Increment) number that anyone can easily read/remember/understand. But if you don't want to manage ID Column (like emp_id) by this way, and value of this column is not much considerable, you can use SYS_GUID() at Table Creation to get Auto Increment like this.
CREATE TABLE <table_name>
(emp_id RAW(16) DEFAULT SYS_GUID() PRIMARY KEY,
name VARCHAR2(30));
Now your emp_id column will accept "globally unique identifier value".
you can insert value in table by ignoring emp_id column like this.
INSERT INTO <table_name> (name) VALUES ('name value');
So, it will insert unique value to your emp_id Column.

Resources