Input list to Snowflake SQL udf - user-defined-functions

I have created a Snowflake SQL udf I call with the following code:
select *
from table(drill_top_down('12345','XXX)) order by depth,path;
If I need to run the query for multiple items is it then possible to input a list or similar to the udf, and then loop through my input list?
Or can I somehow call my function in smarter way so I can get the result from multiple inputs?

You can provide a Snowflake Array, Object or Variant with your argument sets nested within, and use that as input to the table function.
Adapting your example, using array construct to provide two sets of arguments the input would look something like :
select *
from table(drill_top_down(
array_construct(
array_construct('12345','XXX'),
array_construct('67890','YYY')
)::array;
Or my preference is to use parse_json, as I find it easier to read
select *
from table(drill_top_down(parse_json('
[ ["12345","XXX"],
["67890","YYY"] ]')::array;
You will need to adapt your Table Function to unpack the Argument Sets using a common-table-expression (CTE) to tabularise the input arguments and then unnest them with Lateral Flatten.
Here's a trivial example:
CREATE OR REPLACE FUNCTION array_concat ( arr array)
RETURNS TABLE ( concatenated_string varchar )
AS
$$
With a as (Select arr)
Select listagg(value)
From a, table(flatten(input => arr))
$$
;
Here is a slightly more sophisticated example that performs an operation with each argument set, using row_number() to group them.
CREATE OR REPLACE FUNCTION array_calcs ( arg_list array)
RETURNS TABLE
( arg_id integer,
array_sz integer,
array_sum integer,
array_mean decimal(12,2) )
AS
$$
With
-- CTE containing the ARGS
arg_input as (select arg_list),
-- CTE un-nest (flatten) first level of args list to each args set
arg_sets as
(Select row_number() over (order by NULL desc) as arg_id, value as arg_set
From arg_input, lateral flatten(input => arg_list))
-- Do something with the Args. e.g. Perform some calculations with the Input arguments
Select arg_id , count(*) array_sz, sum(value)::integer array_sum, array_sum/array_sz::decimal(12,2) array_mean
From arg_sets, table(flatten(input => arg_set))
Where is_decimal( value ) or is_integer( value ) or is_double( value ) -- filter out non-numeric arguments i.e. validate inputs
Group By arg_id
$$;
This works if we provide the following input arguments
Select * from table(array_calcs(parse_json('[ [1],
[1,2],
[1,2,3],
[1,2,3,4],
["A","B"],
["A",1]
]')::array));
Producing the following:
ARG_ID
ARRAY_SZ
ARRAY_SUM
ARRAY_MEAN
1
1
1
1.0
2
2
3
1.5
3
3
6
2.0
4
4
10
2.5
6
1
1
1.0
But word of caution. If your aim was to build your arguments directly from your data, rather than hard-code them in the function call, you are more than likely to run into this issue:
Create or replace View V_array_calcs_input as
Select parse_json($1)::array arg_list
from (values ('[[1],[1,2],[1,2,3],[1,2,3,4],["A","B"], ["A",1]'));
Select *
from V_array_calcs_input,
table(array_calcs(arg_list));
SQL compilation error: Unsupported subquery type cannot be evaluated
A Stored Procedure, or JavaScript UDF/UDTF may be better options to resolve this, if you can build the functional logic you need in either of those.

Related

PLSQL aggregation function using type object PARALLEL_ENABLE AGGREGATE

I have question. Already i have avg_new function, which include nulls (as 0) in the result. I have start from linkedin link.
Code:
select avg(a),avg_new(nvl(a,-9999)) from
(select 'test' h, 2 a from dual
union all
select 'test' h, null a from dual
union all
select 'test' h, 2 a from dual
union all
select 'test' h ,2 a from dual)
the results are
2; 1,5
I would like to extend avg_new function by adding denominator parameter ex.:
avg_new(nvl(a,-9999),10)
Te result should be then 0.6
Default value of the parameter would be null, then function works as previous example. If the parameter would be >0 then I would divide sum of 'a' by value of this parameter. How i could do this? I would like to pass this parameter to used type object and to perform further calculations there. Is it possible?
create or replace FUNCTION avg_new (input NUMBER , denominator NUMBER DEFAULT NULL) RETURN NUMBER
PARALLEL_ENABLE AGGREGATE USING T_avg_new;
Right now the type could proper read only the first parameter. After adding i have errors:
ORA-29925: cannot execute T_avg_new.ODCIAGGREGATEINITIALIZE
ORA-06553: PLS-306: wrong number or types of arguments in call to "ODCIAGGREGATEINITIALIZE"
00000 - "cannot execute %s"
*Cause: The specified function does not exist or does not have an
appropriate signature.
*Action: Implement the function with the appropriate signature.

OUT parameter with multiples values

create or replace PROCEDURE Show_R(A IN VARCHAR2, B OUT VARCHAR2)
IS
BEGIN
select func_w(day),TO_CHAR(hour, 'HH24:MI')INTO B
from task t
inner join mat m
on t.id_p = m.id_a
where m.cod_mod = A;
END;
I have a issue with this code, this select gets two types of columns data that are not the same type of data, i don't know how to add into B two types of data in only one "out parameter"
You can't put 2 values into 1 OUT parameter. So, use 2 OUT parameters.
Firstly don't store day and hour in separate columns. Just use a single DATE column as, in Oracle, the DATE data type has year, month, day, hour, minute and second components and so can store both the date and time.
Secondly, don't use A, B, show_R or func_w identifiers; use meaningful names as it will be far easier to debug your code in 6-months if you can tell what it is intended to do.
Third, your SELECT ... INTO statement will fail as you have two columns but only one variable to select into; you need 2 variables in INTO clause and this means (unless you are going to concatenate the two values) that you need 2 OUT parameters.
CREATE PROCEDURE Show_w_day_and_hour(
i_cod_mod IN mat.cod_mod%TYPE,
o_w_day OUT VARCHAR2,
o_hour OUT VARCHAR2
)
IS
BEGIN
SELECT func_w(day),
TO_CHAR(hour, 'HH24:MI')
INTO o_w_day,
o_hour
FROM task t
INNER JOIN mat m
ON ( t.id_p = m.id_a )
WHERE m.cod_mod = i_cod_mod;
END;
/
db<>fiddle

PL/SQL array manipulation function

I'm new in PL/SQL. I have a matrix stored in the DB as a nested table. Something like,
the matrix is stored as a TABLE of objects (and objects are t1 number, t2 number, ... t100 number)
To to get the matrix it would be select x.* from test t, table(t.matrix) x where... , returning
|T1|T2|T3|...|T100|
I want to create a function that returns the sum over the row to be called using SQL only, something equivalent to
select sum(x.T1),sum(x.T2)...sum(x.T100) from test t, table(t.matrix) x where ...
Something like select bigsum(x.*) from table t, table(t.matrix)
It will be called several times, and I don't want to write the 100 columns every time.
If you want to sum the values from 100 different columns, you're going to have to explicitly list those 100 columns at some point. You can encapsulate that logic for that expression in a view or a function or a pipelined table function or some other construct so that you don't have to repeat the expression many times, you just have to reference the abstraction you've created (i.e. call the function that sums the 100 values).
Although it would likely complicate the problem rather than simplifying it, you could potentially create a solution that uses dynamic SQL to generate the 100 columns names and the expression to add them together if you really, really want to avoid writing out 100 column names. It is highly unlikely, however, that the extra complexity of resorting to dynamic SQL would be beneficial unless there are substantial requirements that you haven't mentioned here that make writing out the column names more than a bit repetitive.
" it'll be called several times, and don't want to write the 100
columns every time"
Why not create a view? Write it once, call it as many times as you like:
create or replace view bigsum
select t.whatever
, sum(x.T1) as sum_t1
, sum(x.T2) as sum_t2
...
, sum(x.T100) as sum_t100
from test t
, table(t.matrix) x
group by t.whatever
You would need to include identifying columns from TEST to allow you to join the view to other tables. This approach would give you something close to want you want:
select *
from bigsum
where whatever = 23
You can reduce the amount of typing further by processing a result set from the data dictionary view USER_TYPE_ATTRS (or a SQL*Plus description) in a decent text editor with a regex search'n'replace.
you can create a function in the below given form depending on your condition and if you require parameter then you can add them while creating function and use them in the condition required
create or replace function bigsum
return number
as
sumall number;
begin
select (sum(x.T1),sum(x.T2)...sum(x.T100)) into sumall
from test t, table(t.matrix) x where .(your condition).. ;
return sumall;
end;/
and call it in the manner
select bigsum from dual;

Collect to a Map in Hive

I have a Hive table such as
id | value
-------------
A 1
A 2
B 3
A 4
B 5
Essentially, I want to mimic Python's defaultdict(list) and create a map with id as the keys and value as the values.
Query:
select COLLECT_TO_A_MAP(id, value)
from table
Output:
{A:[1,2,4], B:[3,5]}
I tried using klout's CollectUDAF() but it appears this will not append the values to an array, it will just update them. Any ideas?
EDIT:
Here is a more detailed description so I can avoid answers referencing that I try functions in the Hive documentation. Suppose I have a table
num |id |value
____________________
1 A 1
1 A 2
1 B 3
2 A 4
2 B 5
2 B 6
What I am looking for is for a UDAF that provides this output
num |new_map
________________________
1 {A:[1,2], B:[3]}
2 {A:[4], B:[5,6]}
To this query
select num
,COLLECT_TO_A_MAP(id, value) as new_map
from table
group by num
There is a workaround to achieve this. It can be mimicked by using Klout's (see above referenced UDAF) CollectUDAF() in a query such as
add jar '~/brickhouse/target/brickhouse-0.6.0.jar'
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
select num
,collect(id_array, value_array) as new_map
from (
select collect_list(id) as id_array
,collect_list(value) as value_array
,num
from table
group by num
) A
group by num
However, I would rather not write a nested query.
EDIT #2
(As referenced in my original question) I have already tried using Klout's CollectUDAF(), even in the instance where you pass it two parameter and it creates a map. The output from that is (if applied to the dataset in my 1st edit)
1 {A:2, B:3}
2 {A:4, B:6}
As stated in my original question, it doesn't collect the values to an array it just collects the last one (or updates the array).
Use the collect UDF in Brickhouse (http://github.com/klout/brickhouse )
It is exactly what you need. Brickhouse's 'collect' returns a list if one parameter is used, and a map if two parameters are used.
the CollectUDAF in Brickhouse (http://github.com/klout/brickhouse ) will get you there.
regarding your comment EDIT #2:
first, collect the values to a list, then collect the k,v pairs to a map:
select
num,
collectUDAF(id, values) as new_map
from
(
SELECT
num,
id,
collect_set(value) as values
FROM
tbl
GROUP BY
num,
id
) as sub
GROUP BY
num
will return
num | new_map
________________________
1 {A:[1,2], B:[3]}
2 {A:[4], B:[5,6]}
If you don't care about the order in which the values appear, you could use the collect_set() UDAF that comes with Hive.
SELECT id, collect_set(value) FROM table GROUP BY id;
This should solve your issue.
Your current query groups by num in both the inner and outer query -- you need to group by id in the inner query to accomplish what you're trying to do.
https://github.com/klout/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/CollectUDAF.java#L55
see brickhouse udaf,when args num larger than 1, MapCollectUDAFEvaluator would be used.
add jar */brickhouse.jar ;
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
select
collect(a,b)
from( select 1232123 a,21 b
union all select 123 a,23 b)a;
result:{1232123:21,123:23}

Sorting by value returned by a function in oracle

I have a function that returns a value and displays a similarity between tracks, i want the returned result to be ordered by this returned value, but i cannot figure out a way on how to do it, here is what i have already tried:
CREATE OR REPLACE PROCEDURE proc_list_similar_tracks(frstTrack IN tracks.track_id%TYPE)
AS
sim number;
res tracks%rowtype;
chosenTrack tracks%rowtype;
BEGIN
select * into chosenTrack from tracks where track_id = frstTrack;
dbms_output.put_line('similarity between');
FOR res IN (select * from tracks WHERE ROWNUM <= 10)LOOP
SELECT * INTO sim FROM ( SELECT func_similarity(frstTrack, res.track_id)from dual order by sim) order by sim; //that's where i am getting the value and where i am trying to order
dbms_output.put_line( chosenTrack.track_name || '(' ||frstTrack|| ') and ' || res.track_name || '(' ||res.track_id|| ') ---->' || sim);
END LOOP;
END proc_list_similar_tracks;
/
declare
begin
proc_list_similar_tracks(437830);
end;
/
no errors are given, the list is just presented unsorted, is it not possible to order by a value that was returned by a function? if so, how do i accomplish something like this? or am i just doing something horribly wrong?
Any help will be appreciated
In the interests of (over-)optimisation I would avoid ordering by a function if I could possibly avoid it; especially one that queries other tables. If you're querying a table you should be able to add that part to your current query, which enables you to use it normally.
However, let's look at your function:
There's no point using DBMS_OUTPUT for anything but debugging unless you're going to be there looking at exactly what is output every time the function is run; you could remove these lines.
The following is used only for a DBMS_OUTPUT and is therefore an unnecessary SELECT and can be removed:
select * into chosenTrack from tracks where track_id = frstTrack;
You're selecting a random 10 rows from the table TRACKS; why?
FOR res IN (select * from tracks WHERE ROWNUM <= 10)LOOP
Your ORDER BY, order by sim, is ordering by a non-existent column as the column SIM hasn't been declared within the scope of the SELECT
Your ORDER BY is asking for the least similar as the default sort order is ascending (this may be correct but it seems wrong?)
Your function is not a function, it's a procedure (one without an OUT parameter).
Your SELECT INTO is attempting to place multiple rows into a single-row variable.
Assuming your "function" is altered to provide the maximum similarity between the parameter and a random 10 TRACK_IDs it might look as follows:
create or replace function list_similar_tracks (
frstTrack in tracks.track_id%type
) return number is
sim number;
begin
select max(func_similarity(frstTrack, track_id)) into sim
from tracks
where rownum <= 10
;
return sim;
end list_similar_tracks;
/
However, the name of the function seems to preclude that this is what you're actually attempting to do.
From your comments, your question is actually:
I have the following code; how do I print the top 10 function results? The current results are returned unsorted.
declare
sim number;
begin
for res in ( select * from tracks ) loop
select * into sim
from ( select func_similarity(var1, var2)
from dual
order by sim
)
order by sim;
end loop;
end;
/
The problem with the above is firstly that you're ordering by the variable sim, which is NULL in the first instance but changes thereafter. However, the select from DUAL is only a single row, which means you're randomly ordering by a single row. This brings us back to my point at the top - use SQL where possible.
In this case you can simply SELECT from the table TRACKS and order by the function result. To do this you need to give the column created by your function result an alias (or order by the positional argument as already described in Emmanuel's answer).
For instance:
select func_similarity(var1, var2) as function_result
from dual
Putting this together the code becomes:
begin
for res in ( select *
from ( select func_similarity(variable, track_id) as f
from tracks
order by f desc
)
where rownum <= 10 ) loop
-- do something
end loop;
end;
/
You have a query using a function, let's say something like:
select t.field1, t.field2, ..., function1(t.field1), ...
from table1 t
where ...
Oracle supports order by clause with column indexes, i.e. if the field returned by the function is the nth one in the select (here, field1 is in position 1, field2 in position 2), you just have to add:
order by n
For instance:
select t.field1, function1(t.field1) c2
from table1 t
where ...
order by 2 /* 2 being the index of the column computed by the function */

Resources