User-defined function not returning correct results in Azure Cosmos DB - user-defined-functions

We are trying to use a simple user-defined function (UDF) in the where clause of a query in Azure Cosmos DB, but it's not working correctly. The end goal is to query all results where the timestamp _ts is greater than or equal to yesterday, but our first step is to get a UDF working.
The abbreviated data looks like this:
[
{
"_ts": 1500000007
}
{
"_ts": 1500000005
}
]
Using the Azure Portal's Cosmos DB Data Explorer, a simple query like the following will correctly return one result:
SELECT * FROM c WHERE c._ts >= 1500000006
A simple query using our UDF incorrectly returns zero results. This query is below:
SELECT * FROM c WHERE c._ts >= udf.getHardcodedTime()
The definition of the function is below:
function getHardcodedTime(){
return 1500000006;
}
And here is a screenshot of the UDF for verification:
As you can see, the only difference between the two queries is that one query uses a hard-coded value while the other query uses a UDF to get a hard-coded value. The problem is that the query using the UDF returns zero results instead of returning one result.
Are we using the UDF correctly?
Update 1
When the UDF is updated to return the number 1, then we get a different count of results each time.
New function:
function getHardcodedTime(){
return 1;
}
New query: SELECT count(1) FROM c WHERE c._ts >= udf.getHardcodedTime()
Results vary with 7240, 7236, 7233, 7264, etc. (This set is the actual order of responses from Cosmos DB.)

By your description, the most likely cause is that the UDF version is slow and returns a partial result with continuation token instead of final result.
The concept of continuation is explained here as:
If a query's results cannot fit within a single page of results, then the REST API returns a continuation token through the x-ms-continuation-token response header. Clients can paginate results by including the header in subsequent results.
The observed count variation could be caused by the query stopping for next page at slightly different times. Check the x-ms-continuation header to know if that's the case.
Why is UDF slow?
UDF returning a constant is not the same as constant for the query. Correct me if you know better, but from CosmosDB side, it does not know that the specific UDF is actually deterministic, but assumes it could evaluate to any value and hence has to be executed for each document to check for a match. It means it cannot use an index and has to do a slow full scan.
What can you do?
Option 1: Follow continuations
If you don't care about the performance, you could keep using the UDF and just follow the continuation until all rows are processed.
Some DocumentDB clients can do this for you (ex: .net API), so this may be the fastest fix if you are in a hurry. But beware, this does not scale (keeps getting slower and cost more and more RU) and you should not consider this a long-term solution.
Option 2: Drop UDF, use parameters
You could pass the hardcodedTime instead as parameter. This way the query execution engine would know the value, could use a matching index and give you correct results without the hassle of continuations.
I don't know which API you use, but related reading just in case: Parameterized SQL queries.
Option 3: Wrap in stored proc
If you really-really must control the hardcodedTime in UDF then you could implement a server-side procedure which would:
query the hardcodedTime from UDF
query the documents, passing the hardcodedTime as parameter
return results as SP output.
It would use the UDF AND index, but brings a hefty overhead in the amount of code required. Do your math if keeping UDF is worth the extra effort in dev and maintenance.
Related documentation about SPs: Azure Cosmos DB server-side programming: Stored procedures, database triggers, and UDFs

Related

Executing DAX query via .NET Core AdomdConnection returns null

I have a very simple DAX query which I can execute via SSMS with no problems.
As show above, the query returns 2 rows, one from the dynamic expression and one hardcoded "123".
When executed via C# Microsoft.AnalysisServices.AdomdClient, the hardcoded "123" row value is returned (oddly the ordinal position of rows does not match the execution in SSMS), but the dynamic expression row value is always null. This issue does seem to be isolated to the tables of this model as querying brand new "AdventureWorks" model programmatically does return values for dynamic expressions as well.
I am at a loss of what to explore next and I don't have high familiarity with querying Analysis Services programmatically, so any help would or hints what to try would be much appreciated.

User-Defined Function Nested in Macro - Framework Manager

We are trying to create a dynamic query subject filter. To do this, we are trying to nest an Oracle user-defined function inside of a macro.
Example query subject filter:
#strip(ORACLE_USER_DEFINED_FUNCTION())#
We have imported the oracle function ORACLE_USER_DEFINED_FUNCTION into Framework Manager. The function returns a VARCHAR2 of the desired expression. For testing purposes, this function is simply returning VARCHAR2 value of '1=1' (the single quotes are not part of the VARCHAR2 return value).
The idea being that we want the query subject filter expression to be dynamically generated at run-time so the resulting query contains '...WHERE 1=1'. The strip macro is the mechanism to pre-process and invoke the user-defined function before the query is sent to the database.
However, when attempting to verify/check the query subject filter we receive the following error.
XQE-GEN-0018 Query Service internal error has occurred, please see the log for details.
I'm trying to get a hold of the query service log, but don't yet have it.
Perhaps there is some casting needed to convert the oracle VARCHAR2 output from the function to an IBM/Cognos string that is acceptable input for the IBM/Cognos macro.
Your assistance is appreciated. Thanks in advance.
Using Oracle 12c and Cognos 11.1.
With the exception of the queryValue macro function, macros are evaluated prior to accessing the database. This means the macro does not know what the Oracle UDF is or what its supposed to do. If you are able to call the UDF via a FM query subject you maybe able to get away with something similar to the queryValue answer found here:
Cognos 11.1.7 Framework manager change the table for the query subject, type data

Is there a way to tag a Snowflake view as "safe" for result reuse?

Reading the Snowflake documentation for Using Persisted Query Results, one of the conditions that must be met for result reuse is the following:
The query does not include user-defined functions (UDFs) or external
functions.
After some experiments with this trivial UDF and view:
create function trivial_function(x number)
returns number
as
$$
x
$$
;
create view redo as select trivial_function($1) as col1, $2 as col2
from values (1,2), (3,4);
I've verified that this condition also applies to queries over views that use UDFs in their definitions.
The thing is, I have a complex view that would benefit much from result reuse, but employs a lot of UDFs for the sake of clarity. Some UDFs could be inlined, but that would make the view much more difficult to read. And some UDFs (those written in Javascript) are impossible to inline.
And yet, I know that all the UDFs are "pure" in the functional programming sense: for the same inputs, they always return the same outputs. They don't check the current timestamp, generate random values, or reference some other table that might change between invocations.
Is there some way to "convince" the query planner that a view is safe for result reuse, despite the presence of UDFs?
There is parameter called IMMUTABLE:
CREATE FUNCTION
VOLATILE | IMMUTABLE
Specifies the behavior of the UDF when returning results:
VOLATILE: UDF might return different values for different rows, even for the same input (e.g. due to non-determinism and statefullness).
IMMUTABLE: UDF assumes that the function, when called with the same inputs, will always return the same result. This guarantee is not checked. Specifying IMMUTABLE for a UDF that returns different values for the same input will result in undefined behavior.
Default: VOLATILE

Oracle In Clause not working when using Parameter

I have a Pesky SSRS report Problem where in the main query of my report has a condition that can have more than 1000 choices and when user selects all it will fail as my backend database is Oracle. I have done some research and found a solution that would work.
Solution is
re-writing the in clause something like this
(1,ColumnName) in ((1,Searchitem1),(1,SearchItem2))
this will work however when I do this
(1,ColumnName) in ((1,:assignedValue))
and pass just one value it works. But when I pass more than one value it fails and gives me ORA-01722: Invalid number error
I have tried multiple combination of the same in clause but nothing is working
any help is appreciated...
Wild guess: your :assignedValue is a comma-separated list of numbers, and Oracle tries to parse it as a single number.
Passing multiple values as a single value for an IN query is (almost) never a good idea - either you have to use string concatenation (prone to SQL injection and terrible performance), or you have to have a fixed number of arguments to IN (which generally is not what you want).
I'd suggest you
INSERT your search items into a temporary table
use a JOIN with this search table in your SELECT

What is the best way to integrate Solr as an index with Oracle as a storage DB?

I have an Oracle database with all the "data", and a Solr index where all this data is indexed. Ideally, I want to be able to run queries like this:
select * from data_table where id in ([solr query results for 'search string']);
However, one key issue arises:
Oracle WILL NOT allow more than 1000 items in the array of items in the "in" clause (BIG DEAL, as the list of objects I find is very often > 1000 and will usually be around the 50-200k items)
I have tried to work around this using a "split" function that will take a string of comma-separated values, and break them down into array items, but then I hit the 4000 char limit on the function parameter using SQL (PL/SQL is 32k chars, but it's still WAY too limiting for 80,000+ results in some cases)
I am also hitting performance issues using a WHERE IN (....), I am told that this causes a very slow query, even when the field referenced is an indexed field?
I've tried making recursive "OR"s for the 1000-item limit (aka: id in (1...1000 or (id in (1001....2000) or id in (2001....3000))) - and this works, but is very slow.
I am thinking that I should load the Solr Client JARs into Oracle, and write an Oracle Function in Java that will call solr and pipeline back the results as a list, so that I can do something like:
select * from data_table where id in (select * from table(runSolrQuery('my query text')));
This is proving quite hard, and I am not sure it's even possible.
Things that I can't do:
Store full data in Solr (security +
storage limits)
User Solr as
controller of pagination and ordering
(this is why I am fetching data from
the DB)
So I have to cook up a hybrid approach where Solr really act like the full-text search provider for Oracle. Help! Has anyone faced this?
Check this out:
http://demo.scotas.com/search-sqlconsole.php
This product seems to do exactly what you need.
cheers
I'm not a Solr expert, but I assume that you can get the Solr query results into a Java collection. Once you have that, you should be able to use that collection with JDBC. That avoids the limit of 1000 literal items because your IN list would be the result of a query, not a list of literal values.
Dominic Brooks has an example of using object collections with JDBC. You would do something like
Create a couple of types in Oracle
CREATE TYPE data_table_id_typ AS OBJECT (
id NUMBER
);
CREATE TYPE data_table_id_arr AS TABLE OF data_table_id_typ;
In Java, you can then create an appropriate STRUCT array, populate this array from Solr, and then bind it to the SQL statement
SELECT *
FROM data_table
WHERE id IN (SELECT * FROM TABLE( CAST (? AS data_table_id_arr)))
Instead of using a long BooleanQuery, you can use TermsFilter (works like RangeFilter, but the items doesn't have to be in sequence).
Like this (first fill your TermsFilter with terms):
TermsFilter termsFilter = new TermsFilter();
// Loop through terms and add them to filter
Term term = new Term("<field-name>", "<query>");
termsFilter.addTerm(term);
then search the index like this:
DocList parentsList = null;
parentsList = searcher.getDocList(new MatchAllDocsQuery(), searcher.convertFilter(termsFilter), null, 0, 1000);
Where searcher is SolrIndexSearcher (see java doc for more info on getDocList method):
http://lucene.apache.org/solr/api/org/apache/solr/search/SolrIndexSearcher.html
Two solutions come to mind.
First, look into using Oracle specific Java extensions to JDBC. They allow you to pass in an actual array/list as an argument. You may need to create a stored proc (it has a been a while since I had to do this), but if this is a focused use case, it shouldn't be overly burdensome.
Second, if you are still running into a boundary like 1000 object limits, consider using the "rows" setting when querying Solr and leveraging it's inherent pagination feature.
I've used this bulk fetching method with stored procs to fetch large quantities of data which needed to be put into Solr. Involve your DBA. If you have a good one, and use the Oracle specific extensions, I think you should attain very reasonable performance.

Resources