Something about the optimizer

Something about the optimizer - monetdb

I create a database and connect with it. But when I execute
select optimizer;
it returns
SELECT: identifier 'optimizer' unknown
What's the problem with it? And I can't find the sys table in the database using \d.
If I want to add an optimizer myopt, is it enough for the steps below:
write the opt_myopt.h and opt_myopt.c in /monetdb5/optimizer/
Add the code into codes in /monetdb5/optimizer/opt_wrapper.c
Add the function into optimizer_init_funcs in /monetdb5/optimizer/optimizer.c
Add a new pipe in /monetdb5/optimizer/opt_pipes.c

Since Oct2020, variables now have a schema (to keep it other SQL objects). In your session, 'sys' is not the session's schema, that's why it cannot find the 'optimizer' variable, the same for the tables.
In default branch (will be available in the next release) I added a "schema path" property on the user to search SQL objects besides the current session's schema. By default it includes the 'sys' schema.

For your first question: if your current_schema is not sys, you need to use select sys.optimizer;.
For your second question: the best existing example is probably in monetdb5/extras/mal_optimizer_template. Next to that, it's basically checking the source code to see how other optimisers have been implemented. NB, although it doesn't often happen, the internals of MonetDB can change between (major) versions. I'd recomment you to use Oct2020 or newer.

Concerning your second question,
You also have to create and add an optimizer pipeline to opt_pipes.c. Look for the default_pipe and then copy/paste that one to a new pipeline and add your optimizer to it.
There are some more places where you might need to add your optimizer, like in the codes[]array in opt_wrapper.c. Just mimick one of the standard optimizers like "reorder".

Related

Informatica cloud: use field in pre/post sql commands

I am trying to delete a set of data in the target table based on a column (year) from the lookup in IICS (Informatica Cloud).
I want to solve this problem using pre/post sql commands but the constraint is I can't pass year column to my query.
I tried this:
delete from sample_db.tbl_emp where emp_year = {year}
I want to delete all the employees in a specific year i get from lookup return
For Ex:
I got year as '2019', all the records in table sample_db.tbl_emp containing emp_year=2019 must be deleted.
I am not sure how this works in informatica cloud.
Any leads would be helpful.

How are you getting the year value? A pre/post SQL may not be the way to go unless you need to do this as part of another transformation, i.e., before or after the transformation runs. Also, does your org only have ICDI, or also ICAI? ICAI may be a better solution depending on the value is being provided.

The following steps would help you achieve this.
Create an input-output parameter in your mapping.
Assign the result of your lookup in an expression transformation to the parameter using SetMaxVariable
Use the parameter in your target pre SQL as
delete from sample_db.tbl_emp where emp_year = $$parameter
Let me know if you have any further questions

Validate Oracle Column Names

In one scenario we are dynamically creating sql to create temp tables on-fly. There is no issue with table_name as it is decided by us however the column-names are provided by sources not in our control.
Usually we would check the column names using below query:
select ..
where NOT REGEXP_LIKE (Column_Name_String,'^([a-zA-Z])[a-zA-Z0-9_]*$')
OR Column_Name_String is NULL
OR Length(Column_Name_String) > 30
However is there any build in function which can do a more extensive check. Also any input on the above query is welcome as well.
Thanks in advance.
Final query based on below answers:
select ..
where NOT REGEXP_LIKE (Column_Name_String,'^([a-zA-Z])[a-zA-Z0-9_]{0,29}$')
OR Column_Name_String is NULL
OR Upper(Column_Name_String) in (select Upper(RESERVED_WORDS.Keyword) from V$RESERVED_WORDS RESERVED_WORDS)
Particularly not happy with character's like $ in column name either hence won't be using..
dbms_assert.simple_sql_name('VALID_NAME')
Instead with regexp I can decide my own set of character's to allow.

This answer does not necessarily offer either a performance or logical improvement, but you can actually validate the column names using a single regex:
SELECT ...
WHERE NOT
REGEXP_LIKE (COALESCE(Column_Name_String, ''), '^([a-zA-Z])[a-zA-Z0-9_]{0,29}$')
This works because:
It uses the same pattern to match columns, i.e. starting with a letter and afterwards using only alphanumeric characters and underscore
NULL column names are mapped to empty string, which fails the regex
We use a length quantifier {0,29} to check the column length directly in the regex

" is there any build in function which can do a more extensive check."
Oracle has the DBMS_ASSERT.SIMPLE_SQL_NAME() function. This returns the passed name if it meets the Oracle naming rules ...
select dbms_assert.simple_sql_name('VALID_NAME') from dual;
... and hurls ORA-44003 if the name is invalid.
Valid names permit any characters if the name is double-quoted (yuck, but then so is creating "temp tables on-fly"). Also the function doesn't check the length of the name, so you will still need to validate that yourself.
Find out more in the docs.
Also here is a SQL Fiddle.
"creating a table with comment column is not possible as its a invalid identifier"
Fair point. DBMS_ASSERT is primarily aimed at preventing SQL injection. So it verifies that a value conforms to Oracle's naming rules, not that the value is a valid Oracle name. To catch things like comment you will also need to check the value against V$RESERVED_WORDS, probably where reserved != 'Y'. As this is a V$ view select on it is not granted by default; if you don't have access you'll need to ask your friendly DBA to help out.
" For validating column names I believe I should check with the entire list"
Up to you. The distinction is that some keywords can legitimately be used as identifiers. For instance TYPE only became a reserved word in Oracle version 8 when they introduced the object-relational stuff. But there were a lot of tables and views in existing systems which used 'TYPE' as a column name (not least the Oracle data dictionary). If Oracle had made TYPE a properly reserved word it would have broken all those systems. So the list of reserved words which cannot be used as identifiers is a sub-set of all the Oracle keywords.
Opinions on the general task:
"we are getting data from external sources (files) and the job of the program/script is to push that data to oracle tables."
There are two parts to this task.
The first is that you should have agreed a standard format for these files with the third parties. There should be no need for discovery of the files' structure or content. (Or if there is such a need because the files are randomly sourced from a carousel of third parties probably you should not be using a relational database but something else: Endeca? Python Pandas library?)
The second is the creating tables on the fly. If you have an agreed file structure then you should be loading into standard tables, using either SQL*Loader or external tables according to your circumstances. If you're on 12c maybe SQL*Loader Express Mode could be of interest.

Using variables in From part of a task flow source

Is there any way to use a variable in the from part (for example SELECT myColumn1 FROM ?) in a task flow - source without having to give the variable a valid default value first?
To be more exact in my situation it is so that I'm getting the tablenames out of a table and then use a control workflow to foreach over the list of tablenames and then call a workflow from within that then gets data from these tables each. In this workflow I have the before mentioned SELECT statement.
To get it to work properly I had to set the variable to a valid default value (on package level) as else I could not create the workflow itself (as the datasource couldn't be created as the select was invalid without the default value).
So my question here is: Is there any workaround possible in this case where I don't need a valid default value for the variable?
The datatables:
The different tables which are selected in the dataflow have the exact same tables in terms of columns (thus which columns, naming of columns and datatypes of columns). Only the data inside of them is different (thus its data for customer A, customer B,....).

You're in luck as this is a trivial thing to implement with SSIS.
The base problem for most people is that they come at SSIS like it's still DTS where you could do whatever you want inside a data flow. They threw out the extreme flexibility with DTS in favor of raw processing performance.
You cannot parameterize the table in a SQL statement. It's simply not allowed.
Instead, the approach that people take is to use Expressions. In your case, assuming you had two Variables of type String created, #[User::QualifiedTableName] and #[User::QuerySource]
Assume that [dbo].[spt_values] is assigned to QualifiedTableName. As you loop through the table names, you will assign the value into this variable.
The "trick" is to apply an expression to the #[User::QuerySource]. Make the expression
"SELECT T.* FROM " + #[User::QualifiedTableName] + " AS T;"
This allows you to change out your table name whenever the value of the other variable changes.
In your data flow, you will change your OLE DB Source to be driven by a query contained in a variable instead of the traditional table selection.
If you want an example of where I use QuerySource to drive a data flow, there's an example on mixing an integer and string in an ssis derived column

Create a second variable. Set its Expression to create the full
Select statement, using the value of the first variable.
In the Data Source, use "SQL command from variable" option for the
Data Access Mode property.
If you can, set a default value for the variable you created in step
That will make filling out the columns from your data source much easier.
If you can't use a default value for the variable, set the Data
Source's ValidateExternalMetadata property to False.
You may have to open the data source with the Advanced Editor and
create Output columns manually.

Is it possible to traverse rowtype fields in Oracle?

Say i have something like this:
somerecord SOMETABLE%ROWTYPE;
Is it possible to access the fields of somerecord with out knowing the fields names?
Something like somerecord[i] such that the order of fields would be the same as the column order in the table?
I have seen a few examples using dynamic sql but i was wondering if there is a cleaner way of doing this.
What i am trying to do is generate/get the DML (insert query) for a specific row in my table but i havent been able to find anything on this.
If there is another way of doing this i'd be happy to use but would also be very curious in knowing how to do the former part of this question - it's more versatile.
Thanks

This doesn't exactly answer the question you asked, but might get you the result you want...
You can query the USER_TAB_COLUMNS view (or the other similar *_TAB_COLUMN views) to get information like the column name (COLUMN_NAME), position (COLUMN_ID), and data type (DATA_TYPE) on the columns in a table (or a view) that you might use to generate DML.
You would still need to use dynamic SQL to execute the generated DML (or at least generate static SQL separately).
However, this approach won't work for identifying the columns in an arbitrary query (unless you create a view of it). If you need that, you might need to resort to DBMS_SQL (or other tools).
Hope this helps.

As far as I know there is no clean way of referencing record fields by their index.
However, if you have a lot of different kinds of updates of the same table each with its own column set to update, you might want to avoid dynamic sql and look in the direction of statically populating your record with values, and then issuing update someTable set row = someTableRecord where someTable.id = someTableRecord.id;.
This approach has it's own drawbacks, like, issuing an update to every, even unchanged column, and thus creating additional redo log data, but I believe it should be considered.

Cache and SqlCacheDependency (ASP.NET MVC)

We need to return subset of records and for that we use the following command:
using (SqlCommand command = new SqlCommand(
"SELECT ID, Name, Flag, IsDefault FROM (SELECT ROW_NUMBER() OVER (ORDER BY #OrderBy DESC) as Row, ID, Name, Flag, IsDefault FROM dbo.Languages) results WHERE Row BETWEEN ((#Page - 1) * #ItemsPerPage + 1) AND (#Page * #ItemsPerPage)",
connection))
I set a SqlCacheDependency declared like this:
SqlCacheDependency cacheDependency = new SqlCacheDependency(command);
But immediately after I run the command.ExecuteReader() instruction, the hasChanged base property of the SqlCacheDependency object becomes true although I did not change the result of the query in any way! And, because of this, the result of this query is not kept in cache.
HttpRuntime.Cache.Insert( cacheKey, list, cacheDependency, Cache.NoAbsoluteExpiration, TimeSpan.FromMinutes(AppConfiguration.CacheExpiration.VeryLowActivity));
Is it because the command has 2 SELECT statements? Is it ROW_NUMBER()? If yes, is there any other way to paginate results?
Please help! After too many hours, a little will be greatly appreciated! Thank you

Running into the same issue and finding the same answers online without any help, I was reasearching the xml invalid subsicription response from profiler.
I found an example on msdn support site that had a slightly different order of code. When I tried it I realized the problem - Don't open your connection object until after you've created the command object and the cache dependency object. Here is the order you must follow and all will be good:
Be sure to enable notifications (SqlCahceDependencyAdmin) and run SqlDependency.Start first
Create the connection object
Create the command object and assign command text, type, and connection object (any combination of constructors, setting properties, or using CreateCommand).
Create the sql cache dependency object
Open the connection object
Execute the query
Add item to cache using dependency.
If you follow this order, and follow all other requirements on your select statement, don't have any permissions issues, this will work!
I believe the issue has to do with how the .NET framework manages the connection, specifically what settings are set. I tried overriding this in my sql command test but it never worked. This is only a guess - what I do know is changing the order immediately solved the issue.
I was able to piece it together from the following to msdn posts.
This post was one of the more common causes of the invalid subscription, and shows how the .Net client sets the properties that are in contrast to what notification requires.
https://social.msdn.microsoft.com/Forums/en-US/cf3853f3-0ea1-41b9-987e-9922e5766066/changing-default-set-options-forced-by-net?forum=adodotnetdataproviders
Then this post was from a user who, like me, had reduced his code to the simplest format. My original code pattern was similar to his.
https://social.technet.microsoft.com/Forums/windows/en-US/5a29d49b-8c2c-4fe8-b8de-d632a3f60f68/subscriptions-always-invalid-usual-suspects-checked-no-joy?forum=sqlservicebroker
Then I found this post, also a very simple reduction of the problem, only his was a simple issue - needing 2 part name for tables. In his case the suggestion resolved the issue. After looking at his code I noticed the main difference was waiting to open the connection object until AFTER the command object AND the dependency object were created. My only assumption is under the hood (I have not yet started reflector to check so only an assumption) the Connection object is opened differently, or order of events and command happen differently, because of this association.
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/bc9ca094-a989-4403-82c6-7f608ed462ce/sql-server-not-creating-subscription-for-simple-select-query-when-using-sqlcachedependency?forum=sqlservicebroker
I hope this helps someone else in a similar issue.

Just a guess, but could it be because your SELECT statement doesn't have an ORDER BY clause?
If you don't specify an explicit ordering then it's possible for the query to return the results in any order each time it is run. Maybe this is causing the SqlCacheDependency object to think that the results have changed.
Try adding an ORDER BY clause:
SELECT ID, Name, Flag, IsDefault
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY #OrderBy DESC) AS Row,
ID, Name, Flag, IsDefault
FROM dbo.Languages
) AS results
WHERE Row BETWEEN ((#Page - 1) * #ItemsPerPage + 1) AND (#Page * #ItemsPerPage)
ORDER BY Row

i'm no expert on SqlCacheDependency, in fact, i found this question whilst looking for answers to my own issues with it! However, i believe the reason your SqlCacheDependency is not working is because your SQL contains a nested sub query.
Take a look at the documentation which lists what you can/can not use in your SQL: Creating a Query for Notification
"....The statement must not contain subqueries, outer joins, or self-joins....."
I also found some invaluable troubleshooting info from a guy at Redgate here: Using and Monitoring SQL 2005 Query Notification that helped me solve my own problem: By using Sql Profiler to trace the QN events he suggests, i was able to spot my connection was incorrectly using the 'SET ARITHABORT OFF' option, causing my notifications to fail.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio