Unit testing of PL/SQL [closed] - oracle

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I would like to ask if you write/make any unit tests on your database and if yes what are yours experiences?
Do the tests are worth the effort? You test only high-level procedure or also functions? What are the best practises?

Testing best practices for PL/SQL or any DB for that matter:
Software 101- The earlier you catch a bug, less expensive it is to fix. By that adage every code going into production should be tested and PL/SQL is no exception. Testing is always worth the effort - no ambiguities there
Database testing should be done at two levels - for the data and about the data
For the data - this includes metrics of data loaded and the process- eg - define sample data set and calculate how much expected counts will be in target tables after the test case is executed.
Secondly Performance test cases - this test the process eg - if you load full production set, how long that takes. Again you don't want to uncover performance issue in production
About the data - this is more business testing, whether the data loaded is as per expected functionality - eg - if if you are aggregating sales rep to their parent companies, is the one to many relationship between company and sales rep valid after you run the test case.
Always create a test query which results in a number, eg - select count of sales rep which are not associated to any company. if the count > 0 then it is a failure
It's a good idea to put test cases, their results, test query and actual result in a table so that you can review them and slice and dice if required.
You can write a SP to automate running the test query from the table and this be repeated very easily and even can be embedded in a batch or a GUI screen

Related

Copy datas from SQLDB into hadoop [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I am studying a use case where we are going to move datas from a SQL database (600TB ~100 tables) into a transformed format into hadoop. We don't have logs enabled in the SQL DB. We decided to copy the datas as a datamart view and to refresh this view every week. The copied datas will be erased every week to be rewritten.
This SQL DB is used for reporting purposes that is derived from the datalake. This OLTP database is an old system we are replacing progressivly. The dataset that is copied is deleted every week and copied again (refreshed).
80% of data copy is straight with no transformation.
20% has redesign.
We identified 3 options :
AirFlow + Beam for the processing
ETL (informatica) was excluded
Kafka (connect, stream, sink into hadoop) with optionnaly CDC Debezium
What do you think is a best approach regarding : performance, overall time to deliver, data architecture ?
Thanks for help !
My thoughts - for what they are worth:
I would definitely not be looking to copy 600TB per week. Given that the majority of this data will not have changed from week to week (I assume) then you should be looking to only copy across the data that has changed. As your data in Hadoop will be partitioned then you would mainly be inserting new data into new partitions - for those records that have changed you will just be dropping/reloading a few partitions
I would copy all the necessary data into a staging area in Hadoop as-is (without transformation) and then process it on the Hadoop platform to produce the data you actually need - you can then drop the staging area data if you want
Data processing tool - if you already have experience of a specific toolset within your company then use that; don't multiply the toolsets in use unless there is critical functionality required that is not available within existing tools. If this one process is all you are going to be using this toolset for then it probably doesn't matter which one you use - pick one that is quickest to learn/deploy. If this toolset is going to be expanded to other use cases then I would definitely use a dedicated ETL/ELT tool rather than use a coding solution (why have you discarded Informatica as a solution?)
The following is definitely an opinion...
If you are building a new analytical platform, I am surprised that you are using Hadoop. Hadoop is legacy technology that has been superseded by more modern and capable Cloud data platforms (Snowflake, etc.).
Also, Hadoop is a horrible platform to try and run analytics on (it's ok as just a data lake to hold data while you decide what you want to do with it). Trying to run queries on it that don't align with how that data is partitioned gives really bad performance (for non-trivial dataset sizes). For example, if your transactions are partitioned by date then running a query to sum transaction values in the last week will run quickly. However, running a query to sum transactions for a specific account (or group of accounts) will perform very badly

support GET more then 1000 record from dynamoDB [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Hi I want to support the option of GET more than 1000 records from DynamoDB and in addition add an option to send via APIgetaway a list of records to dynamodb.
(Both things are not possible at this moment).
Is there a way to do that? Is a suitable Lambda function is the only option?
DynamoDB does not have a limit of getting up to 1000 items - I don't know what in the other layers you use impose this specific limit "1000".
If you want to read all items in the table, or all the items of a partition, you have the Scan and Query requests, respectively, which can bring you back even billions of records - but not in one call of course (you need to do consecutive requests, in what is known as pagination, and there is also the option for a parallel scan.
But it seems what you are really looking for is to read a bunch of unrelated items given their keys. The request for that is BatchGetItem. This request is actually limited to just 100 item keys (much smaller than the limit you mentioned, 1000), and even that number 100 is only guaranteed to work if the items being read are fairly small - otherwise you go over the response size limit and get back responses for only some of the items. But this is hardly a problem - your application can always split up a 10,000-item request into 100 separate requests, sending all those batch requests in sequence or even in parallel.

Performance implications of using (DBMS_RLS) Oracle Row Level Security(RLS)? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
If we use Oracle Row Level Security(RLS) to hide some records - Are there any Performance Implications - will it slow down my SQL Queries? The Oracle Package for this is: DBMS_RLS.
I plan to add: IS_HISTORICAL=T/F to some tables. And then using RLS, hide the records which have value of IS_HISTORICAL=T.
The SQL Queries we use in application are quite complex, with inner/outer joins, subqueries, correlated subqueries etc.
Of the 200 odd tables, about 50 of them will have this RLS Policy (to hide records by IS_HISTORICAL=T) applied on them. Rest of the 150 tables are child tables of these 50 Tables, so RLS is implicit on them.
Any License implications?
Thanks.
"Are there any Performance Implications - will it slow down my SQL
Queries? "
As with all questions relating to performance the answer is, "it depends". RLS works by wrapping the controlled query in an outer query which applies the policy function as a WHERE clause...
select /*+ rls query */ * from (
select /*+ your query */ ... from t23
where whatever = 42 )
where rls_policy.function_t23 = 'true'
So the performance implications rest entirely on what goes in the function.
The normal way of doing these things is to use context namespaces. These are predefined areas of session memory accessed through the SYS_CONTEXT() function. As such the cost of retrieving a stored value from a context is negligible. And as we would normally populate the namespaces once per session - say by an after logon trigger or a similar connection hook - the overall cost per query is trivial. There are different ways of refreshing the namespace which might have performance implications but again these are trivial in the overall schem of things (see this other answer).
So the performance impact depends on what your function actually does. Which brings us to a consideration of your actual policy:
"this RLS Policy (to hide records by IS_HISTORICAL=T)"
The good news is the execution of such a function is unlikely to be costly in itself. The bad news is the performance may still be Teh Suck! anyway, if the ratio of live records to historical records is unfavourable. You will probably end up retrieving all the records and then filtering out the historical ones. The optimizer might push the RLS predicate into the main query but I think it's unlikely because of the way RLS works: it avoids revealing the criteria of the policy to the general gaze (which makes debugging RLS operations a real PITN).
Your users will pay the price of your poor design decision. It is much better to have journalling or history tables to store old records and keep only live data in the real tables. Retaining historical records alongside live ones is rarely a solution which scales.
"Any License implications?"
DBMS_RLS requires an Enterprise Edition license.

using materialised views to fix bugs and reduce code [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
The application I'm working on has a legacy problem where 2 tables were created ADULT and CHILD in an oracle 11g dB.
This has led to a number of related tables that have both a field for ADULT and CHILD no FK applied.
The bugs have arisen where poor development has mapped relationships to the wrong field.
Our technical architect plans to merge the ADULT and CHILD tables in to a new ADULT_CHILD table and create materialised views in place of the tables. The plan is to also create a new id value and replace the I'd values in all associated tables so even if the plsql/apex code maps to the wrong field the data mapping will still be correct.
The reasoning behind this solution it it does not require that we change any other code.
My opinion is this is a fudge but I'm more a Java/.NET OO.
What arguments can I use to convince the architect this is wrong and not a real solution. I'm concerned we are creating a more complex solution and performance will be an issue.
Thanks for any pointers
While it may be a needed solution it might also create new issues. If you really do need to use an MV that is up to date at all times, you need on commit refresh and that in turn tends to make all updates sequential. Meaning that all processes writing to it waits in line for the one updating the table to commit. Not, the table, not the row.
So it is prudent to test the approach with realistic loads. Why does it have to become a single table? Could they not stay separate, add a FK? If you need more control on the updates, rename them and put views with instead-of triggers in their place.

where to get csv sample data? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
As part of my development I need to process some .csv files.
For what it matters I am writing a super fast CSV parser in java
I would like to ask if somebody can name some websites where I can find some good csv files so I can test my app.
Please don't tag this question is inappropriate I think developers would benefit from a list
of good sites where to find sample data
The baseball archive can be downloaded in CSV format. The batting statistics file contains a little over 90,000 rows of data which should be helpful in performance testing your app.
You can download the Sample CSV Data Files from this site.
Examples:
Sample Insurance Data
Real Estate Data
Sales Transactions Data
See also this question on sample data.
I've used http://www.fakenamegenerator.com for these purposes in the past.
Another good source is baseball reference. Pick whatever baseball player or manager you can think of.
http://www.baseball-reference.com/managers/coxbo01.shtml
This is a site that is in beta that can give you data in JSON, XML or CSV. All lists are customizable. This is a sample call to return data as CSV: http://mysafeinfo.com/api/data?list=dowjonescompanies&format=csv
Documentation on lists, formats and options under documentation: http://mysafeinfo.com/content/documentation -
Over 80 data sets available - see a full list under Datasets on the main menu
If you're looking for some large CSV files with real-world data, try http://www.baseball-databank.org.
Severals very nice testing csv files : http://support.spatialkey.com/spatialkey-sample-csv-data/
Sample insurance portfolio,
Real estate transactions,
Sales transactions,
Company Funding Records,
Crime Records
Thank you for the question !

Resources