Background:
I am doing a self study on an Oracle Product (Argus Safety Insight), and I need to understand the database schema of this product. I have installed the database and loaded the schema successfully. I have also generated data model using "SQL DEVELOPER DATA MODELER".
Issue:
This schema has 500 tables and 700 views which together gives around 20K columns, I couldn't navigate through the data model due its huge size; SQL developer hangs.
Question:
Will you please help me with a tool or technique on how to read and understand the logical relationships between tables in such huge databases.
You have two issues.
1: Technical - 'sql dev hangs' - you're asking it to open something so big, it overwhelms the Java Virtual Machine (JVM). For really LARGE models, we recommend you bump this to 2 or even 3 GB.
To increase the memory for the JVM, you need to find the product.conf file for SQL Developer. On Windows, it's under AppData for your user, and roaming profiles. On a Mac/NIX, it's in your $HOME directory, and then in a .SQLDev 'hidden' sub directory.
The file is documented quite well, but you need to do something like -
AddVMOption -Xmx2048m
Save, then re-open SQLDev and your design.
2: Human - how do you make sense of hundreds or thousands of objects in a diagram? You just can't. So you need to find application-driving MAIN tables, and generate SubViews (a subset of the diagram) for easier digestion.
I talk about how to do this here.
Now that your objects are grouped by SubViews, you can now view, print, report, and search them by SubView as well.
Related
At work, my team accesses and works in a number of different databases using our team login. We have a ton of tables and views in each respective schema and I would guess that only ~10% are used regularly. As such, I would like to clean up these schemas to keep only those tables and views which are actually used and delete all the other ones (or at least archive them).
Is there any way for me to see the last time that a view was run, or the last time that a table was queried? My thinking is that if I can see that a view/table hasn't been used in x amount of time, then I'd feel more comfortable dropping it. My fear is that without such a process, I might drop tables/views that are used in Tableau dashboards and for other purposes.
Please check this Link
DBA_HIST tables can show you data depending till what date data is stored but not beyond that and it wont be conclusive.
We are facing one issue in our project i.e. Data verification issue.
The project is about Replication of data from Sybase to oracle DBs.
The table structures for Table A across Sybase, Oracle is same.
Same column and primary key combination across all the databases.
e.g. If Sybase has Table A with columns a, b and C
same table with same name and same columns will be available in different databses.
We are done with replication stuff part.But we faced some silent failure like data discrepancy just wondering if there will any tool already available for this.
Any information on his would be helpful. Thanks.
Sybase (now SAP) has a couple products that can be used for data comparisons and reconciliation:
rs_subcmp - an older, 32-bit tool that comes with the Sybase Replication Server product that can be used to compare data between
source and target; SQL reconciliation scripts can be generated from
the differences and then applied to the target to bring it in sync
with the source; if your tables are more than 1GB in size you can
still use rs_subcmp but you'll need to create multiple comparison
jobs (via where clauses) to work on different subsets of your tables
[I don't recall if rs_subcmp can be use for heterogeneous
replication setsup, eg, ASE-Oracle.]
Data Assurance (DA) - the newer, 64-bit product ... also from
Sybase ... which can also compare data and (re)sync the target(s)
from the source (either via SQL reconciliation scripts or directly);
DA is capable of handling comparisons between a handful of
different RDBMS products (eg, ASE-Oracle); I'm currently working on a
project where one of the requirements is to validate (and reconcile
where needed) 200+TB of data being migrated from Oracle to HANA and
I'm using DA for the validation/reconciliation portion of the project
As #TenG has hinted at with his answer, there's a good bit of effort involved to compare data and generate code to reconcile the differences. Rolling your own code is doable but will entail a lot of work. If you've got the money you'll likely find 3rd party tools can get most/all of the work done for you.
If you used a 3rd party product to replicate your data from Sybase to Oracle, you may want to see if the same vendor has a comparison/validation/reconciliation tool you could use.
I've worked on a few migration projects and a key part has always been data reconciliation.
I can only talk about the approaches we took, based on constraints around tools available and minimising downtime, and constraints of available space.
In all cases I took to writing scripts that worked on two levels - summary view and "deep dive". We couldn't find any tools readily available that did what we wanted in a timely enough manner. In fact even the migration tools we found had limitations (datapump, sqlloader, golden gate, etc) and hand coded scripts to handle the bits that we found to be lacking or too slow in the standard tools.
The summary view varied from project to project. It was part functional based (do the accounting figures for transactions match) for the users to verify, and part technical. For smaller tables we could just write simple reports and the diff was straight forward.
For larger tables we wrote technical reports that looked at bands of data (e.g group the PK into 1000s) collect all the column data and produce checksum, generating a report for each table like:
PK ID Range Start Checksum
----------------- -----------
100000 22773377829
200000 38938938282
.
.
Corresponding table pairs from each database were then were "diff"d against each other to highlight discrepancies. Any differences that were found could then be looked at in more detail.
The scripts were written in such a way to allow them to run in parallel looking at discrete bands. Te band ranges were tunable as well to get the best throughput. This obviously sped things up.
The scripts were shell scripts firing off sqlplus reports, and similar for the source database.
On one project there wasn't enough diskspace to do these reports, so I wrote a Java program that queried the two databases side by side, using block queues to fetch and compare rowsets. Being in memory meant this was super fast.
For the "deep dive" we looked at the details for key tables, or for tables that reports a checksum difference.
For the user reports, the users would specify what they wanted to see, and we wrote the reports accordingly.
On the last project, the only discrepancies found were caused by character set conversion issues (people names with accents weren't handled correctly).
On projects where the overall dataset was smaller we extracted the data to XML files and wrote a Java tool to processes pairs and report differences.
The SAP/Sybase rs_subcmp tool is pretty powerful and also pretty hard to use. For details see:
https://help.sap.com/viewer/075940003f1549159206fcc89d020515/16.0.3.3/en-US/feb58db1bd1c1014b134ef4efef25563.html?q=rs_subcmp
You have to pass it key field information, but once you do that, it can retry/restart the compare streams after transient differences. Pretty fancy.
rs_subcmp expects to work on Sybase data source. So to compare against Oracle, you'd probably have to setup one of those Sybase-to-Oracle gateway products ($$$$$).
Could you install the Oracle ODBC drivers and configure them to allow Sybase clients to access Oracle? I'm guessing not (but that's outside the range of my experience).
Note the "-h" option for rs_subcmp. The docs just say it runs a "fast comparison", but what it's actually doing is running queries using the hashbytes() function. Something like:
select keyfield1,keyfield2, hashbytes("Md5",datacol1,datacol2,datacol3)
from mytable
So this sort of query might be good for the "summary view" type comparison discussed above (if the Oracle STANDARD_HASH() function output matches up with the Sybase hashbytes() function (again, outside my experience))
Note, as of ASE 16, there was a bug with the hash() & hashbytes() functions running the Md5 hash option against large varbinary columns where they could use up all procedure cache, potentially crashing the server (CR 811073)
We are planning a new system for a client in ORACLE 11g. I've been mostly in the Sql Server world for several years, and am not really current on the latest ORACLE updates.
One particular feature I'm wondering if ORACLE has added in by this point is some sort of logical "container" for database objects, akin to Sql Server's SCHEMA.
Trying to use ORACLE's schemas like Sql Server winds up being a disaster for code comparisons when trying to push from dev > test > live.
Packages are sort of similar, except that you can't put tables into a package (so they really only work for logical code grouping).
The only other option I am aware of is the archaic practice of having to prefix object names with a "schema" prefix, i.e. RPT_REPORTS, RPT_PARAMETERS, RPT_LOGS, RPT_USERS, RPT_RUN_REPORT(), with the prefix RPT_ denoting that these are all the objects dealing with our reporting engine say. Writing a system like this feels like we never left the 8.3 file-naming age.
Is there by this point in time any cleaner, more direct way of logically grouping related objects together in ORACLE?
Oracle's logical container for database objects IS the schema. I don't know how much "cleaner" and "more direct" you can get! You are going to have to do a paradigm shift here. Don't try to think in SQL Server terms, and force a solution that looks like SQL Server on Oracle. Get familiar with what Oracle does and approach your problems from that perspective. There should be no problem pushing from dev to test to production in Oracle if you know what you're doing.
It seems you have a bit of a chip on your shoulder about Oracle when you use terms like "archaic practice". I would suggest you make friends with Oracle's very rich and powerful feature set by doing some reading, since you're apparently already committed to Oracle for this project. In particular, pick up a copy of "Effective Oracle By Design" by Tom Kyte. Once you've read that, have a look at "Expert Oracle Database Architecture" by the same author for a more in-depth look at how Oracle works. You owe it to your customer to know how to use the tool you've been handed. Who knows? You might even start to like it. Think of it as another tool in your toolchest. You're not married to SQL Server and you're not being unfaithful by using Oracle ;-)
EDIT:
In response to questions by OP:
I'm not sure why that is a logistical problem. They can be thought of as separate databases, but physically they are not. And no, you do not need a separate data file for each schema. A single datafile is often used for all schemas.
If you want a "nice, self-contained database" ala SQL Server, just create one schema to store all your objects. End of problem. You can create other users/schemas, just don't give them the ability to create objects.
There are tools to compare objects and data, as in the PL/SQL Developer compare. Typically in Oracle you want to compare schemas, not entire databases. I'm not sure why it is you want to have multiple schemas each with their own objects anyway. What does is buy you to do that? Keep your objects (tables, triggers, code, views, etc.) in one schema.
Looking for a bit of advice on how to optimise one of our projects. We have a ASP.NET/C# system that retrieves data from a SQL2008 data and presents it on a DevExpress ASPxGridView. The data that's retrieved can come from one of a number of databases - all of which are slightly different and are being added and removed regularly. The user is presented with a list of live "companies", and the data is retrieved from the corresponding database.
At the moment, data is being retrieved using a standard SqlDataSource and a dynamically-created SQL SELECT statement. There are a few JOINs in the statement, as well as optional WHERE constraints, again dynamically-created depending on the database and the user's permission level.
All of this works great (honest!), apart from performance. When it comes to some databases, there are several hundreds of thousands of rows, and retrieving and paging through the data is quite slow (the databases are already properly indexed). I've therefore been looking at ways of speeding the system up, and it seems to boil down to two choices: XPO or LINQ.
LINQ seems to be the popular choice, but I'm not sure how easy it will be to implement with a system that is so dynamic in nature - would I need to create "definitions" for each database that LINQ could access? I'm also a bit unsure about creating the LINQ queries dynamically too, although looking at a few examples that part at least seems doable.
XPO, on the other hand, seems to allow me to create a XPO Data Source on the fly. However, I can't find too much information on how to JOIN to other tables.
Can anyone offer any advice on which method - if any - is the best to try and retro-fit into this project? Or is the dynamic SQL model currently used fundamentally different from LINQ and XPO and best left alone?
Before you go and change the whole way that your app talks to the database, have you had a look at the following:
Run your code through a performance profiler (such as Redgate's performance profiler), the results are often surprising.
If you are constructing the SQL string on the fly, are you using .Net best practices such as String.Concat("str1", "str2") instead of "str1" + "str2". Remember, multiple small gains add up to big gains.
Have you thought about having a summary table or database that is periodically updated (say every 15 mins, you might need to run a service to update this data automatically.) so that you are only hitting one database. New connections to databases are quiet expensive.
Have you looked at the query plans for the SQL that you are running. Today, I moved a dynamically created SQL string to a sproc (only 1 param changed) and shaved 5-10 seconds off the running time (it was being called 100-10000 times depending on some conditions).
Just a warning if you do use LINQ. I have seen some developers who have decided to use LINQ write more inefficient code because they did not know what they are doing (pulling 36,000 records when they needed to check for 1 for example). This things are very easily overlooked.
Just something to get you started on and hopefully there is something there that you haven't thought of.
Cheers,
Stu
As far as I understand you are talking about so called server mode when all data manipulations are done on the DB server instead of them to the web server and processing them there. In this mode grid works very fast with data sources that can contain hundreds thousands records. If you want to use this mode, you should either create the corresponding LINQ classes or XPO classes. If you decide to use LINQ based server mode, the LINQServerModeDataSource provides the Selecting event which can be used to set a custom IQueryable and KeyExpression. I would suggest that you use LINQ in your application. I hope, this information will be helpful to you.
I guess there are two points where performance might be tweaked in this case. I'll assume that you're accessing the database directly rather than through some kind of secondary layer.
First, you don't say how you're displaying the data itself. If you're loading thousands of records into a grid, that will take time no matter how fast everything else is. Obviously the trick here is to show a subset of the data and allow the user to page, etc. If you're not doing this then that might be a good place to start.
Second, you say that the tables are properly indexed. If this is the case, and assuming that you're not loading 1,000 records into the page at once and retreiving only subsets at a time, then you should be OK.
But, if you're only doing an ExecuteQuery() against an SQL connection to get a dataset back I don't see how Linq or anything else will help you. I'd say that the problem is obviously on the DB side.
So to solve the problem with the database you need to profile the different SELECT statements you're running against it, examine the query plan and identify the places where things are slowing down. You might want to start by using the SQL Server Profiler, but if you have a good DBA, sometimes just looking at the query plan (which you can get from Management Studio) is usually enough.
This question is addressed to a degree in this question on LINQ to SQL .dbml best practices, but I am not sure how to add to a question.
One of our applications uses LINQ to SQL and we have currently have one .dbml file for the entire database which is becoming difficult to manage. We are looking at refactoring it a bit into separate files that are more module/functionality specific, but one problem is that many of the high level classes would have to be duplicated in several .dbml files as the associations can't be used across .dbml files (as far as I know), with the additional partial class code as well.
Has anyone grappled with this problem and what recommendations would you make?
Take advantage of the namespace settings. You can get to it in properties from clicking in the white space of the ORM.
This allows me to have a Users table and a User class for one set of business rules and a second (but the same data store) Users table and a User class for another set of business rules.
Or, break up the library, which should also have the affect of changing the namespacing depending on your company's naming conventions. I've never worked on an enterprise app where I needed access to every single table.
Past a certain size it probably becomes easier to work with the xml instead of the dbml designer.
I have written a tool too! Mine is for scripting changes to dbml files using c# so you can rerun them and not lose changes. See my blog http://www.adverseconditionals.com 4 more details
The approach that we've used it to keep 2 .dbml files. One of them holds the Stored Procs and all production DB access is done through this. The other is in a unit test folder and holds tables and their relationships and is used for DB data manipulation and querying for unit tests.
I have written a utility to address exactly that problem, I needed a quick app to let you select only the database objects you need. In my case I often needed a complex view, but no tables.
http://www.codeplex.com/SqlMetalInclude/