typed data set; parent/child select and update with ONE trip to the database (for each op)? - visual-studio

Is it possible, using an ADO.NET typed DataSet containing two tables in a parent/child relationship, to populate the DataSet with ONE trip to the d/b (query could return one or two tables; if one, then result set has columns from both tables, right?), and to update the d/b with ONE trip to the d/b (call to generated stored proc, I guess).
By "is it possible", I mean is it possible to have Visual Studio (2012) automagically generate the classes and SQL code to make this happen?
Or am I kind of on my own? It's looking an awful lot like VS really wants to generate one d/b server round trip for each table involved.
*I guess the update stored proc would have to take table-typed parameters from both parent and child, and perform inserts/updates/deletes appropriately.

Yes, one round trip per table is the way to go.
(- It's certainly possible to use a join query to populate a datatable but VS will then be reluctant to generate update etc SQL. This may or may not be a problem, depending on what you intend to do with the dataset.)
But if you have two tables in a dataset, lets say customers - orders, then you would typically use two queries, and two trips to the db:
SELECT * FROM customers WHERE customers.customerid=#customerid
and
SELECT * FROM orders WHERE orders.customerid=#customerid
Somewhat more counter-intuitive is the situation where you want all customers and orders for one country:
SELECT * FROM customers WHERE customers.countryid=#countryid
and
SELECT orders.* FROM orders INNER JOIN customers ON customers.customerid=orders.customerid WHERE customers.countryid=#countryid
Note how the join query returns data from only one table, but uses the join to identify which rows to return.
Then, once you have the data in your dataset, you can navigate it using the getparentrow and getchildrows methods. This is how ADO.Net manages hierarchical data.
You do need this one-table-at-a-time approach, because, assuming you have foreign key constraints in your db, you need to insert and update in reverse order from delete.
EDIT Yes, this does mean that in some circumstances, depending on the data you want and the structure of your primary keys, you could end up with a humungous set of JOINS that still only pull the data from the table at the end of the hierarchy. This might seem wrong in terms of traditional SQL, but actually it's fine. The time you have lost in the multiple, more complex queries is saved by the reduced amount of data you have to pull back across the wire, compared with one big join query that would be returning multiple copies of the parent data.

Related

Database: Storing multiple Types in single table or multiple intermediate tables for Delta Tables

Using Java and Oracle.
We need to update changes in Email, UserID of employee to third party.
Actual table is Employee and intermediate table we keep which we will use for comparison of changes before sending to third party.
Following are database designs coming in mind for intermediate table:
Only Single table:
EmployeeiD|Value|Type|UpdateDate
Value is userid or email, type will be 'email' or 'userid'. Update date is kept so to figure out that which of email or userid was different and update to third party.
Multiple Table:
Employee_EmailID
EmpId|EmailID|Updatedate
Employee_UserID
EmpId|UserID|Updatedate
Java flow will be:
Pick employee from actual table.
Pick employee from above intermediate table.
Compare differences. Update difference to third party.
Update above table with updated value and last update date.
Which one is consider as best way, single table approach or multiple table or is there any standard way to implement the same? There are 10,000 Employees in system.
Intermediate table is just storing Delta records i.e Records transferred to third party so that it can be compared next day.
Good database design has separate tables for different concepts. Using the same database column to hold different types of data will lead to code which is harder to understand, prone to data corruption and less performative.
You may think it's only two tables and a few tens of thousands of rows, so does it matter? But that is only your current requirement. What you choose now will set the template for what happens when (say) you need to add telephone numbers to the process.
Now in future if we get 5 more entities to update
Do you mean "entities", like say Customers rather than Employees? Or do you really mean "attributes" as in my example of Employee Telephone Number?
Generally speaking we have a separate table for distinct entities, and all the attributes of that entity are grouped at the same cardinality. To take your example, I would expect an Employee to have one UserID and one Email Address so I would design the table like this:
Employee_audit
EmpId|UserID|EmailID|Updatedate
That is, I have one record which stores the complete state of the Employee record at the Updatedate.
If we add a new entity, Customers then we have a new table. Simple. But a new attribute like Employee Phone Number offers a choice, because an employee can have more than one: work landline, mobile, fax, home, etc. So we could represent this in three ways: a child table with a type column, multiple child tables for each type, or as distinct columns on the Employee record.
For the main Employee table I would choose the separate table (or tables, depending on whether I'm shooting for 6NF). But for an audit table I would choose one record per Employee and pivot the phone numbers like this:
Employee_audit
EmpId|UserID|EmailID|Landline|Mobile|Fax|Home|Updatedate
The one thing I would never do is have a single table with type and value columns. It seems attractive because it means we could track additional entities without any further DDL. But in fact it becomes harder to re-assemble the complete state of an Employee at any given time with each attribute we add. Also it means the auditing process itself is more complicated (because it needs to determine which attributes have changed and whether it needs to audit the change) and more expensive (because changing three attributes on the same record entails inserting three audit records).

Data-structure to store database record

I want to store employees record. I don't want to use any external libraries or framework. I am trying to build the data structure from scratch.
There will be three fields,
EmployeeName
Age
Salary
We also want to query like,
Get all the salary where EmployeeName = "Bill"
Get all the EmployeeName where salary > 2000
Get all the Salary where age='50'
I am open to use any language but not any built-in package. What is the recommended data-structure to achieve it ?
I assume that the purpose of this exercise is self-education.
If so, Where to begin reading SQLite source code? is a great place to start reading to understand how this kind of software can be built.
If you really want to roll your own, I would suggest storing your data in an array of structs/objects/dictionaries (what they are called will depend on your language), hidden behind an object so that your insert/update/delete methods on the table go through well-defined access functions. Your operations can be implemented inefficiently with grep, filter, etc depending on your language. In addition to the obvious fields, include deleted as a field. That way you can just update that to delete a record, rather than try to modify the table.
To make them more efficient, read through https://cstack.github.io/db_tutorial/parts/part7.html for how to write a b-tree. Then create a b-tree mapping EmployeeName to the list of indexes of records with that name, ditto for age and salary. Now modify the access methods to update the indexes for those fields when you modify the table. Your searches can now go through the b-tree to find the indexes of the records that you want, and then you can look in the table for them.
This is massively simplified compared to what a database gives you, but you're on your way to understanding how databases work. Both in terms of why they scale, and also why they aren't magically fast.

Does Neo4j Support the concept of views

I'm starting with Neo4J trying to migrate my current system from a Relational DB to Neo4j
and have a peculiar problem to overcome.
I have a table called Orders and has 2 particular columns that are being a pain.
ShipBy is a value for (Train/Air/Truck)
Carrier is the Id of the company carrying the order but this changes, if it ships by Air, it has something like UPS/ALASKA/CONTINENTAL; if it ships by Train, it has something like BNSF/KANSASCITYRAIL/ETC...
these values come from different catalog tables, so this was resolved in my system with something like this
Select Orders.Number, Carrier.Name from Orders, (Select 'T' Type,Id,Name from Truckers union all Select 'R' Type, Id, Name from RailCompanies union all Select 'A' Type, Id, Name from AirLines) Carriers
Where Orders.ShipBy = Carriers.Type and Orders.CarrierId=Carrier.Id
I'd appreciate any pointer on this.
Neo4J doesn't have views in the way that relational DBs have. There are several things you could do as alternates:
Continually re-issue the query that computes the "view" you need, as needed
Create a special "view node", and then link that node via relationships to all of the other nodes that would naturally occur in your "view". Querying your view then becomes as simple as pulling up that one "view node" and traversing your edges to the view results.
Option #1 is easiest, option #2 is probably faster, but comes with it the maintenance burden that as your underlying nodes in the DB change, you need to maintain your view and make sure it points to the right places.
As we can read here "In database theory, a view is the result set of a stored query on the data, which the database users can query just as they would in a persistent database collection object."
Neo4j doesn't host stored queries, but you could think to extend Neo4j Servers as posted here by Stefan: https://stackoverflow.com/a/21780942/3442366
Materialized views are of course different...
Rely on the power of relationship management offered by Neo4j ;-)
Cheers,
Lorenzo

Advantage of splitting a table

My question may seems more general. But only answer I got so far is from the SO itself. My question is, I have a table customer information. I have 47 fields in it. Some of the fields are optional. I would like to split that table into two customer_info and customer_additional_info. One of its column is storing a file in byte format. Is there any advantage by splitting the table. I saw that the JOIN will slow down the query execution. Can I have more PROs and CONs of splitting a table into two?
I don't see much advantage in splitting the table unless some of the columns are very infrequently accessed and fairly large. There's a theoretical advantage to keeping rows small as you're going to get more of them in a cached block, and you improve the efficiency of a full table scan and of the buffer cache. Based on that I'd be wary of storing this file column in the customer table if it was more than a very small size.
Other than that, I'd keep it in a single table.
I can think of only 2 arguments in favor of splitting the table:
If all the columns in Customer_Addition_info are related, you could potentially get the benefit of additional declarative data integrity that you couldn't get with a single table. For instance, lets say your addition table was CustomerAddress. Your business logic may dictate that a customer address is optional, but once you have a customer Zip code, the addressL1, City and State become required fields. You could set these columns to non null if they exist in a customerAddress table. You couldn't do that if they existed directly in the customer table.
If you were doing some Object-relational mapping and your had a customer class with many subclasses and you didn't want to use Single Table Inheritance. Sometimes STI creates problems when you have similar properties of various subclasses that require different storage layout. Being that all subclasses have to use the same table, you might have name clashes. The alternative is Class Table inheritance where you have a table for the superclass, and an addition table for each subclass. This is a similar scenario to the one you described in your question.
As for CONS, The join makes things harder and slower. You also run the risk of accidentally creating a 1 to many relationship. I.E. You create 2 addresses in the CustomerAddress table and now you don't know which one is valid.
EDIT:
Let me explain the declarative ref integrity point further.
If your business rules are such that a customer address is optional, and you embed addressL1, addressL2, City, State, and Zip in your customer table, you would need to make each of these fields Nullable. That would allow someone to insert a customer with a City but no state. You could write a table level check constraint to cover this situation. But that isn't as easy as simply setting the AddressL1, City, State and Zip columns in the CustomerAddress table not nullable. To be clear, I am NOT advocating using the multi-table approach. However you asked for Pros and Cons, and I'm just pointing out this aspect falls on the pro side of the ledger.
I second what David Aldridge said, I'd just like to add a point about the file column (presumably BLOB)...
BLOBs are stored up to approx. 4000 bytes in-line1. If a BLOB is used rarely, you can specify DISABLE STORAGE IN ROW to store it out-of-line, removing the "cache pollution" without the need to split the table.
But whatever you do, measure the effects on realistic amounts of data before you make the final decision.
1 That is, in the row itself.

Very slow search of a simple entity relationship

We use CRM 4.0 at our institution and have no plans to upgrade presently as we've spend the last year and a half customising and extending the CRM to work with our processes.
A tiny part of model is a simply hierarchy, we have a group of learning rooms that has a one-to-many relationship with another entity that describes the courses available for that learning room.
Another entity has a list of all potential and enrolled students who have expressed an interest in whichever course.
That bit's all straightforward and works pretty well and is modelled into 3 custom entities.
Now, we've got an Admin application that reads the rooms and then wants to show the courses for that room, but only where there are enrolled students.
In SQL this is simplified to:
SELECT DISTINCT r.CourseName, r.OtherInformation
FROM Rooms r
INNER JOIN Students S
ON S.CourseId = r.CourseId
WHERE r.RoomId = #RoomId
And this indeed is very close to the eventual SQL that CRM generates.
We use a Crm QueryEntity, a Filter and a LinkEntity to represent this same structure.
The problem now is that the CRM normalizes the a customize entity into a Base Table which has the standard CRM entity data that all share, and then an ExtensionBase Table which has our customisations. To Give a flattened access to this, it creates a view that merges both tables.
This view is what is used by the Generated SQL.
Now the base tables have indices but the view doesn't.
The problem we have is that all we want to do is return Courses where the inner join is satisfied, it's enough to prove there are entries and CRM makes it SELECT DISTINCT, so we only get one item back for Room.
At first this worked perfectly well, but now we have thousands of queries, it takes well over 30 seconds and of course causes a timeout in anything but SMS.
I'm given to believe that we can create and alter indices on tables in CRM and that's not considered to be an unsupported modification; but what about Views ?
I know that if we alter an entity then its views are recreated, which would of course make us redo our indices when this happens.
Is there any way to hint to CRM4.0 that we want a specific index in place ?
Another source recommends that where you get problems like this, then it's best to bring data closer together, but this isn't something I'd feel comfortable in trying to engineer into our solution.
I had considered putting a new entity in that only has RoomId, CourseId and Enrolment Count in to it, but that smacks of being incredibly hacky too; After all, an index would resolve the need to duplicate this data and have some kind of trigger that updates the data after every student operation.
Lastly, whilst I know we're stuck on CRM4 at the moment, is this the kind of thing that we could expect to have resolved in CRM2011 ? It would certainly add more weight to the upgrading this 5 year old product argument.
Since views are "dynamic" (conceptually, their contents are generated on-the-fly from the base tables every time they are used), they typically can't be indexed. However, SQL Server does support something called an "indexed view". You need to create a unique clustered index on the view, and the query analyzer should be able to use it to speed up your join.
Someone asked a similar question here and I see no conclusive answer. The cited concerns from Microsoft are Referential Integrity (a non-issue here) and Upgrade complications. You mention the unsupported option of adding the view and managing it over upgrades and entity changes. That is an option, as unsupported and hackish as it is, it should work.
FetchXml does have aggregation but the query execution plans still uses the views: here is the SQL generated from a simple select count from incident:
'select
top 5000 COUNT(*) as "rowcount"
, MAX("__AggLimitExceededFlag__") as "__AggregateLimitExceeded__" from (select top 50001 case when ROW_NUMBER() over(order by (SELECT 1)) > 50000 then 1 else 0 end as "__AggLimitExceededFlag__" from Incident as "incident0" ...
I dont see a supported solution for your problem.
If you are building an outside admin app and you are hosting CRM 4 on-premise you could go directly to the database for your query bypassing the CRM API. Not supported but would allow you to solve the problem.
I'm going to add this as a potential answer although I don't believe its a sustainable or indeed valid long-term solution.
After analysing the indexes that CRM had defined automatically, I realised that selecting more information in my query would be enough to fulfil the column requirements of an Index and now the query runs in less then a second.

Resources