I am using LINQ to access my database, and thereby gets a LINQ-created object which I want to send to the browser (this is a webservice) as a JSON-object. This works well by now, but when I add some testdata to the database (about 10-20 entries in each table) this fails miserably. The reason is that the LINQ-object contains all the referenced objects. This becomes huge pretty fast. Eg. each resourcetype contains all its resources which contains all reservationlines which contains each reservations..
Do you have any tips on how I should resolve this? Is there a setting in the serializer I can set? I use json.net for serializing the objects. Or is there some setting in LINQ?
In the best case I don't want to create new objects before I serialize them, since it is very convenient to just serialize the LINQ-objects directly :)
The best practice, at least for the moment, is to not serialize LINQ to SQL objects, or Entity Framework entities. The main reason for that is that they include implementation-dependent data from the base classes.
Instead, serialize what you want serialized. Use Data Transfer Objects matching exactly what you want to send, and copy from the LINQ to SQL objects into them before sending.
Related
I want to define one schema that will be cross teams & platform valid. This is pretty simple and can be thought of as a kind of ontology. What I need is to have the ability to define what the field represents and under it the name of the field on each platform. I'd like the schema to have the ability to generate data objects for each of the used languages, and therefore I'd like to know if my need can be filled within Protobuf or GraphQL. Notice - my conventions can be different than the trivial in my generated target language since it needs to be compatible with the databases. A simple example for my need:
{
"lastName": {
"mssqlName":"LastName",
"oracleName":"FamilyName",
"elasticName":"lastName",
"cassandraName":"last_name",
"rocksDbName":"surname",
},
"age" : {
...
}
As you can see, on some platforms I have totally different names than the others. I'd like to know what are the usual ways\ technologies to solve this problem, and if whether it will be possible with codegen-able technologies like Proto & GraphQL.
A single schema as the single point of truth for all object / message definition across databases, comms links, multiple languages and plaforms? It would be nice, wouldn't it?
The closest I can think of is XSD (XML schema), but I don't think it works when it comes to tools. For example, I know of tools that will take an XSD schema and generate you code that will serialise / deserialise objects to / from XML (e.g. Microsoft's xsd.exe). There's even some good ones.
And then there's tools that will create SQL tables from that XSD schema. But a code generator that builds classes that can access those tables isn't also building them to serialise / deserialise objects to and XML wireformat.
Basically, I've not come across a schema language that has tooling that does everything. The ASN.1 tools are very good at creating serialisation classes, but I've never found one that also targets SQL interactions. Same with XSD.
My knowledge is of course not exhaustive, and there might be something in JSON-land that works.
Minimum Pain Compromise Approach
What I have settled on in the past is to accept that I'm having to do some manual coding around changes in schema, but probably not too much. I'd define messages fully in, say, Google Protocol Buffers, and use that for object exchange between applications / languages. Where I wanted to stash objects in a database, I'd accept that for that I'd be having to have a parallel definition of the object in the table columns, but only for critical fields that I'd want to search on. The last column would be an arbitrary container, able to store the serialised object whole.
For example, if a GPB message had an integer ID field, and a string Name field, plus a bunch of other fields. My data base table would then have an ID column, a Name column, and column for storing Bytes.
That way I could serialise an object, and push it into a row's Bytes column whilst also filling in the ID and Name columns. I could quickly search for objects, because of the Name / ID column. If I then wanted access to the other fields in the object stored in the database, I'd have to retreive the record from the database and deserialise the Bytes column.
This way one is essentially taking a bet that those key columns / field names (ID, Name) won't ever be changed during development in the schema. But it's quite likely a safe bet. Generally, one can settle things like that quite easily, early on in a project, it's the rest of the schema that might be changed during development.
One small payoff is that if the reason to hunt out an object in the database is to be able to send it through a communications channel, it is already serialised in the database. No need to serialise it again before dispatch down the comms link.
So this approach can leave one with some duplication of code / points of truth, but can be quite performant in avoiding a serialisation step during parts of runtime.
You can also cheat a little. If the serialisation wireformat is text based (JSON, XML, some ASN.1 formats, etc), then there's a good chance that string searches on the bytes column will yield good results anyway. For instance, suppose a message field was MiddleName, but I'd not created that as a distinct table column in the database. I could find likely records for any given MiddleName by searching for the value in the Bytes column, as it's stored as text somewhere in there.
Reflection Based Approach?
A potential other approach is to accept that the tooling does not exist to satisfy all needs, and adapt using language features (reflection) to exploit a common feature of code generators.
For example, consider GPB's proto compiler. In the generated code you end up with classes whose members are named after the fields in messages. And it'll be more or less the same with any code generated to access a database table that has columns by the same name.
So it is possible to use reflection to make an auto-transcriber between generated classes. You iterate down the tree of members in one class, and you can match that up to a member in a different generated class.
This avoids the need for code like:
Protobuf::MyClass myObj_g; // An object built using GPB
JSON::MyClass myObj_j; // equivalent object to be copied from myObj_g;
myObj_j.Field1 = myObj_g.Field1;
myObj_j.Field2 = myObj_g.Field2;
.
.
.
Instead:
Protobuf::MyClass myObj_g; // An object built using GPB
JSON::MyClass myObj_j; // equivalent object to be copied from myObj_g;
foreach (Protobuf::MyClass::Reflection::Field field in Protobuf::MyClass.Fields)
{
myObj_j.Reflection.FindByName(field.Name) = myObj_g.Reflection.FindByName(field.Name);
}
There'd be a fit of fiddling around to do to get this to work between each database and serialisation technology, per language, but the point is you'd only ever have to write it once. Any subsequent schema changes do not require code changes, at least not so far as exchanging objects between a serialisation technology and a database access technology.
Obviously, reflection is easier / possible in some languages and not otheres.
The Fix It At Runtime Approach?
Apache Avro has the characteristic where serialised data describes it's own shape. Basically, wireformat data comes with its own schema, so a consumer can build a representation of the data automatically. In some languages that's horrid (C, C++), but libraries exist.
Basically, it forces you to write applications so that they work out what to do with data for themselves;
I was wondering if it's possible to get the generated classes like a linq-to-sql class generates classes for every table without using the actual dbml. my situation is as following:
I have an api which gets all data from database and returns them as json.
on the other hand I have an application retrieving that data but it needs to be casted to the right classes again to work with it properly. Ofcourse I could write all those classes myself with the needed properties. But it would be handy if they could be generated. The application does not need any connection to the database itself. This is why I want to protect the database by not using a dbml. with a dbml, a connection is possible. So I just need those classes to cast the json to.
Can I do this or is it impossible?
You can use T4 to generate the classes if that's what you need.
Oleg Sych has a good series of tutorials for this.
Hope it helps!
For application developers, I suppose the traditional paradigm for writing an application with domain objects that can be persisted to an underlying data store (SQL database for arguments sake), is to write the domain objects and then write (or generate) the table structure. There is a tight coupling between what the domain object looks like and what the structure of underlying data store looks like. So if you want to add a piece of information to your domain object, you add the field to your code and then add a column to the appropriate database table. All familiar?
This is all well and good for data stores that have a well defined structure (I'm mainly talking about SQL databases whereby the tables and columns are pre-defined and fixed), but now a number of alternatives to the ubiquitous SQL database exist and these often do not constrain the data in this way. For instance, MongoDB is a NoSQL database whereby you divide data into collections but aside from that there is no structuring of the data. You don't define new columns when you want to add a new field.
Now to the question: given the flexibility of a data store like MongoDB, how would one go about achieving a similar kind of flexibility in the domain objects that represent this data? So for instance if I'm using Spring and creating my own domain obejcts, when I add a "middleName" field to my data, how can I avoid having to add a "middleName" field to my domain object? I'm looking for some kind of mechanism/approach/framework to dynamically inspect the data and have access to it in my domain object without having to make a code change every time. All ideas welcome.
I think you have a couple of choices:
You can use a dynamic programming language and not have domain objects (clojure for example)
If you're fixed on using java, the mongo java driver returns data in DBObject which is essentially a Map. So the default behavior already provides what you want. It's only when you map the DBObject into domain objects, using a library like morphia (or spring-data), that you even have to worry about domain objects at all.
But, if I was using java, I would stick with the standard convention of domain objects mapped via morphia, because I think adding a field is a very minor inconvenience when compared against the benefits.
I think the question is inherintly paradoxical-
On one hand, you want to have domain objects, i.e. objects that represent the data (and behaviour) of your problem domain.
On the other hand, you say that you don't want your domain objects to be explicitly influenced by changes to the data.
But when you have objects that represent your problem domain, you want to do just that- to represent your problem domain.
So that if, for example, middle name is added, then your representation of the real-life 'User' entity should change to accomodate this change to the real-life user; perhaps not only by adding this piece of data to your object, but also adding some related behaviour (validation of middle name, or some functionality related to it).
In essense, what I'm trying to say here is that when you have (classic OO) domain objects, you may need to change your behaviour / functionality along with your data, and since you don't have any automatic way of changing your behaviour, the question of automatically changing your data becomes irrelevant.
If you don't want behaviour associated with your data, then you essentialy have DTOs, and #Kevin's answer is what you're looking for.
Honestly, it sounds more like you're looking for some kind of blackbox DTO where, like you describe, fields are added or removed "arbitrarily" depending on the data. This makes me inclined to suggest a simple Map to do the job. You can't really have a domain-driven design if your domain model is constantly changing.
I am building a DataAccess layer to a DB, what data structure is recommended to use to pass and return a collection?
I use a list of data access objects mapped to the db tables.
I'm not sure what language you're using, but in general, there are tradeoffs of simplicity vs extensibility.
If you return the DataSet directly, you have now coupled yourself to database specific classes. This leaves little room for extension - what if you allow access to files or to other types of data sources? But, it is also very simple. This is the recordset pattern and C#/VB provide a lot of built-in support for this. The GUI layer can access the recordset and easily manipulate the data. This works well for simple applications.
On the other hand, you can wrap the datasets in a custom object, and provide gateway methods (see the Gateway pattern http://martinfowler.com/eaaCatalog/gateway.html). This method is more complex, but provides a lot more extensibility. In a larger application when you need to separate the the business logic, data logic, and GUI logic, this is a more robust way to go.
For larger enterprise applications, you can look into using Object Relational Mapping tools (ORM). They help to automatically map java objects to database tables. They hide a lot of the painful SQL details. Frameworks such as Spring provide excellent support for ORMs.
I tend to use arrays of objects, so that I can disconnect the DAO from the business logic.
You can store the data in the DAO as a dataset, for example, and give them an easy way to add to the database before doing an update, so they can pass in information to do modification operations, and then when they want to commit the changes they can do it in one shot.
I prefer that the user can't add/modify the structure themselves, as it makes it harder to determine what must be changed in the database.
By initially returning an array they can then display what is in the database.
Then, as the presentation layer makes changes, the DAO can be updated by the controller. By having a loose coupling the entire system becomes more flexible, as you can change the DAO from a dataset to something else, and the rest of the application doesn't care.
There are two choices that are the most generic.
The first way to look at a ResultSet is as a List of Maps, where each Map represents a row in the ResultSet. The keys are the columns listed in the FROM clause; the values are the database values.
The second way to look at a ResultSet is as a Map of Lists, where each List represents a column in the ResultSet. The Map keys are the columns listed in the FROM clause; the values are the List of database values.
If you don't want to do full-blown ORM, these can carry you a long way.
I have a HashMap that I am serializing and deserializing to an Oracle db, in a BLOB data type field.
I want to perform a query, using this field.
Example, the application will make a new HashMap, and have some key-value pairs.
I want to query the db to see if a HashMap with this data already exists in the db.
I do not know how to do this, it seems strange if i have to go to every record in the db, deserialize it, then compare, Does SQL handle comparing BLOBs, so i could have...select * from PROCESSES where foo = ?....and foo is a BLOB type, and the ? is an instance of the new HashMap?
Thanks
Here's an article for you to read: Pounding a Nail: Old Shoe or Glass Bottle
I haven't heard much about your application's underlying architecture, but I can tell you immediately that there is never a reason why you should need to use a HashMap in this way. Its a bad technique, plain and simple.
The answer to your question is not a clever Oracle query, its a redesign of your application's architecture.
For a start, you should not serialize a HashMap to a database (more generally, you shouldn't serialize anything that you need to query against). Its much easier to create a table to represent hashmaps in your application as follows:
HashMaps
--------
MapID (pk int)
Key (pk varchar)
Value
Once you have the content of your hashmaps in your database, its trivial to query the database to see if the data already exists or produce any other kind of aggregate data:
SELECT Count(*) FROM HashMaps where MapID = ? AND Key = ?
Storing serialized objects in a database is almost always a bad idea, unless you know ahead of time that you don't need to query against them.
How are you serializing the HashMap? There are lots of ways to serialize data and an object like a HashMap. Comparing two maps, especially in serialized form, is not trivial, unless your serialization technique guarantees that two equivalent maps always serialize the same way.
One way you can get around this mess is to use XML serialization for some objects that rarely need to be queried. For example, where I work we have a log table where a certain log message is stored as an XML file in a CLOB field. This xml data represents a serialized Java object. Normally we query against other columns in the record, and only read/write the blob in single atomic steps. However once or twice it was necessary to do some deep inspection of the blob, and using XML allowed this to happen (Oracle supports querying XML in varchar2 or CLOB fields as well as native XML objects). It's a useful technique if used sparingly.
Look into dbms_crypto.hash to make a hash of your blob. Store the hash alongside the blob and it will give you something to narrow down the search to something manageable. I'm not recommending storing the hash map, but this is a general technique for searching for an exact match between blobs.
See also SQL - How do you compare a CLOB
i cannot disagree, but i'm being told to do so.
i appreciate your solution, and that's sort of what i had previously.
thanks
I haven't had the need to compare BLOBs, but it appears that it's supported through the dbms_lob package.
See dbms_lob.compare() at http://www.psoug.org/reference/dbms_lob.html
Oracle can have new data types defined with Java (or .net on windows) you could define a data type for your serialized object and define how queries work on it.
Good lack if you try this...
If you serialize your data to xml, and store the data in an xml you can then use xpaths within your sql query. (Sorry as I am more of a SqlServer person, I don’t know the details of how to do this in Oracle.)
If you EVERY need to update only part of the serialized data don’t do this.
Likewise if any of the data is pointed to by other data or points to other data don’t do this.