We have a need coming up in an application where the following is true:
A web page uses AJAX to request data from a server.
The specification of the data (e. g. table name) requested from the server will not be known until run-time.
The configuration of the data view is itself data-driven, and configurable by an administrator.
Data updates and inserts must be supported, not just views.
Prototyping this was very easy - we could pass in the appropriate information (table name, changeset, whatever) to a generic data service that just did what it was told (using JSON as the data storage mechanism). The data service could do basic validation on the parameters to ensure the current user can perform the requested operation (read the data, insert a row, read the row).
The issue we have now that we are looking to doing this is a secure production manner, and the idea of passing table names and column names is frightening. Everything we think of to deal with this devolves into trusting the client in some significant way, or seems to involve substantial bookkeeping on the server. For example:
User requests a viewing page.
The server notes the table name and saves it server side with a request ID
The server notes the column names and saves them, replacing them with "col1, col2", etc., and stores the mapping with the request ID data.
The client page sends the request ID to the service, which looks up the server storage by ID
The service returns col1, col2, etc.
This would work, we think, but feels very messy.
Does anyone have experience with this kind of problem and can offer a solution?
Do you need to give them access to raw tables?
Perhaps you can go meta, and make a meta-table that stores the tabular data in a secure manner (ie, only the system knows the table/schema, but the user's concept of schema/table are just abstractions that all map back to the same schema/table)...
Again, more information is needed as to what can be abstracted. Allowing DDL operations by the end-users is asking for trouble, as you rightfully assessed, and I would just abstract that so that "DDL" becomes DML.
However, mapping actual SQL that is written against this data would be much more difficult to abstract, if that is a requirement.
If I had to expose back-end information to end customers, I'd probably hide the actual physical representation using meta-data that would remap table names and columns to more user-friendly text, that would also enable me to provide views on the tables that are a bit more advanced than plain table / column names... As properly modeling associations between tables and so on...
Related
We have a database that manages codes, such as a list of valid currencies, a list of country codes, etc (hereinafter known as CodesDB).
We also have multiple microservices that in a monolithic app + database would have foreign key constraints to rows in tables in the CodesDB.
When a microservice receives a request to modify data, what are my options for ensuring the codes passed in the request are valid?
I am currently leaning towards having the CodesDB microservice post an event onto a service bus announcing when a code is added or modified - and then each other microservice interested in that type of code (country / currency / etc) can then issue an API request to the CodeDB microservice to grab the state it needs and reflect the changes in its own local DB. That way we get referential integrity within each microservice DB.
Is this the correct approach? Are there any other recommended approaches?
Asynchronous event based notification is a pattern commonly used in micro services world for ensuring eventual consistency. Depending on how strict your consistency requirement are you may have to ensure additional checks.
Another possible approach could be to use
Read only data stores using materialized view. This is a form of CQRS pattern where data from multiple services is stored in a de-normalized form in read only data store. The data gets updated asynchronously using the approach mentioned above. The consumers gets fast access to data without having to query multiple services
Caching - You could also possibly use distributed or replicated depending on your performance or consistency requirements.
What are the ways in which data can be encrypted? Say for example salary column, even the admin should not be able to see the encrypted columns if possible, data should be visible only through application to users who have access which is defined in the application, changes in application (adding new functionality to encrypt/decrypt at application level) would be a last resort and minimal.
So far I have thought of 2 ways any fresh ideas or pros and cons of the ones below would be much appreciated:
1. Using Oracle TDE (transparent data encryption).
- Con : Admin can possibly grant himself rights to see the data
2. Creating a trigger to encrypt before insert and something along the lines of a pipeline to retrieve.
Oracle Database Vault is the only way to prevent a DBA from being able to access data stored in the database. That is an extra cost product, however, and it requires you to have an additional set of security admins whose job it is to grant the DBAs whatever privileges they actually need.
Barring that, you'd be looking at solutions that encrypt and decrypt the data in the application outside the database. That would involve making changes to the database structure (i.e. the salary column would be declared as a raw rather than a number). And it involves application changes to call the encryption and decryption routines. And that requires that you solve the key management problem which is generally where these sorts of solutions fail. Storing the encryption key somewhere that the application can retrieve it but somewhere that no admin can access is generally non-trivial. And then you need to ensure that the key is backed up and restored separately since the encrypted data in the database is useless without the key.
Most of the time, though, I'd tend to suggest that the right approach is to allow the DBA to see the data and audit the queries they run instead. If you see that one particular DBA is running queries for fun rather than occasionally looking at bits of data in the course of doing her job, you can take action at that point. Knowing that their queries are being audited is generally enough to keep the DBA from accessing data that she doesn't really need.
So I was thinking... Imagine you have to write a program that would represent a schedule of a whole college.
That schedule has several dimensions (e.g.):
time
location
indivitual(s) attending it
lecturer(s)
subject
You would have to be able to display the schedule from several standpoints:
everything held in one location in certain timeframe
everything attended by individual in certain timeframe
everything lecturered by a certain lecturer in certain timeframe
etc.
How would you save such data, and yet keep the ability to view it from different angles?
Only way I could think of was to save it in every form you might need it:
E.g. you have folder "students" and in it each student has a file and it contains when and why and where he has to be. However, you also have a folder "locations" and each location has a file which contains who and why and when has to be there. The more angles you have, the more size-per-info ratio increases.
But that seems highly inefficinet, spacewise.
Is there any other way?
My knowledge of Javascript is 0, but I wonder if such things would be possible with it, even in this space inefficient form.
If not that, I wonder if it would work in any other standard (C++, C#, Java, etc.) language, primarily in Java...
EDIT: Could this be done by using MySQL database?
Basically, you are trying to first store data and then present it under different views.
SQL databases were made exactly for that: from one side you build a schema and instantiate it in a database to store your data (the language is called Data Definition Language, DDL), then you make requests on it with the query language (SQL), what you call "views". There are even "views" objects in SQL databases to build these views Inside the database (rather than having to the code of the request in the user code).
MySQL can do that for sure, note that it is possible to compile some SQL engine for Javascript (SQLite for example) and use local web store to store the data.
There is another aspect to your question: optimization of the queries. While SQL can do most of the request job for your views. It is sometimes preferred to create actual copies of the requests results in so called "datamarts" (this is called de-normalizing a request), so that the hard work of selecting or computing aggregate/groups functions and so on is done once per period of time (imagine that a specific view changes only on Monday), then requesters just have to read these results. It is important in this case to separate at least semantically what is primary data from what is secondary data (and for performance/user rights reasons, physical separation is often a good idea).
Note that as you cited MySQL, I wrote about SQL but mostly any database technology could do that what you searched to do (hierarchical, object oriented, XML...) as long as the particular implementation that you use is flexible enough for your data and requests.
So in short:
I would use a SQL database to store the data
make appropriate views / requests
if I need huge request performance, make appropriate de-normalized data available
the language is not important there, any will do
My Database Schema :
table : Terminology (ID (PK), Name, Comments)
table : Content (ID (PK), TerminologyID (FK), Data, LangaugeID)
1 - many relationship between Terminology and Content. One Terminology can have any number of content based on different language ID.
Terminology and Contents table may have millions of records.
Now, even thought I fetch some hundreds of record (pagination) from my client side using WCF data Service, after 5-6 attempts, I get time out exception.
_DataService.Terminologies.Expand("Contents").Skip(index1).Take(count).ToList();
If I don't expand my Contents, query works fine :), but I will not have Content Data.
What is the best way to handle this scenario.
Options...
Is there any performance improvement, if I use Include in ServerSide (I mean, writing Custom webget method) over Exapnd in Client Side.
Creating database Views and accessing it over client side.
Creating Stored Procedure, where I can pass my preferred LanguageID and call it from client side.
Is this ADO.NET DataServices build by default wizard?
In any case if your client can access database directly, it will be a lot faster, so if direct db option is available then take it.
If WCF is the only option, then you will have to create your own implementation of Paging Web Service, perhaps even with store procedure that returns multiple recordsets.
On a side note I do not see LanguageId in your service query, and that could slow things down a lot.
_DataService.Terminologies.Expand("Contents").Skip(index1).Take(count).ToList();
I am working on an MVC3 and Razor website. The user has to select their way through a few choices before finally working on the data.
For example:
Client List -> Version List (Filtered by client) -> Etc (Filtered by version)
Once a user selects a client, they select a version for the client. So I'm passing the client id on the querystring. For each mode of the controller of version I'm passing around the client id. On views that I want to show the client name, I'm querying the database for the client and stuffing it into the ViewBag. This seems very inefficient. I feel like I could use a cookie to hold the client id & name.
Now that I've got my version controller done, I'm facing the same pattern again with each subsequent controller, but now I need to persist both client and version...
What is a preferred approach for persisting information like this across requests?
This seems very inefficient
That's what database are made and optimized for => query data based on fields and if you put indexes on those fields it will be screamingly fast. Of course Session, Cookies, Cache are some common techniques that you could employ to limit the number of queries to the database but you will have to assume the possible staleness of data that you are getting this way (if some other thread/process modified the data in the database you no longer get correct results).
So before doing any premature optimizations here's what I would recommend you: hammer your database until you discover that this is actually a bottleneck for your application. Databases might become bottleneck in some very high traffic applications where you should resort to one of the afforementioned techniques (or in some poorly written applications of course but let's exclude this possibility for the moment).
You should use TempData, which allows you to pass data between the current and next HTTP requests. Be sure to keep in mind that it uses the session.
Greg Shackles has a great article all about TempData here
see this similar question MVC3 multi step form - How to persist model object