What are the ways in which data can be encrypted? Say for example salary column, even the admin should not be able to see the encrypted columns if possible, data should be visible only through application to users who have access which is defined in the application, changes in application (adding new functionality to encrypt/decrypt at application level) would be a last resort and minimal.
So far I have thought of 2 ways any fresh ideas or pros and cons of the ones below would be much appreciated:
1. Using Oracle TDE (transparent data encryption).
- Con : Admin can possibly grant himself rights to see the data
2. Creating a trigger to encrypt before insert and something along the lines of a pipeline to retrieve.
Oracle Database Vault is the only way to prevent a DBA from being able to access data stored in the database. That is an extra cost product, however, and it requires you to have an additional set of security admins whose job it is to grant the DBAs whatever privileges they actually need.
Barring that, you'd be looking at solutions that encrypt and decrypt the data in the application outside the database. That would involve making changes to the database structure (i.e. the salary column would be declared as a raw rather than a number). And it involves application changes to call the encryption and decryption routines. And that requires that you solve the key management problem which is generally where these sorts of solutions fail. Storing the encryption key somewhere that the application can retrieve it but somewhere that no admin can access is generally non-trivial. And then you need to ensure that the key is backed up and restored separately since the encrypted data in the database is useless without the key.
Most of the time, though, I'd tend to suggest that the right approach is to allow the DBA to see the data and audit the queries they run instead. If you see that one particular DBA is running queries for fun rather than occasionally looking at bits of data in the course of doing her job, you can take action at that point. Knowing that their queries are being audited is generally enough to keep the DBA from accessing data that she doesn't really need.
Related
I am giving a presentation about cryptography. My teacher told me to include the advantages and disadvantages of TDE encryption and especially why you should use them instead of encrypting with C# for example. I couldn't find the real advantages of database encryption instead of encryption in a program.
Oracle Transparent Data Encryption specifically protects data at rest, when written into a datafile. It would not stop a database user with select privileges from seeing the data using SQL, and it allows the data to be used in all types of SQL constructs like joins and indexes.
Encrypting data in the application rather than the DB would prevent adhoc SQL queries outside of the app from decrypting the data, and would make it impossible to use SQL (in the database or in the app) to search the data, make table joins, indexes, or do anything at all with the encrypted data outside of the hard-coded application. Application-level encryption cause could also interfere with data compression algorithms in the database or the storage media.
I'm trying to make a database table for every single username. I see that for every username, I can add more columns in it's row, but I want to attribute a full table for each one. How can I do that?
Thanks,
Eli
First let me say, what you are trying to do sounds like really, really bad database design and you should rethink your idea of creating a table per user. To get help for this you should add way more detail about the reasoning to your question to get a good answer. As far as I know there is also a maximum number of classes you can create on Parse so sooner or later you will run into problems, either performance wise or due to technical limitations of the platform.
That being said, you can use the Schema API to programmatically create/delete/update tables of your Parse app. It always requires the master key, so doing this from the client side is not recommended for security reasons. You could put this into a Cloud Code function for example and call this one from your app/admin tool to create a new table for a user on the fly or delete a table of a user.
Again, better don't do it and think about a better way to design your database, it would be out of scope here to discuss it.
Env: Oracle 11g DB with a Java based application
We are looking to encrypt data in our database, for a few sensitive columns of a table.
We would like these columns to be decrypted and visible to a set of users A.
And we DO NOT want these encrypted columns to be visible to another set of users B.
But, this user set B should be able to see the rest of the non-encrypted columns of the table.
From various articles and posts, I understand TDE does encryption and decryption transperantly and at column level, but have not been able to find clear information if the above user/role based encryption, at a column level granularity is possible or not.
Can we achieve the above using TDE?
I'm not a DBA, but from my understanding of TDE the encryption is not noticeable when viewed from any query. It only encrypts the data in the disk data file so it can't be read if dumped directly from the file.
A good DBA may have a better answer but just off the cuff, here is what I would suggest.
Have two fields for the sensitive data. One is clear (though TDE may be a good idea) and the other is obfuscated in some way. These fields may be normalized into a separate table. Don't allow access directly to the table but use a view instead. The view would be defined like:
create view TableName as
select ...,
case ROLE when 'A' then clear_field else obfuscated_field end as FieldName,
...
from SensitiveTable
join PossibleNormalizedTable on ... ;
You would also need triggers on the view. If only A can clearly see that field, probably only A can insert and update it.
This is a general design problem - I want to validate a username field for uniqueness when the user enters the value and tabs out. I do a Ajax validation and get a response from the server. This is all very standard. Now, what if I have a HUGE user database ? How to handle this situation ? I want to find if a username "foozbarz" is present among 150Million usernames ?
Database queries are out of question [EDIT] - Read the username database once and populate the cache/hash for faster lookup (to clarify Emil Vikström's point)
In memory databases wont help either
Keep an in-memory hash (or cache/memcache) to store all usernames - usernames can be easily hashed and lookup will be very fast. But there are some problems with this:
a. Size of the hash - can we optimize so that we can reduce the hash size ?
b. Hash/cache refresh frequencies (users might get added while we are validating)
Shard the username table based on some criteria (e.g.: A-B in table username_1 and so on) - thanks piotrek for this suggestion
Or, any other better approach ?
why don't you simply partition the data? if you have/plan to have 150M+ users i assume you have/will have budget for this. if you are just starting (with 2k users) do it traditional way with simple indexed search on database. when you have so many users that you observe performance issues and measure that this is because of your database (and not e.g. www server) then you simply put another database. on the first one you will have users with name from a to m and rest on the other one. you may choose other criterion, like hash, to make data be balanced. when you need more you will add more databases. but if you don't have so many users right now, i advise you not to do any premature optimizations. there are many things that may become a bottleneck with this amount of data
You are most likely right about doing some kind of hashing where you store the taken names and, obviously, not hashed means it's free.
What you shouldn't do is rely on that validation. There can be a lot of time between user pressing Register and user checking if name is free.
To be fair, you only have one issue here and that's consideration for whether you REALLY need to worry whether you will get 150 million users. Scalability is often an issue, but unless this happens over night, you can probably swap in a better solution before this happens.
Secondly, your worry about both users getting a THIS NAME IS FREE and then one taking it. First of all, the chances of that happening are pretty damn low. Secondly, the only ways I can think of ‘solving’ this in a way where user will never click OK with validated name and get a USERNAME TAKEN is to either
a) Remember what user validated last, store that, and if someone else registers that in a mean time, use AJAX to change the name field to taken and notify the user. Don't do this. A lot of wasted cycles and really too much effort to implement.
b) Lock usernames as user validates one, for a short period of time. This results in a lot of free usernames coming up as taken when they actually aren't. You probably don't want this either.
The easiest solution for this is to simply put hash things into the table as users actually click OK, but before doing that, check if the name exists again. If it does, just send the user back with USERNAME TAKEN. The chances of someone racing someone else for a name are really, really slim and I doubt anyone will make a big fuss over how your validator (which did its job, the name was free at the point of checking) ‘lied’ to the user.
Basically your only issue is how you want to store the nicknames.
Your #1 criteria is flawed because this is exactly what you have a database system for: to store and manage data. Why do you even have a table with usernames if you're not going to read it?
The first thing to do is improving the database system by adding an index, preferably a HASH index if your database system supports it. You will have a hard time writing anything near the performance of this yourself.
If this is not enough, you must start scaling your database, for example by building a clustered database or by partitioning the table into multiple sub-tables.
What I think is a fair thing to do is implement caching in front of the database, but for single names. Not all usernames will have a collision attempt, so you may cache the small subset where the collisions typically happen. A simple algorithm for checking the collision status of USER:
Check if USER exist in your cache. If it does:
Set a "last checked" timestamp for USER inside the cache
You are done and USER is a collision
Check the database for USER. If it does exist:
Add USER to the cache
If the cache is full (all X slots is used), remove the least recently used username from the cache (or the Y least recently used usernames, if you want to minimize cache pruning).
You are done and USER is a collision
If it didn't match the cache or the db, you are done and USER is NOT a collision.
You will of course still need a UNIQUE contraint in your database to avoid race conditions.
If you're going the traditional route you could use an appropriate index to improve the database lookup.
You could also try using something like ElasticSearch which has very low latency lookups on large data sets.
If you have 150M+ users, you will have to have in place some function that:
Checks that the user exists, and signals if not found
Verifies the password is correct, and signals if it is not
Retrieves the user's data
This problem you will have, and will have to solve it. In all likelihood with something akin to a user's query. Even if you heavily rely on sessions, still you will have the problem of "finding session X among many from a 150M+ pool", which is structurally identical to "finding user X among many from a 150M+ pool".
Once you solve the bigger problem, the problem you now have is just its step #1.
So I'd check out a scalable database solution (possibly a NoSQL one), and implement the "availability check" using that.
You might end with a
retrieveUserData(user, password = None)
which returns the user info if user and password are valid and correct. For the availability check, you would send no password, and expect an UserNotFound exception if the username is available.
We have a need coming up in an application where the following is true:
A web page uses AJAX to request data from a server.
The specification of the data (e. g. table name) requested from the server will not be known until run-time.
The configuration of the data view is itself data-driven, and configurable by an administrator.
Data updates and inserts must be supported, not just views.
Prototyping this was very easy - we could pass in the appropriate information (table name, changeset, whatever) to a generic data service that just did what it was told (using JSON as the data storage mechanism). The data service could do basic validation on the parameters to ensure the current user can perform the requested operation (read the data, insert a row, read the row).
The issue we have now that we are looking to doing this is a secure production manner, and the idea of passing table names and column names is frightening. Everything we think of to deal with this devolves into trusting the client in some significant way, or seems to involve substantial bookkeeping on the server. For example:
User requests a viewing page.
The server notes the table name and saves it server side with a request ID
The server notes the column names and saves them, replacing them with "col1, col2", etc., and stores the mapping with the request ID data.
The client page sends the request ID to the service, which looks up the server storage by ID
The service returns col1, col2, etc.
This would work, we think, but feels very messy.
Does anyone have experience with this kind of problem and can offer a solution?
Do you need to give them access to raw tables?
Perhaps you can go meta, and make a meta-table that stores the tabular data in a secure manner (ie, only the system knows the table/schema, but the user's concept of schema/table are just abstractions that all map back to the same schema/table)...
Again, more information is needed as to what can be abstracted. Allowing DDL operations by the end-users is asking for trouble, as you rightfully assessed, and I would just abstract that so that "DDL" becomes DML.
However, mapping actual SQL that is written against this data would be much more difficult to abstract, if that is a requirement.
If I had to expose back-end information to end customers, I'd probably hide the actual physical representation using meta-data that would remap table names and columns to more user-friendly text, that would also enable me to provide views on the tables that are a bit more advanced than plain table / column names... As properly modeling associations between tables and so on...