Android room insert/update result - android-room

I've been using room for a while. I'm from a mysql background where you have to check the values of queries and stuff. In room, I find this a bit complicated because so far I can either declare the dao insert query as void or as long returning the rowId
If I return a long, I have to write a listener to notify the UI of success/failure
My question is, is this necessary? Do I need the return value of inserts/updates/deletes or are these queries guaranteed to succeed?

My question is, is this necessary?
This depends and is hopefully better explained (a least a little anyway) below.
Do I need the return value of inserts/updates/deletes or are these
queries guaranteed to succeed?
There is no guarantee that they will succeed. However, you may be able to assume they have or use CONFLICT handling.
Much could depend upon how the Entities are coded, for example say you had a simple table (Entity) with an id and a name and for simplicity that you have autogenerate = true and you never allow the id to be specified when inserting. Unless the database is massive (beyond storage device capacity) or tweaked. A unique id will always result.
If the name needs to be UNIQUE then you are introducing a facet that makes it more likely that the insert will not succeed. If you had the onConflictStrategy of the insert as IGNORE, then a duplicate wouldn't fail but you'd may want to know if nothing was inserted (-1 returned).
This is just one facet. The answer is really that you need to consider the design of the database and of the app itself. Personally I'd always go with informing the user at least of the abnormal/unexpected which probably then means yes it is necessary (typically it is easier to supress code than add new code).

Related

Database efficiency: references/pointers from table to table

I am working on learning databases and am unsure about something that doesn't seem to make any sense to me. In the relational model you are able to combine through references but always require a global sort of key in each table to be able to combine this information. That is obviously required in most cases, but I feel like in a perfect tree hierarchy set up of a database this is inefficient.
To explain this better I shall use the example of storing products in a database. Products have main categories and sub categories and these are very clear. (ie. Milk is a subcategory of Dairy which is a subcategory of Food, etc.)
I thought in cases like this the ability to store single or a list of references/pointers to tables in fields would take away a lot of search querying and storage requirements.
Here is a link to a simple pain layout I made to illustrate this:
Image (the table entry could have some command character like '|' after which it knows the following entry is a file directory so when the database initiates it knows to make a pointer there)
Since I am only learning to work with databases now I understand that I may just be missing some knowledge on the subject, but I don't seem to find anything when I try googling this problem. Any help explaining where to start or any confirmation that this may improve efficiency and where I could learn how to write this myself would be great.
The concept of "pointer" is useful only if the object you want to point to has a well-defined address that is at least as permanent as the pointer itself. If the address is less permanent, you could end up with a "dangling" pointer.
A row in the database does not necessarily have a permanent address.1 By referencing the row through a logical value (instead of the physical address), the reference stays valid even when the row physically moves.2 And to ensure that the value identifies exactly one row, it must be unique.3
As for storing the list of values (be it "pointers" or anything else) inside a single field, this violates the principle of atomicity and therefore the 1NF. There are very good reasons to avoid violating the 1NF, including the ability to maintain the referential integrity and utilize indexing. That being said, there are DBMSes that support arrays or even sub-tables within a single field, which may be useful on rare occasions.
1 For example, Oracle ROWID is constant as long as the row is not physically moved on disk, but that can happen in many situations that are part of the normal database operation. So aside from putting severe restrictions on how your database is used, you couldn't rely on the ROWID staying constant over the lifetime of the rows that reference it (which could be as long as the lifetime of the database itself).
2 I suppose it would be theoretically possible for a DBMS to keep track of all the pointers and update them when the row physically moves. However, I'm not aware of any DBMS that actually supports such "updatable" pointers in practice, probably because the underlying mechanism needed for that wouldn't be any more efficient than the standard "value-based" referencing.
3 And must obviously be non-NULL. Saying that the attribute (or combination thereof) is "non-NULL and unique", is synonymous to saying it's a "key". Ideally, the key should also be immutable (so there is no need for a cascading referential action such as ON UPDATE CASCADE).

Cost of time-stamping as a method of concurrency control with Entity Framework

In concurrency, in optimistic concurrency the way to control the concurrency is using a timestamp field. However, in my particular case, not all the fields need to be controlled in respect to concurrency.
For example, I have a products table, holding the amount of stock. This table has fields like description, code... etc. For me, it is not a problem that one user modifies these fields, but I have to control if some other user changes the stock.
So if I use a timestamp and one user changes the description and another changes the amount of stock, the second user will get an exception.
However, if I use the field stock instead of concurrency exception, then the first user can update the information and the second can update the stock without problems.
Is it a good solution to use the stock field to control concucrrency or is it better to always use a timestamp field?
And if in the future I need to add a new important field, then I need to use two fields to control concurrency for stock and the new one? Does it have a high cost in terms of performance?
Consider the definition of optimistic concurrency:
In the field of relational database management systems, optimistic concurrency control (OCC) is a concurrency control method that assumes that multiple transactions can complete without affecting each other, and that therefore transactions can proceed without locking the data resources that they affect. (Wikipedia)
Clearly this definition is abstract and leaves a lot of room for your specific implementation.
Let me give you an example. A few years back I evaluated the same thing with a bunch of colleagues and we realized that in our application, on some of the tables, it was okay for the concurrency to simply be based on the fields the user was updating.
So, in other words, as long as the fields they were updating hadn't changed since they gathered the row, we'd let them update the row because the rest of the fields really didn't matter and and row was going to get refreshed on udpate anyway so they would get the most recent changes by other users.
So, in short, I would say what you're doing is just fine and there aren't really any hard and fast rules. It really depends on what you need. If you need it to be more flexible, like what you're talking about, then make it more flexible -- simple.

Why no primary key

I have inherited a datababase with tables that lack primary keys. It's an OLTP database. One of the tables in question has ~300k records, and has no primary key implemented, even though examining the rest of the schema tells me one column is used AS a primary key, ie being replicated in another table, with identical name, etc. ie. This is not an 'end of line' table
This database also does not implement FKs.
My question is - is there ANY valid reason for a table (in Oracle for that matter) NOT to have a primary key?
I think PK is mandatory for almost all cases. Lots of reasons will exist but I'll treat some of them.
prevent to insert duplicate rows
rows will be referenced, so it must have a key for it
I saw very few cases make tables without PK (e.g. table for logs).
Not specific to Oracle but I recall reading about one such use-case where mysql was highly customized for a dam (electricity generation) project, I think. The input data from sensors were in the order 100-1000 per second or something. They were using timestamps for each record so didn't need a primary key (like with logs/logging mentioned in another answer here).
So good reasons would be:
Overhead, in the case of high frequency transactions
Necessity or Un-necessity in that case
"Uniqueness" maintained or inferred by application, not by db
In a normalized table, if every record needs to be unique and every field is referenced in other tables, then having a PK additionally adds an index overhead and if the PK would never actually be used in any SQL query (imho, I disagree with this but it's possible). But it should still have a unique index encompassing all the fields.
Bad reasons are infinite :-)
The most frequent bad reason which is actually responsible for the lack of a primary key is when DBs are designed by application/code-developers with little or no DB experience, who want to (or think they should) handle all data constraints in the application.
Any valid reason? I'd say "No"--I'm a database guy--but there are places that insist on using the database as a dumb data store. They usually implement all integrity "constraints" in application code.
Putting integrity constraints into application code isn't usually done to improve performance. In fact, if you built one database that enforces all the known constraints, and you built another with functionally identical constraints only in application code, the first one would almost certainly run rings around the second one.
Instead, application-level constraints usually hope to increase flexibility. (And, in the process, some of the known constraints are usually dropped, which appears to improve performance.) If it becomes inconvenient to enforce certain constraints in order to bulk load some scruffy data, an application programmer can just side-step the application-level constraints for a little while, then clean up the data when it's more convenient.
I'm not a db expert but I remember a conversation with a friend who worked in the Oracle apps dept. who told me that this was done to handle emergencies. If there was a problem in some report being generated which you could fix by putting in a row, db level constraints often stand in your way. They generally implemented things like unique primary keys in the application rather than the database. It was inefficient but enough and for them and much more manageable in case of a disaster recovery scenario.
You need a primary key to enforce uniqueness for a subset of its columns (useful if you need to refer to individual rows). It also speeds up certain queries because of the index associated to it.
If you do not need that index, or that uniqueness constraint, then you may not need a primary key (the index does not come free).
An example that comes to mind are logging tables, that just record some data (that is never updated or queried for individual records).
There is a small overhead when inserting to a table with an index and you need an index if you have a primary key. Downside of course is that finding a row is very costly.

Database design: Same table structure but different table

My latest project deals with a lot of "staging" data.
Like when a customer registers, the data is stored in "customer_temp" table, and when he is verified, the data is moved to "customer" table.
Before I start shooting e-mails, go on a rampage on how I think this is wrong and you should just put a flag on the row, there is always a chance that I'm the idiot.
Can anybody explain to me why this is desirable?
Creating 2 tables with the same structure, populating a table (table 1), then moving the whole row to a different table (table 2) when certain events occur.
I can understand if table 2 will store archival, non seldom used data.
But I can't understand if table 2 stores live data that can changes constantly.
To recap:
Can anyone explain how wrong (or right) this seemingly counter-productive approach is?
If there is a significant difference between a "customer" and a "potential customer" in the business logic, separating them out in the database can make sense (you don't need to always remember to query by the flag, for example). In particular if the data stored for the two may diverge in the future.
It makes reporting somewhat easier and reduces the chances of treating both types of entities as the same one.
As you say, however, this does look redundant and would probably not be the way most people design the database.
There seems to be several explanations about why would you want "customer_temp".
As you noted would be for archival purposes. To allow analyzing data but in that case the historical data should be aggregated according to some interesting query. However it using live data does not sound plausible
As oded noted, there could be a certain business logic that differentiates between customer and potential customer.
Or it could be a security feature which requires logging all attempts to register a customer in addition to storing approved customers.
Any time I see a permenant table names "customer_temp" I see a red flag. This typically means that someone was working through a problem as they were going along and didn't think ahead about it.
As for the structure you describe there are some advantages. For example the tables could be indexed differently or placed on different File locations for performance.
But typically these advantages aren't worth the cost cost of keeping the structures in synch for changes (adding a column to different tables searching for two sets of dependencies etc. )
If you really need them to be treated differently then its better to handle that by adding a layer of abstraction with a view rather than creating two separate models.
I would have used a single table design, as you suggest. But I only know what you posted about the case. Before deciding that the designer was an idiot, I would want to know what other consequences, intended or unintended, may have followed from the two table design.
For, example, it may reduce contention between processes that are storing new potential customers and processes accessing the existing customer base. Or it may permit certain columns to be constrained to be not null in the customer table that are permitted to be null in the potential customer table. Or it may permit write access to the customer table to be tightly controlled, and unavailable to operations that originate from the web.
Or the original designer may simply not have seen the benefits you and I see in a single table design.

Thread-safe unique entity instance in Core Data

I have a Message entity that has a messageID property. I'd like to ensure that there's only ever one instance of a Message entity with a given messageID. In SQL, I'd just add a unique constraint to the messageID column, but I don't know how to do this with Core Data. I don't believe it can be done in the data model itself, so how do you go about it?
My initial thought is to use a validation method to do a fetch on the NSManagedObject's context for the ID, see if it finds anything but itself, and if so, fails the validation. I suspect this will work - but I'm worried about the performance of something like that. I went through a lot of effort to minimize the fetch requests needed for the entire import routine, and having it validate by performing a fetch for every single new message entity seems a bit excessive. I can get all pre-existing objects I need and identify all the new objects I need to insert into the store using just two fetch queries before I do the actual work of importing and connecting everything together. This would add a fetch to every single update or insert in addition to those two - which would seem to eliminate any performance advantage I had by pre-processing the import data in the first place!
The main reason this is an issue is that the importer can (potentially) run several batches concurrently on several threads and may include some overlapping/duplicate data that needs to ultimately result in just one object in the store and not duplicate entries. Is there a reasonable way to do this and does what I'm asking for make sense for Core Data?
The only way to guarantee uniqueness is to do a fetch. Fortunately you can just do a -countForFetchRequest:error: and check to see if it is zero or not. That is the least expensive way to guarantee uniqueness at this time.
You can probably accomplish this in the validation or run it in the loop that is processing the data. Personally I would do it above the creation of the NSManagedObject so that you do not have the unnecessary allocs when a record already exists.
I don't think there is a way to easily guarantee an attribute is unique without doing a lot of work on your own. You can, of course use CFUUIDCreate to create a globally unique UUID, which should be unique, even in a multithreaded environment. But...
The objectID (type NSManagedObjectID) of all managed objects is guaranteed to be unique within the persistent store coordinator. Since you can add arbitrarily many persistent stores to the coordinator, this guarantee basically guarantees that the objectIDs are globally unique. Why don't you use the objectID as your messageID? You can't, of course, change the objectID once it's assigned (and it won't get assigned until the context containing the inserted object is saved; until then it will be a temporary but still unique ID).
So you have a NSManagedContext for each thread, backed by the same persistent store, is that correct? And before you save the NSManagedContext, you'd like to make sure the messageID is unique, that is, that you are not updating an existing row, and that it is not in one of the other contexts, correct?
Given that model (correct me if I misunderstand), I think you'd be better served having one object that manages access to the persistent store. That way, all threads would update one context and you can do your validation in there, using Marcus's -countForFetchRequest:error: suggestion. Granted, that places a bottleneck on this operation.
Just to add my 2 cents: I think inconsistencies will occur sooner or later anyway, and the only way to mitigate them seems to be to do it on an application-level with rather complex code.
So in my case I decided to allow duplicate values for what are supposed to be "unique" fields.
I added code, however, that detects these problems later (e.g. when a fetch that should return 1 object returns more than 1) and fixes them when they occur (usually by deleting).
It's a "go ahead, make a mistake, ill fix it later for you"-strategy.
This is not ideal, of course, but a valid way to attack this problen, imho.

Resources