Does Hyperledger remove asset actually remove anything? - immutability

I'm trying Hyperledger Composer and I'm just wondering what happen when we remove an asset. Is it possible to remove / delete anything from the blockchain?
Or we simply mark an asset as removed, but actually all the transaction records of that asset still exist in the blockchain?
When I removed an asset I still see the block number increasing. So I have a feeling that maybe the asset is not removed (as in deleted from existence) but just marked that the current state is removed.
I have tried to create an asset with the same ID and it works though. I can delete and recreate as many times I want yet block number always increases.
Following the above, is restarting the entire Hyperledger Network (e.g. reloading Docker image from all computers in the network) the only true way of deleting the blockchain from existence?
Thank you in advance.

It sounds like you've got it right. No, data on a blockchain won't ever be deleted. A deletion is just another transaction saying certain data is deleted, so that the world state database (the DB with the non-deleted info) can remove that data.
Since a blockchain is a Merkle Tree in the background (or maybe a Hashgraph...), it plays by those rules and is immutable. Data will always be there unless the ledger and transactions are removed from the machines, such as restarting the network and removing all the information from the peers. That's basically a wipe of every machine that was used for the network structure holding the ledger. For Bitcoin that's everyone, for permissioned blockchains that may only be a few machines and could reasonably happen.
However, that's in theory, and it gets a little complicated with different implementations of a blockchain. It sounds like you're using Hyperledger Fabric, so let's take that as an example. If you're upgrading the Business Network Definition for your network dynamically and your asset definitions changed and won't support existing assets in the registry, are they actually deleted? I'm not sure, but I know they won't show up in a query, which might be effectively the same. Similarly, if you set an ACL rule or use encryption, then a marked asset might as well be deleted, since there will be many barriers (see the docs on Security and Access control) for a random participant to view that data. So depending on how sensitive your data is, it may not really matter.

Related

Making sure you don't add same person data twice using EventSourcing

I am wondering how you make sure you are not adding the same person twice in your EventStore?
lets say that on you application you add person data but you want to make sure that the same person name and birthday is not added twice in different streams.
Do you ask you ReadModels or do you do it within your Evenstore?
I am wondering how you make sure you are not adding the same person twice in your EventStore?
The generalized form of the problem that you are trying to solve is set validation.
Step #1 is to push back really hard on the requirement to ensure that the data is always unique - if it doesn't have to be unique always, then you can use a detect and correct approach. See Memories, Guesses, and Apologies by Pat Helland. Roughly translated, you do the best you can with the information you have, and back up if it turns out you have to revert an error.
If a uniqueness violation would expose you to unacceptable risk (for instance, getting sued to bankruptcy because the duplication violated government mandated privacy requirements), then you have to work.
To validate set uniqueness you need to lock the entire set; this lock could be pessimistic or optimistic in implementation. That's relatively straight forward when the entire set is stored in one place (which is to say, under a single lock), but something of a nightmare when the set is distributed (aka multiple databases).
If your set is an aggregate (meaning that the members of the set are being treated as a single whole for purposes of update), then the mechanics of DDD are straightforward. Load the set into memory from the "repository", make changes to the set, persist the changes.
This design is fine with event sourcing where each aggregate has a single stream -- you guard against races by locking "the" stream.
Most people don't want this design, because the members of the set are big, and for most data you need only a tiny slice of that data, so loading/storing the entire set in working memory is wasteful.
So what they do instead is move the responsibility for maintaining the uniqueness property from the domain model to the storage. RDBMS solutions are really good at sets. You define the constraint that maintains the property, and the database ensures that no writes which violate the constraint are permitted.
If your event store is a relational database, you can do the same thing -- the event stream and the table maintaining your set invariant are updated together within the same transaction.
If your event store isn't a relational database? Well, again, you have to look at money -- if the risk is high enough, then you have to discard plumbing that doesn't let you solve the problem with plumbing that does.
In some cases, there is another approach: encoding the information that needs to be unique into the stream identifier. The stream comes to represent "All users named Bob", and then your domain model can make sure that the Bob stream contains at most one active user at a time.
Then you start needing to think about whether the name Bob is stable, and which trade-offs you are willing to make when an unstable name changes.
Names of people is a particularly miserable problem, because none of the things we believe about names are true. So you get all of the usual problems with uniqueness, dialed up to eleven.
If you are going to validate this kind of thing then it should be done in the aggregate itself IMO, and you'd have to use use read models for that like you say. But you end up infrastructure code/dependencies being sent into your aggregates/passed into your methods.
In this case I'd suggest creating a read model of Person.Id, Person.Name, Person.Birthday and then instead of creating a Person directly, create some service which uses the read model table to look up whether or not a row exists and either give you that aggregate back or create a new one and give that back. Then you won't need to validate at all, so long as all Person-creation is done via this service.

Using Core Data as cache

I am using Core Data for its storage features. At some point I make external API calls that require me to update the local object graph. My current (dumb) plan is to clear out all instances of old NSManagedObjects (regardless if they have been updated) and replace them with their new equivalents -- a trump merge policy of sorts.
I feel like there is a better way to do this. I have unique identifiers from the server, so I should be able to match them to my objects in the store. Is there a way to do this without manually fetching objects from the context by their identifiers and resetting each property? Is there a way for me to just create a completely new context, regenerate the object graph, and just give it to Core Data to merge based on their unique identifiers?
Your strategy of matching, based on the server's unique IDs, is a good approach. Hopefully you can get your server to deliver only the objects that have changed since the time of your last update (which you will keep track of, and provide in the server call).
In order to update the Core Data objects, though, you will have to fetch them, instantiate the NSManagedObjects, make the changes, and save them. You can do this all in a background thread (child context, performBlock:), but you'll still have to round-trip your objects into memory and back to store. Doing it in a child context and its own thread will keep your UI snappy, but you'll still have to do the processing.
Another idea: In the last day or so I've been reading about AFIncrementalStore, an NSIncrementalStore implementation which uses AFNetworking to provide Core Data properties on demand, caching locally. I haven't built anything with it yet but it looks pretty slick. It sounds like your project might be a good use of this library. Code is on GitHub: https://github.com/AFNetworking/AFIncrementalStore.

How to keep your distributed cache clean?

In a N-Tier architecture, what would be the best patterns to use so that you can keep your cache clean?
I know it's easy to just set an absolute/sliding timeout, but is there a better mechanism available to allow you to mark your cache as dirty after you update the underlying persistence.
The difficulty I"m trying to wrap my head around is that Cache are usually stored as KVP. But a query is usually a fair bit more complex than that. So how can the gateway service tell the cache store that for such and such query, it needs to refetch from persistence.
I also can't afford to hand-code the cache update per query. I'm looking for a more systematic approach.
Is this just a pipe dream, or is there some way to do this elegantly?
Link/Guide/Post appreciated.
I have worked with AppFabric and I think tried to do what you are asking about. I was working on an auction site and I wanted to pro-actively invalidate items in the cache.
For example, we had listings (things for sale) and they would be present all over the cache (AppFabric). The data that represented a listing was in 10 different places. What I initially wanted was a way to say, "Ok, my listing has changed. Let me go find everywhere it exists in cache, and then update." (I think you say "mark as dirty" in your question)
I found doing this was incredibly difficult. There are tags in AppFabric that I tried to use, so I would mark a given object (or collection of objects) with a tag and that would let me query the cache and remove items. In other words, if an object had a LISTING tag, I would find it and invalidate it.
Eventually I settled on a two-pronged attack.
For 95% of the data I let it expire. It was a happy day when I decided this because everything got much easier to develop. I had to make some concessions in the UI etc., but it was well worth it.
For the last 5% of the data I resolved to only ever store it once. For example, a bid on a listing. Whenever a new bid came in, we'd pro-actively invalidate that object, and then everything that needed that information would be updated as well.

What should be stored in cache for web app?

I realize that this might be a vague question the bequests a vague answer, but I'm in need of some real world examples, thoughts, &/or best practices for caching data for a web app. All of the examples I've read are more technical in nature (how to add or remove cache data from the respective cache store), but I've not been able to find a higher level strategy for caching.
For example, my web app has an inbox/mail feature for each user. What I've been doing to date is storing typical session data in the cache. In this example, when the user logs in I go to the database and retrieve the user's mail messages and store them in cache. I'm beginning to wonder if I should just maintain a copy of all users' messages in the cache, all the time, and just retrieve them from cache when needed, instead of loading from the database upon login. I have a bunch of other data that's loaded on login (product catalogs and related entities) and login is starting to slow down.
So I guess my question to the community, is what would you do/recommend as an approach in this scenario?
Thanks.
This might be better suited to https://softwareengineering.stackexchange.com/, but generally you want to cache:
Metadata/configuration data that does not change frequently. E.g. country/state lists, external resource addresses, logic/branching settings, product/price/tax definitions, etc.
Data that is costly to retrieve or generate and that does not need to frequently change. E.g. historical data sets for reports.
Data that is unique to the current user's session.
The last item above is where you need to be careful as you can drastically increase your app's memory usage, by adding a few megabytes to the data for every active session. It also implies different levels of caching -- application wide, user session, etc.
Generally you should NOT cache data that is under active change.
In larger systems you also need to think about where the cache(s) will sit. Is it possible to have one central cache server, or is it good enough for each server/process to handle its own caching?
Also: you should have some method to quickly reset/invalidate the cached data. For a smaller or less mission-critical app, this could be as simple as restarting the web server. For the large system that I work on, we use a 12 hour absolute expiration window for most cached data, but we have a way of forcing immediate expiration if we need it.
This is a really broad question, and the answer depends heavily on the specific application/system you are building. I don't know enough about your specific scenario to say if you should cache all the users' messages, but instinctively it seems like a bad idea since you would seem to be effectively caching your entire data set. This could lead to problems if new messages come in or get deleted. Would you then update them in the cache? Would that not simply duplicate the backing store?
Caching is only a performance optimization technique, and as with any optimization, measure first before making substantial changes, to avoid wasting time optimizing the wrong thing. Maybe you don't need much caching, and it would only complicate your app. Maybe the data you are thinking of caching can be retrieved in a faster way, or less of it can be retrieved at once.
Cache anything that causes duplicate database queries.
Client side file caching is important as well. Assuming files are marked with an id in your database, cache them on every network request to avoid many network requests for the same file. A resource to do this can be found here (https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API). If you don't need to cache files, web storage, local storage and cookies are good for smaller pieces of data.
//if file is in cache
//refer to cache
//else
//make network request and push file to cache

Key based caching

I'm reading this article:
http://37signals.com/svn/posts/3113-how-key-based-cache-expiration-works
I'm not using rails so I don't really understand their example.
It says in #3:
When the key changes, you simply write the new content to this new
key. So if you update the todo, the key changes from
todos/5-20110218104500 to todos/5-20110218105545, and thus the new
content is written based on the updated object.
How does the view know to read from the new todos/5-20110218105545 instead of the old one?
I was confused about that too at first -- how does this save a trip to the database if you have to read from the database anyway to see if the cache is valid? However, see Jesse's comments (1, 2) from Feb 12th:
How do you know what the cache key is? You would have to fetch it from the database to know the mtime right? If you’re pulling the record from the database already, I would expect that to be the greatest hit, no?
Am I missing something?
and then
Please remove my brain-dead comment. I just realized why this doesn’t matter: the caching is cascaded, so yes a full depth regeneration incurs a DB hit. The next cache hit will incur one DB query for the top-level object—all the descendant objects are not queried because the cache for the parent object includes cached versions for the children (thus, no query necessary).
And Paul Leader's comment 2 below that:
Bingo. That’s why is works soooo well. If you do it right it doesn’t just eliminate the need to generate the HTML but any need to hit the db. With this caching system in place, our data-vis app is almost instantaneous, it’s actually useable and the code is much nicer.
So given the models that DHH lists in step 5 of the article and the views he lists in step 6, and given that you've properly setup your relationships to touch the parent objects on update, and given that your partials access your child data as parent.children, or even child.children in nested partials, then this caching system should have a net gain because as long as the parent's cache-key is still valid then the parent.children lookup will never happen and will also be pulled from cache, etc.
However, this method may be pointless if your partials reference lots of instance variables from the controller since those queries will already have been performed by the time Rails sees the calls to cache in the view templates. In that case you would probably be better off using other caching patterns.
Or at least this is my understanding of how it works. HTH

Resources