Best strategy for "mutable" records in Erlang - data-structures

I develop the system where I assume will be many users. Each user has a profile represented inside the application as a record. To store user's profile I do the following base64:encode_to_string(term_to_binary(Profile)), so basically profiles stored in serialized maner.
So far everything is just fine. Now comes the question:
From time to time I do plan to extend profile functionality by adding and removing certain fields in it. My question is what is a best strategy to handle these changes in the code?
The approach I see at the moment is to do something like this:
Profile = get_profile(UserName),
case is_record(Profile, #profile1) of
true ->
% do stuff with Profile#profile1
ok;
_ ->
next
end,
case is_record(Profile, #profile2) of
true ->
% do stuff with Profile#profile2
ok;
_ ->
next
end,
I want to know if there are any better solutions for my task?
Additional info: I use is simple KV storage. It cannot store Erlang types this is why I use State#state.player#player.chips#chips.br

Perhaps, you could use proplists.
Assume, you have stored some user profile.
User = [{name,"John"},{surname,"Dow"}].
store_profile(User).
Then, after a couple of years you decided to extend user profile with user's age.
User = [{name,"John"},{surname,"Dow"},{age,23}].
store_profile(User).
Now you need to get a user profile from DB
get_val(Key,Profile) ->
V = lists:keyfind(Key,1,Profile),
case V of
{_,Val} -> Val;
_ -> undefined
end.
User = get_profile().
UserName = get_val(name,User).
UserAge = get_val(age,User).
If you get a user profile of 'version 2', you will get an actual age (23 in this particular case).
If you get a user profile of 'version 1' ('old' one), you will get 'undefined' as an age, - and then you can update the profile and store it with the new value, so it will be 'new version' entity.
So, no version conflict.
Probably, this is not the best way to do, but it might be a solution in some case.

It strongly depend of proportion of number of records, frequency of changes and acceptable outage. I would prefer upgrade of profiles to newest version first due maintainability. You also can make system which will upgrade on-fly as mnesia does. And finally there is possibility keep code for all versions which I would definitely not prefer. It is maintenance nightmare.
Anyway when is_record/2 is allowed in guards I would prefer
case Profile of
X when is_record(X, profile1) ->
% do stuff with Profile#profile1
ok;
X when is_record(X, profile2) ->
% do stuff with Profile#profile2
ok
end
Notice there is not catch all clause because what you would do with unknown record type? It is error so fail fast!
You have many other options e.g. hack like:
case element(1,Profile) of
profile1 ->
% do stuff with Profile#profile1
ok;
profile2 ->
% do stuff with Profile#profile2
ok
end
or something like
{_, F} = lists:keyfind({element(1,Profile), size(Profile)},
[{{profile1, record_info(size, profile1)}, fun foo:bar/1},
{{profile2, record_info(size, profile2)}, fun foo:baz/1}]),
F(Profile).
and many other possibilities.

The best approach is to have the copy of the serialized (profile) and also a copy of the same but in record form. Then , each time changes are made to the record-form profile, changes are also made to the serialized profile of the same user ATOMICALLY (within the same transaction!). The code that modifies the users record profile, should always recompute the new serialized form which, to you, is the external representation of the users record-record(record_prof,{name,age,sex}).
-record(myuser,{
username,
record_profile = #record_prof{},
serialized_profile
}).
change_profile(Username,age,NewValue)->
%% transaction starts here....
[MyUser] = mnesia:read({myuser,Username}),
Rec = MyUser#myuser.record_profile,
NewRec = Rec#record_prof{age = NewValue},
NewSerialised = serialise_profile(NewRec),
NewUser = MyUser#myuser{
record_profile = NewRec,
serialized_profile = NewSerialised
},
write_back(NewUser),
%% transaction ends here.....
ok.
So whatever the serialize function is doing, that's that. But this always leaves an overhead free profile change. We thereby keep the serialized profile as always the correct representation of the record profile at all times. When changes occur to the record profile, the serialized form must also be recomputed (transactional) so as to have integrity.

You could use some extensible data serialization format such as JSON or Google Protocol Buffers.
Both of these formats support adding new fields without breaking backwards compatibility. By using them you won't need to introduce explicit versioning to your serialized data structures.
Choosing between the two formats depends on your use case. For instance, using Protocol Buffers is more reliable, whereas JSON is easier to get started with.

Related

What is the convention around derivative information?

I am working on a service that provides information about a few related entities, somewhat like a database. Suppose that there's calls to retrieve information about a school:
service MySchool {
rpc GetClassRoom (ClassRoomRequest) returns (ClassRoom);
rpc GetStudent (StudentRequest) returns (Student);
}
Now, suppose that I want to find out a class room's information, I'd receive a proto that looks like so:
message ClassRoom {
string id = 1;
string address = 2;
string teacher = 3;
}
Sometimes I also want to know all of the students of the classroom. I am struggling to think which is the better design pattern.
Option A) Add an extra rpc like so: rpc GetClassRoomStudents (ClassRoomRequest) returns (ClassRoomStudents), where ClassRoomStudents has a single field repeated Student students. This technique requires more than one call to get all the information that we want (and many if we wanted to know information for more than one classroom).
Option B) Add an extra repeated Student students field to the ClassRoom proto, and B') Fill it up only when necessary, or B") Fill it up whenever the server receives a GetClassRoom call. This may sometimes fetch extra information, or lead to ambiguity according to what fields are filled up.
I am not sure what's the best / most conventional way of dealing with this. How have some of you dealt with this?
There is no simple answer. It's a tradeoff between simplicity (option A) and performance (option B), and it depends on the situation which solution is best.
In general, I'd recommend to go with the simple solution first, unless your measurements demonstrate that it leads to performance issues. At that point, it's easy to add repeated Student students to ClassRoom and a field bool fetch_students [default=false] to ClassRoomRequest. Then clients are free to continue using the simple API, or choose to upgrade to the more performant API if they need to.
Note that this isn't specific to gRPC; the same issue is seen in REST APIs, and basically almost any request/response model.

How can the User Interface know which commands is allowed to perform against an Aggregate Root?

The UI is decoupled from the domain, but the UI should try its best to never allow the user to issue commands that are sure to fail.
Consider the following example (pseudo-code):
DiscussionController
#Security(is_logged)
#Method('POST')
#Route('addPost')
addPostToDiscussionAction(request)
discussionService.postToDiscussion(
new PostToDiscussionCommand(request.discussionId, session.myUserId, request.bodyText)
)
#Method('GET')
#Route('showDiscussion/{discussionId}')
showDiscussionAction(request)
discussionWithAllThePosts = discussionFinder.findById(request.discussionId)
canAddPostToThisDiscussion = ???
// render the discussion to the user, and use `canAddPostToThisDiscussion` to show/hide the form
// from which the user can send a request to `addPostToDiscussionAction`.
renderDiscussion(discussionWithAllThePosts, canAddPostToThisDiscussion)
PostToDiscussionCommand
constructor(discussionId, authorId, bodyText)
DiscussionApplicationService
postToDiscussion(command)
discussion = discussionRepository.get(command.discussionId)
author = collaboratorService.authorFrom(discussion.Id, command.authorId)
post = discussion.createPost(postRepository.nextIdentity(), author, command.bodyText)
postRepository.add(post)
DiscussionAggregate
// originalPoster is the Author that started the discussion
constructor(discussionId, originalPoster)
// if the discussion is closed, you can't create a post.
// *unless* if you're the author (OP) that started the discussion
createPost(postId, author, bodyText)
if (this.close && !this.originalPoster.equals(author))
throw "Discussion is closed."
return new Post(this.discussionId, postId, author, bodyText)
close()
if (this.close)
throw "Discussion already closed."
this.close = true
isClosed()
return this.close
The user goes to /showDiscussion/123 and he see the discussion with the <form> from which he can submit a new post, but only if the discussion is not closed or the current user is who started that discussion.
Or, the user goes to /showDiscussion/123 where it's presented as a REST-as-in-HATEOAS API. A hypermedia link to /addPost will be provided, only if the discussion is not closed or the authenticated user is who started that discussion.
How can I provide that knowledge into the UI?
I could code that into the read model,
canAddPostToThisDiscussion = !discussionWithAllThePosts.discussion.isClosed
&& discussionWithAllThePosts.discussion.originalPoster.id == session.currentUserId
but then I need to maintain that logic and keep it in sync with the write model. This is a fairly simple example, but as the states transitions of an aggregate become more complex, it may become really hard to do. I'd like to image my aggregates as state machines, with their workflows (like the RESTBucks example). But I don't like the idea to move that business logic outside my domain model, and put it in a service that both the read side and write side can use.
Maybe this isn't the best example, but as an aggregate root is basically a consistency boundary, we know that we need to prevent invalid state transitions in its life cycle, and in each transitions to a new state some operations may become illegal and vice versa. So, how can the user interface know what is allowed or not? What are my alternative? How should I approach to this problem? Do you have any example to provide?
How can I provide that knowledge into the UI?
The easiest way is probably to share the domain model's understanding of what is possible with the UI. Ta Da.
Here's a way to think about it -- in the abstract, all of the write model logic has a fairly simple looking shape.
{
// Notice that these statements are queries
State currentState = bookOfRecord.getState()
State nextState = model.computeNextState(currentState, command)
// This statement is a command
bookOfRecord.replace(currentState, nextState)
}
Key ideas here: the book of record is the authority of state; everybody else (including the "write model") is working with a stale copy.
What the model represents is a collection of constraints that ensure that the business invariant is satisfied. Over the lifetime of a system, there might be many different sets of constraints, as the understanding of the business changes.
The write model is the authority for which collection of constraints is currently enforced when replacing the state in the book of record. Everybody else is working with a stale copy.
The staleness is something to keep in mind; in a distributed system, any validation you perform is provisional -- unless you have a lock on the state and a lock on the model, either could be changed while your messages are in flight.
This means that your validation is approximate anyway, so you don't need to be too concerned that you have all of the fiddly details right. You assume that your stale copy of the state is approximately right, and your current understanding of the model is approximately right, and if the command is valid given those pre-conditions, then it is checked enough to send.
I don't like the idea to move that business logic outside my domain model, and put it in a service that both the read side and write side can use.
I think the best answer here is "get over it". I get it; because having the business logic inside the aggregate root is what the literature is telling us to do. But if you continue to refactor, identifying common patterns and separating concerns, you'll see that entities are really just plumbing around a reference to state and a functional core.
AggregateRoot {
final Reference<State> bookOfRecord;
final Model<State,Command> theModel;
onCommand(Command command) {
State currentState = bookOfRecord.getState()
State nextState = model.computeNextState(currentState, command)
bookOfRecord.replace(currentState, nextState)
}
}
All we've done here is taken the "construct the next state" logic, which we used to have scattered through out the AggregateRoot, and encapsulated it into a separate responsibility boundary. Here, its specific to the root itself, but an equivalent refactoring it so pass it as an argument.
AggregateRoot {
final Reference<State> bookOfRecord;
onCommand(Model<State,Command> theModel, Command command) {
State currentState = bookOfRecord.getState()
State nextState = model.computeNextState(currentState, command)
bookOfRecord.replace(currentState, nextState)
}
}
In other words, the model, teased out from the plumbing of tracking state, is a domain service. The domain logic within the domain service is just as much a part of the domain model as the domain logic within the aggregate -- the two implementations are dual to one another.
And there's no reason that a read model of your domain shouldn't have access to a domain service.
I don't like the idea of sharing domain knowlegde (code) between the write and the read models as you will have to continously keep them in sync and that'd really a chalenge even if you are the only developer in your company.
But the good knews is that you don't have to duplicate anything. If you designed your Aggregate to be pure, with no side effect as you should do (!), you can simply send it the command but without persisting the changes. If the command throws an exception then the command would not succeed, otherwise the command would succeed. In case of CQRS this is even better as you have a 3rd outcome: idempotent command detection in which case the command succeeds but it has no effect (no events are raised but no exception is thrown either) and the UI might find this interesting.
So, as an example you could have something like this:
DiscussionController
#Security(is_logged)
#Method('POST')
#Route('addPost')
addPostToDiscussionAction(request)
discussionService.postToDiscussion(
new PostToDiscussionCommand(request.discussionId, session.myUserId, request.bodyText)
)
#Method('GET')
#Route('showDiscussion/{discussionId}')
showDiscussionAction(request)
discussionWithAllThePosts = discussionFinder.findById(request.discussionId)
canAddPostToThisDiscussion = discussionService.canPostToDiscussion(request.discussionId, session.myUserId, "some sample body")
// render the discussion to the user, and use `canAddPostToThisDiscussion` to show/hide the form
// from which the user can send a request to `addPostToDiscussionAction`.
renderDiscussion(discussionWithAllThePosts, canAddPostToThisDiscussion)
DiscussionApplicationService
postToDiscussion(command)
discussion = discussionRepository.get(command.discussionId)
author = collaboratorService.authorFrom(discussion.Id, command.authorId)
post = discussion.createPost(postRepository.nextIdentity(), author, command.bodyText)
postRepository.add(post)
canPostToDiscussion(discussionId, authorId, bodyText)
discussion = discussionRepository.get(discussionId)
author = collaboratorService.authorFrom(discussion.Id, authorId)
try
{
post = discussion.createPost(postRepository.nextIdentity(), author, bodyText)
return true
}
catch (exception)
{
return false
}
You could even have a method named whyCantPostToDiscussion that would return the exception or the exception message and display it in the UI.
There is only one issue with the code: the call to postRepository.nextIdentity() because it would increase the next ID every time but you could replace it with something like postRepository.getBiggestIdentity() that should have no side effect.
I find it is rare that authorization is actually part of the domain. If it isn't, it makes sense to move that logic out into its own service which the UI and the domain can make use of.
I like to build up a set of rules using the specification pattern. I find it to be a fairly elegant way to build up the rules.
This also plays very well in a CQRS context as you can run each command through the 'rules engine' before they get issued to your AR's. If you push queries through a message routeing system you can do the same for queries. I've had a lot of success with this approach.
The response you are looking for is HATEOAS, look no further. You must implement your rest api as really restful (level 3) adhering to hypertext to model the state transitions and return links to the clients (being the UI one of those). These links represent the actions the user can execute in its context according to the model state. It´s simple. If you return a link from the server then you bind it to a button in the UI, if you don´t return the link because of business invariants then you do not show the button on the UI. There is a lot more of concepts behind it such as designing a good API supporting a well designed domain model behind but this is the general idea around it and fits exactly what you want.

Can you data-bind a composite id in Grails such that it (or parts of it) becomes updateable?

I am trying to read through the dataBind documentation, but it's not all that clear:
http://grails.org/doc/2.1.0/ref/Controllers/bindData.html
I have a composite id composed of 4 columns, and I need to update one of those. It refuses to .save() and doesn't even throw an error. Is there some configuration that will allow me to change these values and save the model?
If I delete it and create a new record, it will bump the rowid, which I was using on the browser side with datatables/jeditable, and it's not really an option. However, even if I include all the parameters with an empty list:
def a = WaiverExemption.find("from WaiverExemption as e where e.exemptionRowId = ?", [params.rowid])
a.properties = params
bindData(a, params, [include: []])
a.save(flush: true, failOnError: true)
This does not seem to work. I've also tried naming the columns/properties explicitly both by themselves and also with "id".
I was confused on what bindData() actually does. Still confused on that.
If you have a composite id in Grails and wish to change one or more of the column values, save() will never ever execute as suggested in the question. Instead, you'll want to use .executeUpdate(). You can pass in HQL that updates (though most of the examples on the web are for delete) the table in question, with syntax that is nearly identical to proper SQL. Something along the lines of "update domain d set d.propertyName = ?" should work.
I do not know if this is a wise thing to do, or if it violates some philosophical rule of how a Grails app should work, but it will actually do the update. I advise caution and plenty of testing. This crap's all voodoo to me.

Some problems with MapperExtension of sqlalchemy

There are two classes: User and Question
A user may have many questions, and it also contains a question_count
to record the the count of questions belong to him.
So, when I add a new question, I want update the question_count of the
user. At first, I do as:
question = Question(title='aaa', content='bbb')
Session.add(question)
Session.flush()
user = question.user
### user is not None
user.question_count += 1
Session.commit()
Everything goes well.
But I wan't to use event callback to do the same thing. As following:
from sqlalchemy.orm.interfaces import MapperExtension
class Callback(MapperExtension):
def after_insert(self, mapper, connection, instance):
user = instance.user
### user is None !!!
user.question_count += 1
class Question(Base):
__tablename__ = "questions"
__mapper_args__ = {'extension':Callback()}
....
Note in the "after_insert" method:
instance.user # -> Get None!!!
Why?
If I change that line to:
Session.query(User).filter_by(id=instance.user_id).one()
I can get the user successfully, But: the user can't be updated!
Look I have modified the user:
user.question_count += 1
But there is no 'update' sql printed in the console, and the
question_count are not updated.
I try to add Session.flush() or Session.commit() in the
after_insert() method, but both cause errors.
Is there any important thing I'm missing? Please help me, thank you
The author of sqlalchemy gave me an useful answer in a forum, I copy it here:
Additionally, a key concept of the
unit of work pattern is that it
organizes a full list of all
INSERT,UPDATE, and DELETE statements
which will be emitted, as well as the
order in which they are emitted,
before anything happens. When the
before_insert() and after_insert()
event hooks are called, this structure
has been determined, and cannot be
changed in any way. The
documentation for before_insert() and
before_update() mentions that the
flush plan cannot be affected at this
point - only individual attributes on
the object at hand, and those which
have not been inserted or updated yet,
can be affected here. Any scheme
which would like to change the flush
plan must use
SessionExtension.before_flush.
However, there are several ways of
accomplishing what you want here
without modifiying the flush plan.
The simplest is what I already
suggested. Use
MapperExtension.before_insert() on the
"User" class, and set
user.question_count =
len(user.questions). This assumes
that you are mutating the
user.questions collection, rather than
working with Question.user to
establish the relationship. If you
happened to be using a "dynamic"
relationship (which is not the case
here), you'd pull the history for
user.questions and count up what's
been appended and removed.
The next way, is to do pretty much
what you think you want here, that is
implement after_insert on Question,
but emit the UPDATE statement
yourself. That's why "connection" is
one of the arguments to the mapper
extension methods:
def after_insert(self, mapper, connection, instance):
connection.execute(users_table.update().\
values(question_count=users_table.c.question_count +1).\
where(users_table.c.id==instance.user_id))
I wouldn't prefer that approach since
it's quite wasteful for many new
Questions being added to a single
User. So yet another option, if
User.questions cannot be relied upon
and you'd like to avoid many ad-hoc
UPDATE statements, is to actually
affect the flush plan by using
SessionExtension.before_flush:
class
MySessionExtension(SessionExtension):
def before_flush(self, session, flush_context):
for obj in session.new:
if isinstance(obj, Question):
obj.user.question_count +=1
for obj in session.deleted:
if isinstance(obj, Question):
obj.user.question_count -= 1
To combine the "aggregate" approach of
the "before_flush" method with the
"emit the SQL yourself" approach of
the after_insert() method, you can
also use SessionExtension.after_flush,
to count everything up and emit a
single mass UPDATE statement with many
parameters. We're likely well in the
realm of overkill for this particular
situation, but I presented an example
of such a scheme at Pycon last year,
which you can see at
http://bitbucket.org/zzzeek/pycon2010/src/tip/chap5/sessionextension.py
.
And, as I tried, I found we should update the user.question_count in after_flush
user, being I assume a RelationshipProperty, is only populated after the flush (as it is only this point the ORM knows how to relate the two rows).
It looks like question_count is actually a derived property, being the number of Question rows for that user. If performance is not a concern, you could use a read-only property and let the mapper do the work:
#property
def question_count(self):
return len(self.questions)
Otherwise you're looking at implementing a trigger, either at the database-level or in python (which modifies the flush plan so is more complicated).

How granular should data in memcached be?

Something I'm curious about.. What would be "most efficient" to cache the generation of, say, an RSS feed? Or an API response (like the response to /api/films/info/a12345).
For example, should I cache the entire feed, and try and return that, as psuedo code:
id = GET_PARAMS['id']
cached = memcache.get("feed_%s" % id)
if cached is not None:
return cached
else:
feed = generate_feed(id)
memcache.put("feed_%s" % id, feed)
return feed
Or cache the queries result, and generate the document each time?
id = sanitise(GET_PARMS['id'])
query = query("SELECT title, body FROM posts WHERE id=%%", id)
cached_query_result = memcache.get(query.hash())
if cached_query_result:
feed = generate_feed(cached_query_result)
return feed
else:
query_result = query.execute()
memcache.put("feed_%s" % id, query_result)
feed = generate_feed(query_result)
(Or, some other way I'm missing?)
As for my experience, You should use multiple levels of cache. Implement both of Your solutions (provided that it's not the only code that uses "SELECT title, body FROM posts WHERE id=%%". If it is use only the first one).
In the second version of code, You memcache.get(query.hash()), but memcache.put("feed_%s" % id, query_result). This might not work as You want it to (unless You have an unusual version of hash() ;) ).
I would avoid query.hash(). It's better to use something like posts-title-body-%id. Try deleting a video when it's stored in cache as query.hash(). It can hang there for months as a zombie-video.
By the way:
id = GET_PARMS['id']
query = query("SELECT title, body FROM posts WHERE id=%%", id)
You take something from GET and put it right into the sql query? That's bad (will result in SQL injection attacks).
Depends on the usage pattern, but all things equal I'd vote for the first way because you'll only do the work of generating the feed 1 time.
It really depends on what your app does... The only way to answer this is to get some performance numbers from your existing app. Then you can find the code that takes the largest amount of time and work on improving that one.
As others have suggested here I'd profile your code and work out what is the slowest or most expensive part of the operation.

Resources