Why we send id instead of whole object in workers? - ruby

In Ruby practice is to send id instead of object in workers. Isn't that kind of CPU consuming process because we have to retrieve Object again from database.

Several reasons:
Saves space on the queue, also transfer time (app => queue, queue => workers).
Often it is easier to fetch fresh object from the database (as opposed to retrieving cached copy from the queue)
Argument to Resque.enqueue must be JSON-serializable. Complex objects not always can be serialized.

If you think about it the reasons are pretty obvious:
your object may change between the time te action is queued and handled and in general you don't want an outdated object.
an id a a lot lighter to transport than a whole object which you will need to serialize it in json/yaml or anything else.
if you need the associations the problem just got even worse :)
But in the end it depends on your application, if you only need some informations you can just send them to your worker directly without even using the full model.

Related

Some data losts when I use NSURLConnection to get data asynchronously

I deal with the data and do some UI working according to the data in the method
-(void)connection:didReiveiceData(I use delegate as callback), and I find that UI working is always not finished completely.Maybe when the data is received, the UI threading is still busy, so some data losts.You may suggest me to deal with data in -(void)connectionFinishLoading:,it will cause other problems.
You've correctly suggested you need to process the received data in connectionDidFinishLoading:.
Before that, you need to collect all the receivedData (eg into an NSMutableData instance). Append the received data each time didReceiveData: is called (it may be called multiple times before it finishes).
the reason why some data lost is all about the method –rangeOfData:options:range:
I use it wrong.BTW, I think this method is very weird,the option accept only one of two value, NSDataSearchBackwards and NSDataSearchAnchored.why no "NSDataSearchForewards" or something like that?

Plone 4.2 how to make PAS cache external usera data

I'm implementing a PAS plugin that handles authentications against mailservers. Actually only DBMail is implemented.
I realized, that the enumerateUsers function from the PAS plugin is called numerous times per request and requires my plugin to open/close an SQL connections for every (subsequent) request. Of course, this is very expensive.
The connections itself are handled in a plone tool, which is able to handle multiple different mailservers and delegeates the enumerateUsers call to wrapper objects that represent registered servers.
My question is now, what sort of cache (OOBTree, Session?) I should use to provide a temporary local storage for repeating enumerations and avoid subsequent SQL connections?
Another idea was, to hook into the user creation process that takes place on the first login, an external user issues and completely "localize" the users.
Third idea was, to store the needed data in the specific member, if possible.
What would be best practice here?
I'd cache the query results, indeed. You need to make a decision on how long to cache the results, and if stored long term, how to invalidate that cache or check for changes.
There are no best practices for these decisions, as they depend entirely on the type of data stored and the APIs of the backends. If they support some kind of freshness query, for example, then you store everything forever and poll the backend to see if the cache needs updating.
You can start with a simple request cache; query once per request, store it on the request object. Your cache will automatically be invalidated at the end of the request as the request object is cleaned up, the next request will be a clean slate.
If your backend users rarely change, you can cache information for longer, in a local cache. I'd use a volatile attribute on the plugin. Any attribute starting with _v_ is ignored by the persistence machinery. Thus, anything stored in a _v_ volatile attribute is both thread-local and only exists for the lifetime of the process, a restart of the server clears these automatically.
At the very least you should use an _v_ volatile attribute to store your backend SQL connections. That way they can stay open between requests, and can be re-used. Something like the following method would do nicely:
def _connection(self):
# Return a backend connection
if getattr(self, '_v_connection', None) is None:
# Create connection here
self._v_connection = yourdatabaseconnection
return self._v_connection
You could also use a persistent attribute on your plugin to store your cache. This cache would be committed to the ZODB and persist across restarts. You then really need to work out how to invalidate the contents; store timestamps and evict data when to old, etc.
Your cache datastructure depends entirely on your application needs. If you don't persist information, a dictionary (username -> information) could be more than enough. Persisted caches could benefit from using a OOBTree instead of a dictionary as they reduce chances of conflicts between different threads and are more efficient when it comes to large sets of data.
Whatever you do, you do not need to use a Session. Sessions are prone to conflicts, do not scale well, and are in any case not the place to store a cache of this kind.

Coldfusion: is it better to keep just the user_id in the session, or the whole user object?

I've got a cfc to handle the user object. My question is: is it better to store just the user_id in the session and create the user object anew with each request? Or is is better to store the whole user object in the session?
Here are my thoughts either way:
If I store the whole object in the session:
There will be potentially less processor overhead
There will be potentially more memory overhead
all of the methods/functions are stored in the actual object, and new functions that I update in the cfc will not be available unless users logout and back in, or if I devise some way to make it refresh itself.
There could potentially be mutex or lock problems if I'm messing with the object via concurrent ajax calls
If I store just the user_id in the session:
I'll have to create the user object with each page request (potentially more processor overhead)
There will be potentially less memory overhead
There won't be a chance for mutex/lock/race conditions since each request will have its own copy of the user object
Updates to the CFC model itself will be immediately recognized across the system and users wouldn't have to log out and back in
Is there a normal practice for this sort of thing? Am I over-thinking it?
All of the CF apps I've written were targeted at high traffic levels and high availability, so we never had the luxury of being able to think about single-server practices.
So, in my experience, I always had to a) allow for multiple load-balanced servers, and b) avoid sticky-sessions on the load balancer for a number of reasons. Therefore, we needed to, at the very least, have a server become part of a cluster on the fly and pick up mid-session traffic.
So, we always pulled "session" data from a shared datastore on every request.
My suggestion is to implement a session facade.
This affords you the option to change how you persist session data (like the user record) without changing the rest of your app.
You can choose, behind the scenes, to store everything in the session scope, load it up for every request, do a hybrid, use a key-value store, whatever.
You can choose whether to eager-load data, or lazy-load data, or any mix in between, and the rest of the app doesn't need to be aware of what you've done.
On Race Conditions
If you're concerned about race conditions then I would suggest using named locks around data commit and access. This is another bonus of using a facade - your application code doesn't need to know about this, and you can choose to put locks around certain objects, as opposed to locking the whole session.
You haven't indicated whether you're using an ORM, so this is a general answer.
For typical applications, I recommend instantiating the user object into the session scope. There's a big downside to creating the object anew with each request that you didn't include in your list: changes to the user object's properties and state will not persist across requests unless you intend to flush the user object's state to your persistence layer (e.g. database) on every hit. That is likely to be a much more expensive operation than object instantiation, and it doesn't necessarily insulate you from the kinds of problems you're thinking about with respect to ajax calls, race conditions, etc -- it just transfers the manifestation of those problems to the persistence layer, where your object's data could be in an unpredictable state.
Since every new request would be an "implicit save", you would also have to design your "ephemeral" object to be able to persist itself regardless of whether it's in a valid state (imagine the case of a multi-page form that modifies some aspect of the user object).
For session-stored objects, your concerns about memory can be mitigated by careful design practices. For instance, if your user has many tasks, and each task has many items, it might be a bad idea to instantiate and compose all those objects into your user object (i.e., lazy loading would be a better approach than eager loading).
If you really must to be able to change your CFCs on the fly, you can achieve that goal even with session-stored objects. One way is to store a version flag in both the application and session. With each request, your app would compare those flags. When they differ, the app would run a session-reload routine that snapshots current properties, rebuilds the session-stored objects, and finally updates the session flag to match the application flag.
This is piggy-backing partially off Ken Redler's answer but I don't have enough reputation to comment.
The way we do it, and the way I prefer, is to store the user data in Session as a struct. Then on request start, our Auth Model creates the user object in the Request scope and overrides any default values with the Session data. There are a few advantages to this:
Less hits to the database, less CPU
Always run newest code without a complex custom system ensuring that
Clustered environment friendly (complex objects in Session can't be clustered)
Can add or remove properties without corruption (assuming your User object only updates dirty columns)
Also, if you're using CF9, one of the features they were really proud of is how much they optimized object instantiation. If you haven't, test it yourself!
It depends.
If you have a lot of traffic - in the thousands of unique visitors per minute range - the memory overhead of storing your User.cfc in the session will eventually weigh you down. This can be easily overcome by throwing hardware at it (more memory for a while, eventually more servers and a hardware load balancer). Of course popularity is a good problem to have.
If you seem to have a CPU, network or other bottleneck in your database space, you may want to have the object cached in session memory so that you have fewer hits to the database.
Why do I mention these scenarios? You may be prematurely optimizing - don't fix a problem that you don't have. Don't optimize your memory, CPU and database access until those are, or soon will be, problems.
Now from an architectural best practice - not from an optimized "what's best for my processor" - well, I can only say: It depends.
Truthfully, neither way is wrong. If you are going to find yourself needing to check credentials against your database on every request, don't cache it. If you like the feel of an object in the session, then cache it. Because you know your own domain, you can probably go back and forth all day on why you should or should not cache the user object in the session. If it's going to make it easier, do it. If it's going to make it harder, don't.
I would just warn you against doing something incredibly convoluted or anything that is not immediately obvious to a developer looking at your application - the more you write, the more you have to maintain forever, the more your co-workers will associate your name with evil.
Finally, last note, if this is a vote - I say you cache it. It makes sense and always feels good to call session.user.hasRole("xyz") or the like.

Is NSPasteboard thread-safe?

Is it safe to write data to an NSPasteboard object from a background thread? I can't seem to find a definitive answer anywhere. I think the assumption is that the data will be written to the pasteboard before the drag begins.
Background:
I have an application that is fetching data from Evernote. When the application first loads, it gets the meta data for each note, but not the note content. The note stubs are then listed in an outline view. When the user starts to drag a note, the notes are passed to the background thread that handles getting the note content from Evernote. Having the main thread block until the data is gotten results in a significant delay and a poor user experience, so I have the [outlineView:writeItems:toPasteboard:] function return YES while the background thread processes the data and invokes the main thread to write the data to the pasteboard object. If the note content gets transferred before the user drops the note somewhere, everything works perfectly. If the user drops the note somewhere before the data has been processed... well, everything blocks forever. Is it safe to just have the background thread write the data to the pasteboard?
You can promise the data to the pasteboard without actually having the data yet.
One way is to declare the type of the data on the pasteboard, passing yourself as the pasteboard's owner, and respond to a pasteboard:provideDataForType: message by providing the data (blocking, if necessary, until the data either arrives or fails to arrive). This means that you'll need to remember which objects were copied (by stashing them in an array, for example) so you can extract/generate the data from them when the promise comes due.
The other way, referenced in Harald Scheirich's answer, is to make your model objects conform to the NSPasteboardWriting protocol, ideally in a category (to separate interface-independent logic from Mac-specific logic). This is much cleaner than the old way, but requires Mac OS X 10.6 and later.
With NSPasteboardWriting, you'll implement promises by having the model objects' writingOptionsForType:pasteboard: method return the NSPasteboardWritingPromised option. Their pasteboardPropertyListForType: method will return the data, or at least try to—as before, this method should block until the data either arrives or fails to arrive.
Oh, and to answer the question in the title (“Is NSPasteboard thread-safe?”): There's no specific answer in the Thread Safety Summary, but there is this general statement:
… mutable objects are generally not thread-safe. To use mutable objects in a threaded application, the application must synchronize appropriately.
I would consider an NSPasteboard to be a mutable object, so no.
In practice, this isn't a problem: You typically only work with NSPasteboard in response to an action message (e.g., copy:), a drag, or a service invocation, and those all only happen on the main thread anyway. For them to happen on a secondary thread, you would have to explicitly send such messages yourself from code running on a secondary thread, in which case you are already doing something very wrong.
Conjecture:
I think your problem has nothing to do with threading but the fact that by returning YES you told the system that the data is ready. have you tried moving your data into a custom class supporting NSPasteboardWriting and NSPasteboardReading? this way the accessor to your data can block until the data is ready.
See the Pasteboard Documentation

Core Data is using a lot of memory

I have a data model which is sort of like this simplified drawing:
alt text http://dl.dropbox.com/u/545670/thedatamodel.png
It's a little weird, but the idea is that the app manages multiple accounts/identities a person may have into a single messaging system. Each account is associated with one user on the system, and each message could potentially be seen/sent-to multiple accounts (but they have a globally unique ID hence the messageID property which is used on import to fetch message objects that may have already been downloaded and imported by a prior session).
The app is used from a per-account point of view - what I mean is that you choose which account you want to use, then you see the messages and stuff from that account's point of view in your window. So I have the messages attached to the account so that I can easily get the messages that should be shown using a fetch like this:
fetch.fetchPredicate = [NSPredicate predicateWithFormat:#"%# IN accounts", theAccount];
fetch.sortDescriptors = [NSArray arrayWithObject:[[NSSortDescriptor alloc] initWithKey:#"date" ascending:NO]];
fetch.fetchLimit = 20;
This seems like the right way to set this up in that the messages are shared between accounts and if a message is marked as read by one, I want it seen as being read by the other and so on.
Anyway, after all this setup, the big problem is that memory usage seems to get a little crazy. When I setup a test case where it's importing hundreds of messages into the system, and periodically re-fetching (using the fetch mentioned above) and showing them in a list (only the last 20 are referenced by the list), memory just gets crazy. 60MB.. 70MB... 100MB.. etc.
I tracked it down to the many-to-many relation between Account and Message. Even with garbage collection on, the managed objects are still being referenced strongly by the account's messages relationship property. I know this because I put a log in the finalize of my Message instance and never see it - but if I periodically reset the context or do refreshObject:mergeChanges: on the account object, I see the finalize messages and memory usage stays pretty consistent (although still growing somewhat, but considering I'm importing stuff, that's to be expected). The problem is that I can't really reset the context or the account object all the time because that really messes up observers that are observing other attributes of the account object!
I might just be modeling this wrong or thinking about it wrong, but I keep reading over and over that it's important to think of Core Data as an object graph and not a database. I think I've done that here, but it seems to be causing trouble. What should I do?
Use the Object Graph instrument. It'll tell you all of the ownerships keeping an object alive.
Have you read the section of the docs on this topic?

Resources