We have a piece of code with a static field val format = DateTimeFormatter.forPattern("yyyy-MM-dd")
Now this instance of formatter will be used by concurrent threads to parse and print date format.parseDateTime("2013-09-24") and format.print(instant).
I learnt that in Scala you can write your code without caring for concurrency, provided that you only use immutable fields, but what about the performance ? Can it become a bottleneck if several threads use the same instance ?
Thanks,
Your question is more related to Java. If the implementation of the forPattern method is thread safe you can share it between many threads without any bottleneck.
Check the javadoc to see if the implementation is thread safe. In your specific case, I will assume that you are using the JodaTime library :
extract from DateTime Javadoc :
DateTimeFormat is thread-safe and immutable, and the formatters it returns are as well.
Has a counter example see SimpleDateFormat javadoc :
Date formats are not synchronized. It is recommended to create separate format instances for each thread. If multiple threads access a format concurrently, it must be synchronized externally.
Using a val just mean that the variable reference will not change after his declaration. see What is the difference between a var and val definition in Scala?
Well looks like you have mis-interpreted it: Concurrency in Scala is very much like Java (or any other language) and nothing much special. Just that it provides alternative libraries and syntactic sugar to get them done with much lesser boiler plate and more importantly do get them done safely (ex: akka).
But the other principles like: dependency on number of cores, thread-pool sizes, context switches etc etc will all have to be handled and taken care of.
Now for the question if a immutable val accessed by multiple threads degrades performance: I dont think there should be any over-head nor do I have data to support. But I think the performance might be good as the processor can cache it and the same object can retrieved faster in another core.
Related
I'm familiarizing myself with NSPersistentContainer. I wonder if it's better to spawn an instance of the private context with newBackgroundContext every time I need to insert/fetch some entities in the background or create one private context, keep it and use for all background tasks through the lifetime of the app.
The documentation also offers convenience method performBackgroundTask. Just trying to figure out the best practice here.
I generally recommend one of two approaches. (There are other setups that work, but these are two that I have used, and tested and would recommend.)
The Simple Way
You read from the viewContext and you write to the viewContext and only use the main thread. This is the simplest approach and avoid a lot of the multithread issues that are common with core-data. The problem is that the disk access is happening on the main thread and if you are doing a lot of it it could slow down your app.
This approach is suitable for small lightweight application. Any app that has less than a few thousand total entities and no bulk changes at once would be a good candidate for this. A simple todo list, would be a good example.
The Complex Way
The complex way is to only read from the viewContext on the main thread and do all your writing using performBackgroundTask inside a serial queue. Every block inside the performBackgroundTask refetches any managedObjects that it needs (using objectIds) and all managedObjects that it creates are discarded at the end of the block. Each performBackgroundTask is transactional and saveContext is called at end of the block. A fuller description can be found here: NSPersistentContainer concurrency for saving to core data
This is a robust and functional core-data setup that can manage data at any reasonable scale.
The problem is that you much always make sure that the managedObjects are from the context you expect and are accessed on the correct thread. You also need a serial queue to make sure you don't get write conflicts. And you often need to use fetchedResultsController to make sure entities are not deleted while you are holding pointers to them.
I have a question about the marker trait Sync after reading Extensible Concurrency with the Sync and Send Traits.
Java's "synchronize" means blocking, so I was very confused about how a Rust struct with Sync implemented whose method is executed on multiple threads would be effective.
I searched but found no meaningful answer. I'm thinking about it this way: every thread will get the struct's reference synchronously (blocking), but call the method in parallel, is that true?
Java: Accesses to this object from multiple threads become a synchronized sequence of actions when going through this codepath.
Rust: It is safe to access this type synchronously through a reference from multiple threads.
(The two points above are not canonical definitions, they are just demonstrations how similar words can be used in sentences to obtain different meanings)
synchronized is implemented as a mutual exclusion lock at runtime. Sync is a compile time promise about runtime properties of a specific type that allows other types depend on those properties through trait bounds. A Mutex just happens to be one way one can provide Sync behavior. Immutable types usually provide this behavior too without any runtime cost.
Generally you shouldn't rely on words having exactly the same meaning in different contexts. Java IO stream != java collection stream != RxJava reactive stream ~= tokio Stream. C volatile != java volatile. etc. etc.
Ultimately the prose matters a lot more than the keyword which are just shorthands.
I've been using SearchScope.fetchObjects() method till this time, and then it just occurred to me that fetchRows might be the better choice in some cases (when you don't need metadata like class names, object stores etc). Something tells me it might be faster, but I didn't found any arguments about what method to use in which case, and why.
Here is SearchScope documentation.
The difference in performance of fetchRows() and fetchObjects() is negligible in most cases. If you process significant volume of data and still are concerned about performance I suggest making a simple test.
The only reason for existence of fetchRows() is the possibility to query disparate object classes using JOIN.
I want to pass an instance of an object between two Ruby processes. Specifically, I want to pass an instance of a Selenium WebDriver from one process to another process. The reason I want to do this is because it takes a lot of time for Ruby to create this object, but I want it to be used by the other process.
I've found some related questions here and here that seem to point towards using DRb, but I've been unable to find any useful examples or sample code.
Is there a tool other than DRb that I should be using? Does anyone have an example similar to this that I could copy from?
It looks like you're going to have to use DRb, although the documentation for it seems to be lacking. There is however an interesting article here. You might also want to consider purchasing The dRuby Book by Masatoshi Seki to get a better idea of how to do this effectively.
Another option to investigate if you are not looking at simultaneous access, but you just want to send the object from one process to another, is to serialize (that is, encode in a way that Ruby can read) the object with YAML (for a human readable file) or Marshall (for a binary encoded file) and send it using a pipe. This was mentioned in another answer that has since been deleted.
Note that either of these solutions require modifying the Selenium code heavily since the objects you want to manipulate neither support copying, nor simultaneous access natively.
TL;DR
Most queue or distributed processes are going to require some sort of serialization to work properly. If you want to pass objects rather than messages, then this will a limiting factor in how you approach the problem.
DRb
I don't know if you can marshal a WebDriver object. If you can't, then DRb may be a good choice for your distributed Ruby programs because it supports DRbObject references for things that can't be marshaled. There are some examples provided in the DRb documentation.
Selenium Wire Protocol
Depending on what you're really trying to do, it may be worth taking a closer look at using the remote bindings for the Remote WebDriver client/server, or Selenium's JSON Wire Protocol as an alternative to passing objects between processes.
Other Alternatives: Fixtures, Factories, Stubs, and Mocks
Whether or not these work in your specific case will depend a lot on why you want to pass objects instead of simply driving the remote server. If it's largely an issue of how long it takes to build your object, then the serialization/de-serialization cycle may not necessarily be faster in all cases.
You might want to revisit why your object is so slow to create. If gathering and processing the data for it is what's taking too long, you can use some sort of test fixture or factory to trim that time, either by using a smaller set of fixed data, or using a pre-serialized object that's optimized for speed.
You might also consider whether you actually need real data or objects for your test at all. In many cases, you can speed up your tests a lot by stubbing methods or creating mock objects that will return the values you need for your integration tests without needing to perform expensive calculations or long-running operations.
There are certainly cases where you need to drive the full stack and perform acceptance tests on real data. Even then, you may be able to devise a set of fixture data that will take less time or memory to process. It's certainly worth at least thinking about.
hello I want to use stanford parser wuth threads but I dont know how to do that with thread pool. I want that all threads will do this:
LexicalizedParser.apply(Object in)
but I dont want to create all the time new object of LexicalizedParser because it will load
lp = new LexicalizedParser("englishPCFG.ser.gz");
and it will take 2 sec for each obj.
what can I do?
thanks!
Guess it's too late but a thread safe version is there: http://nlp.stanford.edu/software/lex-parser.shtml
You can use ThreadLocal.
It allows you to keep one instance of parser per thread. Thus any created instance of parser will never be used from more than one thread.
Usually it shouldn't create more instances than CPUs*cores you have.
For me it is ~4-5 instances (if I disable Hyper Threading on my quadcore).
P.S. Not related to StanfordNLP. Sometimes poor class implementations contain static fields and modify them in non-thread safe way. General safe parallelization approach for such implementations would be:
move computation part into separate process;
launch (CPUs*cores) number of processes with computations.
use IPC technic for communicating between main/background processes.