I read that it was a bad idea to modify a set within a parallelStream.
I can understand that there are concurrency issues, but is there any workaround?
Can we synchronize or would it lose the interest of the parallelStream?
Related
I'm working on an image uploader and want to concurrently resize the image to different sizes. Once I've read the file as a []byte I'm passing a reference of that buffer to my resize functions that are being run concurrently.
Is this safe? I'm thinking by passing a reference of a large file to be read by resize functions will save me memory, and the concurrency will save me time.
Thank you!
Read-only data is usually fine for concurrent access, but you have to be very careful when passing references (pointers, slices, maps and so on) around. Today maybe no one is modifying them while you're also reading, but tomorrow someone may be.
If this is a throwaway script, you'll be fine. But if it's part of a larger program, I'd recommend future-proofing your code by judiciously protecting concurrent access. In your case something like a reader-writer lock could be a good match - all the readers will be able to acquire the lock concurrently, so the performance impact is negligible. And then if you do decide in the future this data could be modified, you already have the proper groundwork laid down w.r.t. safety.
Don't forget to run your code with the race detector enabled.
I'm trying to make a state machine in which I want to build a retry and max attemps feature. Let me explain, so far I have this:
From SAVED, I want to go to VALIDATED, although if there is an error, it has to go to AWAITING_VALIDATION state. After 3 minutes, try again to VALIDATED state.
Did I have correctly set up retry mechanism?
After 3 attemps, I want to go back to SAVED state (and pause state machine). Is it possible to do that in a fancy waty (e.g using spring state machine) or do I have to do this manually using some kind of a cache?
Thanks for your help
There are probably many ways to do these things with different machine configurations but having said that, this is such a clearly presented guestion that I wanted to spend some time on it.
You are close and you missed some things(I'd say tricks) to make this happen. Answer is to use extended state variables to add memory into a machine. These variables are usually used to limit number of needed stated to represent what machine needs to do. You need 3 loops and you could probably create more states to represent each loop and transition(with specific guards) to those as needed. However this will simply explode state configuration if you need more loops like 10 or 20 or 100+.
I created an example in ssm-sample3 which is showing how extended state variables and different guards and actions can be used to drive this specific flow.
Unfortunately there is a bug in a current 1.1.1.RELEASE which prevents you to directly transition from a AWAITING_VALIDATION into HAS_ERROR junction and loop until you pause into VALID using an anonymous transition having a guard(that's why sample has a dummy TMP state which is not needed with 1.2.x).
This is probably something I'd like to add as an example or faq to our ref docs.
Lemmy know if this helps.
In an environment where online processing and batch processing is simultaneous, is there a way to devise the parameter open_cursors?
I am trying to look so that I can optimize our testing environment for open_cursor parameter. I have already checked the Oracle Performance Tuning guide but still I am unable to understand how to arrive to this number.
Will running load runner tests help me get to this number? Please let me know if any more info is needed to help.
Do you actually have a problem? open_cursors is a limit on the number of cursors a single session can have open. It is not a system-wide limit. The proper value isn't influenced by load or what happens in some other session.
The default value is almost always more than sufficient for a properly written application. If you have an application that has long-running sessions and cursor leaks, increasing the value may let you run longer before you start to encounter problems while you find and address the cursor leaks but if you have a leak you'll eventually run out no matter what your setting. In the vast majority of cases, when people get an error related to open_cursors, the proper solution is to find and fix the bug that is leaking cursors rather than to change open_cursors.
As I understand, MATLAB cannot use pass by reference when sending arguments to other functions. I am doing audio processing, and I frequently have to pass waveforms as arguments into functions, and because MATLAB uses pass by value for these arguments, it really eats up a lot of RAM when I do this.
I was considering using global variables as a method to pass my waveforms into functions, but everywhere I read there seems to be a general opinion that this is a bad idea, for organization of code, and potentially performance issues... but I haven't really read any detailed answers on how this might impact performance...
My question: What are the negative impacts of using global variables (with sizes > 100MB) to pass arguments to other functions in MATLAB, both in terms of 1) performance and 2) general code organization and good practice.
EDIT: From #Justin's answer below, it turns out MATLAB does on occasion use pass by reference when you do not modify the argument within the function! From this, I have a second related question about global variable performance:
Will using global variables be any slower than using pass by reference arguments to functions?
MATLAB does use pass by reference, but also uses copy-on-write. That is to say, your variable will be passed by reference into the function (and so won't double up on RAM), but if you change the variable within the the function, then MATLAB will create a copy and change the copy (leaving the original unaffected).
This fact doesn't seem to be too well known, but there's a good post on Loren's blog discussing it.
Bottom line: it sounds like you don't need to use global variables at all (which are a bad idea as #Adriaan says).
While relying on copy on write as Justin suggested is typically the best choice, you can easily implement pass by reference. With Matlab oop being nearly as fast as traditional functions in Matlab 2015b or newer, using handle is a reasonable option.
I encountered an interesting use case of a global variable yesterday. I tried to parallellise a piece of code (1200 lines, multiple functions inside the main function, not written by me), using parfor.
Some weird errors came out and it turned out that this piece of code wrote to a log file, but used multiple functions to write to the log file. Rather than opening and closing the relevant log file every time a function wanted to write to it, which is very slow, the file ID was made global, so that all write-functions could access it.
For the serial case this made perfect sense, but when trying to parallellise this, using global apparently breaks the scope of a worker instance as well. So suddenly we had 4 workers all trying to write into the same log file, which resulted in some weird errors.
So all in all, I maintain my position that using global variables is generally a bad idea, although I can see its use in specific cases, provided you know what you're doing.
Using global variables in Matlab may increase performance alot. This is because you can avoid copying of data in some cases.
Before attempting to gain such performance tweaks, think carefully of the cost to your project, in terms of the many drawbacks that global variables come with. There are also pitfalls to using globals with bad consequences to performance, and those may be difficult to avoid(although possible). Any code that is littered with globals tend to be difficult to comprehend.
If you want to see globals in use for performance, you can look at this real-time toolbox for optical flow that I made. This is the only project in native Matlab that is capable of real-time optical flow that I know of. Using globals was one of the reasons this was doable. It is also a reason to why the code is quite difficult to grasp: Globals are evil.
That globals can be used this way is not a way to argue for their use, rather it should be a hint that something should be updated with Matlabs unflexible notions of workspace and inefficient alternatives to globals such as guidata/getappdata/setappdata.
I've been using ReadDirectoryChangesW to monitor a particular portion of the file system. It rather nicely provides a partial pathname to the file or directory which changed along with a clue about the nature of the change. This may have spoiled me.
I also need to monitor a particular portion of the registry, but it looks as if RegNotifyChangeKeyValue is very coarse. It will tell me that something under the given key changed, but it doesn't seem to want to tell me what that something might have been. Bummer!
The portion of the registry in question is arbitrarily deep, so enumerating all the sub-keys and calling RegNotifyChangeKeyValue for each probably isn't a hot idea because I'll eventually end up having to overcome MAXIMUM_WAIT_OBJECTS. Plus I'd have to adjust the set of keys I'd passed to RegNotifyChangeKeyValue, which would be a fair amount of effort to do without enumerating the sub-keys every time, which would defeat a fair amount of the purpose.
Any ideas?
Unfortunately, yes. You probably have to cache all the values of interest to your code, and update this cache yourself whenever you get a change trigger, or else set up multiple watchers, one on each of the individual data items of interest. As you noted the second solution gets unwieldy very quickly.
If you can implement the required code in .Net you can get the same effect more elegantly via RegistryEvent and its subclasses.