how to know what is NOT thread-safe in ruby? - ruby

starting from Rails 4, everything would have to run in threaded environment by default. What this means is all of the code we write AND ALL the gems we use are required to be threadsafe
so, I have few questions on this:
what is NOT thread-safe in ruby/rails? Vs What is thread-safe in ruby/rails?
Is there a list of gems that is known to be threadsafe or vice-versa?
is there List of common patterns of code which are NOT threadsafe example #result ||= some_method?
Are the data structures in ruby lang core such as Hash etc threadsafe?
On MRI, where there a GVL/GIL which means only 1 ruby thread can run at a time except for IO, does the threadsafe change effect us?

None of the core data structures are thread safe. The only one I know of that ships with Ruby is the queue implementation in the standard library (require 'thread'; q = Queue.new).
MRI's GIL does not save us from thread safety issues. It only makes sure that two threads cannot run Ruby code at the same time, i.e. on two different CPUs at the exact same time. Threads can still be paused and resumed at any point in your code. If you write code like #n = 0; 3.times { Thread.start { 100.times { #n += 1 } } } e.g. mutating a shared variable from multiple threads, the value of the shared variable afterwards is not deterministic. The GIL is more or less a simulation of a single core system, it does not change the fundamental issues of writing correct concurrent programs.
Even if MRI had been single-threaded like Node.js you would still have to think about concurrency. The example with the incremented variable would work fine, but you can still get race conditions where things happen in non-deterministic order and one callback clobbers the result of another. Single threaded asynchronous systems are easier to reason about, but they are not free from concurrency issues. Just think of an application with multiple users: if two users hit edit on a Stack Overflow post at more or less the same time, spend some time editing the post and then hit save, whose changes will be seen by a third user later when they read that same post?
In Ruby, as in most other concurrent runtimes, anything that is more than one operation is not thread safe. #n += 1 is not thread safe, because it is multiple operations. #n = 1 is thread safe because it is one operation (it's lots of operations under the hood, and I would probably get into trouble if I tried to describe why it's "thread safe" in detail, but in the end you will not get inconsistent results from assignments). #n ||= 1, is not and no other shorthand operation + assignment is either. One mistake I've made many times is writing return unless #started; #started = true, which is not thread safe at all.
I don't know of any authoritative list of thread safe and non-thread safe statements for Ruby, but there is a simple rule of thumb: if an expression only does one (side-effect free) operation it is probably thread safe. For example: a + b is ok, a = b is also ok, and a.foo(b) is ok, if the method foo is side-effect free (since just about anything in Ruby is a method call, even assignment in many cases, this goes for the other examples too). Side-effects in this context means things that change state. def foo(x); #x = x; end is not side-effect free.
One of the hardest things about writing thread safe code in Ruby is that all core data structures, including array, hash and string, are mutable. It's very easy to accidentally leak a piece of your state, and when that piece is mutable things can get really screwed up. Consider the following code:
class Thing
attr_reader :stuff
def initialize(initial_stuff)
#stuff = initial_stuff
#state_lock = Mutex.new
end
def add(item)
#state_lock.synchronize do
#stuff << item
end
end
end
A instance of this class can be shared between threads and they can safely add things to it, but there's a concurrency bug (it's not the only one): the internal state of the object leaks through the stuff accessor. Besides being problematic from the encapsulation perspective, it also opens up a can of concurrency worms. Maybe someone takes that array and passes it on to somewhere else, and that code in turn thinks it now owns that array and can do whatever it wants with it.
Another classic Ruby example is this:
STANDARD_OPTIONS = {:color => 'red', :count => 10}
def find_stuff
#some_service.load_things('stuff', STANDARD_OPTIONS)
end
find_stuff works fine the first time it's used, but returns something else the second time. Why? The load_things method happens to think it owns the options hash passed to it, and does color = options.delete(:color). Now the STANDARD_OPTIONS constant doesn't have the same value anymore. Constants are only constant in what they reference, they do not guarantee the constancy of the data structures they refer to. Just think what would happen if this code was run concurrently.
If you avoid shared mutable state (e.g. instance variables in objects accessed by multiple threads, data structures like hashes and arrays accessed by multiple threads) thread safety isn't so hard. Try to minimize the parts of your application that are accessed concurrently, and focus your efforts there. IIRC, in a Rails application, a new controller object is created for every request, so it is only going to get used by a single thread, and the same goes for any model objects you create from that controller. However, Rails also encourages the use of global variables (User.find(...) uses the global variable User, you may think of it as only a class, and it is a class, but it is also a namespace for global variables), some of these are safe because they are read only, but sometimes you save things in these global variables because it is convenient. Be very careful when you use anything that is globally accessible.
It's been possible to run Rails in threaded environments for quite a while now, so without being a Rails expert I would still go so far as to say that you don't have to worry about thread safety when it comes to Rails itself. You can still create Rails applications that aren't thread safe by doing some of the things I mention above. When it comes other gems assume that they are not thread safe unless they say that they are, and if they say that they are assume that they are not, and look through their code (but just because you see that they go things like #n ||= 1 does not mean that they are not thread safe, that's a perfectly legitimate thing to do in the right context -- you should instead look for things like mutable state in global variables, how it handles mutable objects passed to its methods, and especially how it handles options hashes).
Finally, being thread unsafe is a transitive property. Anything that uses something that is not thread safe is itself not thread safe.

In addition to Theo's answer, I'd add a couple problem areas to lookout for in Rails specifically, if you're switching to config.threadsafe!
Class variables:
##i_exist_across_threads
ENV:
ENV['DONT_CHANGE_ME']
Threads:
Thread.start

starting from Rails 4, everything would have to run in threaded environment by default
This is not 100% correct. Thread-safe Rails is just on by default. If you deploy on a multi-process app server like Passenger (community) or Unicorn there will be no difference at all. This change only concerns you, if you deploy on a multi-threaded environment like Puma or Passenger Enterprise > 4.0
In the past if you wanted to deploy on a multi-threaded app server you had to turn on config.threadsafe, which is default now, because all it did had either no effects or also applied to a Rails app running in a single process (Prooflink).
But if you do want all the Rails 4 streaming benefits and other real time stuff of the multi-threaded deployment
then maybe you will find this article interesting. As #Theo sad, for a Rails app, you actually just have to omit mutating static state during a request. While this a simple practice to follow, unfortunately you cannot be sure about this for every gem you find. As far as i remember Charles Oliver Nutter from the JRuby project had some tips about it in this podcast.
And if you want to write a pure concurrent Ruby programming, where you would need some data structures which are accessed by more than one thread you maybe will find the thread_safe gem useful.

Related

Why a+=1 is a thread-safe operation in ruby?

Code Snippet:
a = 0
Array.new(50){
Thread.new {
500_000.times { a += 1 }
}
}.each(&:join)
p "a: #{a}"
Result: a = 25_000_000.
In my understanding, (MRI) Ruby use GIL, so there is only one ruby thread can get the CPU, but when thread-switch happend, some data of ruby thread will be stored for restoring thread later. So, in theory, a += 1 may not be thread-safe.
But the result above turns out I'm wrong. Does Ruby makes a+=1 atomic? If true, which operations can be considered thread-safe?
It's Neither Atomic Nor Thread-Safe
In your example, the apparent consistency is largely due to the global interpreter lock, but is also partly due to the way your Ruby engine and your code sequences (theoretically) asynchronous threads. You are getting consistent results because each loop in each thread is simply incrementing the current value of a, which is not a block-local or thread-local variable. With threads on the YARV virtual machine, only one thread at a time is inspecting or setting the current value of a, but I wouldn't really say that it's an atomic operation. It's just a byproduct of the engine’s lack of real-time concurrency between threads, and the underlying implementation of the Ruby virtual machine.
If you're concerned about preserving thread-safety in Ruby without relying on idiosyncratic behaviors that just happen to appear consistent, consider using a thread-safe library like concurrent-ruby. Otherwise, you may be relying on behaviors that aren't guaranteed across Ruby engines or Ruby versions.
For example, three consecutive runs of your code in JRuby (which does have concurrent threads) will generally yield different results on each run. For example:
#=> "a: 3353241"
#=> "a: 3088145"
#=> "a: 2642263"
Ruby doesn't have a well-defined Memory Model, so in some philosophical sense, the question is non-sensical, since without a Memory Model, the term "thread-safe" isn't even defined. For example, the ISO Ruby Language Specification doesn't even document the Thread class.
The way that people write concurrent code in Ruby without a well-defined Memory Model is essentially "guess-and-test". You guess what the implementations will do, then you test as many versions of as many implementations on as many platforms and as many operating systems on as many CPU architectures and as many different system sizes as possible.
As you can see in Todd's answer, even just testing one other implementation already reveals that your conclusion was wrong. (Pro tip: never make a generalization based on a sample size of 1!)
The alternative is to use a library that has already done the above, such as the concurrent-ruby library mentioned in Todd's answer. They do all the testing I mentioned above. They also work closely with the maintainers of the various implementations. E.g. Chris Seaton, the lead developer of TruffleRuby is also one of the maintainers of concurrent-ruby, and Charlie Nutter, the lead developer of JRuby, is one of the contributors.
But the result above turns out I'm wrong.
The results are misleading. In Ruby, a += 1 is a shorthand for:
a = a + 1
With a + 1 being a method call that occurs before the assignment. Since integers are objects in Ruby, we can override that method:
module ThreadTest
def +(other)
super
end
end
Integer.prepend(ThreadTest)
The above code doesn't do anything useful, it just calls super. But merely adding a Ruby implementation on top of the built-in C implementation is enough to break (or fix) your test:
Integer.prepend(ThreadTest)
a = 0
Array.new(50){
Thread.new {
500_000.times { a += 1 }
}
}.each(&:join)
p "a: #{a}"
#=> "a: 11916339"

How do Erlang actors differ from OOP objects?

Suppose I have an Erlang actor defined like this:
counter(Num) ->
receive
{From, increment} ->
From ! {self(), new_value, Num + 1}
counter(Num + 1);
end.
And similarly, I have a Ruby class defined like this:
class Counter
def initialize(num)
#num = num
end
def increment
#num += 1
end
end
The Erlang code is written in a functional style, using tail recursion to maintain state. However, what is the meaningful impact of this difference? To my naive eyes, the interfaces to these two things seem much the same: You send a message, the state gets updated, and you get back a representation of the new state.
Functional programming is so often described as being a totally different paradigm than OOP. But the Erlang actor seems to do exactly what objects are supposed to do: Maintain state, encapsulate, and provide a message-based interface.
In other words, when I am passing messages between Erlang actors, how is it different than when I'm passing messages between Ruby objects?
I suspect there are bigger consequences to the functional/OOP dichotomy than I'm seeing. Can anyone point them out?
Let's put aside the fact that the Erlang actor will be scheduled by the VM and thus may run concurrently with other code. I realize that this is a major difference between the Erlang and Ruby versions, but that's not what I'm getting at. Concurrency is possible in other languages, including Ruby. And while Erlang's concurrency may perform very differently (sometimes better), I'm not really asking about the performance differences.
Rather, I'm more interested in the functional-vs-OOP side of the question.
In other words, when I am passing messages between Erlang actors, how is it different than when I'm passing messages between Ruby objects?
The difference is that in traditional languages like Ruby there is no message passing but method call that is executed in the same thread and this may lead to synchronization problems if you have multithreaded application. All threads have access to each other thread memory.
In Erlang all actors are independent and the only way to change state of another actor is to send message. No process have access to internal state of any other process.
IMHO this is not the best example for FP vs OOP. Differences usually manifest in accessing/iterating and chaining methods/functions on objects. Also, probably, understanding what is "current state" works better in FP.
Here, you put two very different technologies against each other. One happen to be F, the other one OO.
The first difference I can spot right away is memory isolation. Messages are serialized in Erlang, so it is easier to avoid race conditions.
The second are memory management details. In Erlang message handling is divided underneath between Sender and Receiver. There are two sets of locks of process structure held by Erlang VM. Therefore, while Sender sends the message he acquires lock which is not blocking main process operations (accessed by MAIN lock). To sum up, it gives Erlang more soft real-time nature vs totally random behaviour on Ruby side.
Looking from the outside, actors resemble objects. They encapsulate state and communicate with the rest of the world via messages to manipulate that state.
To see how FP works, you must look inside an actor and see how it mutates state. Your example where the state is an integer is too simple. I don't have the time to provide full example, but I'll sketch the code. Normally, an actor loop looks like following:
loop(State) ->
Message = receive
...
end,
NewState = f(State, Message),
loop(NewState).
The most important difference from OOP is that there are no variable mutations i.e. NewState is obtained from the State and may share most of the data with it, but the State variable always remains the same.
This is a nice property, since we never corrupt current state. Function f will usually perform a series of transformation to turn State into NewState. And only if/when it completely succeeds we replace the old state with the new one by calling loop(NewState).
So the important benefit is consistency of our state.
The second benefit I found is cleaner code, but it takes some time getting used to it. Generally, since you cannot modify variable, you will have to divide your code in many very small functions. This is actually nice, because your code will be well factored.
Finally, since you cannot modify a variable, it is easier to reason about the code. With mutable objects you can never be sure whether some part of your object will be modified, and it gets progressively worse if using global variables. You should not encounter such problems when doing FP.
To try it out, you should try to manipulate some more complex data in a functional way by using pure erlang structures (not actors, ets, mnesia or proc dict). Alternatively, you might try it in ruby with this
Erlang includes the message passing approach of Alan Kay's OOP (Smalltalk) and the functional programming from Lisp.
What you describe in your example is the message approach for OOP. The Erlang processes sending messages are a concept similar to Alan Kay's objects sending messages. By the way, you can retrieve this concept implemtented also in Scratch where parallel running objects send messages between them.
The functional programming is how you code the processes. For instance, variables in Erlang cannot be modified. Once they have been set, you can only read them. You have also a list data structure which works pretty much like Lisp lists and you have fun which are insprired by Lisp's lambda.
The message passing on one side, and the functional on the other side are quite two separate things in Erlang. When coding real life erlang applications, you spend 98% of your time doing functional programming and 2% thinking about messages passing, which is mainly used for scalability and concurrency. To say it another way, when you come to tackly complex programming problem, you will probably use the FP side of Erlang to implement the details of the algo, and use the message passing for scalability, reliability, etc...
What do you think of this:
thing(0) ->
exit(this_is_the_end);
thing(Val) when is_integer(Val) ->
NewVal = receive
{From,F,Arg} -> NV = F(Val,Arg),
From ! {self(), new_value, NV},
NV;
_ -> Val div 2
after 10000
max(Val-1,0)
end,
thing(NewVal).
When you spawn the process, it will live by its own, decreasing its value until it reach the value 0 and send the message {'EXIT',this_is_the_end} to any process linked to it, unless you take care of executing something like:
ThingPid ! {self(),fun(X,_) -> X+1 end,[]}.
% which will increment the counter
or
ThingPid ! {self(),fun(X,X) -> 0; (X,_) -> X end,10}.
% which will do nothing, unless the internal value = 10 and in this case will go directly to 0 and exit
In this case you can see that the "object" lives its own live by itself in parallel with the rest of the application, that it can interact with the outside almost without any code, and that the outside can ask him to do things you didn't know when you wrote and compile the code.
This is a stupid code, but there are some principle that are used to implement application like mnesia transaction, the behaviors... IMHO the concept is really different, but you have to try to think different if you want to use it correctly. I am pretty sure that it is possible to write "OOPlike" code in Erlang, but it will be extremely difficult to avoid concurrency :o), and at the end no advantage. Have a look at OTP principle which gives some tracks about the application architecture in Erlang (supervision trees, pool of "1 single client servers", linked processes, monitored processes, and of course pattern matching single assignment, messages, node clusters ...).

Testing concurrency features

How would you test Ruby code that has some concurrency features? For instance, let's assume I have a synchronization mechanism that is expected to prevent deadlocks. Is there a viable way to test what it really does? Could controlled execution in fibers be the way forward?
I had the exact same problem and have implemented a simple gem for synchronizing subprocesses using breakpoints: http://github.com/remen/fork_break
I've also documented an advanced usage scenario for rails3 at http://www.hairoftheyak.com/testing-concurrency-in-rails/
I needed to make sure a gem (redis-native_hash) I authored could handle concurrent writes to the same Redis hash, detect the race condition, and elegantly recover. I found that to test this I didn't need to use threads at all.
it "should respect changes made since last read from redis" do
concurrent_edit = Redis::NativeHash.find :test => #hash.key
concurrent_edit["foo"] = "race value"
concurrent_edit.save
#hash["yin"] = "yang"
#hash["foo"] = "bad value"
#hash.save
hash = Redis::NativeHash.find :test => #hash.key
hash["foo"].should == "race value"
hash["yin"].should == "yang"
end
In this test case I just instantiated another object which represents the concurrent edit of the Redis hash, had it make a change, then make sure saving the already-existing object pointing to the same hash respected those changes.
Not all problems involving concurrency can be tested without actually USING concurrency, but in this case it was possible. You may want to try looking for something similar to test your concurrency solutions. If its possible its definitely the easier route to go.
It's definitely a difficult problem. I started writing my test using threads, and realized that they way the code I was testing was implemented, I needed the Process IDs (PID) to actually be different. Threads run using the same PID as the process that kicked off the Thread. Lesson learned.
It was at that point I started exploring forks, and came across this Stack Overflow thread, and played with fork_break. Pretty cool, and easy to set up. Though I didn't need the breakpoints for what I was doing, I just wanted processes to run through concurrently, using breakpoints could be very useful in the future. The problem I ran into was that I kept getting an EOFError and I didn't know why. So I started implementing forking myself, instead of going through fork_break, and found out it was that an exception was happening in the code under test. Sad that the stack trace was hidden from me by the EOFError, though I understand that the child process ended abruptly and that's kinda how it goes.
The next problem I came across was with the DatabaseCleaner. No matter which strategy it used (truncation, or transaction), the child process's data was truncated/rolled back when the child process finished, so the data that was inserted by child processes was gone and the parent process couldn't select and verify that it was correct.
After banging my head on that and trying many other unsuccessful things, I came across this post http://makandracards.com/makandra/556-test-concurrent-ruby-code which was almost exactly what I was already doing, with one little addition. Calling "Process.exit!" at the end of the fork. My best guess (based on my fairly limited understanding of forking) is that this causes the process to end abruptly enough that it completely bypasses any type of database cleanup when the child process ends. So my parent process, the actual test, can continue and verify the data it needs to verify. Then during the normal after hooks of the test (in this case cucumber, but could easily be rspec too), the database cleaner kicks in and cleans up data as it normally would for a test.
So, just thought I'd share some of my own lessons learned in this discusson of how to test concurrent features.

Ruby Semaphores?

I'm working on an implementation of the "Fair Barbershop" problem in Ruby. This is for a class assignment, but I'm not looking for any handouts. I've been searching like crazy, but I cannot seem to find a Ruby implementation of Semaphores that mirror those found in C.
I know there is Mutex, and that's great. Single implementation, does exactly what that kind of semaphore should do.
Then there's Condition Variables. I thought that this was going to work out great, but looking at these, they require a Mutex for every wait call, which looks to me like I can't put numerical values to the semaphore (as in, I have seven barbershops, 3 barbers, etc.).
I think I need a Counting Semaphore, but I think it's a little bizarre that Ruby doesn't (from what I can find) contain such a class in its core. Can anyone help point me in the right direction?
If you are using JRuby, you can import semaphores from Java as shown in this article.
require 'java'
java_import 'java.util.concurrent.Semaphore'
SEM = Semaphore.new(limit_of_simultaneous_threads)
SEM.acquire #To decrement the number available
SEM.release #To increment the number available
There's http://sysvipc.rubyforge.org/SysVIPC.html which gives you SysV semaphores. Ruby is perfect for eliminating the API blemishes of SysV semaphores and SysV semaphores are the best around -- they are interprocess semaphores, you can use SEM_UNDO so that even SIGKILLs won't mess up your global state (POSIX interprocess semaphores don't have this), and you with SysV semaphores you can perform atomic operations on several semaphores at once as long as they're in the same semaphore set.
As for inter-thread semaphores, those should be perfectly emulatable with Condition Variables and Mutexes. (See Bernanrdo Martinez's link for how it can be done).
I also found this code:
https://gist.github.com/pettyjamesm/3746457
probably someone might like this other option.
since concurrent-ruby is stable (beyond 1.0) and is being widely used thus the best (and portable across Ruby impls) solution is to use its Concurrent::Semaphore class
Thanks to #x3ro for his link. That pointed me in the right direction. However, with the implementation that Fukumoto gave (at least for rb1.9.2) Thread.critical isn't available. Furthermore, my attempts to replace the Thread.critical calls with Thread.exclusive{} simply resulted in deadlocks. It turns out that there is a proposed Semaphore patch for Ruby (which I've linked below) that has solved the problem by replacing Thread.exclusive{} with a Mutex::synchronize{}, among a few other tweaks. Thanks to #x3ro for pushing me in the right direction.
http://redmine.ruby-lang.org/attachments/1109/final-semaphore.patch
Since the other links here aren't working for me, I decided to quickly hack something together. I have not tested this, so input and corrections are welcome. It's based simply on the idea that a Mutex is a binary Semaphore, thus a Semaphore is a set of Mutexes.
https://gist.github.com/3439373
I think it might be useful to mention the Thread::Queue in this context for others arriving at this question.
The Queue is a thread-safe tool (implemented with some behind-the-scenes synchronization primitives) that can be used like a traditional multi-processing semaphore with just a hint of imagination. And it comes preloaded by default, at least in ruby v3:
#!/usr/bin/ruby
# hold_your_horses.rb
q = Queue.new
wait_thread = Thread.new{
puts "Wait for it ..."
q.pop
puts "... BOOM!"
}
sleep 1
puts "... click, click ..."
q.push nil
wait_thread.join
And can be demonstrated simply enough:
user#host:~/scripts$ ruby hold_your_horses.rb
Wait for it ...
... click, click ...
... BOOM!
The docs for ruby v3.1 say a Queue can be initialized with an enumerable object to set up initial contents but that wasn't available in my v3.0. But if you want a semaphore with, say, 7 permits, it's easy to stuff the box with something like:
q = Queue.new
7.times{ q.push nil }
I used the Queue to implement baton-passing between some worker-threads:
class WaitForBaton
def initialize
#q = Queue.new
end
def pass_baton
#q.push nil
sleep 0.0
end
def wait_for_baton
#q.pop
end
end
So that thread task_master could perform steps one and three with thread little_helper stepping in at the appropriate time to handle step two:
baton = WaitForBaton.new
task_master = Thread.new{
step_one(ARGV[0])
baton.pass_baton
baton.wait_for_baton
step_three(logfile)
}
little_helper = Thread.new{
baton.wait_for_baton
step_two(ARGV[1])
baton.pass_baton
}
task_master.join
little_helper.join
Note that the sleep 0.0 in the .pass_baton method of my WaitForBaton class is necessary to prevent task_master from passing the baton to itself: unless thread scheduling happens to jump away from task_master right after baton.pass_baton, the very next thing that happens is task_master's baton.wait_for_baton - which takes the baton right back again. sleep 0.0 explicitly cedes execution to any other threads that might be waiting to run (and, in this case, blocking on the underlying Queue).
Ceding execution is not the default behavior because this is a somewhat unusual usage of semaphore technology - imagine that task_master could be generating many tasks for little_helpers to do and task_master can efficiently get right back to generating tasks right after passing a task off through a Thread::Queue's .push([object]) method.

Can I be sure that the code I write is always executed in the same thread?

I normally work on single threaded applications and have generally never really bothered with dealing with threads. My understanding of how things work - which certainly, may be wrong - is that as long as we're always dealing with single threaded code (i.e. no forks or anything like that) it will always be executed in the same thread.
Is this assumption correct? I have a fuzzy idea that UI libraries/frameworks may spawn off threads of their own to handle GUI stuff (which accounts for the fact that the Windows task manager tells me that my 'single threaded' application is actually running on 10 threads) but I'm guessing that this shouldn't affect me?
How does this apply to COM? For instance, if I were to create an instance of a COM component in my code; and that COM component writes some information to a thread-based location (using System.Threading.Thread.GetData for instance) will my application be able to get hold of that information?
So in summary:
In single threaded code, can I be sure that whatever I store in a thread-based location can be retrievable from anywhere else in the code?
If that single threaded code were to create an instance of a COM component which stores some information in a thread-based location, can that be similarly retrievable from anywhere else?
UI usually has the opposite constraint (sadly): it's single threaded and everything must happen on that thread.
The easiest way to check if you are always in the same thread (for, say, a function) is to have an integer variable set at -1, and have a check function like (say you are in C#):
void AssertSingleThread()
{
if (m_ThreadId < 0) m_ThreadId = Thread.CurrentThread.ManagedThreadId;
Debug.Assert(m_ThreadId == Thread.CurrentThread.ManagedThreadId);
}
That said:
I don't understand the question #1, really. Why store in a thread-based location if your purpose is to have a global scope ?
About the second question, most COM code runs on a single thread and, most often, on the thread where your UI message processing lives - this is because most COM code is designed to be compatible with VB6 which is single-thread.
The reason your program has about 10 threads is because both Windows (if you use some of its features like completion ports, or some kind of timers) and the CLR (for example for the GC or, again, some types of timers) may create threads in your process space (technically any program with enough priviledges, can too).
Think about having the model of having a single dataStore class running in your mainThread that all threads can read and write their instance variables to. This will avoid a lot of problems that might arise accessing threads all over the shop.
Simple idea, until you reach the fun part of threading. Concurrency and synchronization; simply, if you have two threads that want to read and write to the same variable inside dataStore at the same time, you have a problem.
Java handles this by allowing you to declare a variable or method synchronized, allowing only one thread access at a time.
I believe some .NET objects have Lock and Synchronized methods defined on them, but I know no more than this.

Resources