Is the Hunspell spelling library thread-safe?
The answer is NO,
A simple multi-threaded test application revealed that Hunspell uses per-instance resources for the spelling process, so only one thread can use it at any time - use locks/work queue/or instanciate a per-thread Hunspell instance.
Related
I want to implement (in C++) a feature, using MPI, in an existing (non-MPI) application. I am thinking of using mpich-3.4.1 for this.
I am planning to create a .so file for that feature, which the original application can link to. I initially thought to have a function in the .so file that starts with an MPI_Init() and ends with MPI_Finalize() and, in between, calls all required MPI apis to do the parallel job. As part of the MPI job, the new feature makes the current application an MPI server by calling APIs like 'MPI_Open_port' and 'MPI_Comm_accept'. Other worker processes (possibly running on different machines) connect to this server, send/receive messages, and complete a heavy computation in parallel. The application then resumes its other non-mpi work.
It seems to me that Singleton MPI_INIT mechanism will be useful for this. I found the following page on Singleton Init:
https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node254.htm
This page says, "A high-quality implementation will allow any process (including those not started with a ``parallel application'' mechanism) to become an MPI process by calling MPI_INIT. Such a process can then connect to other MPI processes...".
However, the comments in mpich-3.4.1/src/mpi/init/init.c says, "The MPI standard does not say what a program can do before an 'MPI_INIT' or after an 'MPI_FINALIZE'. In the MPICH implementation, you should do as little as possible. In particular, avoid anything that changes the external state of the program, such as opening files, reading standard input or writing to standard output."
Based on the above comments, it seems we should not have MPI_Init(NULL, NULL) and MPI_Finalize() as part of any implementation in a library. In that case, I am thinking to have the init and finalize APIs in the original application's main function, and have rest of the API calls made from the .so file. My original application is a working large software, and may not need to execute my mpi feature at all, in some situations.
My questions are:
(1) Does it make sense to have MPI_Init(NULL, NULL) and MPI_Finalize() called in the main function of this application, and rest of the MPI functionalities in a .so file?
(2) Once MPI_Init(NULL, NULL) is called in the main, would it interfere with the normal execution of the software in any way? Would there be any performance impact on the existing application?
(3) Is there an MPI implementation that handles this better?
(4) Is MPI a good approach to handle this requirement, or other mechanisms like ZeroMQ better? In the comments made by Wesley Bland in the following link, he says that "MPI may not be right for you if you're looking for a client/server model. Yes, it's possible, but it's not really optimized for that use case and you might have better luck using a different communication mechanism". Is that true in 2022?
client relationship within MPI server
The Windows Antimalware scan Interface (AMSI) contains abstractions which can be used to call the currently active virus scanner in Windows:
https://learn.microsoft.com/en-us/windows/desktop/amsi/antimalware-scan-interface-functions
There are 2 methods related to initialization:
AmsiInitialize
AmsiUninitialize
AmsiInitialize returns "A handle of type HAMSICONTEXT that must be passed to all subsequent calls to the AMSI API.".
After initialization is complete, I can use AmsiScanBuffer to scan a buffer for malware.
My question:
Can I use the same context concurrently from many threads in my application, or do I need to create one per thread from which I'm going to call the methods?
Reading the documentation, for AsmiUnitialize, it tells me that When the app is finished with the AMSI API it must call AmsiUninitialize.. This tells me that the context can be used for many calls, but it doesn't tell me anything about thread safety or concurrency.
Generally, API calls that are not specifically marked as thread-safe are not (this is usually true for any library). The easiest solution is to open an AMSI handle per thread.
(P.S. This only works with Windows Defender so far as I 've tested).
As we know, in Windows NT kernel, there are three ways to post a work item to execute in a system thread environment at PASSIVE_LEVEL.
i.e. ExQueueWorkItem, FltQueueGenericWorkItem, and FltQueueDeferredIoWorkItem.
However, I just wonder their differences and their respective application scenarios.
Any explanations?
From the documentation of each API:
ExQueueWorkItem : Can be used by drivers where there isn't any framework apis provided for such work. The documentation suggests to use IoQueueWorkItem instead.
FltQueueGenericWorkItem: For minifilter drivers, shall use this to do any non-IO related work. Like some periodic cleanup etc.
FltQueueDeferredIoWorkItem: For minifilter drivers for work related to IO operation. I.e. if you are filtering an IO, you can defer some work related to that IO using this function.
As I understand, Ruby 1.9 uses OS threads but only one thread will still actually be running concurrently (though one thread may be doing blocking IO while another thread is doing processing). The threading examples I've seen just use Thread.new to launch a new thread. Coming from a Java background, I typically use thread pools as to not launch to many new threads since they are "heavyweight."
Is there a thread pool construct built into ruby? I didn't see one in the default language libraries. Or are there is a standard gem that is typically used? Since OS level threading is a newer feature of ruby, I don't know how mature the libraries are for it.
You are correct in that the default C Ruby interpreter only executes one thread at a time (other C based dynamic languages such as Python have similar restrictions). Because of this restriction, threading is not really that common in Ruby and as a result there is no default threadpool library. If there are tasks to be done in parallel, people typically uses processes since processes can scale over multiple servers.
If you do need to use threads, I would recommend you use https://github.com/meh/ruby-threadpool on the JRuby platform, which is a Ruby interpreter running on the JVM. That should be right up your alley, and because it is running on the virtual machine it will have true threading.
The accepted answer is correct, But, there are many tasks in which threads are fine. after all there are some reasons why it is there. even though it can only run a thread at a time. it is still can be considered parallel in many real life situations.
for example when we have 100 long running process in which each takes approximate 10 minutes to complete. by using threads in ruby, even with all those restrictions, if we define a threadpool of 10 tasks at time, it will run much faster than 100*10 minutes when running without threads. examples include, live capturing of file changes, sending large number of web requests (such as status check)
You can understand how pooling works by reading https://blog.codeship.com/understanding-fundamental-ruby-abstraction-concurrency/ . in production code use https://github.com/meh/ruby-thread#pool
I am using Amazon S3 service to upload different directories (and the files inside) to different buckets (directory -> bucket).
I am coding in Ruby, and I am using the lib http://amazon.rubyforge.org.
The files are small (about 20KB).
I'd like to upload the directories in parallel (using many threads) but I have to use synchronize around the S3Object.store
#mutex.synchronize do
S3Object.store(s3_obj_name, open(image_name), bucket_name)
end
If I don't use synchronize I obtain Net::HTTPBadResponse exception !
So, with synchronize, I lost the advantages of using parallel programming.
Do you have some ideas about how to succeed in the parallel uploading ?
Thank you,
Alessandro DS
It appears that the ruby s3 library you're using isn't thread safe: http://rubyforge.org/tracker/index.php?func=detail&aid=8162&group_id=2409&atid=9357
So your options include:
Write a patch for that library to make it thread safe (I'm not a ruby guy, not sure how difficult that would be to do)
Find another S3 ruby library that is thread safe (I googled it and didn't see anything obvious)
Write a short ruby script that does a single S3Object.store call, and exec that from your parent ruby script. Then each store() call will be in a separate process and the thread safety issue won't bite you
Those options assume you want to stick with ruby. Hope that helps.