So right now I have a single thread to handle all the requests for the database. Let's say I have 400 requests per second for logins / logouts / other stuff, and 400 requests per second which are only related to items (move them, update them, remove them, etc).
Obviously, the problem is that If I want to load an item from the database, but the database is currently processing a login request, then there's gonna be a delay. And I want it to be instant, that's why I wanted to create another thread, exclusive to process item requests, and the other thread to process logins/logouts, etc.
Microsoft says this:
1: Have multiple statement handles on a single connection handle, with a single thread for each statement handle.
2: Have multiple connection handles, with a single statement handle and single thread for each connection handle.
What are exactly the differences on both approaches? I obviously need to fetch data and insert/update in both threads at the same time.
Will this 2 threads vs 1 approach speed up things?
Both threads will work exclusive in different SQL tables (ie the thread for the items will only use ITEMS_TABLE, it will never use the LOGIN_TABLE and vice-versa)
Currently I'm using the following functions (C++):
SQLSetEnvAttr with SQL_OV_ODBC3
SQLConnect
SQLAllocHandle
SQLBindParameter
SQLExecDirect
Answering your questions:
Q1: What are exactly the differences on both approaches?
Answer:
The first approach shares the same connection handle across multiple threads. So basically you first connect, then start your threads and each thread will create its own statement handle.
The second approach uses different connection handles for different threads. This means you create your threads and each thread starts its own connection and create its own statement handle(s)
I'd avoid the first approach (sharing the connection handle between multiple threads) because it has several restrictions. For example, suppose one of your threads want to switch AUTO-COMMIT ON or OFF. Since AUTOCOMMIT is a connection attribute (and all threads share the same connection handle) a change of this setting will affect ALL other threads.
Q2: Will this 2 threads vs 1 approach speed up things?
Answer:
don't think you will notice any difference.
In both cases sharing the same Environment handle between multiple threads should be ok.
Related
My spring boot application is going to listen to 1 million records an hour from a kafka broker. The entire processing logic for each message takes 1-1.5 seconds including a database insert. Broker has 64 partitions, which is also the concurrency of my #KafkaListener.
My current code is only able to process 90 records in a minute in a lower environment where I am listening to around 50k records an hour. Below is the code and all other config parameters like max.poll.records etc are default values:
#KafkaListener(id="xyz-listener", concurrency="64", topics="my-topic")
public void listener(String record) {
// processing logic
}
I do get "it is likely that the consumer was kicked out of the group" 7-8 times an hour. I think both of these issues can be solved through isolating listener method and multithreading processing of each message but I am not sure how to do that.
There are a few points to consider here. First, 64 consumers seems a bit too much for a single application to handle consistently.
Considering each poll by default fetches 500 records per consumer at a time, your app might be getting overloaded and causing the consumers to get kicked out of the group if a single batch takes more than the 5 minutes default for max.poll.timeout.ms to be processed.
So first, I'd consider scaling the application horizontally so that each application handles a smaller amount of partitions / threads.
A second way to increase throughput would be using a batch listener, and handling processing and DB insertions in batches as you can see in this answer.
Using both, you should be processing a sensible amount of work in parallel per app, and should be able to achieve your desired throughput.
Of course, you should load test each approach with different figures to have proper metrics.
EDIT: Addressing your comment, if you want to achieve this throughput I wouldn't give up on batch processing just yet. If you do the DB operations row by row you'll need a lot more resources for the same performance.
If your rule engine doesn't do any I/O you can iterate each record from the batch through it without losing performance.
About data consistency, you can try some strategies. For example, you can have a lock to ensure that even through a rebalance only one instance will process a given batch of records at a given time - or perhaps there's a more idiomatic way of handling that in Kafka using the rebalance hooks.
With that in place, you can batch load all the information you need to filter out duplicated / outdated records when you receive the records, iterate each record through the rule engine in memory, and then batch persist all results, to then release the lock.
Of course, it's hard to come up with an ideal strategy without knowing more details about the process. The point is by doing that you should be able to handle around 10x more records within each instance, so I'd definitely give it a shot.
I have a function that loops through a list of items by sending them to a server and grabbing the response. The problem I'm having is the loop is going faster than the server can handle. I need to figure out a way to slow the loop down without freezing the application. Is there a way to delay the loop from moving to the next item for a brief moment? In other languages, I'd use something like sleep(interval).
Don't slow the process down. Add the network calls to an operation queue with a limited number of concurrent operations. You may need to rewrite your network code as an NSOperation subclass but that's fairly straightforward. You can see some examples in this tutorial.
There is a built-in limit to the number of simultaneous network connections that can be made anyway, but it sounds like your server's limit is lower than that, or that you're saturating the network connections and your later calls are timing out before they've been able to start.
Instead of a sleep interval it sounds like you want a completion block that calls the same code again until the list is empty. So once it finishes the request, it goes onto the next one.
Also I don't think you should be trying to sleep since it will hold the main thread which results in a poor user experience.
We have several nightly jobs running inside an Oracle 11g R2 instance, not all of these jobs are under our control. Some of them are external data loads run by third parties. The jobs are implemented as PL/SQL packages and run using DBMS_SCHEDULER facilities.
Some of these jobs operate on the same set of data, a table with user entries, e. g. updating personal data, removing retired users, adding newly joined users. Since the jobs mostly use bulk statements to run the updates, we have run into blocking locks several times now, having to kill single jobs to allow others to run through.
What are good ways to prevent jobs from colliding with each other?
I am thinking about things like:
setting up a meta-job which knows about dependencies and coordinates the dependent jobs
creating schedules which keep conflicting jobs as separate as possible
coordinating jobs with the third parties to prevent conflicts between "external" and "internal" jobs
not using bulk statements (updating everything at once with a single MERGE or UPDATE) but instead update one by one committing the intermediate results
Especially the last option seems a plausible approach to me in order to reduce the probability of blocking locks. But I know that performance suffers a lot when I switch our jobs from bulk updates to looping over cursors.
This may be a good use of the DBMS_LOCK package. DBMS_LOCK allows you access to the same enqueue/locking model that Oracle uses internally.
You can establish an enqueue, and then multiple processes may take that enqueue in various lock modes. Locks will show up like any other enqueue, with type 'UL' (for user lock).
For example, suppose you have three processes that can all run concurrently, but then you have a process that needs to wait for all three of those processes to run, and needs to run by itself, and then it's followed by two more processes that can run concurrently once that process completes.
You could have the first three processes take the UL enqueue in 'S' (shared) mode, and they will all be able to run concurrently. Then run the process that needs to run by itself, but at the beginning of the code, have it take the UL enqueue in 'X' (exclusive) mode. That process will wait for the three processes holding enqueue in shared mode to complete. Now, you can also run the last two processes, again, with shared mode. They will queue behind the process that's requesting exclusive mode locks, and everything runs in the order you want.
That's a simple example. With more than one UL type lock, and multiple modes that locks can be held in, your processes and locking strategy may be arbitrarily complex.
Hope that helps.
It is very hard to give any advice without knowing all the details.
Simplest thing would be to schedule jobs not to overlap (if process permits).
If you cannot do that, then probably there is no easy solution, especially if there are jobs you cannot modify.
Smaller transactions make it less likely to collide, however Murphy might/will hit you anyway. I would start the jobs in the 'right' order...
I've been messing around with Ruby and threading a little bit today. I have a list of proxies that I want to check. Assuming a timeout of 10 seconds going through a very large list of proxies will take many hours if I write something that goes like:
proxies.each do |proxy|
check_proxy(proxy)
end
My first problem with trying to figure out threads is how to START multiple at the same exact time. I found a neat little snippet of code online:
for page in pages
threads << Thread.new(page) { |myPage|
puts "Fetching: #{myPage}\n"
doc = Hpricot(open(myPage.to_s)).to_s
puts "Got #{myPage}: #{doc.size}"
}
end
Seems to work nicely as far as starting them all at the same time. So now I can... start checking all 7 thousand records at the same time?
How do I go to a file, take out a line for each thread, run a batch of like 20 and repeat the process?
Can I run a while loop that in turn starts 20 threads at the same (which remove lines from a file) and keeps going until the file is blank?
I'm a little weak on the logic of what I'm supposed to do.
Thanks guys!
PS.
Another thought: Will there be file access issues if 20 workers are constantly messing with it randomly? What would be a good way around that if this is so?
The keyword you are after is threadpool. You can either try to find one for Ruby (I am sure there's couple at least on Github), or roll your own.
Here's a simple implementation here on SO.
Re: the file access, IMO you shouldn't let workers alter the file directly, but do it in your main thread. You don't want to allow simultaneous edits there.
Try to use gem DelayJob:
https://github.com/tobi/delayed_job
You don't need to generate that many Threads in order to do this work. In fact generating a lot of Threads can decrease the overall performance of your application. If you handle checking each proxy asynchronously, without blocking, you can get by with far fewer threads.
You'd create a file manager thread to process the file. Each line gets added as a request to an array(request queue). On the other end of the request queue you can use eventmachine to send the requests without blocking. eventmachine would also be used to receive the responses and handle the timeout. The response can then be placed on another array(response queue) which your file manager thread polls. The file manager thread pulls the responses from the response queue and resolves if the proxy exists or not.
This gets you down to just creating two threads. One issue that you will have is limiting the number of requests that have been sent since this model will be able to send out all of the requests in less than a second and flood the nearest router. In my experience you should be able to have around 500 outstanding requests at any one time.
There is more than one way to solve this problem asynchronously but hopefully the above is enough to help get you started with non-blocking I/O.
This is kind of a 2 part question
1) Is there a max number of HttpWebRequests that can be run at the same time in WP7?
I'm going to create a ScheduledTaskAgent to run a PeriodicTask. There will be 2 different REST service calls the first one will get a list of IDs for records that need to be downloaded, the second service will be used to download those records one at a time. I don't know how many records there will be my guestimage would be +-50.
2.) Would making all the individual record requests at once be a bad idea? (assuming that its possible) or should I wait for a request to finish before starting another?
Having just spent a week and a half working at getting a BackgroundAgent to stay within it's memory limits, I would suggest doing them one at a time.
You lose about half your memory to system libraries and the like, your first web request will take another nearly 20%, but it seems to reuse that memory on subsequent requests.
If you need to store the results into a local database, it is going to take a good chunk more. I have found a CompiledQuery uses less memory, which means holding a single instance of your context.
Between each call I would suggest doing a GC.Collect(), I even add a short Thread.Sleep() just to be sure the process has some time to tidying things up.
Another thing I do is track how much memory I am using and attempt to exit gracefully when I get to around 97 or 98%.
You can not use the debugger to test memory limits as the debug memory is much higher and the limits are not enforced. However, for comparative testing between versions of your code, the debugger does produce very similar result on subsequent runs over the same code.
You can track your memory usage with Microsoft.Phone.Info.DeviceStatus.ApplicationCurrentMemoryUsage and Microsoft.Phone.Info.DeviceStatus.ApplicationMemoryUsageLimit
I write a status log into IsolatedStorage so I can see the result of runs on the phone and use ScheduledActionService.LaunchForTest() to kick the off. I then use ShellToast notifications to let me know when the task runs and also when it completes, that way I can launch my app to read the status log without interrupting it.
Tyler,
My 2 cents here.
I don't believe there is any restriction on how mant HTTPWebequests you can spin up. These however have to be async, off course, and may be served from the browser stack. Most modern browsers including IE9 handle over 5 concurrently to the same domain; but you are not guaranteed a request handle immediately. However, it should not matter if you are willing to wait on a separate thread, dump your content on to the request pipe & wait for response on yet another thread. This post (here) has a nice walkthrough of why we need to do this.
Nothing wrong with this approach either, IMO. You're just going to have to wait until all the requests have their respective pipelines & then wait for the responses.
Thanks!
1) Your memory limit in a PeriodicTask or ResourceIntensiveTask is 5 MB. So you definitely should control your requests really careful. I dont think there is a limit in the code.
2)You have only 5 MB. So when you start all your requests at the same time it will terminate immediately.
3) I think you should better use a ResourceIntensiveTask because a PeriodicTask should only run 15 seconds.
Good guide for Multitasking features in Mango: http://blogs.infosupport.com/blogs/alexb/archive/2011/05/26/multi-tasking-in-windows-phone-7-1.aspx
I seem to remember (but can't find the reference right now) that the maximum number of requests that the OS can make at once is 7. You should avoid making this many at once though as it will stop other/system apps from being able to make requests.