Resque vs Sidekiq? [closed] - ruby

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am currently using Resque for my background process but recently I heard a lot of huff-buff about sidekiq. Could anybody compare/differentiate?
In particular I would like to know is there a way to monitor programmatically whether a job is completed in sidekiq

Resque:
Pros:
does not require thread safety (works with pretty much any gem out there);
has no interpreter preference (you can use any ruby);
Resque currently supports MRI 2.3.0 or later
loads of plugins.
Cons
runs a process per worker (uses more memory);
does not retry jobs (out of the box, anyway).
Sidekiq:
Pros
runs thread per worker (uses much less memory);
less forking (works faster);
more options out of the box.
Cons
[huge] requires thread-safety of your code and all dependencies. If you run thread-unsafe code with threads, you're asking for trouble;
works on some rubies better than others (jruby is recommended, efficiency on MRI is decreased due to GVL (global VM lock)).

From the question:
In particular I would like to know is there a way to monitor
programmatically whether a job is completed in sidekiq
Here's a solution for that:
Sidekiq::Status gem
Batch API (Sidekiq Pro) - usage

Related

Insist on using Kafka brokers on Windows [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I know Java services love Linux/Unix hosts much more.
However, there are indeed some scenarios where it's not always feasible to let customer install a Linux cluster in their environment just to deploy Kafka, i.e. Windows 10 / Windows Server may be their only acceptable choice.
To describle our application briefly: not a service running constantly, we just want to introduce Kafka as a reliable communication broker to exchange data among quite a few different distributed processes (on different machines in the network, probably including some machines on the cloud) when a certain operation starts and runs for a variable duration, say, from 1 hour up to 48 hours. Each run will create many temporary topics.
In such cases, is Kafka on windows a production option?
BTW, I encountered quite a few known issues for Kafka on windows, e.g. this one. For this specific issue, We simply assume there will someone in the customer company, or some scheduled script will be available and respsonbile for cleaning up the out-dated topics from the logs, say, topics from one month ago.
Is there any other unsolvable road blockers to use Kafka on Windows?
Any thoughts or comments are appreciated.
Is it an option? Yes. Is it a sensible option? … perhaps not.
As you've identified, there are several known issues with running Kafka on Windows. There are workarounds etc etc, but do you really want to be dealing with those in Production? It's one thing to run a hack to get your sandbox to work, but if you've got production workloads, quite another.
Here is one option if you really want to run Kafka on Windows - do so using WSL2.

Tools for scheduler debugging in Linux [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed last year.
Improve this question
I have an embedded linux system containing two threads that must run in real time (or soft real time). When using SCHED_OTHER, I noted a lot of jitter but the two threads always executed within their allocated time.
I have applied the RT patch with PREEMPT_RT enabled, and running those two threads with SCHED_FIFO (with a high thread priority of ~80) leads to a lot less jitter, it's overall a lot better, except once and a while both threads miss their deadline (instead of executing every 10 ms or so, they may not get schedule for almost a second!).
I wanted to ask which tool is best when debugging linux scheduling (under RT) on an embedded Linux OS. ftrace came to mind, but I don't know if it is the best and/or only tool. My goal is to find out why the two threads don't get scheduled for an extensive amount of time once in a while.
UPDATE: I've been running ftrace today with wakeup_rt. wakeup_rt as a tracer didn't get the job done: the max latency it recorded was 5ms when my thread can run up to 1000ms late. Maybe it is not a scheduler issue? What other tracer in ftrace would you recommend please?
Try using rt-app which is used by scheduler developers.
You might also want to use SCHED_DEADLINE instead of SCHED_FIFO to reduce your jitter.

How to build a powerful crawler like google's? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I want to build a crawler which can update hundreds of thousands of links in several minutes.
Is there any mature ways to do the scheduling?
Is distributed system needed?
What is the greatest barrier that limits the performance?
Thx.
For Python you could go with Frontera by Scrapinghub
https://github.com/scrapinghub/frontera
https://github.com/scrapinghub/frontera/blob/distributed/docs/source/topics/distributed-architecture.rst
They're the same guys that make Scrapy.
There's also Apache Nutch which is a much older project.
http://nutch.apache.org/
You would need a distributed crawler but don't reinvent the wheel, use Apache Nutch. it was built exactly for that purpose, is mature and stable and used by a wide community to deal with large scale crawls.
The amount of processing and memory required would need distributed processing unless you are willing to compromise speed. Remember you'd be dealing with billions of links and terabytes of text and images

What are my options to deploy different ruby versions to a server? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
The Linux server I'm deploying a web application to has a rather outdated version of Ruby (1.8.7) in their repositories, and it doesn't look like that's going to change any time soon.
What are my options in terms of using other ruby versions than the distro sanctioned package in a production environment?
If I was to use something like rvm, how would that affect my deployment process, server management, and stability?
rvm or rbenv are your best bets for managing multiple ruby versions.
As long as you setup RVM/rbenv for the user you're going to be deploying to, this will work fine. In fact, I've done this myself on AWS with Capistrano.

How do I improve app performance on AppHarbor? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have an application hosted on AppHarbor that is visited a few times a day with a light load.
The performance experience is a SLOW first page load as the dormant worker process wakes up. Subsequent page loads are fast.
I want to reduce the amount of time for the initial load. Will buying additional instances solve this issue, or should I look towards a dedicated host
You app pool will be spun down after 20 minutes of inactivity, this is standard IIS behaviour.
To avoid this, you can upgrade from Canoe to either the Catamaran or Yacht plans. Web apps on those plan don't idle (adding a custom hostname, SSL or running more than one dyno on the Canoe plan will still give you an idling app).
You can also circumvent the idling by using services like Pingdom and StillAlive to generate requests for your site. But upgrading from Canoe is fairer to AppHarbor.
The way I do it, is have something like this running locally: https://github.com/haf/Requester
It just queries the web app every nth second and keeps it in memory. It's a hack, but it works and the problems go away when the app becomes more popular. ^^

Resources