Heroku web workers timeout caused by worker - ruby

I have a problem with a process in an app of mine hosted on Heroku (2xweb + 1x worder dynos). The process ends with quite a heavy email being created and sent via SendGrid. This took some time, causing web worker timeouts and bad usability, so it was refactored into a worker, which I thought would solve the problem, but I'm getting situations like this:
Apr 10 17:12:48 wc heroku/web.2: Processing by DealsController#show as */*
[request is processed]
Apr 10 17:12:50 wc app/worker.1: [worker sending emails]
[a lot of lines with debug data cut]
Apr 10 17:12:53 wc heroku/worker.1: source=worker.1 dyno=heroku.16079351.858a3455-0b9f-4f75-9052-b419d4653703 sample#load_avg_1m=0.29 sample#load_avg_5m=0.07 sample#load_avg_15m=0.02
Apr 10 17:12:53 wc heroku/worker.1: source=worker.1 dyno=heroku.16079351.858a3455-0b9f-4f75-9052-b419d4653703 sample#memory_total=240.45MB sample#memory_rss=240.34MB sample#memory_cache=0.11MB sample#memory_swap=0.00MB sample#memory_pgpgin=85990pages sample#memory_pgpgout=24436pages
Apr 10 17:12:53 wc heroku/web.2: source=web.2 dyno=heroku.16079351.879182ef-13f0-4908-bf35-c487ccab6153 sample#load_avg_1m=0.00 sample#load_avg_5m=0.01 sample#load_avg_15m=0.01
Apr 10 17:12:54 wc heroku/web.2: source=web.2 dyno=heroku.16079351.879182ef-13f0-4908-bf35-c487ccab6153 sample#memory_total=844.16MB sample#memory_rss=511.82MB sample#memory_cache=0.00MB sample#memory_swap=332.34MB sample#memory_pgpgin=223581pages sample#memory_pgpgout=92554pages
Apr 10 17:12:54 wc heroku/web.2: Process running mem=844M(164.9%)
Apr 10 17:12:54 wc heroku/web.2: Error R14 (Memory quota exceeded)
Apr 10 17:12:54 wc app/web.2: ** [NewRelic][04/10/14 15:12:54 +0000 879182ef-13f0-4908-bf35-c487ccab6153 (468)] INFO : Starting Agent shutdown
Apr 10 17:12:55 wc app/worker.1: ** [Bugsnag] Bugsnag exception handler 1.6.2 ready, api_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[8 new relic lines cut]
Apr 10 17:12:56 wc app/heroku-postgres: source=HEROKU_POSTGRESQL_WHITE sample#current_transaction=144542 sample#db_size=189720760bytes sample#tables=92 sample#active-connections=12 sample#waiting-connections=0 sample#index-cache-hit-rate=0.99981 sample#table-cache-hit-rate=0.99868 sample#load-avg-1m=0.36 sample#load-avg-5m=0.3 sample#load-avg-15m=0.285 sample#read-iops=38.367 sample#write-iops=13.221 sample#memory-total=7629464kB sample#memory-free=187884kB sample#memory-cached=6599816kB sample#memory-postgres=689216kB
Apr 10 17:12:56 wc app/worker.1: ** [NewRelic][04/10/14 15:12:55 +0000 858a3455-0b9f-4f75-9052-b419d4653703 (99)] INFO : Installing DelayedJob instrumentation hooks
[11 new relic lines cut]
Apr 10 17:12:59 wc app/worker.1: Delayed::Backend::ActiveRecord::Job Load (1.6ms) UPDATE "delayed_jobs" SET locked_at = '2014-04-10 15:12:59.482734', locked_by = 'host:858a3455-0b9f-4f75-9052-b419d4653703 pid:2' WHERE id IN (SELECT id FROM "delayed_jobs" WHERE ((run_at <= '2014-04-10 15:12:59.481969' AND (locked_at IS NULL OR locked_at < '2014-04-10 11:12:59.482002') OR locked_by = 'host:858a3455-0b9f-4f75-9052-b419d4653703 pid:2') AND failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
Apr 10 17:12:59 wc app/worker.1: ** [NewRelic][04/10/14 15:12:59 +0000 858a3455-0b9f-4f75-9052-b419d4653703 (99)] INFO : Starting Agent shutdown
Apr 10 17:13:09 wc app/worker.1: Delayed::Backend::ActiveRecord::Job Load (1.6ms) UPDATE "delayed_jobs" SET locked_at = '2014-04-10 15:13:09.486147', locked_by = 'host:858a3455-0b9f-4f75-9052-b419d4653703 pid:2' WHERE id IN (SELECT id FROM "delayed_jobs" WHERE ((run_at <= '2014-04-10 15:13:09.485453' AND (locked_at IS NULL OR locked_at < '2014-04-10 11:13:09.485486') OR locked_by = 'host:858a3455-0b9f-4f75-9052-b419d4653703 pid:2') AND failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
Apr 10 17:13:13 wc heroku/web.2: source=web.2 dyno=heroku.16079351.879182ef-13f0-4908-bf35-c487ccab6153 sample#load_avg_1m=0.00 sample#load_avg_5m=0.01 sample#load_avg_15m=0.01
Apr 10 17:13:14 wc heroku/web.2: source=web.2 dyno=heroku.16079351.879182ef-13f0-4908-bf35-c487ccab6153 sample#memory_total=463.75MB sample#memory_rss=153.29MB sample#memory_cache=0.00MB sample#memory_swap=310.46MB sample#memory_pgpgin=245188pages sample#memory_pgpgout=205946pages
Apr 10 17:13:14 wc app/web.2: Started GET "/user" for [IP.IP.IP.IP] at 2014-04-10 15:13:13 +0000
Apr 10 17:13:14 wc heroku/worker.1: source=worker.1 dyno=heroku.16079351.858a3455-0b9f-4f75-9052-b419d4653703 sample#load_avg_1m=0.21 sample#load_avg_5m=0.06 sample#load_avg_15m=0.02
Apr 10 17:13:14 wc heroku/worker.1: source=worker.1 dyno=heroku.16079351.858a3455-0b9f-4f75-9052-b419d4653703 sample#memory_total=157.25MB sample#memory_rss=157.14MB sample#memory_cache=0.11MB sample#memory_swap=0.00MB sample#memory_pgpgin=103179pages sample#memory_pgpgout=62923pages
Apr 10 17:13:16 wc app/web.2: Started GET "/user" for [IP.IP.IP.IP] at 2014-04-10 15:13:16 +0000
Apr 10 17:13:17 wc heroku/router: at=error code=H12 desc="Request timeout" method=GET path=/pages/planning host=www.cool-app.com request_id=c62a7ee5-11d8-4286-846a-a55861cc6a0e fwd="[IP.IP.IP.IP]" dyno=web.2 connect=2ms service=30000ms status=503 bytes=0
Apr 10 17:13:19 wc app/web.2: E, [2014-04-10T15:13:18.948990 #2] ERROR -- : worker=1 PID:12 timeout (31s > 30s), killing
Apr 10 17:13:19 wc app/worker.1: Delayed::Backend::ActiveRecord::Job Load (1.5ms) UPDATE "delayed_jobs" SET locked_at = '2014-04-10 15:13:19.489494', locked_by = 'host:858a3455-0b9f-4f75-9052-b419d4653703 pid:2' WHERE id IN (SELECT id FROM "delayed_jobs" WHERE ((run_at <= '2014-04-10 15:13:19.488845' AND (locked_at IS NULL OR locked_at < '2014-04-10 11:13:19.488874') OR locked_by = 'host:858a3455-0b9f-4f75-9052-b419d4653703 pid:2') AND failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
Apr 10 17:13:20 wc app/web.2: E, [2014-04-10T15:13:19.689336 #2] ERROR -- : reaped #<Process::Status: pid 12 SIGKILL (signal 9)> worker=1
Apr 10 17:13:21 wc app/web.2: Disconnected from ActiveRecord
Apr 10 17:13:27 wc app/web.2: Processing by UsersController#show as HTML
[web worker 2 works properly from now on]
The web.2 worker still crashes resulting in a ~20 second hang for the user and and "Application Error" screen being shown. What's strange is that this happens on different pages and seems to be linked to the worker in the background.
The line that especially confuses me (and the one probably symptomatic of the crash) is:
Apr 10 17:13:19 wc app/web.2: E, [2014-04-10T15:13:18.948990 #2] ERROR -- : worker=1 PID:12 timeout (31s > 30s), killing
What does it mean? It seems to me that the web.2 dyno is being killed because the worker=1 had a timout, which seems a bit crazy.
The configuration for the dynos is:
Dynos:
1x - 2 - web bundle exec unicorn -p $PORT -c ./config/unicorn.rb
1x - 1 - worker bundle exec rake jobs:work
Any ideas?

Solution: switch from delayed_job to sidekiq
To whom it may concern - despite several hours of debugging, I was never really able to get to the bottom of this. Decided to try switching this one job over to sidekiq. This solved the issue. Probably just a bug with the particular delayed_job -> heroku interaction. Maybe this would be working today...

Related

Gearman worker in shell hangs as a zombie

I have a Gearman worker in a shell script started with perp in the following way:
runuid -s gds \
/usr/bin/gearman -h 127.0.0.1 -t 1000 -w -f gds-rel \
-- xargs /home/gds/gds-rel-worker.sh < /dev/null 2>/dev/null
The worker only does some input validation and calls another shell script run.sh that invokes bash, curl, Terragrunt, Terraform, Ansible and gcloud to provision and update resources in GCP like this:
./run.sh --release 1.2.3 2>&1 >> /var/log/gds-release
The script is intended to run unattended. The problem I have is that after the job finishes successfully (that's both shell scripts run.sh and gds-rel-worker.sh) the Gearman job remains executing, because the child process becomes zombie (see last line below).
root 144748 1 0 Apr29 ? 00:00:00 perpboot -d /etc/perp
root 144749 144748 0 Apr29 ? 00:00:00 \_ tinylog -k 8 -s 100000 -t -z /var/log/perp/perpd-root
root 144750 144748 0 Apr29 ? 00:00:00 \_ perpd /etc/perp
root 2492482 144750 0 May14 ? 00:00:00 \_ tinylog (gearmand) -k 10 -s 100000000 -t -z /var/log/perp/gearmand
gearmand 2492483 144750 0 May14 ? 00:00:08 \_ /usr/sbin/gearmand -L 127.0.0.1 -p 4730 --verbose INFO --log-file stderr --keepalive --keepalive-idle 120 --keepalive-interval 120 --keepalive-count 3 --round-robin --threads 36 --worker-wakeup 3 --job-retries 1
root 2531800 144750 0 May14 ? 00:00:00 \_ tinylog (gds-rel-worker) -k 10 -s 100000000 -t -z /var/log/perp/gds-rel-worker
gds 2531801 144750 0 May14 ? 00:00:00 \_ /usr/bin/gearman -h 127.0.0.1 -t 1000 -w -f gds-rel -- xargs /home/gds/gds-rel-worker.sh
gds 2531880 2531801 0 May14 ? 00:00:00 \_ [xargs] <defunct>
So far I have traced the problem to run.sh, because if I replace its call with something simpler (e.g. echo "Hello"; sleep 5) the worker does not hang. Unfortunately, I have no clue what is causing the problem. The script run.sh is rather long and complex, but has been working without a problem so far. Tracing the worker process I see this:
getpid() = 2531801
write(2, "gearman: ", 9) = 9
write(2, "gearman_worker_work", 19) = 19
write(2, " : ", 3) = 3
write(2, "gearman_wait(GEARMAN_TIMEOUT) ti"..., 151) = 151
write(2, "\n", 1) = 1
sendto(5, "\0REQ\0\0\0'\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0) = 12
recvfrom(5, "\0RES\0\0\0\n\0\0\0\0", 8192, MSG_NOSIGNAL, NULL, NULL) = 12
sendto(5, "\0REQ\0\0\0\4\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0) = 12
poll([{fd=5, events=POLLIN}, {fd=3, events=POLLIN}], 2, 1000) = 1 ([{fd=5, revents=POLLIN}])
sendto(5, "\0REQ\0\0\0'\0\0\0\0", 12, MSG_NOSIGNAL, NULL, 0) = 12
recvfrom(5, "\0RES\0\0\0\6\0\0\0\0\0RES\0\0\0(\0\0\0QH:terra-"..., 8192, MSG_NOSIGNAL, NULL, NULL) = 105
pipe([6, 7]) = 0
pipe([8, 9]) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fea38480a50) = 2531880
close(6) = 0
close(9) = 0
write(7, "1.2.3\n", 18) = 6
close(7) = 0
read(8, "which: no terraform-0.14 in (/us"..., 1024) = 80
read(8, "Identity added: /home/gds/.ssh/i"..., 1024) = 54
read(8, 0x7fff6251f5b0, 1024) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2531880, si_uid=1006, si_status=0, si_utime=0, si_stime=0} ---
read(8,
So the worker continues reading standard output even though the child has finished successfully and presumably closed it. Any ideas how to catch what causes this problem?
I was able to solve it. The script run.sh was starting ssh-agent, which opens a socket and since Gearman redirects all outputs the worker continued reading the open file descriptor even after the script successfully completed.
I found it by examining the open file descriptors for the Gearman worker process after it hang:
# ls -l /proc/2531801/fd/*
lr-x------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/0 -> /dev/null
l-wx------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/1 -> 'pipe:[9356665]'
l-wx------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/2 -> 'pipe:[9356665]'
lr-x------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/3 -> 'pipe:[9357481]'
l-wx------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/4 -> 'pipe:[9357481]'
lrwx------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/5 -> 'socket:[9357482]'
lr-x------. 1 gds devops 64 May 17 11:26 /proc/2531801/fd/8 -> 'pipe:[9369888]'
Then identified the processes using file node for the pipe in file descriptor 8 that German worker continued reading:
# lsof | grep 9369888
gearman 2531801 gds 8r FIFO 0,13 0t0 9369888 pipe
ssh-agent 2531899 gds 9w FIFO 0,13 0t0 9369888 pipe
And finally listed files opened by ssh-agent and found what stands behind file descriptor 3:
# ls -l /proc/2531899/fd/*
lrwx------. 1 root root 64 May 17 11:14 /proc/2531899/fd/0 -> /dev/null
lrwx------. 1 root root 64 May 17 11:14 /proc/2531899/fd/1 -> /dev/null
lrwx------. 1 root root 64 May 17 11:14 /proc/2531899/fd/2 -> /dev/null
lrwx------. 1 root root 64 May 17 11:14 /proc/2531899/fd/3 -> 'socket:[9346577]'
# lsof | grep 9346577
ssh-agent 2531899 gds 3u unix 0xffff89016fd34000 0t0 9346577 /tmp/ssh-0b14coFWhy40/agent.2531898 type=STREAM
As a solution I added kill of the ssh-agent before exit from run.sh script and now there are no jobs hanging due to zombie process.

Weird output of pg_stat_activity

I have troubles with the output of this simple query:
select
pid,
state
from pg_stat_activity
where datname = 'My_DB_name'
while running it different ways:
In IDE
Via running psql in terminal
In bash script:
QUERY="copy (select pid, state from pg_stat_activity where datname = 'My_DB_name') to stdout with csv"
psql -h host -U user -d database -t -c "$QUERY" >> result
1 and 2 return results as I need them:
1:
pid state
------ -----------------------------
23126 idle
25573 active
2642 active
20420 idle
23391 idle
5339 idle
7710 idle
1558 idle
12506 idle
2862 active
716 active
9834 idle in transaction (aborted)
2:
pid | state
-------+-------------------------------
23126 | idle
25573 | idle
2642 | active
20420 | idle
23391 | idle
5339 | active
7710 | idle
1558 | idle
12506 | idle
2211 | active
716 | active
9834 | idle in transaction (aborted)
3 is weird - it doesnt give me any state name except 'active'
23126,
25573,
2642,
20420,
23391,
5339,
7710,
1558,
12506,
1660,active
716,active
1927,active
9834,
What am I missing? How to get all the state names via bash script?
pg_stat_activity is a catalog view that will show different content depending on whether you're logged in as a superuser, or as a non-privileged user.
From your output, it looks like you're logged in as superuser in #1 and #2, but as a normal user in #3.

Updated Ubuntu and pcntl_fork stopped working (php)

Things that changed recently on my server:
I'm almost sure it's because the dist-upgrade.(few days ago)
I added a new user and added him a library in var/www/html/banana.
so it might be from that too (?) - (2 weeks ago)
Tried installing FastCGI without any success - but this didn't disrupt any regular processing and flow. (2 months ago)
I usually run API queries from my PHP code using forking, and in some point it stopped working for me (it does work, but when getting to heavy query results it stopps).
error.log:
[Sun Aug 28 12:15:03.201994 2016] [:notice] [pid 1882] FastCGI: process manager initialized (pid 1882)
[Sun Aug 28 12:15:03.278176 2016] [mpm_prefork:notice] [pid 1879] AH00163: Apache/2.4.18 (Ubuntu) mod_fastcgi/mod_fastcgi-SNAP-0910052141 configured -- resuming normal operations
running cat /var/mail/root outputs:
From root#banana Sun Aug 28 12:39:01 2016
Return-Path: <root#banana>
X-Original-To: root
Delivered-To: root#banana
Received: by banana (Postfix, from userid 0)
id ABC281005BA; Sun, 28 Aug 2016 12:39:01 +0300 (IDT)
From: root#banana (Cron Daemon)
To: root#banana
Subject: Cron <root#banana> [ -x /usr/lib/php/sessionclean ] && /usr/lib/php/sessionclean
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/root>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=root>
Message-Id: <20160828093901.ABC281005BA#YHserver>
Date: Sun, 28 Aug 2016 12:39:01 +0300 (IDT)
Can someone help me debug the problem better and solve it?
Running this script gets true results:
<?php
echo "Is fork? <br/>";
var_dump (extension_loaded('pcntl'));
echo "<br><br> more checks: <br>";
$supports = array();
if (function_exists("pcntl_fork")) $supports[] = "ispcntl";
echo implode(",", $supports);
for ($i = 1; $i <= 5; ++$i) {
$pid = pcntl_fork();
if (!$pid) {
sleep(1);
print "In child $i\n";
exit;
}
}
?>
EDIT: I Tried running that same script on the server without forking and I got all the results right (after waiting a lot and getting my website stuck for a while..)

cron task wouldn't work, why?

I want to write a cron task to record the ntpdate synchronization info into the system log, but there's no such info printed in the /var/log/messages after this cron task is done, where did I do wrong?
The followings are what my crontab looks like.
*/1 * * * * ntpdate 192.168.100.97 | logger -t "NTP"
*/1 * * * * echo "log test" | logger -t "TEST"
*/1 * * * * whoami | logger -t "WHO"
When I do tailf /var/log/messages and wait some time I only got the following lines, the NTP lines are missing.
Oct 29 15:22:01 localhost TEST: log test
Oct 29 15:22:01 localhost WHO: root
Oct 29 15:23:01 localhost TEST: log test
Oct 29 15:23:01 localhost WHO: root
Oct 29 15:24:01 localhost TEST: log test
Oct 29 15:24:01 localhost WHO: root
Oct 29 15:25:01 localhost TEST: log test
Oct 29 15:25:01 localhost WHO: root
Oct 29 15:26:01 localhost TEST: log test
Oct 29 15:26:01 localhost WHO: root
But when I do the ntpdate 192.168.100.97 | logger -t "NTP" in the command line, I could see there's message Oct 29 15:28:39 localhost NTP: 29 Oct 15:28:39 ntpdate[11101]: adjust time server 192.168.100.97 offset 0.000043 sec print out in the system log. What am I missing here?
Thanks in advance for your kind help.

How to properly use tor-privoxy Ruby gem?

I am using tor-privoxy Ruby gem. According to this page: https://github.com/pirj/tor-privoxy
I installed "tor" and "privoxy" packages on my Arch Linux installation. I issued commands:
sudo systemctl start privoxy.service
sudo systemctl start tor.service
Status of the services, by "systemctl status privoxy.service" and "systemctl status tor.service":
● tor.service - Anonymizing Overlay Network
Loaded: loaded (/usr/lib/systemd/system/tor.service; enabled)
Active: active (running) since Thu 2014-06-26 16:27:44 CEST; 1 weeks 5 days ago
Main PID: 454 (tor)
CGroup: /system.slice/tor.service
└─454 /usr/bin/tor -f /etc/tor/torrc
Jul 08 16:28:28 bridgelinux Tor[454]: Application request when we haven't used client functionality late...gain.
Jul 08 16:28:40 bridgelinux Tor[454]: We now have enough directory information to build circuits.
Jul 08 16:28:41 bridgelinux Tor[454]: Tor has successfully opened a circuit. Looks like client functiona...king.
Jul 08 17:20:05 bridgelinux Tor[454]: Socks version 65 not recognized. (Tor is not an http proxy.)
Jul 08 17:20:05 bridgelinux Tor[454]: Fetching socks handshake failed. Closing.
Jul 08 18:01:25 bridgelinux Tor[454]: Socks version 65 not recognized. (Tor is not an http proxy.)
Jul 08 18:01:25 bridgelinux Tor[454]: Fetching socks handshake failed. Closing.
Jul 08 18:10:04 bridgelinux systemd[1]: Started Anonymizing Overlay Network.
Jul 08 18:10:13 bridgelinux systemd[1]: Started Anonymizing Overlay Network.
Jul 08 18:14:34 bridgelinux systemd[1]: Started Anonymizing Overlay Network.
Hint: Some lines were ellipsized, use -l to show in full.
and
● privoxy.service - Privoxy Web Proxy With Advanced Filtering Capabilities
Loaded: loaded (/usr/lib/systemd/system/privoxy.service; disabled)
Active: active (running) since Tue 2014-07-08 16:09:16 CEST; 2h 8min ago
Process: 8554 ExecStart=/usr/bin/privoxy --pidfile /run/privoxy.pid --user privoxy.privoxy /etc/privoxy/config (code=exited, status=0/SUCCESS)
Main PID: 8555 (privoxy)
CGroup: /system.slice/privoxy.service
└─8555 /usr/bin/privoxy --pidfile /run/privoxy.pid --user privoxy.privoxy /etc/privoxy/config
Jul 08 16:09:16 bridgelinux systemd[1]: Started Privoxy Web Proxy With Advanced Filtering Capabilities.
Jul 08 18:17:55 bridgelinux systemd[1]: Started Privoxy Web Proxy With Advanced Filtering Capabilities.
My Ruby script looks like:
require 'mechanize'
require 'tor-privoxy'
require 'net/telnet'
def tor
privoxy_agent ||= TorPrivoxy::Agent.new '127.0.0.1', '', {8118 => 9050} do |agent|
sleep 20
puts "New IP is #{agent.ip}"
end
return privoxy_agent
end
def switch_endpoint
localhost = Net::Telnet::new("Host" => "localhost", "Port" => "9050", "Timeout" => 10, "Prompt" => /250 OK\n/)
localhost.cmd('AUTHENTICATE ""') { |c| print c; throw "Cannot authenticate to Tor" if c != "250 OK\n" }
localhost.cmd('signal NEWNYM') { |c| print c; throw "Cannot switch Tor to new route" if c != "250 OK\n" }
localhost.close
end
agent=tor
It shows that my IP adress remained the original one. When I try to call "switch_endpoint" method, I get an error: "ArgumentError: uncaught throw "Cannot authenticate to Tor"
However when I issue this command at bash prompt:
torify wget -qO- https://check.torproject.org/ | grep -i congratulations
I get no error, and it shows that I was able to connect to Tor network.
What can I do to make Tor-Privoxy work with Ruby and Mechanize?
I ran into the same problem, you can see in the logs that your authenticate command was refused by tor :
Socks version 65 not recognized. (Tor is not an http proxy.)
I managed to send telnet command to Tor using Socksify instead of tor-privoxy. You don't need privoxy anymore if you use socksify.
Here is a working example to dynamically swich Tor circuit :
First start Tor specifying password, control port and socks port:
tor --CookieAuthentication 0 --HashedControlPassword "" --ControlPort 9050 --SocksPort 50001
Then you can try this in ruby :
require 'net/telnet'
require 'socksify'
require 'mechanize'
original_ip = Mechanize.new.get("http://bot.whatismyipaddress.com").content
puts "original IP is : #{original_ip}"
# socksify will forward traffic to Tor so you dont need to set a proxy for Mechanize from there
TCPSocket::socks_server = "127.0.0.1"
TCPSocket::socks_port = "50001"
tor_port = 9050
2.times do
#Switch IP
localhost = Net::Telnet::new("Host" => "localhost", "Port" => "#{tor_port}", "Timeout" => 10, "Prompt" => /250 OK\n/)
localhost.cmd('AUTHENTICATE ""') { |c| print c; throw "Cannot authenticate to Tor" if c != "250 OK\n" }
localhost.cmd('signal NEWNYM') { |c| print c; throw "Cannot switch Tor to new route" if c != "250 OK\n" }
localhost.close
sleep 5
new_ip = Mechanize.new.get("http://bot.whatismyipaddress.com").content
puts "new IP is #{new_ip}"
end

Resources