Elixir pry session interrupted because database connection timed out - debugging

I was happily following this advice on how to run a pry debugger inside my Phoenix controller tests:
require IEx in the target file
add IEx.pry to the desired line
run the tests inside IEx: iex -S mix test --trace
But after a few seconds this error always appeared:
16:51:08.108 [error] Postgrex.Protocol (#PID<0.250.0>) disconnected:
** (DBConnection.ConnectionError) owner #PID<0.384.0> timed out because
it owned the connection for longer than 15000ms
As the message says, the database connection appears to time out at this point and any commands that invoke the database connection will error out with a DBConnection.OwnershipError. How do I tell my database connection not to time out so I can debug my tests in peace?

The Ecto.Adapters.SQL.Sandbox FAQ mentions this issue and explains that you can add the :ownership_timeout setting to your Repo config to specify how long db connections should stay open before timing out. I set mine to 10 minutes (test environment only) so I never have to think about it again:
# config.test.exs
config :rumbl, Rumbl.Repo,
# ...other settings...
ownership_timeout: 10 * 60 * 1000 # long timeout so pry sessions don't break
As expected, I can now fool around in pry for 10 minutes before seeing that error.

Related

Issue installing openwhisk with incubator-openwhisk-devtools

I have a blocking issue installing openwhisk with docker
I typed make quick-start right after a git pull of the project incubator-openwhisk-devtools. My OS is Fedora 29, docker version is 18.09.0, docker-compose version is 1.22.0. JDk 8 Oracle.
I get the following error:
[...]
adding the function to whisk ...
ok: created action hello
invoking the function ...
error: Unable to invoke action 'hello': The server is currently unavailable (because it is overloaded or down for maintenance). (code ciOZDS8VySDyVuETF14n8QqB9wifUboT)
[...]
[ERROR] [#tid_sid_unknown] [Invoker] failed to ping the controller: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for health-0: 30069 ms has passed since batch creation plus linger time
[ERROR] [#tid_sid_unknown] [KafkaProducerConnector] sending message on topic 'health' failed: Expiring 1 record(s) for health-0: 30009 ms has passed since batch creation plus linger time
Please note that controller-local-logs.log is never created.
If I issue a touch controller-local-logs.log in the right directory the log file is always empty after I try to issue make quick-start again.
http://localhost:8888/ping gives me the right answer: pong.
http://localhost:9222 is not reacheable.
Where am I wrong?
Thank you in advance

TFS2015 Test Agent Aborted - PowerShell script completed with errors

I am running a TFS nightly build that for the last few days has not been able to complete all its tests. It fails after several hours with a "Test run is aborted" message. Previous to this the tests always ran successfully, and no major changes(or even minor) have been made to the system that runs these tests.
Information:
Two MStest runs in the build(unit tests)
Timeout is set to 20 hours
Runs for approx. 15 hours before failure
Tests are set to continue on failure
When I look in the TFS log for the latest run it lists the following(2017-04-11T06:42:47.5500707Z):
[warning]DistributedTests: Test run is aborted. Logging details of the run logs.
[warning]DistributedTests: New test run created.
[warning]Test Run queued for Project Collection Build Service
[warning]DistributedTests: Test discovery started.
[warning]DistributedTests: Test Run Discovery Completed . Test run id: 533
[warning]DistributedTests: 290 test cases discovered.
[warning]DistributedTests: Test execution started. Test run id : 533
[warning]DistributedTests: Test run timed out. Test run id : 533
[warning]DistributedTests: Test run aborted. Test run id: 533
[error]The test run was aborted, failing the task.
When I look at the run log(worker_20170410-234426-utc_864.log) I see:
06:42:47.659516 BaseLogger.LogConsoleMessage(scope.JobId =
7ced7f31-e360-47f3-b334-ef20faeaf000, message = ##[error]The test run
was aborted, failing the task.) 06:42:47.659516
Microsoft.TeamFoundation.DistributedTask.Agent.Common.AgentExecutionTerminationException:
PowerShell script completed with errors. at
Microsoft.TeamFoundation.DistributedTask.Handlers.PowerShellHandler.Execute(ITaskContext
context, CancellationToken cancellationToken) at
Microsoft.TeamFoundation.DistributedTask.Worker.JobRunner.RunTask(ITaskContext
context, TaskWrapper task, CancellationTokenSource tokenSource)
In the test log, I don't see any errors in the VS, just a warning about not able to connect(I see these often):
W, 2060, 5, 2017/04/10, 16:26:03.595, XXXTESTING\QTController.exe,
Test of LoadTestResultConnectString failed: A network-related or
instance-specific error occurred while establishing a connection to
SQL Server. The server was not found or was not accessible. Verify
that the instance name is correct and that SQL Server is configured to
allow remote connections. (provider: SQL Network Interfaces, error: 26
- Error Locating Server/Instance Specified)
I also see an error thrown in the Application Event log at the same time:
The description for Event ID 0 from source Application cannot be
found. Either the component that raises this event is not installed on
your local computer or the installation is corrupted. You can install
or repair the component on the local computer.
If the event originated on another computer, the display information
had to be saved with the event.
The following information was included with the event:
Error Handler Exception: System.ServiceModel.CommunicationException:
There was an error reading from the pipe: The pipe has been ended.
(109, 0x6d). ---> System.IO.IOException: The read operation failed,
see inner exception. ---> System.ServiceModel.CommunicationException:
There was an error reading from the pipe: The pipe has been ended.
(109, 0x6d). ---> System.IO.PipeException: There was an error reading
from the pipe: The pipe has been ended. (109, 0x6d).....
the message resource is present but the message is not found in the
string/message table
The issue is that I really don't know how to interpret these messages, each log just says "test run was aborted, failing the task", I'm not even certain the powershell issue is what caused it. I'm also not sure that the error thrown in the application log is related, though it was thrown at exactly the same time that the run failed.
It's also difficult to research this issue, when you really don't know what's causing the test agent to fail. There are posts related to VS, and to the TFS Test Agent, but these don't strike me as related issues, and of course there is this somewhat unhelpful post about the Powershell message.
Has anyone seen this sort of issue before? I don't think anything on my build server has changed over the last few days(maybe updates...), what do you think would cause an issue like this to occur?
If you look at the failed build(containing tests) after it is aborted in the "Build" section of TFS, its says it was "Aborted", that's it... If you look at results of the build(containing tests) in the "Test" section of TFS it specified that the test run "Exceeded Timeout".
Apparently MSTest was running up against the default value of this little gem. I think it defaults to 8 hours when not specified, but I'm not too sure about this. Anyways I set the following setting in my "Default.testsettings" file:
<?xml version="1.0" encoding="utf-8"?>
<TestSettings name="TestSettings1">
<Execution>
<Timeouts runTimeout="200000000" />
</Execution>
</TestSettings>
Seems to resolve the issue. Tests runs successfully and no longer time out.

Jekyll Error - "serve" only works once

Started working on an update to my website. It was working fine the other day but now I get an error.
I generally type on CMD (i'm on Windows 10) "bundle exec Jekyll serve --watch" and the server goes. I can edit and save and its all reflected in browser upon refresh.
Now I can do this, but if I make one change to any file it works. Do another change and I get an error. I have to terminate and type again.
Below is the error:
D:\Tristen Grant\Documents\GitHub\portfolio>bundle exec jekyll serve --watch
DL is deprecated, please use Fiddle
Configuration file: D:/Tristen Grant/Documents/GitHub/portfolio/_config.yml
Source: D:/Tristen Grant/Documents/GitHub/portfolio
Destination: D:/Tristen Grant/Documents/GitHub/portfolio/_site
Incremental build: disabled. Enable with --incremental
Generating...
done in 0.595 seconds.
Auto-regeneration: enabled for 'D:/Tristen Grant/Documents/GitHub/portfolio'
Configuration file: D:/Tristen Grant/Documents/GitHub/portfolio/_config.yml
Server address: http://127.0.0.1:3000//
Server running... press ctrl-c to stop.
Regenerating: 1 file(s) changed at 2016-09-06 16:16:28 ...done in 0.521498 seconds.
Regenerating: 1 file(s) changed at 2016-09-06 16:16:30 ...error:
Error: No such file or directory - git rev-parse HEAD
Error: Run jekyll build --trace for more information.
[2016-09-06 16:19:35] ERROR Errno::ENOTSOCK: An operation was attempted on something that is not a socket.
C:/Ruby21-x64/lib/ruby/2.1.0/webrick/server.rb:170:in `select'
Terminate batch job (Y/N)? y
Terminate batch job (Y/N)? y
I'm using Ruby 2.1.5 64 bit version. RubyDevKit, Sass, Bourbon.
Any ideas how to fix this? I don't know much about Jekyll or ruby. Just starting out.
I also get this error in CMD. You should have github's desktop app installed; try running the same commands from the GitShell you get with that app.
For me that works, it saves me the trouble of installing git globally on Windows and setting it up.

Unresponsive socket after x time (puma - ruby)

I'm experiencing an unresponsive socket in with my Puma setup after random time. Up to this point I don't have a clue what's causing the issue. I was hoping somebody over here can help we with some answers or point me in the right direction. I'm having the following setup:
I'm using the official docker ruby-2.2.3-slim image together with the latest puma release 2.15.3, I've also installed Nginx as a reverse proxy. But I'm already sure Nginx isn't the problem over here because and I've tried to verify if the socket was working using this script. And the socket wasn't working, I got a timeout over there as well so I could ignore Nginx.
This is a testing environment so the server isn't experiencing any extreme load, I've also check memory consumption it has still several GB's of free space so that couldn't be the issue either.
What triggered me to look at the puma socket was the error message I got in my Nginx error logging:
upstream timed out (110: Connection timed out) while reading response header from upstream
Also I couldn't find anything in the logs of puma indicating what is going wrong, over here are my puma setup:
threads 0, 16
app_dir = ENV.fetch('APP_HOME')
environment ENV['RAILS_ENV']
daemonize
bind "unix://#{app_dir}/sockets/puma.sock"
stdout_redirect "#{app_dir}/log/puma.stdout.log", "#{app_dir}/log/puma.stderr.log", true
pidfile "#{app_dir}/pids/puma.pid"
state_path "#{app_dir}/pids/puma.state"
activate_control_app
on_worker_boot do
require 'active_record'
ActiveRecord::Base.connection.disconnect! rescue ActiveRecord::ConnectionNotEstablished
ActiveRecord::Base.establish_connection(YAML.load_file("#{app_dir}/config/database.yml")[ENV['RAILS_ENV']])
end
And this it the output in my puma state file:
---
pid: 43
config: !ruby/object:Puma::Configuration
cli_options:
conf:
options:
:min_threads: 0
:max_threads: 16
:quiet: false
:debug: false
:binds:
- unix:///APP/sockets/puma.sock
:workers: 1
:daemon: true
:mode: :http
:before_fork: []
:worker_timeout: 60
:worker_boot_timeout: 60
:worker_shutdown_timeout: 30
:environment: staging
:redirect_stdout: "/APP/log/puma.stdout.log"
:redirect_stderr: "/APP/log/puma.stderr.log"
:redirect_append: true
:pidfile: "/APP/pids/puma.pid"
:state: "/APP/pids/puma.state"
:control_url: unix:///tmp/puma-status-1449260516541-37
:config_file: config/puma.rb
:control_url_temp: "/tmp/puma-status-1449260516541-37"
:control_auth_token: cda8879717be7a645ea323d931b88d4b
:tag: APP
The application itself is a Rails app on the latest version 4.2.5, it's deployed on GCE (Google Container Engine).
If somebody could give me some pointer's on how to debug this any further would be very much appreciated. Because now I don't see any output anywhere which could help me any further.
EDIT
I replaced the unix socket with tcp connection to Puma with the same result, still hangs after x time
I'd start with:
How many requests get processed successfully per instance of puma?
Make sure you log the beginning and end of each request with the thread id of the thread executing it, what do you see?
Not knowing more about your application, I'd say it's likely the threads get stuck doing some long/blocking calls without timeouts or spinning on some computation until the whole thread pool gets depleted.
We'll see.
I finally found out why my application was behaving the way it was.
After trying to use a tcp connection and switching to Unicorn I start looking into other possible sources.
That's when I thought maybe my connection to Google Cloud SQL could be the problem. Once I read the faq of Cloud SQL, they mentioned that you have to tweak you Compute instances to ensure they keep open your DB connection. So I performed the next steps they recommend and that solved the problem for me, I added them just in case:
# Display the current tcp_keepalive_time value.
$ cat /proc/sys/net/ipv4/tcp_keepalive_time
# Set tcp_keepalive_time to 60 seconds and make it permanent across reboots.
$ echo 'net.ipv4.tcp_keepalive_time = 60' | sudo tee -a /etc/sysctl.conf
# Apply the change.
$ sudo /sbin/sysctl --load=/etc/sysctl.conf
# Display the tcp_keepalive_time value to verify the change was applied.
$ cat /proc/sys/net/ipv4/tcp_keepalive_time

gsoap error: SOAP-ENV:Client [no subcode]

I downloaded gsoap 2.8 and went into the samples folder and ran a make. Everything seems to have built fine. I then navigated into the "ssl" folder and ran the sllserver in one xterm and ran sslclient in a second xterm window. (I am running RHEL 6) The server seems to run fine, it says "Bind successful: socket = 4". But when I run the client I receive the following message:
Error -1 fault: SOAP-ENV:Client [no subcode]
"End of file or no input: Operation interrupted or timed out (30 s receive delay) (30 s send delay)"
Detail: [no detail]
I have not modified any of the sample code, so it seems like it should just work. Can anyone please give me some advice as to what I should look at? I am trying to learn how to set up a soap server that uses ssl. (I have a gsoap server running already) I searched all day for an example on the web and as usual, there is not one.
Thank you so much for any help.
You could rebuild this example with compiler switch -DDEBUG to enable message logging (make 'sslclient_CFLAGS = -DWITH_OPENSSL -DWITH_GZIP -DDEBUG'). The TEST.log will tell what went wrong. I suspect it is a network issue with the server address/port that is set by default to "https://localhost:18081".
You could set the timeout parameter: soap.recv_timeout = 60 (for 60 seconds)

Resources