How to find Oracle server crash from OCI Client program - oracle

I have written a oracle client program using OCI library.
client send a request to server and hung because server crashed and not intimated to client.
How can i find server status from client side(using OCI API).?
Thanks

I think Oracle db module for Asterisk had a nice DCD(dead connection detection) implemented. There are various approaches(server side, client side).
In your case the easiest way would be to use TCP keepalive. Use enable=broken directive in tnsnames.ora.
Purpose
The keepalive feature on the supported TCP transports can be enabled
for a net service client by embedding (ENABLE=BROKEN) under the
DESCRIPTION parameter in the connect string. Keepalive allows the
caller to detect a dead remote server, although typically it will take
2 hours or more to notice. Operating system TCP configurables, which
vary by platform, define the actual keepalive timing details.
net_service_name=
(DESCRIPTION=
(enable=broken)
(ADDRESS=(PROTOCOL=tcp)(HOST=sales1-svr)(PORT=1521))
(ADDRESS=(PROTOCOL=tcp)(HOST=sales2-svr)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=sales.us.example.com))
Just beware you will also need root privileges. With default setting Linux kernel starts sending keepalive packets after 2 hours of inactivity. So you also have to change tcp_keepalive_time and tcp_keepalive_intvl in /etc/sysctl.conf. This is global server side settings and Oracle can not yet set keepalive interval for a single TCP connection.
One more comment: I recall there is some function called OCIPing.
This one can be used for testing too. But I'm not sure how to distinguish long running queries from dead server situation.

Related

Determining if existing connections are using TCP KeepAlive under Windows

In Windows (Vista and later) is there a way, or a tool, that can aide in determining wether an existing, already established, outgoing TCP connection, was created with the SO_KEEPALIVE option.
On Unix platforms, this can usually be seen using netstat (ie. 'netstat -o' will show a separate column for KEEPALIVE).
Netstat on Windows does not have this feature. Nor does other Microsoft/SysInternals network tools I've tested.
I don't seem to be able to find a tool that can provide this information.
Scenario is: Applications running on a Windows 2008 R2 server needs to have TCP keepalive enabled on all connections it establishes. Some applications does not have the option to enable TCP keepalive, and I need some way of determining if it is enabled by default for these applications.
To be clear: I need some tool, or suggestions on how to program a tool, that shows wether existing winsock connections have TCP keepalive enabled or not. While there are other ways to determine this (such as sniffing the traffic and see if keepalive packets are sent), they all come with uncertainties. Also, we're talking about a lot of servers and a lot of applications.
Use WireShark to see the TCP Keepalive packets. If you need to check the loopback as I did use https://github.com/nmap/npcap

Stale connection with Pheanstalk

I'm using beanstalkd to offload some work to other machines. The setup is a bit unusual, the server is on the internet (public ip) but the consumers are behind adsl lines on some peoples homes. So there is a linux server as client going out through a dynamic ip and connecting to the server to get a job. It's all PHP and I'm using pheanstalk library.
Everything runs smoothly for some time, but then the adsl changes the IP (every 24h hours the provider forces a disconnect-reconnect) the client just hangs, never to go out of "reserve".
I thought that putting a timeout on the reserve would help it, but it didn't. As it seems, the client issues a command and blocks, it never checks the timeout. It just issues a reserve-with-timeout (instead of a simple reserve) and it is the servers responsibility to return a TIME_OUT as the timeout occurs. The problem is, the connection is broken (but the TCP/IP doesn't know about that yet until any of the sides try to talk to the other side) and if the client blocked reading, it will never return.
The library seems to have support for some kind of timeouts locally (for example when trying to connect to server), but it does not seem to contemplate this scenario.
How could I detect the stale connection and force a reconnect? Is there some kind of keepalive on the protocol (and on the pheanstalk itself)?
Thanks!
You could try to close each connection right after the request is answered and reopen a new connection each time.
There is no close() function but you deleting the Pheanstaly Object with unset($pheanstalk) will close it.
This explanation is quite helpful:
Pheanstalk (PHP client for beanstalk) - how do connections work?
I haven't tried it yet, but I came up with the idea of connecting to the beanstalk server through an SSH tunnel. We can enable the ServerAliveCountMax and ServerAliveInterval options on the tunnel, so that a network or server failure will cause the tunnel to close. This should then cause the pheanstalk client to report an error.

How to validate Oracle's ValidateConnection Property is working?

Someone told me if I set the "ValidateConnection" property in Oracle to TRUE, the application will be able to handle the following cases:
Timeouts on network equipment that shutdown TCP connections after a certain
amount of time and/or inactivity.
Physical connection breaks such as pulled cables, network equipment resets,
etc.
Oracle server being restarted, or DBA logically closing the connection on
the server side.
My questions are:
If ValidateConnection is set to TRUE, can oracle actually handle the above cases?
Do I need to write additional code or Oracle's connection pool will just wait until the connection is timedout?
What technique or tools can I use to test this cases? Sample code, or link to other article will be very useful.
Thanks.
ValidateConnection simply tells Oracle to test the connection from the pool prior to handing it to the application. This prevents you from getting already disconnected connections from the connection pool. To answer your question of which situations are handled by using ValidateConnection, I guess I would need to know what you mean by "handle". If the Oracle server has been disconnected from the internet, ValidateConnection cannot do anything about it. However, once it is back online, ValidateConnection will prevent Oracle from handing your application disconnected connections from the connection pool. The link below gives some a little more information, and he shortly describes how he tested ValidateConnection in his environment.
http://spdeveloper.net/2009/10/disconnected-odp-net-and-system-data-oracleclient-connections/

Socket connection rerouting

Most proxy servers perform the job of forwarding data to an appropriate "real" server. However, I am in the process of designing a distributed system in which when the "proxy" receives a TCP/IP socket connection, the remote system actually connects with a real server which the proxy nominates. All subsequent data flows from remote to the real server.
So is it possible to "forward" the socket connection request so that the remote system connects with the real server?
(I am assuming for the moment that nothing further can be done with the remote system. Ie the proxy can't respond to the connection by sending the IP address of the actual server and the remote connections with that. )
This will be under vanilla Windows (not Server), so can't use cunning stuff like TCPCP.
I assume your "remote system" is the one that initiates connection attempts, i.e. client of the proxy.
If I get this right: when the "remote system" wants to connect somewhere, you want the "proxy server" to decide where the connection will really go ("real server"). When the decision is made, you don't want to involve the proxy server any further - the data of the connection should not pass the proxy, but go directly between the "remote system" and the "real server".
Problem is, if you want the connection to be truly direct, the "remote system" must know the IP address of of the "real server", and vice versa.
(I am assuming for the moment that nothing further can be done with
the remote system. Ie the proxy can't respond to the connection by
sending the IP address of the actual server and the remote connections
with that. )
Like I said, not possible. Why is it a problem to have the "proxy" send back the actual IP address?
Is it security - you want to make sure the connection really goes where the proxy wanted? If that's the case, you don't have an option - you have to compromise. Either the proxy forwards all the data, and it knows where the data is going, or let the client connect itself, but you don't have control where it connects.
Most networking problems can be solved as long as you have complete control over the entire network. Here, for instance, you could involve routers on the path between the "remote system" and the "real client", to make sure the connection is direct and that it goes where the proxy wanted. But this is complex, and probably not an option in practice (since you may not have control over those routers).
A compromise may be to have several "relay servers" distributed around the network that will forward the connections instead of having the actual proxy server forward them. When a proxy makes a decision, it finds the best (closest) relay server, tells it about the connection, then orders the client to connect to the relay server, which makes sure the connection goes where the proxy intended it to go.
There might be a way of doing this but you need to use a Windows driver to achieve it. I've not tried this when the connection comes from an IP other than localhost, but it might work.
Take a look at NetFilter SDK. There's a trial version which is fully functional up to 100000 TCP and UDP connections. The other possibility is to write a Windows driver yourself, but this is non-trivial.
http://www.netfiltersdk.com
Basically it works as follows:
1) You create a class which inherits from NF_EventHandler. In there you can provide your own implementation of methods like tcpConnectRequest to allow you to redirect TCP connections somewhere else.
2) You initialize the library with a call to nf_init. This provides the link between the driver and your proxy, as you provide an instance of your NF_EventHandler implementation to it.
There are also some example programs for you to see the redirection happening. For example, to redirect a connection on port 80 from process id 214 to 127.0.0.0:8081, you can run:
TcpRedirector.exe -p 80 -pid 214 -r 127.0.0.1:8081
For your proxy, this would be used as follows:
1) Connect from your client application to the proxy.
2) The connection request is intercepted by NetFilterSDK (tcpConnectRequest) and the connection endpoint is modified to connect to the server the proxy chooses. This is the crucial bit because your connection is coming from outside and this is the part that may not work.
Sounds like routing problem, one layer lower than TCP/IP;
You're actually looking for ARP like proxy:
I'd say you need to manage ARP packets, chekcing the ARP requests:
CLIENT -> WHOIS PROXY.MAC
PROXY -> PROXY.IP is SERVER.IP
Then normal socket connection via TCP/IP from client to server.

How can I set the timeout on OCILogon2?

When the Oracle 10 databases are up and running fine, OCILogon2() will connect immediately. When the databases are turned off or inaccessible due to network issues - it will fail immediately.
However when our DBAs go into emergency maintenance and block incomming connections, it can take 5 to 10 minutes to timeout.
This is problematic for me since I've found that OCILogin2 isn't thread safe and we can only use it serially - and I connect to quite a few Oracle DBs. 3 blocked servers X 5-10 minutes = 15 to 30 minutes of lockup time
Does anyone know how to set the OCILogon2 connection timeout?
Thanks.
I'm currenty playing with OCI and it seems to me that it's impossible.
The only way I can think of is to use non-blocking mode. You'll need OCIServerAttach() and OCISessionBegin() instead of OCILogon() in this case. But when I tried this, OCISessionBegin() constantly returns OCI_ERROR with the following error code:
ORA-03123 operation would block
Cause: The attempted operation cannot complete now.
Action: Retry the operation later.
It looks strange and I don't yet know how to deal with it.
Possible workaround is to run your logon in another process, which you can kill after timeout...
We think we found the right file setting - but it's one of those problems where we have to wait until something rare and horrible occurs before we can verify it :-/
[sqlnet.ora]
SQLNET.OUTBOUND_CONNECT_TIMEOUT=60
From the Oracle docs..
http://download.oracle.com/docs/cd/B28359_01/network.111/b28317/sqlnet.htm#BIIFGFHI
5.2.35 SQLNET.OUTBOUND_ CONNECT _TIMEOUT
Purpose
Use the SQLNET.OUTBOUND_ CONNECT _TIMEOUT parameter to specify the time, in seconds, for a client to establish an Oracle Net connection to the database instance.
If an Oracle Net connection is not established in the time specified, the connect attempt is terminated. The client receives an ORA-12170: TNS:Connect timeout occurred error.
The outbound connect timeout interval is a superset of the TCP connect timeout interval, which specifies a limit on the time taken to establish a TCP connection. Additionally, the outbound connect timeout interval includes the time taken to be connected to an Oracle instance providing the requested service.
Without this parameter, a client connection request to the database server may block for the default TCP connect timeout duration (approximately 8 minutes on Linux) when the database server host system is unreachable.
The outbound connect timeout interval is only applicable for TCP, TCP with SSL, and IPC transport connections.
Default
None
Example
SQLNET.OUTBOUND_ CONNECT _TIMEOUT=10

Resources