I have built a linux kernel module which helps in migrating TCP socket from one server to another. The module is working perfectly except when the importing server tries to close the migrating socket, the whole server hangs and freezes.
I am not able to find out the root of the problem, I believe it is something beyond my kernel module code. Something I am missing when I am recreating the socket in the importing machine, and initializes its states. It seems that the system is entering an endless loop. But when I close the socket from client side, this problem does not appear at all.
So my question, what is the appropriate way to debug the kernel module and figure out what is going on, why is it freezing? How to dump error messages especially in my case I am not able to see anything, once I close the file descriptor related to the migrated socket in the server side, the machines freezes.
Note: I used printk to print all the values, and I am not able to find something wrong in the code.
Considering your system is freezing, have you checked if your system is under heavy load while migrating the socket, have you looked into any sar reports to confirm this, see if you can take a vmcore (after configuring kdump) and use crash-tool to narrow down the problem. First, install and configure kdump, then you may need add the following lines to /etc/sysctl.conf and running sysctl -p
kernel.hung_task_panic=1
kernel.hung_task_timeout_secs=300
Next get a vmcore/dump of memory:
echo 'c' > /proc/sysrq-trigger # ===> 1
If you still have access to the terminal, use the sysrq-trigger to dump all the stack traces of kernel thread in the syslog:
echo 't' > /proc/sysrq-trigger
If you system is hung try using the keyboard hot keys
Alt+PrintScreen+'c' ====> same as 1
Other things you may want to try out, assuming you would have already tried some of the below:
1. dump_stack() in your code
2. printk(KERN_ALERT "Hello msg %ld", err); add these lines in the code.
3. dmesg -c; dmesg
Related
I have a sinatra app, when I run it in production it acts odly.
The first requests works, assets download and the page loads.
If you refresh the page however the request just stalls, and nothing gets logged out to the log file.
I'm running with sinatra-asset-pack and I did precompile the assets before starting it.
I'd post code but I'm not sure what would be needed to work out the issue
EDIT: it works fine on my own box, but when I deploy it to a VM using vagrant it just ceases up in production mode, it's fine in development mode though.
EDIT: I was able to get it to spit out this error:
Errno::EPIPE
Broken pipe # io_write -
And narrow it down to an action, however posting the code in the action is pointless as it doesn't log anything out and the first line of the action is a logging action so I'm not sure the action gets run at all; the logging was added after the problem hit so whatever it is, I don't think it's that.
EDIT: the error actually occurs here (base.rb(1144) of sinatra:
1142 def dump_errors!(boom)
1143 msg = ["#{Time.now.strftime("%Y-%m-%d %H:%M:%S")} - #{boom.class} - #{boom.message}:", *boom.backtrace].join("\n\t")
1144 #env['rack.errors'].puts(msg)
1145 end
EDIT: Ok, so when I run the deployment command manually it works fine; weirdly the output from the server is still being outputted to the terminal despite it being forked, I wonder if that's the problem. The broken pipe is the terminal that no-longer exists (when deployed via chef) and as such it's breaking... maybe?
Ok, turns out that the broken pipe thing was the cause, for some reason even after being forked it was trying to write stdout and stderr to the terminal.
However because the terminal no longer existed (It's started by chef) it could no longer write to the output and thus locked up, starting the app manually on the VM allowed it to work and further evidence to this conclusion is that adding a redirect (>> app.log 2>&1) to the end of the start command allowed the app to work.
Why sinatra is still writing logs to the terminal instead of the file I don't know and I need to work that out, but the main issue of the why is solved.
EDIT:
In the end I'm just doing this:
$stderr = #log.file
$stdout = #log.file
to redirect it to the same place as my logs go so it should be fine now... I think?
So I am trying to build an application that uses libtorrent. However, before I start I would like to make sure that I have compiled the lib correctly and that I have a functioning environment for testing.
I am currently running a VM with opentracker and I try to connect using the example client in libtorrent.
First I start by creating a .torrent file using libtorrent (I am currently not sitting in front of a computer with libtorrent available so I might be remembering the exact commands a bit wrong):
maketorrent.exe dummy.txt -t "http://10.XXX.XXX.XXX/announce"
This gives me a .torrent file called a.torrent. Opening the file everything looks ok, the bencoding is correct and the announce address is there.
Next I try to add it to the example client hoping it starts to seed:
client_test.exe a.torrent
Everything starts up OK, but no tracker is found. Then if I press t to show tracker information I see an error (maybe not the exact phrasing):
Alert: {null} unsupported URL protocol
OK, so maybe something is wrong with how I built libtorrent. So I get the Halite client instead since that is also supposed to be build upon libtorret. But there I have the same problem.
So I have a look at the code and found where this error message is generated. The code is checking if I am supplying an address using the HTTP or HTTPS protocol, which I am. So could it be that I am not able to use a bare IP-address or am I doing something wrong?
I found the problem. It was not a problem with the IP address or the torrent itself. Instead it was a problem with caching.
The first time I added the torrent I used http:\XXX.XXX.XXX.XXX instead of http://XXX.XXX.XXX.XXX which didn't work. However whatever change i did to the torrent file after that did not stick. It was always falling back to that original file until I removed the .resume folder.
I've compiled and trolled around the quickfix ( http://www.quickfixengine.org ) source and the examples. I figured a good starting point would be to compile (C++) and run the 'executor' example, then use the 'tradeclient' example to connect to 'executor', and send it order requests.
I created two seperate session files one for the 'executor' as an acceptor, and one for the 'tradeclient' as the initiator. They're both running on the same Win7 pc.
'executor' runs, but tradeclient can't connect to it, and I can't figure out why. I downloaded Mini-fix and was able to send messages to executor, so I know that executor is working. I figure that the problem is with the tradeclient session settings. I've included both of them below, I was hoping someone could point out what's causing them to not communicate. They're both running on the same computer using port 56156.
--accceptor session.txt----
[DEFAULT]
ConnectionType=acceptor
ReconnectInterval=5
SenderCompID=EXEC
DefaultApplVerID=FIX.5.0
[SESSION]
BeginString=FIXT.1.1
TargetCompID=SENDER
HeartBtInt=5
#SocketConnectPort=
SocketAcceptPort=56156
SocketConnectHost=127.0.0.1
TransportDataDictionary=pathToXml/spec/FIX50.xml
StartTime=07:00:00
EndTime=23:00:00
FileStorePath=store
---- initiator session.txt ---
[DEFAULT]
ConnectionType=initiator
ReconnectInterval=5
SenderCompID=SENDER
DefaultApplVerID=FIX.5.0
[SESSION]
BeginString=FIXT.1.1
TargetCompID=EXEC
HeartBtInt=5
SocketConnectPort=56156
#SocketAcceptPort=56156
SocketConnectHost=127.0.0.1
TransportDataDictionary=pathToXml/spec/FIX50.xml
StartTime=07:00:00
EndTime=23:00:00
FileLogPath=log
FileStorePath=store
--------end------
Update: Thanks for the resonses... Turns out that my logfile directories didn't exist. Once I created them, they both started communicating. Must have been some logging error that didn't throw an exception, but disabled proper behavior.
Is there an error condition that I should be checking? I was relying on exceptions, but that's obviously not enough.
It doesn't seem to be config, check that your message sequence numbers are in synch, especially since you've been connecting to a different server using the same settings.
Try setting the TargetCompID and SenderCompID on the acceptor to *
I have not worked so much with files: I am wondering about possible issues with accessing remote files on another computer. What if the distant application crashes and doesn't close the file ?
My aim is to use this win32 function:
HFILE WINAPI OpenFile(LPCSTR lpFileName, LPOFSTRUCT lpReOpenBuff, UINT uStyle);
Using the flag OF_SHARE_EXCLUSIVE assures me that any concurrent access will be denied
(because several machines are writing to this file from time to time).
But what if the file is left open ? (application crash for example ?)
How to put the file back to normal ?
What if the distant application crashes and doesn't close the file ?
Then the O/S should close the file when it cleans up after the "crashed" application.
This won't help with a "hung" application (an application which stays open but does nothing forever).
I don't know what the issues are with network access: for example if the network connection disappears when a client has the file open (or if the client machine switches off or reboots). I'd guess there are timeouts which might eventually close the file on the server machine, but I don't know.
It might be better to use a database engine instead of a file: because database engines are explicitly built to handle concurrent access, locking, timeouts, etc.
I came across the same problem using VMware, which sometimes doe not release file handles on the host, when files are closed on the guest.
You can close such handles using the handle utility from www.sysinternals.com
First determine the file handle id my passing a part of the filename. handle will show all open files where the given string matches a part of the file name:
D:\sysinternals\>handle myfile
deadhist.exe pid: 744 3C8: D:\myfile.txt
Then close the hanlde using the parameters -c and -p
D:\sysinternals\>handle -c 3c8 -p 744
3C8: File (---) D:\myfile.txt
Close handle 3C8 in LOCKFILE.exe (PID 744)? (y/n) y
Handle closed.
handle does not care about the application holding the file handle. You are now able to reopen, remove, rename etc. the file
I have a program that was written for linux and I am trying to build and run it on my MacOS 10.5 machine. The program builds and runs without problem, however it makes many calls to syslog. I know that syslogd is running on my mac, however I can't seem to find where my syslog calls are output to.
The syslog calls are of the form
syslog (LOG_WARNING, "Log message");
Any idea where I might find my log output?
/var/log/system.log
You can monitor it easily using tail -f /var/log/system.log
See also the "logger" (man logger) and "syslog" (man syslog).
You should probably use the Console.app to view logfiles. It's purdy.
Select your device on the left and filter messages on the right:
Maybe interesting to note: Apple was using a real syslogd in the past but meanwhile all of this has switched to ASL (Apple System Log). The syslog command is still available, but it will only access this one log. If you want to access all log messages of ASL across all log files configured, use the log command.
E.g. the following shows all log messages produced by Safari within the last two days (be patient, can take a while):
log show --predicate 'process == "Safari"' --last 2d
See man log for all the actions you can perform, all the parameters it knows and what attributes you can filter for.
When in doubt, there's always man syslog.
You can find your messages in /var/log/syslog; my machine is set up out of the box to only include high level messages so you may need to have your settings.
You can also read the messages through syslog(1), or create a test message with a command like
$ syslog -s -l INFO "Hello, world."
use a severity of P ("panic") and you'll get an exciting message on your console immediately.
Mac OS X implements a superset of syslog's functionality. All of syslog is there, but as part of ASL.
Console, mentioned by Matthew Schinckel in his answer, is the GUI on ASL. It'll show you any messages that exist in the database, as fetched by queries listed in the sidebar. There are two queries by default; one only shows messages sent with the Console facility (as used by NSLog, among other things), whereas the other shows all log messages. Check the all-messages query; you'll probably find your message there.
That “all” does come with an asterisk. If you look in /etc/asl.conf, you'll see this line:
# save everything from emergency to notice
? [<= Level notice] store
Fortunately, in your case, the message will pass this check, since warning outranks (is a lesser number than) notice.
If you need complex syslog analysis (navigation hour by hour in terminal, regexp, comparing in real time w\ other files or even running SQL over syslog) lnav would seamlessly provide it for you.
Installation:
brew install lnav
Usage:
lnav /var/log/system.log
UI itself:
Building on Charlie's answer, I would like to add that you should take a look at the manpage of syslog.conf(5) and also take a peek at the file /etc/syslog.conf (which is where the syslog configuration is defined by default and also, as I see it, on OS X 10.5.x).
Check for a call to openlog somewhere in the program. After a call to openlog, syslog will save its output to that log file instead of the default location.
Big Sur
Unfortunately, non of the stated answers worked for me.
What Worked for me:
The system mail accessed using the mail program from the terminal had all the /usr/sbin/cron logs in emails.