I am trying to run applications on windows cluster. I am getting random crashes like bellow but most times it works.
I suspected that it was famous forking issue, but cygwin's rebase did not help.
Thank you for suggestions.
2 [main] bash 12840 C:\cygwin\bin\bash.exe: *** fatal error - add_item ("\??\C:\cygwin", "/", ...) failed, errno 1
Stack trace:
Frame Function Args
002868A8 6102F97B (002868A8, 00000000, 00000000, 00000000)
00286B98 6102F97B (6119FE20, 00008000, 00000000, 611A1C8F)
00287BC8 6100652C (611DF498, 00287BF4, 00000000, 60FE000C)
00287BE8 61006568 (611DF498, 00289C10, 00000001, 0003000A)
0028AC28 610917E4 (60FE000C, 20000C08, 0028ACF8, 61083290)
0028AC58 610D40FF (004C46B0, 01D05699, 004657E0, 612729D4)
208979 [main] bash 12840 exception::handle: Exception: STATUS_ACCESS_VIOLATION
cygwin 6.1
windows server 2008 R2 Ent
I have got explanation to the error from cygwin support guys(Thanks to Corinna):
That's not a rebase problem. It's apparently a concurrency problem of
sorts. While pulling up the per-user shared memory region, two or
more processes are trying to set up the same mount points.
This is not supposed to happen. Only the first process actually
creating the per-user shared memory is supposed to create the mount points. The OS tells a process if it created or just opened a shared
memory region, but for some reason both processes seem to think they
created the shmem region and one of them then stumbles of the EPERM
condition trying to create the root mount point twice.
But it is still leave problem the original problem.
Related
Linux kernel version: 4.18.0-17
I am porting some 4.15 kernel customizations to 4.18, but my 4.18 kernel does not boot. A stock 4.18 kernel (i.e. the starting point before merging the 4.15 modifications) boots and runs.
The error message is:
Failed to execute /init (error -7)
Starting init: /bin/sh exists but couldn't execute it (error -7)
"errno 7" is "E2BIG 7 Argument list too long"
What does that mean in the context of the kernel starting the init process?
If the kernel command line and root file systems is exactly the same as the one you are giving to the kernel version that does boot than the most likely cause is that get_user_pages_remote() is failing here: https://elixir.bootlin.com/linux/v4.18/source/fs/exec.c#L194
Which would imply one of your changes broke memory management.
To get here, just track from try_to_run_init_process() which runs init to all the functions called from it which can return E2BIG. This is the only call site that does not depend on init argument list or environment size - https://elixir.bootlin.com/linux/v4.18/source/init/main.c#L1001
Having said that, I would first make VERY sure that the kernel command line and root file system are the same.
How do I build a standalone executable in SBCL? I've tried
; SLIME 2.20
CL-USER> (defun hullo ()
(format t "hullo"))
HULLO
CL-USER> (sb-ext:save-lisp-and-die "hullo" :toplevel #'hullo :executable t)
but that just produces the following error.
Cannot save core with multiple threads running.
Interactive thread (of current session):
#<THREAD "main thread" RUNNING {10019563F3}>
Other threads:
#<THREAD "Swank Sentinel" RUNNING {100329E073}>,
#<THREAD "control-thread" RUNNING {1003423A13}>,
#<THREAD "reader-thread" RUNNING {1003428043}>,
#<THREAD "swank-indentation-cache-thread" RUNNING
{1003428153}>,
#<THREAD "auto-flush-thread" RUNNING {1004047DA3}>,
#<THREAD "repl-thread" RUNNING {1004047FA3}>
[Condition of type SB-IMPL::SAVE-WITH-MULTIPLE-THREADS-ERROR]
What am I doing wrong?
What you are doing wrong is trying to save an image while multiple threads are running. Unlike many errors in Lisp the error message explains exactly what the problem is.
If you look up the function in the sbcl manual here then you find that indeed one may not save an image with multiple threads running. The extra threads come from swank (the CL half of SLIME). The manual says that you may add functions to *save-hooks* which destroy excess threads and functions to *init-hooks* to restore threads.
One way around all this is to not save the image when it is running through slime but instead to start sbcl directly at a terminal (note: no readline support), load your program and save from there.
Working with slime is different. In theory there is a SWANK-BACKEND:SAVE-IMAGE function but I’m not sure if that works. Also as saving an image kills the process you may want to fork (SB-POSIX:FORK) first, unless you are on Windows. But forking causes problems due to not being well specified and file descriptor issues (i.e. if you try fork->close swank connection->save and die then you may find that the connection in the parent process is closed (or worse, corrupted by appearing open but being closed at some lower level)). One can read about such things online. Note that due to the way sbcl threads are implemented, forking clones only the thread that forks and the other threads are not cloned. Thus forking and then saving should work but may cause problems when running the executable due to partial slime state.
You may be interested in buildapp.
If you want to be able to use slime with your saved application you can load swank and start listening on a socket or port (maybe with some command line argument) and then in Emacs you may connect to that swank backend with slime.
You have to run save-lisp-and-die from a new sbcl, not from Slime. Dan Robertson explains more.
It is cumbersome the first time, but you can put it in a Makefile and re-use it. Don't forget to load your dependencies.
build:
sbcl --load cl-torrents.asd \
--eval '(ql:quickload :torrents)' \
--eval '(use-package :torrents)' \ # not mandatory
--eval "(sb-ext:save-lisp-and-die #p\"torrents\" :toplevel #'main :executable t)"
The quickload implies Quicklisp is already loaded, which may be the case if you installed Quicklisp on your machine, because then your ~/.sbclr contains quicklisp loading script ((load quicklisp-init)).
However sb-ext is not portable across implementations. asdf:make is the cross-platform equivalent. Add this in your .asd system definition:
:build-operation "program-op" ;; leave as is
:build-pathname "<binary-name>"
:entry-point "<my-package:main-function>"
and then call asdf:make to build the executable.
You can have a look at buildapp (mentioned above), a still popular app to do just that, for SBCL and CCL. It is in Debian. http://lisp-lang.org/wiki/article/buildapp An example usage looks like
buildapp --output myapp \
--asdf-path . \
--asdf-tree ~/quicklisp/dists \
--load-system my-app \
--entry my-app:main
But see also Roswell, a more general purpose tool, also supposed to build executables, but it is less documented. https://roswell.github.io/
If you want to build an executable on a CI system (like Gitlab CI), you may appreciate a lisp Docker image which has already SBCL, others lisps and Quicklisp installed, and if you want to parse command line arguments, see https://lispcookbook.github.io/cl-cookbook/testing.html#gitlab-ci and (my) tutorial: https://vindarel.github.io/cl-torrents/tutorial.html#org8567d07
I'm trying to serve a big number of small files with G-WAN (version 4.3.14, started with sudo on 64-bit Ubuntu 14.04.3). I start hammering it with requests over a single connection using wget to provide base URL and a file with a list of the URL suffixes. At some point, which is different for different runs, the gwan executable silently exits. There's no trace in the gwan log or in the site error log (I did change '_log' to 'log' to enable logging). The exit status code is 139. What does it mean? When I stop it with Ctrl-C the exit code is 130.
Is there a reference for the exit status codes? I cannot find any with Google.
First, Ubuntu 14.04.3 is very recent while G-WAN v4.3.14 is very old. Almost every new OS release introduces cincompatibilities that require patches and this is why we have to issue more recent releases for registered users. This explains the 'silent exits' that you are experiencing.
Second, process exits codes can be found this way:
./gwan -h
echo $?
0
Zero means no error, and any other value is an error (mixing system flags to be as informative as possible). That's why Ctrl+C returns 130: Control-C is fatal error signal 2, (130 = 128 + 2).
I have a few Windows 7 machines that I am not able to read their memory dumps. I found something that I suspect may be related, but am not positive:
https://twitter.com/aionescu/status/634028737458114560
I also found this: http://support.microsoft.com/kb/2528507
However, the scenario message regarding wow64exts given in the doc is not seen in any of my dumps. I also cannot apply that hotfix at this time to test it. So I'm just looking for some more information or opinions.
I'm able to open any other OS dump as well as my own system's Windows 7 dump, but there are 2 other machines that run Win 7 and it's telling me I have the wrong kernel symbols.
I have tried clearing out my symbol cache, reinstalled the Windows SDK, and also tried to open the dumps on two other machines with the same result. If it matters, the crash is manually created using the scroll lock method.
Symbol path: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols;
Seeing these errors: followed by "Type referenced: nt!_KPRCB"
Does anyone know about the issue mentioned by Alex in the twitter link and if it's possibly related to what I'm seeing?
Update 2015-10-22:
With the Microsoft patch day (2015-10-13) and KB3088195, symbols are available again.
However, symbols for the broken version have not been provided, so below may still be useful.
Microsoft has already published "good" symbols for ntdll in the past, containing type information like _TEB or _KPRCB. Starting from mid of July 2015, Microsoft has still published symbols for ntdll, but not containing that information.
So it depends on the version of ntdll whether you get type information or not. Old dumps referencing an old version of ntdll will download old PDBs containing type information while new dumps reference new versions of ntdll and WinDbg (or any other debugger) downloads PDBs without type information.
Could Microsoft remove type information of "good" symbols retroactively, thus making them "bad"?
Yes. As described in this answer, there is a tool to remove type information from existing PDBs. Doing that and replacing the PDB would result in such an effect.
Can Microsoft publish the "good" version of those PDBs which are currently "bad"?
That's hard to tell, since we don't know whether Microsoft has kept a copy of the "good" version so they could replace the "bad" version on the symbol server with the "good" one. Rebuilding ntdll from the same source code and thus creating new PDBs sounds possible, but the PDB gets a new time stamp and checksum. This can potentially be corrected manually, especially be Microsoft, since they should have the knowledge about the PDB internal format, but IMHO it's unlikely they'll do that. Things may go wrong and MS will hardly have tests to guarantee the correctness of such a thing.
So what can I do?
IMHO you can do nothing to really correct the situation.
You could assume that the types in ntdll have not changed so much. This would allow you to take an older version of wntdll.pdb and the new version of ntdll.dll and apply ChkMatch -m to it. This will copy the timestamp and checksum from the DLL to the PDB. After you did that (in an empty folder), replace the existing wntdll.pdb in your symbols directory with the hacked one.
WinDbg walkthrough (with output shortened to relevant things). I am using the latest version of wntdll.pdb I could find on my PC.
WARNING: doing the following may fix the type information but will likely destroy the correctness of the callstacks. Since any changes in the implementation (which are likely for security fixes) will change the method offsets.
0:005> dt nt!_PEB
*************************************************************************
*** ***
*** Either you specified an unqualified symbol, or your debugger ***
...
*** Type referenced: nt!_PEB ***
*** ***
*************************************************************************
Symbol nt!_PEB not found.
0:005> lm m ntdll
start end module name
773f0000 77570000 ntdll (pdb symbols) e:\debug\symbols\wntdll.pdb\FA9C48F9C11D4E0894B8970DECD92C972\wntdll.pdb
0:005> .shell cmd /c copy C:\Windows\SysWOW64\ntdll.dll e:\debug\temp\ntdllhack\ntdll.dll
1 file(s) copied.
0:005> .shell cmd /c copy "E:\Windows SDk\8.0\Debuggers\x86\sym\wntdll.pdb\B081677DFC724CC4AC53992627BEEA242\wntdll.pdb" e:\debug\temp\ntdllhack\wntdll.pdb
1 file(s) copied.
0:005> .shell cmd /c E:\debug\temp\ntdllhack\chkmatch.exe -m E:\debug\temp\ntdllhack\ntdll.dll E:\debug\temp\ntdllhack\wntdll.pdb
...
Executable: E:\debug\temp\ntdllhack\ntdll.dll
Debug info file: E:\debug\temp\ntdllhack\wntdll.pdb
Executable:
TimeDateStamp: 55a69e20
Debug info: 2 ( CodeView )
TimeStamp: 55a68c18 Characteristics: 0 MajorVer: 0 MinorVer: 0
Size: 35 RVA: 000e63e0 FileOffset: 000d67e0
CodeView format: RSDS
Signature: {fa9c48f9-c11d-4e08-94b8-970decd92c97} Age: 2
PdbFile: wntdll.pdb
Debug info: 10 ( Unknown )
TimeStamp: 55a68c18 Characteristics: 0 MajorVer: 565 MinorVer: 6526
Size: 4 RVA: 000e63dc FileOffset: 000d67dc
Debug information file:
Format: PDB 7.00
Signature: {b081677d-fc72-4cc4-ac53-992627beea24} Age: 4
Writing to the debug information file...
Result: Success.
0:005> .shell cmd /c copy E:\debug\temp\ntdllhack\wntdll.pdb E:\debug\symbols\wntdll.pdb\FA9C48F9C11D4E0894B8970DECD92C972\wntdll.pdb
1 file(s) copied.
0:005> .reload
Reloading current modules
.............................
0:005> dt nt!_PEB
ntdll!_PEB
+0x000 InheritedAddressSpace : UChar
+0x001 ReadImageFileExecOptions : UChar
...
0:005> !heap -s
LFH Key : 0x219ab08b
Termination on corruption : DISABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-----------------------------------------------------------------------------
Virtual block: 00920000 - 00920000 (size 00000000)
Virtual block: 02c60000 - 02c60000 (size 00000000)
Virtual block: 02e10000 - 02e10000 (size 00000000)
...
Note: using ChkMatch like this has the benefit that you do not need to turn on .symopt- 100, since that option would affect all PDB files, and you would not find potential other symbol issues. If you don't mind using .symopt, you could simply copy an old wntdll.PDB over the new one.
The issue is now fixed according to Microsoft and Microsoft told me that you should clear your symbol cache to get the new PDBs, otherwise Windbg would use the old Symbols which miss the information.
eCryptfs is a POSIX-compliant encrypted filesystem that has been part of the mainline Linux Kernel since version 2.6.19.
When I try to insert the module (ecryptfs.ko), I get the following error:
insmod: error inserting 'ecryptfs.ko': -1 Cannot allocate memory
Can some one please help me out?
below is the dmesg
Failed to allocate one or more kmem_cache objects
kmem_cache_create: duplicate cache ecryptfs_auth_tok_list_item
Pid: 3332, comm: insmod Tainted: G O 3.2.2+ #1
Call Trace:
[<c102bfe0>] ? printk+0x15/0x17
[<c10878b6>] kmem_cache_create+0x41c/0x458
[<d0ebd038>] ecryptfs_init+0x38/0x1b1 [ecryptfs]
[<c1001071>] do_one_initcall+0x71/0x118
[<d0ebd000>] ? 0xd0ebcfff
[<c1055703>] sys_init_module+0x60/0x18c
[<c12db9b0>] sysenter_do_call+0x12/0x36
ecryptfs_init_kmem_caches: ecryptfs_auth_tok_list_item: kmem_cache_create
failed
Failed to allocate one or more kmem_cache objects
Start with the error you are seeing in dmesg:
kmem_cache_create: duplicate cache ecryptfs_auth_tok_list_item
When the ecryptfs module is loaded the first thing it does is create a bunch of memory caches for itself. The error suggests that there is already a cache with that name.
You can check if the cache already exists by looking at sysfs:
$ ls -ld /sys/kernel/slab/ecryptfs*
NB. It may not show up in /proc/slabinfo due to slab merging.
If you see any ecryptfs slabs that suggests the ecryptfs module is already loaded, or it is already built into your kernel.
Normally the module loader would not let you load the same module twice, but perhaps you have done something weird to confuse it.
A likely cause of something like this happening is if one recompiles and installs a kernel and its modules but forgets to mount /boot before installing the kernel. After a reboot, one will then run with the old kernel but new modules. In any event, check that the running kernel is current, and reinstall both the kernel and the modules if in doubt:
mount /boot
cd /usr/src/linux
make && make install && make modules_install
I have done the above steps and error was solved