Go application hangs when starting subprocess with another user's credentials - go

I've been using a simple game server management application on Ubuntu 14.04 for the last 6 months or so. After a recent server update & reboot the application would hang on when trying to start a subprocess. After some debugging it seems that whenever I try to start a subprocess with another user's credentials (I'm running as a root) any command will hang.
Here's a simple application to demonstrate what causes the hang:
package main
import (
"os/exec"
"syscall"
"fmt"
)
func main() {
proc := exec.Command("ls")
proc.SysProcAttr = &syscall.SysProcAttr{}
proc.SysProcAttr.Credential = &syscall.Credential{Uid: 1022, Gid: 1023}
err := proc.Run()
if err != nil {
fmt.Printf("err: %v", err)
}
}
By removing the syscall.Credential part, the application will run without any issues.
My question is: is there some platform/update specific reason that causes this behaviour? Is this no longer a correct way to run a subprocess as another user?
EDIT:
Here's the last 5 lines of strace -f
[pid 3994] futex(0xc21000a888, FUTEX_WAKE, 1 <unfinished ...>
[pid 3995] <... futex resumed> ) = 0
[pid 3994] <... futex resumed> ) = 1
[pid 3995] futex(0xc21000a888, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 3994] select(0, NULL, NULL, NULL, {0, 20}) = 0 (Timeout)
[pid 3994] futex(0x7f615c51a4f8, FUTEX_WAIT, 0, NULL
So apparently if I'm interpreting this right it's blocking at futex_wait.

You should execute your application with strace. So strace myapp and see where it locks up. It could be you have something else that's forking before your application executes, which is causing it to hang.

Related

What capabilities required for ioctl() on emmc on systemd?

I want to run my program with systemd with a regular user ( non-root). This program uses ioctl() syscall to access emmc registers.I want to learn which capabilities required to be added to my systemd unit file.
I tried with below unit file:
[Unit]
Description=EMMC-LIFETIME UTILITY
[Service]
User=tron
Group=disk
ExecStart=/HARICI/emmc-lifetime /dev/mmcblk0 -v
CapabilityBoundingSet=CAP_SYS_ADMIN
DeviceAllow=/dev/mmcblk0 rw
[Install]
WantedBy=multi-user.target
Here is the code of emmc-lifetime:
int main(int argc, char **argv)
{
if(argc < 2){
printf("Usage: %s <mmcfilename> (-v)\n", argv[0]);
printf("Example: %s /dev/mmcblk1 -v\n", argv[0]);
return 1;
}
char ext_csd[512], ext_csd_rev;
int fd, ret;
fd = open(argv[1], O_RDWR);
if (fd < 0) {
printf("Failed to open eMMC device, please check which path you have passed\n");
return 1;
}
struct mmc_ioc_cmd idata;
memset(&idata, 0, sizeof(idata));
memset(ext_csd, 0, sizeof(char) * 512);
idata.write_flag = 0;
idata.opcode = MMC_SEND_EXT_CSD;
idata.arg = 0;
idata.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;
idata.blksz = 512;
idata.blocks = 1;
mmc_ioc_cmd_set_data(idata, ext_csd);
ret = ioctl(fd, MMC_IOC_CMD, &idata);
if (ret){
printf("ioctl failed, are you sure it is an MMC device???\n");
return ret;
}
ext_csd_rev = ext_csd[EXT_CSD_REV];
if (ext_csd_rev >= 7) {
if(argc==3 && !strcmp(argv[2],"-v")){
printf("EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A: 0x%02x\n",
ext_csd[EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]);
printf("EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B: 0x%02x\n",
ext_csd[EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]);
printf("EXT_CSD_PRE_EOL_INFO: 0x%02x\n",
ext_csd[EXT_CSD_PRE_EOL_INFO]);
}else{
printf("%d\n",ext_csd[EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]*10);
}
}
if(fd)
close(fd);
return ret;
If I comment out "User=tron" in my unit file, everything works expectedly:
Nov 03 01:17:03 tron systemd[1]: Started EMMC-LIFETIME UTILITY.
Nov 03 01:17:03 tron emmc-lifetime[28294]: EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A: 0x01
Nov 03 01:17:03 tron emmc-lifetime[28294]: EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B: 0x01
Nov 03 01:17:03 tron emmc-lifetime[28294]: EXT_CSD_PRE_EOL_INFO: 0x01
But if I uncomment "User=tron", here is the result:
Nov 03 00:57:17 tron systemd[1]: Started EMMC-LIFETIME UTILITY.
Nov 03 00:57:17 tron emmc-lifetime[27706]: ioctl failed, are you sure it is an MMC device???
Nov 03 00:57:17 tron systemd[1]: emmc-info.service: Main process exited, code=exited, status=255/n/a
Nov 03 00:57:18 tron systemd[1]: emmc-info.service: Unit entered failed state.
Nov 03 00:57:18 tron systemd[1]: emmc-info.service: Failed with result 'exit-code'.
What capabilities are required in my unit file to run my executable as "tron" user ?
Solved. For the capabilities side, it must have CAP_SYS_RAWIO.
We can use :
setcap cap_sys_rawio=+eip /HARICI/emmc-lifetime
And by this way, we can run "emmc-lifetime" binary with non-root user. Please note that, this user must be in "disk" group to be able to open "/dev/mmcblkX"
But unfortunately, this doesn't work in systemd unit files.
In systemd unit files, if you write:
User=some-non-root-user
CapabilityBoundingSet=SOME_CAPABILITY
Does not work. Thats why my above code allways fails.
I must find another way to run my binary with non-root privileges.

Rsyslog does not write to file

I have a very basic use case : make rsyslog listen on a given TCP port and write each line received to a specified text file. Rsyslog listens correctly on the port, and testing with logger + ngrep shows that everything is fine on the TCP part. However rsyslog never writes anything in the specified file. I am a bit puzzled I never had this issue before.
My config:
module(load="imtcp")
ruleset(name="rs1") {
# I tested both syntaxes. None of them worked
#*.* /var/log/test.log
action(type="omfile" file="/var/log/test.log")
}
input(type="imtcp" port="10514" ruleset="rs1")
The rest of the configuration is the Debian's rsyslog configuration file
sudo /usr/sbin/rsyslogd -f /etc/rsyslog.conf -N 1
rsyslogd: version 8.4.2, config validation run (level 1), master config /etc/rsyslog.conf
rsyslogd: End of config validation run. Bye.
Running /usr/sbin/rsyslogd -dn shows (as usual) a ton of output and says everything is OK. I tripled checks file permissions and other basic checks, everything is OK.
Here is the debug output I get when testing
[..]
9533.048681189:main Q:Reg/w0 : strm 0x7f4e64003930: file -1(messages) flush, buflen 142
9533.048698110:main Q:Reg/w0 : strmPhysWrite, stream 0x7f4e64003930, len 142
9533.048720759:main Q:Reg/w0 : file '/var/log/messages' opened as #10 with mode 416
9533.048740602:main Q:Reg/w0 : strm 0x7f4e64003930: opened file '/var/log/messages' for WRITE as 10
9533.048762238:main Q:Reg/w0 : strm 0x7f4e64003930: file 10 write wrote 142 bytes
9533.048788387:main Q:Reg/w0 : Action 15 transitioned to state: rdy
9533.048794753:main Q:Reg/w0 : Action 15 transitioned to state: itx
9533.048810943:main Q:Reg/w0 : Action 15 transitioned to state: rdy
9533.048827085:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
9533.048842385:main Q:Reg/w0 : actionCommitAll: action 17, state 0, nbr to commit 0 isTransactional 0
9533.048848882:main Q:Reg/w0 : processBATCH: batch of 1 elements has been processed
9533.048865523:main Q:Reg/w0 : regular consumer finished, iret=0, szlog 0 sz phys 1
9533.048883876:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1 objects and enqueued 0 objects
9533.048900724:main Q:Reg/w0 : doDeleteBatch: delete batch from store, new sizes: log 0, phys 0
9533.048917314:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0 sz phys 0
9533.048923512:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for work.
9537.087044117:imtcp.c : epoll returned 1 entries
9537.087054376:imtcp.c : epoll push ppusr[0]: 0x180e070
9537.087059193:imtcp.c : tcpsrv: ready to process 1 event entries
9537.087062349:imtcp.c : tcpsrv: processing item 1, pUsr 0x180e070, bAbortConn
9537.087065363:imtcp.c : New connect on NSD 0x18219a0.
9537.087078854:imtcp.c : dnscache: entry (nil) found
9537.087174947:imtcp.c : adding nsdpoll entry 0/0x7f4e5c002af0, sock 11
9537.087182220:imtcp.c : New session created with NSD 0x7f4e5c002af0.
9537.087185460:imtcp.c : doing epoll_wait for max 128 events
9537.087612939:imtcp.c : epoll returned 1 entries
9537.087618865:imtcp.c : epoll push ppusr[0]: 0x7f4e5c002af0
9537.087621850:imtcp.c : tcpsrv: ready to process 1 event entries
9537.087624642:imtcp.c : tcpsrv: processing item 0, pUsr 0x7f4e5c002af0, bAbortConn
9537.087636869:imtcp.c : netstream 0x7f4e5c002a20 with new data
9537.087649100:imtcp.c : doing epoll_wait for max 128 events
9537.087705735:imtcp.c : epoll returned 1 entries
9537.087710379:imtcp.c : epoll push ppusr[0]: 0x7f4e5c002af0
9537.087713159:imtcp.c : tcpsrv: ready to process 1 event entries
9537.087715744:imtcp.c : tcpsrv: processing item 0, pUsr 0x7f4e5c002af0, bAbortConn
9537.087718426:imtcp.c : netstream 0x7f4e5c002a20 with new data
9537.087722700:imtcp.c : removing nsdpoll entry 0/0x7f4e5c002af0, sock 11
9537.087742477:imtcp.c : doing epoll_wait for max 128 events
And strace-ing the process shows the only files rsyslog touches are /etc/resolv.conf and /etc/hosts but it did receive my log line though
iznogoud#haproxylogs-xen02:~$ sudo strace -p $(cat /var/run/rsyslogd.pid) -f
Process 7463 attached with 9 threads
[pid 7471] futex(0x7fead1c25004, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 7470] futex(0x7fead1c24f9c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 7469] futex(0x7fead1c24f34, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 7468] futex(0x7fead1c24ecc, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 7467] futex(0x84967c, FUTEX_WAIT_PRIVATE, 11, NULL <unfinished ...>
[pid 7466] epoll_wait(8, <unfinished ...>
[pid 7465] read(4, <unfinished ...>
[pid 7464] select(4, [3], NULL, NULL, NULL <unfinished ...>
[pid 7463] select(1, NULL, NULL, NULL, {577, 636835}
<unfinished ...>
[pid 7466] <... epoll_wait resumed> {{EPOLLIN, {u32=3288344160, u64=140646287418976}}}, 128, -1) = 1
[pid 7466] accept(6, {sa_family=AF_INET6, sin6_port=htons(37578), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 13
[pid 7466] rt_sigprocmask(SIG_BLOCK, [HUP], ~[KILL STOP TTIN RTMIN RT_1], 8) = 0
[pid 7466] open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 14
[pid 7466] fstat(14, {st_mode=S_IFREG|0644, st_size=23, ...}) = 0
[pid 7466] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fead4506000
[pid 7466] read(14, "nameserver 10.75.164.1\n", 4096) = 23
[pid 7466] read(14, "", 4096) = 0
[pid 7466] close(14) = 0
[pid 7466] munmap(0x7fead4506000, 4096) = 0
[pid 7466] uname({sys="Linux", node="haproxylogs-xen02", ...}) = 0
[pid 7466] open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 14
[pid 7466] fstat(14, {st_mode=S_IFREG|0644, st_size=201, ...}) = 0
[pid 7466] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fead4506000
[pid 7466] read(14, "127.0.0.1\tlocalhost\n10.75.164.12"..., 4096) = 201
[pid 7466] close(14) = 0
[pid 7466] munmap(0x7fead4506000, 4096) = 0
[pid 7466] rt_sigprocmask(SIG_SETMASK, ~[KILL STOP TTIN RTMIN RT_1], NULL, 8) = 0
[pid 7466] fcntl(13, F_GETFL) = 0x2 (flags O_RDWR)
[pid 7466] fcntl(13, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 7466] epoll_ctl(8, EPOLL_CTL_ADD, 13, {EPOLLIN, {u32=3288345072, u64=140646287419888}}) = 0
[pid 7466] epoll_wait(8, {{EPOLLIN, {u32=3288345072, u64=140646287419888}}}, 128, -1) = 1
# Rsyslog received my test logline as shown below (truncated)
[pid 7466] recvfrom(13, "<5>Jul 10 18:02:01 iznogoud: Mon"..., 131072, MSG_DONTWAIT, NULL, NULL) = 58
[pid 7466] gettimeofday({1499709721, 740339}, NULL) = 0
[pid 7466] epoll_wait(8, {{EPOLLIN, {u32=3288345072, u64=140646287419888}}}, 128, -1) = 1
[pid 7466] recvfrom(13, "", 131072, MSG_DONTWAIT, NULL, NULL) = 0
[pid 7466] epoll_ctl(8, EPOLL_CTL_DEL, 13, 7feac40029f0) = 0
[pid 7466] close(13) = 0
[pid 7466] epoll_wait(8, <unfinished ...>
[pid 7464] <... select resumed> ) = 1 (in [3])
I am missing something obvious ?
Thanks :)
Upgrading rsyslog 8.23 fixed the problem
rsyslogd 8.23.0, compiled with:
PLATFORM: x86_64-pc-linux-gnu
PLATFORM (lsb_release -d):
FEATURE_REGEXP: Yes
GSSAPI Kerberos 5 support: Yes
FEATURE_DEBUG (debug build, slow code): No
32bit Atomic operations supported: Yes
64bit Atomic operations supported: Yes
memory allocator: system default
Runtime Instrumentation (slow code): No
uuid support: Yes
Number of Bits in RainerScript integers: 64

gdb how to break or step on forked processes of stripped binaries

I have a stripped binary which crashes and I want to reverse it. I tried the 'info file' to get the EntryPoint and set a breakpoint there. However a segmentation fault happens on one of the child processes...
[New process 40472]
process 40472 is executing new program: /usr/bin/dpkg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Inferior 2 (process 40472) exited normally]
E: Method http has died unexpectedly!
E: Sub-process http received a segmentation fault.
From the documentation I found the 'show inferior' but I cant find out how to see the specifics of the segfault ? I tried the 'set follow-fork-mode' to chile but it doesnt look like it is helping.
For example I would like to examine the values of the registers such as RIP etc.
Stracing the process produces this:
[pid 54137] writev(3, [{"\0\37", 2}, {"{\346\1\0\0\1\0\0\0\0\0\0\4http\4example\3org\0\0\1\0\1", 31}, {"\0\37", 2}, {"\357\24\1\0\0\1\0\0\0\0\0\0\4http\4example\3org\0\0\34\0\1", 31}], 4) = 66
[pid 54137] read(3, <unfinished ...>
[pid 54134] <... read resumed> "\10\376", 2) = 2
[pid 54134] read(3, "X\250AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 2302) = 2302
[pid 54134] close(3) = 0
[pid 54134] --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
[pid 54134] +++ killed by SIGSEGV +++
[pid 54131] <... select resumed> ) = 1 (in [5], left {0, 425835})
[pid 54131] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=54134, si_uid=0, si_status=SIGSEGV, si_utime=0, si_stime=1} ---
close(4)
....
....
....
close(5) = 0
close(4) = 0
write(2, "E", 1E) = 1
write(2, ": ", 2: ) = 2
write(2, "Method http has died unexpectedl"..., 34Method http has died unexpectedly!) = 34
write(2, "\n", 1
) = 1
write(2, "E", 1E) = 1
write(2, ": ", 2: ) = 2
write(2, "Sub-process http received a segm"..., 47Sub-process http received a segmentation fault.) = 47
write(2, "\n", 1
) = 1
close(3) = 0
exit_group(100) = ?
+++ exited with 100 +++

redis operation with ruby is blocking in ppoll

Our project use this apns provider that runing on centos 6.4 to push the oofline msg .
The apns provider just read from redis queue with brpop, then reformat the data and send to the apns msg to apple push service.
Recently, I faced an problem that the apn provider DO NOT read the msg from redis queue, I just strace the process:
The abnormal strace result:
tcp 0 0 ::1:39688 ::1:6379 ESTABLISHED 29452/ruby
[root#server]# strace -p 29452
Process 29452 attached - interrupt to quit
ppoll([{fd=56, events=POLLIN}], 1, NULL, NULL, 8
The normal strace result:
clock_gettime(CLOCK_MONOTONIC, {9266059, 349937955}) = 0
select(9, [8], NULL, NULL, {6, 0}) = 1 (in [8], left {3, 976969})
fcntl64(8, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
read(8, "*-1\r\n", 1024) = 5
write(8, "*3\r\n$5\r\nbrpop\r\n$9\r\napn_queue\r\n$1"..., 37) = 37
fcntl64(8, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
read(8, 0x9a0e5d8, 1024) = -1 EAGAIN (Resource temporarily unavailable)
clock_gettime(CLOCK_MONOTONIC, {9266061, 374086306}) = 0
select(9, [8], NULL, NULL, {6, 0}^C <unfinished ...>
Process 20493 detached
here is the related code:
loop do
begin
message = #redis.brpop(self.queue, 1)
if message
APN.log(:info, "---------->#{message} ----------->\n")
#notification = APN::Notification.new(JSON.parse(message.last,:symbolize_names => true))
send_notification
end
rescue Exception => e
if e.class == Interrupt || e.class == SystemExit
APN.log(:info, 'Shutting down...')
exit(0)
end
APN.log(:error, "class: #{e.class} Encountered error: #{e}, backtrace #{e.backtrace}")
APN.log(:info, 'Trying to reconnect...')
client.connect!
APN.log(:info, 'Reconnected')
client.push(#notification)
end
end
This problem occur aperiodically , the period time may be one or two month.
I think the code logic is right, guess the system network may affect the normal runnning of programming.
When I use pkill [pid] to kill the programm, it just restore the normal condiction starting read the msg from queue.
Now I don't know how to analyse the problem, so I have to use cron to reboot or send kill signal to the program every dawn periodcally. :(
Can everyone have the idea to handle the problem?
You used in your abnormal strace result ppoll with null timeout .
correct way is
const struct timespec timeout = { .tv_sec = 10, .tv_nsec = 0 };
struct pollfd myfds;
myfds.fd = fd;
myfds.events = POLLIN;
myfds.revents = 0;
retresult = ppoll(&myfds, 1,&timeout,NULL);
This will give 10sec delay once 10sec is finish its return to next code.

Segmentation fault happens when trying to free OCIEnv structure after failure of OCI environment setup with OCI_THREADED option

Summary
Segmentation fault happens when trying to free OCIEnv structure after failure of OCI environment setup with OCI_THREADED option(failure due to eg. misconfigured NLS_LANG environment variable).
When OCIEnvCreate called without OCI_THREADED options the example code does not crash, it works as expected.
Example code
#include <oci.h>;
#include <stdio.h>
#include <string.h>
int my_connect(const char *username, const char *password, const char *sid)
{
OCIEnv *env = NULL;
OCIError *err = NULL;
OCISvcCtx *svc = NULL;
if ( OCIEnvCreate(&env,
OCI_THREADED,
(dvoid *)0,
0,
0,
0,
(size_t)0,
(dvoid **)0) )
{
fprintf(stderr, "unable to initialize environment\n");
if ( env )
{
printf("env:[%p]\n", env);
OCIHandleFree(env, OCI_HTYPE_ENV); // segfault.
}
return -1;
}
printf("env:[%p]\n", env);
if ( OCIHandleAlloc((dvoid *)env,
(dvoid **)&err,
OCI_HTYPE_ERROR,
(size_t)0,
(dvoid **)0) )
{
fprintf(stderr, "unable to alloc error handlers\n");
goto error;
}
if ( OCIHandleAlloc((dvoid *) env,
(dvoid **) &svc,
OCI_HTYPE_SVCCTX,
(size_t) 0,
(dvoid **)0) )
{
fprintf(stderr, "unable to allocate service handlers\n");
goto error;
}
if ( OCILogon(env,
err,
&svc,
(CONST OraText *) username,
strlen(username),
(CONST OraText *) password,
strlen(password),
sid,
strlen(sid)
) )
{
fprintf(stderr, "login failed\n");
goto error;
}
printf("logged in\n");
if ( OCILogoff (svc, err) )
{
fprintf(stderr, "logoff failed\n");
goto error;
}
printf("logged out\n");
error:
if ( err )
OCIHandleFree(err, OCI_HTYPE_ERROR);
if ( svc )
OCIHandleFree(svc, OCI_HTYPE_SVCCTX);
if ( env )
OCIHandleFree(env, OCI_HTYPE_ENV);
return 0;
}
int main()
{
return my_connect("test_user", "qqq123", "XE");
}
Before run
export NLS_LANG=x
Stack trace
The problem is that __pthread_mutex_destroy is called with a NULL-pointer.
#0 __pthread_mutex_destroy (mutex=0x0) at pthread_mutex_destroy.c:28
#1 0x00007ffff585e6e0 in sltsmxd () from /lib/libclntsh.so.11.1
#2 0x00007ffff56a147c in kpufhndl0 () from /lib/libclntsh.so.11.1
#3 0x00007ffff56a0185 in kpufhndl () from /lib/libclntsh.so.11.1
#4 0x00007ffff567cac1 in OCIHandleFree () from /lib/libclntsh.so.11.1
#5 0x0000000000400a0c in my_connect (username=0x400dd1 "test_user", password=0x400dca "qqq123", sid=0x400dc7 "XE") at test2.c:24
#6 0x0000000000400c27 in main () at test2.c:84
Product details
Basic Lite Package Information
Thu Oct 4 13:00:49 UTC 2007
Client Shared Library 64-bit - 11.1.0.6.0
System name: Linux
Release: 2.6.9-34.0.1.0.11.ELsmp
Version: #1 SMP Mon Dec 4 22:20:39 UTC 2006
Machine: x86_64
OS details
Linux 3.2.0-37-generic #58-Ubuntu SMP Thu Jan 24 15:28:10 UTC 2013 x86_64 GNU/Linux
Distributor ID: Ubuntu
Description: Ubuntu 10.04.4 LTS
Release: 10.04
Codename: lucid
Question
At the moment I just do not free that memory area, but this is not a good solution.
What do you think, what would be the good solution?
This looks like it might be a bug in the 'Basic Lite' instant client, though I can't see anything relevant on MOS; but if so then in OCIEnvCreate(), not OCIHandleFree() as I think you suggest.
None of the example code I've seen tries to clean up the OCIEnv when it encounters a failure of OCIEnvCreate(); it seems to always just exit, including Oracle's own example code. It looks like that function is getting as far as creating the OCIEnv structure since you get a pointer to it, but presumably hasn't allocated the internals of that. Since it's in an indeterminate state, trying to clean it up will probably be a thankless task. So it seems like it ought to be OK to just not call OCIHandleFree().
I was able to recreate the problem using the 11.2.0.3.0 Basic Lite client package for Linux x86-64 (from TechNet downloads), under Oracle Enterprise Linux 5.6. When using the Basic (non-Lite) client package the problem was not seen - no memory fault, but also no 'unable to initialize environment' message, so the OCIEnvCreate() call succeeds. It does then fail to log in though, which is probably more reasonable anyway.
That suggests that you shouldn't expect the function to fail because of the NLS_LANG value - that part looks like a bug. If it does fail for some other reason then don't try to clean up. I can't see anything suggesting the SDK isn't expected to work with the Lite package, but you're forcing a failure and then trying to clean up beyond the norm, so the combination might not have been seen often before. Even without the clean-up triggering the memory fault, though, it looks like it's failing at the wrong point.
To get past the apparent bug and get the failure at the correct point, during login, you may need to switch to the Basic (non-Lite) client package.

Resources