I upgraded consul to 1.8.7 when I moved to Ubuntu 22. I am facing the following error,
● consul.service - Consul Agent
Loaded: loaded (/lib/systemd/system/consul.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2023-01-12 06:00:59 UTC; 22min ago Main PID: 206769 (consul)
Tasks: 20 (limit: 119416)
Memory: 19.8M
CPU: 3.686s
CGroup: /system.slice/consul.service
└─206769 /usr/bin/consul agent -config-dir /etc/consul
node231 consul[206769]: 2023-01-12T06:22:56.783Z [WARN] agent:
error getting server health from server: server=node231 error="rpc
error getting client: failed to get conn: dial tcp
10.142.0.39:0->10.142.0.39:8300: i/o timeout" node231 consul[206769]: 2023-01-12T06:22:56.783Z [WARN] agent: error getting server health
from server: server=node231 error="rpc error getting client: failed to
get conn: rpc error: lead thread didn't get connection" node231
consul[206769]: 2023-01-12T06:22:56.783Z [WARN] agent: error
getting server health from server: server=node231 error="rpc error
getting client: failed to get conn: rpc error: lead thread didn't get
connection" node231 consul[206769]: 2023-01-12T06:22:56.783Z
[WARN] agent: error getting server health from server: server=node231
error="rpc error getting client: failed to get conn: rpc error: lead
thread didn't get connection" node231 consul[206769]:
2023-01-12T06:22:56.783Z [WARN] agent: error getting server health
from server: server=node231 error="rpc error getting client: failed to
get conn: rpc error: lead thread didn't get connection" node231
consul[206769]: 2023-01-12T06:22:56.783Z [WARN] agent: error
getting server health from server: server=node231 error="rpc error
getting client: failed to get conn: rpc error: lead thread didn't get
connection" node231 consul[206769]: 2023-01-12T06:22:57.782Z
[WARN] agent: error getting server health from server: server=node231
error="context deadline exceeded" node231 consul[206769]:
2023-01-12T06:22:59.783Z [WARN] agent: error getting server health
from server: server=node231 error="context deadline exceeded" node231
consul[206769]: 2023-01-12T06:23:01.782Z [WARN] agent: error
getting server health from server: server=node231 error="context
deadline exceeded" node231 consul[206769]:
2023-01-12T06:23:03.160Z [ERROR] agent.http: Request error: method=GET
url=/v1/agent/check/fail/service:puppet from=127.0.0.1:55702
error="method GET not allowed"
My server info
agent:
check_monitors = 0
check_ttls = 2
checks = 2
services = 2
build:
prerelease =
revision =
version = 1.8.7
consul:
acl = disabled
bootstrap = true
known_datacenters = 1
leader = true
leader_addr = 10.142.0.39:8300
server = true
raft:
applied_index = 51
commit_index = 51
fsm_pending = 0
last_contact = 0
last_log_index = 51
last_log_term = 2
last_snapshot_index = 0
last_snapshot_term = 0
latest_configuration = [{Suffrage:Voter ID:c3b23645-fbf1-4a29-3497-d6cccb7d0941 Address:10.142.0.39:8300}]
latest_configuration_index = 0
num_peers = 0
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 2
runtime:
arch = amd64
cpu_count = 32
goroutines = 77
max_procs = 32
os = linux
version = go1.17
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 1
event_time = 2
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
Is there anything else I need to take care for new version of consul?
I have a very basic use case : make rsyslog listen on a given TCP port and write each line received to a specified text file. Rsyslog listens correctly on the port, and testing with logger + ngrep shows that everything is fine on the TCP part. However rsyslog never writes anything in the specified file. I am a bit puzzled I never had this issue before.
My config:
module(load="imtcp")
ruleset(name="rs1") {
# I tested both syntaxes. None of them worked
#*.* /var/log/test.log
action(type="omfile" file="/var/log/test.log")
}
input(type="imtcp" port="10514" ruleset="rs1")
The rest of the configuration is the Debian's rsyslog configuration file
sudo /usr/sbin/rsyslogd -f /etc/rsyslog.conf -N 1
rsyslogd: version 8.4.2, config validation run (level 1), master config /etc/rsyslog.conf
rsyslogd: End of config validation run. Bye.
Running /usr/sbin/rsyslogd -dn shows (as usual) a ton of output and says everything is OK. I tripled checks file permissions and other basic checks, everything is OK.
Here is the debug output I get when testing
[..]
9533.048681189:main Q:Reg/w0 : strm 0x7f4e64003930: file -1(messages) flush, buflen 142
9533.048698110:main Q:Reg/w0 : strmPhysWrite, stream 0x7f4e64003930, len 142
9533.048720759:main Q:Reg/w0 : file '/var/log/messages' opened as #10 with mode 416
9533.048740602:main Q:Reg/w0 : strm 0x7f4e64003930: opened file '/var/log/messages' for WRITE as 10
9533.048762238:main Q:Reg/w0 : strm 0x7f4e64003930: file 10 write wrote 142 bytes
9533.048788387:main Q:Reg/w0 : Action 15 transitioned to state: rdy
9533.048794753:main Q:Reg/w0 : Action 15 transitioned to state: itx
9533.048810943:main Q:Reg/w0 : Action 15 transitioned to state: rdy
9533.048827085:main Q:Reg/w0 : actionCommit, in retry loop, iRet 0
9533.048842385:main Q:Reg/w0 : actionCommitAll: action 17, state 0, nbr to commit 0 isTransactional 0
9533.048848882:main Q:Reg/w0 : processBATCH: batch of 1 elements has been processed
9533.048865523:main Q:Reg/w0 : regular consumer finished, iret=0, szlog 0 sz phys 1
9533.048883876:main Q:Reg/w0 : DeleteProcessedBatch: we deleted 1 objects and enqueued 0 objects
9533.048900724:main Q:Reg/w0 : doDeleteBatch: delete batch from store, new sizes: log 0, phys 0
9533.048917314:main Q:Reg/w0 : regular consumer finished, iret=4, szlog 0 sz phys 0
9533.048923512:main Q:Reg/w0 : main Q:Reg/w0: worker IDLE, waiting for work.
9537.087044117:imtcp.c : epoll returned 1 entries
9537.087054376:imtcp.c : epoll push ppusr[0]: 0x180e070
9537.087059193:imtcp.c : tcpsrv: ready to process 1 event entries
9537.087062349:imtcp.c : tcpsrv: processing item 1, pUsr 0x180e070, bAbortConn
9537.087065363:imtcp.c : New connect on NSD 0x18219a0.
9537.087078854:imtcp.c : dnscache: entry (nil) found
9537.087174947:imtcp.c : adding nsdpoll entry 0/0x7f4e5c002af0, sock 11
9537.087182220:imtcp.c : New session created with NSD 0x7f4e5c002af0.
9537.087185460:imtcp.c : doing epoll_wait for max 128 events
9537.087612939:imtcp.c : epoll returned 1 entries
9537.087618865:imtcp.c : epoll push ppusr[0]: 0x7f4e5c002af0
9537.087621850:imtcp.c : tcpsrv: ready to process 1 event entries
9537.087624642:imtcp.c : tcpsrv: processing item 0, pUsr 0x7f4e5c002af0, bAbortConn
9537.087636869:imtcp.c : netstream 0x7f4e5c002a20 with new data
9537.087649100:imtcp.c : doing epoll_wait for max 128 events
9537.087705735:imtcp.c : epoll returned 1 entries
9537.087710379:imtcp.c : epoll push ppusr[0]: 0x7f4e5c002af0
9537.087713159:imtcp.c : tcpsrv: ready to process 1 event entries
9537.087715744:imtcp.c : tcpsrv: processing item 0, pUsr 0x7f4e5c002af0, bAbortConn
9537.087718426:imtcp.c : netstream 0x7f4e5c002a20 with new data
9537.087722700:imtcp.c : removing nsdpoll entry 0/0x7f4e5c002af0, sock 11
9537.087742477:imtcp.c : doing epoll_wait for max 128 events
And strace-ing the process shows the only files rsyslog touches are /etc/resolv.conf and /etc/hosts but it did receive my log line though
iznogoud#haproxylogs-xen02:~$ sudo strace -p $(cat /var/run/rsyslogd.pid) -f
Process 7463 attached with 9 threads
[pid 7471] futex(0x7fead1c25004, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 7470] futex(0x7fead1c24f9c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 7469] futex(0x7fead1c24f34, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 7468] futex(0x7fead1c24ecc, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 7467] futex(0x84967c, FUTEX_WAIT_PRIVATE, 11, NULL <unfinished ...>
[pid 7466] epoll_wait(8, <unfinished ...>
[pid 7465] read(4, <unfinished ...>
[pid 7464] select(4, [3], NULL, NULL, NULL <unfinished ...>
[pid 7463] select(1, NULL, NULL, NULL, {577, 636835}
<unfinished ...>
[pid 7466] <... epoll_wait resumed> {{EPOLLIN, {u32=3288344160, u64=140646287418976}}}, 128, -1) = 1
[pid 7466] accept(6, {sa_family=AF_INET6, sin6_port=htons(37578), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 13
[pid 7466] rt_sigprocmask(SIG_BLOCK, [HUP], ~[KILL STOP TTIN RTMIN RT_1], 8) = 0
[pid 7466] open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 14
[pid 7466] fstat(14, {st_mode=S_IFREG|0644, st_size=23, ...}) = 0
[pid 7466] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fead4506000
[pid 7466] read(14, "nameserver 10.75.164.1\n", 4096) = 23
[pid 7466] read(14, "", 4096) = 0
[pid 7466] close(14) = 0
[pid 7466] munmap(0x7fead4506000, 4096) = 0
[pid 7466] uname({sys="Linux", node="haproxylogs-xen02", ...}) = 0
[pid 7466] open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 14
[pid 7466] fstat(14, {st_mode=S_IFREG|0644, st_size=201, ...}) = 0
[pid 7466] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fead4506000
[pid 7466] read(14, "127.0.0.1\tlocalhost\n10.75.164.12"..., 4096) = 201
[pid 7466] close(14) = 0
[pid 7466] munmap(0x7fead4506000, 4096) = 0
[pid 7466] rt_sigprocmask(SIG_SETMASK, ~[KILL STOP TTIN RTMIN RT_1], NULL, 8) = 0
[pid 7466] fcntl(13, F_GETFL) = 0x2 (flags O_RDWR)
[pid 7466] fcntl(13, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 7466] epoll_ctl(8, EPOLL_CTL_ADD, 13, {EPOLLIN, {u32=3288345072, u64=140646287419888}}) = 0
[pid 7466] epoll_wait(8, {{EPOLLIN, {u32=3288345072, u64=140646287419888}}}, 128, -1) = 1
# Rsyslog received my test logline as shown below (truncated)
[pid 7466] recvfrom(13, "<5>Jul 10 18:02:01 iznogoud: Mon"..., 131072, MSG_DONTWAIT, NULL, NULL) = 58
[pid 7466] gettimeofday({1499709721, 740339}, NULL) = 0
[pid 7466] epoll_wait(8, {{EPOLLIN, {u32=3288345072, u64=140646287419888}}}, 128, -1) = 1
[pid 7466] recvfrom(13, "", 131072, MSG_DONTWAIT, NULL, NULL) = 0
[pid 7466] epoll_ctl(8, EPOLL_CTL_DEL, 13, 7feac40029f0) = 0
[pid 7466] close(13) = 0
[pid 7466] epoll_wait(8, <unfinished ...>
[pid 7464] <... select resumed> ) = 1 (in [3])
I am missing something obvious ?
Thanks :)
Upgrading rsyslog 8.23 fixed the problem
rsyslogd 8.23.0, compiled with:
PLATFORM: x86_64-pc-linux-gnu
PLATFORM (lsb_release -d):
FEATURE_REGEXP: Yes
GSSAPI Kerberos 5 support: Yes
FEATURE_DEBUG (debug build, slow code): No
32bit Atomic operations supported: Yes
64bit Atomic operations supported: Yes
memory allocator: system default
Runtime Instrumentation (slow code): No
uuid support: Yes
Number of Bits in RainerScript integers: 64
I have a stripped binary which crashes and I want to reverse it. I tried the 'info file' to get the EntryPoint and set a breakpoint there. However a segmentation fault happens on one of the child processes...
[New process 40472]
process 40472 is executing new program: /usr/bin/dpkg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Inferior 2 (process 40472) exited normally]
E: Method http has died unexpectedly!
E: Sub-process http received a segmentation fault.
From the documentation I found the 'show inferior' but I cant find out how to see the specifics of the segfault ? I tried the 'set follow-fork-mode' to chile but it doesnt look like it is helping.
For example I would like to examine the values of the registers such as RIP etc.
Stracing the process produces this:
[pid 54137] writev(3, [{"\0\37", 2}, {"{\346\1\0\0\1\0\0\0\0\0\0\4http\4example\3org\0\0\1\0\1", 31}, {"\0\37", 2}, {"\357\24\1\0\0\1\0\0\0\0\0\0\4http\4example\3org\0\0\34\0\1", 31}], 4) = 66
[pid 54137] read(3, <unfinished ...>
[pid 54134] <... read resumed> "\10\376", 2) = 2
[pid 54134] read(3, "X\250AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"..., 2302) = 2302
[pid 54134] close(3) = 0
[pid 54134] --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
[pid 54134] +++ killed by SIGSEGV +++
[pid 54131] <... select resumed> ) = 1 (in [5], left {0, 425835})
[pid 54131] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=54134, si_uid=0, si_status=SIGSEGV, si_utime=0, si_stime=1} ---
close(4)
....
....
....
close(5) = 0
close(4) = 0
write(2, "E", 1E) = 1
write(2, ": ", 2: ) = 2
write(2, "Method http has died unexpectedl"..., 34Method http has died unexpectedly!) = 34
write(2, "\n", 1
) = 1
write(2, "E", 1E) = 1
write(2, ": ", 2: ) = 2
write(2, "Sub-process http received a segm"..., 47Sub-process http received a segmentation fault.) = 47
write(2, "\n", 1
) = 1
close(3) = 0
exit_group(100) = ?
+++ exited with 100 +++
Our project use this apns provider that runing on centos 6.4 to push the oofline msg .
The apns provider just read from redis queue with brpop, then reformat the data and send to the apns msg to apple push service.
Recently, I faced an problem that the apn provider DO NOT read the msg from redis queue, I just strace the process:
The abnormal strace result:
tcp 0 0 ::1:39688 ::1:6379 ESTABLISHED 29452/ruby
[root#server]# strace -p 29452
Process 29452 attached - interrupt to quit
ppoll([{fd=56, events=POLLIN}], 1, NULL, NULL, 8
The normal strace result:
clock_gettime(CLOCK_MONOTONIC, {9266059, 349937955}) = 0
select(9, [8], NULL, NULL, {6, 0}) = 1 (in [8], left {3, 976969})
fcntl64(8, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
read(8, "*-1\r\n", 1024) = 5
write(8, "*3\r\n$5\r\nbrpop\r\n$9\r\napn_queue\r\n$1"..., 37) = 37
fcntl64(8, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
read(8, 0x9a0e5d8, 1024) = -1 EAGAIN (Resource temporarily unavailable)
clock_gettime(CLOCK_MONOTONIC, {9266061, 374086306}) = 0
select(9, [8], NULL, NULL, {6, 0}^C <unfinished ...>
Process 20493 detached
here is the related code:
loop do
begin
message = #redis.brpop(self.queue, 1)
if message
APN.log(:info, "---------->#{message} ----------->\n")
#notification = APN::Notification.new(JSON.parse(message.last,:symbolize_names => true))
send_notification
end
rescue Exception => e
if e.class == Interrupt || e.class == SystemExit
APN.log(:info, 'Shutting down...')
exit(0)
end
APN.log(:error, "class: #{e.class} Encountered error: #{e}, backtrace #{e.backtrace}")
APN.log(:info, 'Trying to reconnect...')
client.connect!
APN.log(:info, 'Reconnected')
client.push(#notification)
end
end
This problem occur aperiodically , the period time may be one or two month.
I think the code logic is right, guess the system network may affect the normal runnning of programming.
When I use pkill [pid] to kill the programm, it just restore the normal condiction starting read the msg from queue.
Now I don't know how to analyse the problem, so I have to use cron to reboot or send kill signal to the program every dawn periodcally. :(
Can everyone have the idea to handle the problem?
You used in your abnormal strace result ppoll with null timeout .
correct way is
const struct timespec timeout = { .tv_sec = 10, .tv_nsec = 0 };
struct pollfd myfds;
myfds.fd = fd;
myfds.events = POLLIN;
myfds.revents = 0;
retresult = ppoll(&myfds, 1,&timeout,NULL);
This will give 10sec delay once 10sec is finish its return to next code.
I've been using a simple game server management application on Ubuntu 14.04 for the last 6 months or so. After a recent server update & reboot the application would hang on when trying to start a subprocess. After some debugging it seems that whenever I try to start a subprocess with another user's credentials (I'm running as a root) any command will hang.
Here's a simple application to demonstrate what causes the hang:
package main
import (
"os/exec"
"syscall"
"fmt"
)
func main() {
proc := exec.Command("ls")
proc.SysProcAttr = &syscall.SysProcAttr{}
proc.SysProcAttr.Credential = &syscall.Credential{Uid: 1022, Gid: 1023}
err := proc.Run()
if err != nil {
fmt.Printf("err: %v", err)
}
}
By removing the syscall.Credential part, the application will run without any issues.
My question is: is there some platform/update specific reason that causes this behaviour? Is this no longer a correct way to run a subprocess as another user?
EDIT:
Here's the last 5 lines of strace -f
[pid 3994] futex(0xc21000a888, FUTEX_WAKE, 1 <unfinished ...>
[pid 3995] <... futex resumed> ) = 0
[pid 3994] <... futex resumed> ) = 1
[pid 3995] futex(0xc21000a888, FUTEX_WAIT, 0, NULL <unfinished ...>
[pid 3994] select(0, NULL, NULL, NULL, {0, 20}) = 0 (Timeout)
[pid 3994] futex(0x7f615c51a4f8, FUTEX_WAIT, 0, NULL
So apparently if I'm interpreting this right it's blocking at futex_wait.
You should execute your application with strace. So strace myapp and see where it locks up. It could be you have something else that's forking before your application executes, which is causing it to hang.