opendj 3.0 replication failed to start for about 2m entries

opendj 3.0 replication failed to start for about 2m entries - opendj

I'm testing opendj 3.0 replicatoin.
I have two opendj nodes which is a replica. The replication works nice.
But when I added about 2m entries, one opendj node failed to restart. I tried several times, but no luck. According to server.out, looks like some TimedOut, I'm not sure if it's related.
Any idea or workaround. I followed https://forum.forgerock.com/topic/replication-server-timed-out-waiting-for-monitor-data/ , add changed the monitor data timeout from 5 seconds to 60 seconds, and still no luck.
[03/Aug/2017:04:44:20 -0400] category=PLUGGABLE severity=NOTICE msgID=org.opends.messages.backend.513 msg=The database backend userRoot containing 2075308 entries has started
[03/Aug/2017:04:44:21 -0400] category=EXTENSIONS severity=NOTICE msgID=org.opends.messages.extension.221 msg=DIGEST-MD5 SASL mechanism using a server fully qualified domain name of: stg2-n6.nscloud.local
[03/Aug/2017:04:44:22 -0400] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.204 msg=Replication server RS(31748) started listening for new connections on address 0.0.0.0 port 8989
[03/Aug/2017:04:44:23 -0400] category=SYNC severity=NOTICE msgID=org.opends.messages.replication.62 msg=Directory server DS(27712) has connected to replication server RS(31748) for domain "cn=admin data" at stg2-n6.nscloud.local/192.168.30.46:8989 with generation ID 161237
[03/Aug/2017:04:45:23 -0400] category=SYNC severity=WARNING msgID=org.opends.messages.replication.106 msg=Timed out waiting for monitor data for the domain "cn=schema" from replication server RS(19987)
[03/Aug/2017:04:46:23 -0400] category=SYNC severity=WARNING msgID=org.opends.messages.replication.106 msg=Timed out waiting for monitor data for the domain "dc=example,dc=com" from replication server RS(19987)
[03/Aug/2017:04:46:23 -0400] category=SYNC severity=WARNING msgID=org.opends.messages.replication.106 msg=Timed out waiting for monitor data for the domain "cn=admin data" from replication server RS(19987)

Related

Ignite client failover

We have this Ignite cluster configuration: there are several servers + 1 client, which acts as a balancer. For example, IP addresses are:
server1 - 192.168.100.1
server2 - 192.168.100.2
server3 - 192.168.100.3
client - 192.168.100.100
So requests go to the client - for example http://192.168.100.100:8082/request1
Then the client sends the Distributed Computing Task to the cluster - the calculations are performed on one of the servers, so on 192.168.100.1 or 192.168.100.2 or 192.168.100.3. Results of calculations then return to client, then client finally sends response to the request.
There is no problem if one of the servers crashes - client would be knowing about that and wouldn't send Task on that server. But there is a problem if the client crashes - all servers work, but the address http://192.168.100.100:8080/request1 is not available.
What can I do about it? Can client be failover? Does Ignite have something for that? If not, what other technology/software can I use?

Running image with aws ecs throws 504 Gateway Time-out

I dockerized my Application. If i run it with docker run, evertything works fine.
I tried to run it with ecs fargate and put an ALB infront of it.
If i try to access my Application via the ALB dns, i get an 504 Gateway Teme-out back.
While searching a solution, i found an post, which told me to set the Tomcat timeout higher than the ELB timeout, but it doesn't helped.
Dockerfile
FROM tomcat:8.0.20-jre8
RUN sed -i 's/connectionTimeout="20000"/connectionTimeout="70000"/' /usr/local/tomcat/conf/server.xml
CMD ["catalina.sh","run"]
COPY /target/Webshop.war /usr/local/tomcat/webapps/
ELB Log
http 2019-09-11T11:20:50.585293Z app/Doces-Backe-19RQJLVNHYG2P/8fb4f4079bb6ff9f 66.85.6.136:47767 - -1 -1 -1 503 - 18 348 "GET http://:8080/ HTTP/1.0" "-" - - arn:aws:elasticloadbalancing:eu-central-1:573575081005:targetgroup/ecs-Docest-de-webshop/8df4f0978484f8bd "Root=1-5d78d892-58886d3490906f0fa3914563" "-" "-" 0 2019-09-11T11:20:50.462000Z "forward" "-" "-"
http 2019-09-11T11:23:23.535869Z app/Doces-Backe-19RQJLVNHYG2P/8fb4f4079bb6ff9f 66.85.6.136:50950 10.10.11.140:8080 -1 -1 -1 504 - 18 303 "GET http://:8080/ HTTP/1.0" "-" - - arn:aws:elasticloadbalancing:eu-central-1:573575081005:targetgroup/ecs-Docest-de-webshop/8df4f0978484f8bd "Root=1-5d78d921-a236121716bd1bd209625fd8" "-" "-" 0 2019-09-11T11:23:13.415000Z "forward" "-" "-"
http 2019-09-11T11:23:56.286426Z app/Doces-Backe-19RQJLVNHYG2P/8fb4f4079bb6ff9f 66.85.6.136:51658 10.10.11.140:8080 -1 -1 -1 504 - 18 303 "GET http://:8080/ HTTP/1.0" "-" - - arn:aws:elasticloadbalancing:eu-central-1:573575081005:targetgroup/ecs-Docest-de-webshop/8df4f0978484f8bd "Root=1-5d78d942-22a1680464884762e02ec940" "-" "-" 0 2019-09-11T11:23:46.156000Z "forward" "-" "-"
http 2019-09-11T11:23:27.513803Z app/Doces-Backe-19RQJLVNHYG2P/8fb4f4079bb6ff9f 66.85.6.136:51034 10.10.11.140:8080 -1 -1 -1 504 - 18 303 "GET http://:8080/ HTTP/1.0" "-" - - arn:aws:elasticloadbalancing:eu-central-1:573575081005:targetgroup/ecs-Docest-de-webshop/8df4f0978484f8bd "Root=1-5d78d925-b6b5daf0d0f733140aea0f84" "-" "-" 0 2019-09-11T11:23:17.393000Z "forward" "-" "-"
I expected to see my application running at the elb.
Thanks for your help!

Solution:
The problem was that I set the correct port in the security group of the load balancer, but not in that of the ECS service.
So I opened the required port there and now it works.
Procedure:
Go to your cluster
Go to the service with the problem
Click on the Security Group under the item Network Access and open the required port
Thanks!

There can be multiple reasons behind gateway timeout. The only thing that I do not like about fargate is debug-log. #AWS team should enable log configuration for fargate service by default as its hard to debug these issues without logs.
Better to configure log driver and push logs to cloud watch and see the actual issue also double check your desired port in task definition and mapped port in service.
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "awslogs-spring",
"awslogs-region": "us-west-2",
"awslogs-stream-prefix": "awslogs-example"
}
or from AWS console
You need to assign permission or role of cloud watch logs to task definition or service to push the logs to Cloud watch.
Once logs are configured then goto cloudwatch loggroup and search the log group so you will insight to your application.
But still, to troubleshoot the actual issue first, you have to understand the error code and possible reason of Gateway Timeout.
HTTP 504: Gateway Timeout
Description: Indicates that the load balancer closed a connection because a request did not complete within the idle timeout period.
Cause 1: The application takes longer to respond than the configured idle timeout.
Solution 1: Monitor the HTTPCode_ELB_5XX and Latency metrics. If there
is an increase in these metrics, it could be due to the application
not responding within the idle timeout period. For details about the
requests that are timing out, enable access logs on the load balancer
and review the 504 response codes in the logs that are generated by
Elastic Load Balancing. If necessary, you can increase your capacity
or increase the configured idle timeout so that lengthy operations
(such as uploading a large file) can complete. For more information,
see Configure the Idle Connection Timeout for Your Classic Load
Balancer and How do I troubleshoot Elastic Load Balancing high
latency.
Cause 2: Registered instances closing the connection to Elastic Load Balancing.
Solution 2: Enable keep-alive settings on your EC2 instances and make
sure that the keep-alive timeout is greater than the idle timeout
settings of your load balancer.

Some postgress connections timing-out while others don't

I have an AWS EC2 machine running a Laravel 5.2 application that connects to a Postgress 9.6 databse running in RDS. While most of the connections work, some of them are getting rejected when trying to stablish, which causes a Timeout and consequently an error in my API. I don't know what is causing them to be rejected. Also, it is very random when it happens, when it does happen it may be in any API endpoint and inside the endpoint in any query.
When the timeout is handled by PHP, it shows a message like:
SQLSTATE[08006] [7] timeout expired (SQL: ...)
Sometimes the Nginx handles the timeout and replies with a 504 Error. When Nginx handles the timeout I get an error like:
2019/04/24 09:48:18 [error] 20657#20657: *3236 upstream timed out (110: Connection timed out) while reading response header from upstream, client: {client-ip-here}, server: {my-url-here}, request: "GET {my-endpoint-here} HTTP/2.0", upstream: "fastcgi://unix:/var/run/php/php7.0-fpm.sock", host: "{}", referrer: "https://app.cartoriovirtual.com/"
All usage charts on the RDS and EC2 seems ok, I have plenty of RAM, storage, CPU and available connections for RDS. I also checked inner VPC Flows and they seem alright, however I have many IPs (listed as attackers) scanning my network interfaces, most of them been rejected. Some (to port 22) accepted but stoped at authentication, I use a .pem Key File for auth.
The RDS network interface only accepts requests from inner VPC machines. In its logs, every 5 minutes I have a Checkpoint like this:
2019-04-25 01:05:29 UTC::#:[22595]:LOG: checkpoint starting: time
2019-04-25 01:05:34 UTC::#:[22595]:LOG: checkpoint complete: wrote 43 buffers (0.1%); 0 transaction log file(s) added, 0 removed, 1 recycled; write=4.393 s, sync=0.001 s, total=4.404 s; sync files=19, longest=0.001 s, average=0.000 s; distance=16515 kB, estimate=16515 kB
Anyone has tips on how to find a solution? I looked at all possible logs that came in mind, fixed a few little issues but the error persists. I am running out of ideas.

Do I need to open port 8300 for consul servers in different DCs?

I have created a Consul architecture that spans across different consul datacenters.
When I now open the UI on one of the consul servers, and switch via the little dropdown menu
to look at at a different datacenter the request times out. In the log I can see this error message:
2016/05/03 06:26:08 [ERR] http: Request GET /v1/internal/ui/nodes?dc=dc1-live&token=<hidden>, error: rpc error: failed to get conn: dial tcp xx.xxx.xxx.xxx:8300: i/o timeout from=xxx.xxx.xxx.xxx:53174
Does this mean I need to open port 8300 additionally to port 8302 between the servers of the different datacenters?

I ended up having port 8300 being opended and that made the error messages go away. So I conclude that it is necessary.

HAProxy load balancing MySQL servers

I have a database cluster of 3 nodes using Percona XtraDB. The three nodes are configured on three different systems. I have used HAProxy load balancer to pass requests to these nodes.
Two of the 3 nodes are configured as backup in HAProxy. When I fire a request to the load balancer connection URL, I can see the request go to node A by default. If node A is down and I request a new database connection, I see the request being routed to node B. This is as per the desired design.
However, if a connection request is sent to HAProxy using a Java program (jdbc URL), the request is routed to node A, after serving a few requests if node A goes down, I wish node B/ node C to serve the request. In the current scenario I see "Connection Failed".
Is there any configuration which will ensure that in case of failure of a node, the database connection will not fail and future requests will be routed to the next available node?
My current HAProxy configuration file is as follows:
global
stats socket /var/run/haproxy.sock mode 0600 level admin
log 127.0.0.1 local2 debug
#chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
daemon
defaults
mode tcp
log global
option tcplog
timeout connect 10000 # default 10 second time out if a backend is not found
timeout client 300000
timeout server 300000
maxconn 20000
# For Admin GUI
listen stats
bind :8080
mode http
stats enable
stats uri /stats
listen mysql *:3306
mode tcp
balance roundrobin
option mysql-check user haproxyUser
option log-health-checks
server MySQL-NodeA <ip-address>:3306 check
server MySQL-NodeB <ip-address>:3306 check backup
server MySQL-NodeC <ip-address>:3306 check backup

Mode tcp under listen *:3306 cannot be use. Check before post here using this command:
haproxy -f /etc/haproxy.cfg -V

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio