Anthos on VMWare deploy seesaw, health check in error 403 Forbidden - google-anthos

We are installing Anthos on VMWare platform and now we have an error in the Admin Cluster deployment procedure of the Seesaw Loadbalancer in HA.
The Deploy of two Seesaw VMs has been created with success, but when checking the health check we get the following error 403:
ubuntu#anth-mgt-wksadmin:~$ gkectl create loadbalancer --config admin-cluster.yaml -v5
Reading config with version "v1"
- Validation Category: OS Images
- [SUCCESS] Admin cluster OS images exist
- Validation Category: Admin Cluster VCenter
- [SUCCESS] Credentials
- [SUCCESS] DRS enabled
- [SUCCESS] Hosts for AntiAffinityGroups
- [SUCCESS] vCenter Version
- [SUCCESS] ESXi Version
- [SUCCESS] Datacenter
- [SUCCESS] Datastore
- [SUCCESS] Resource pool
- [SUCCESS] Folder
- [SUCCESS] Network
- Validation Category: Bundled LB
- [FAILURE] Seesaw validation: admin cluster lb health check failed: LB "10.25.94.229" is not healthy: received 403 Forbidden
- Validation Category: Network Configuration
- [SUCCESS] CIDR, VIP and static IP (availability and overlapping)
- Validation Category: GCP
- [SUCCESS] GCP service
- [SUCCESS] GCP service account
Some validation results were FAILURE or UNKNOWN. Check report above.
Preflight check failed with preflight check failed
Exit with error:
also this simple test give the same result
root#jump-mgm-wks:~# wget http://10.25.94.229
--2021-07-27 13:56:04-- http://10.25.94.229/
Connecting to 10.173.119.123:8080... connected.
Proxy request sent, awaiting response... 403 Forbidden
2021-07-27 13:56:04 ERROR 403: Forbidden.
We get also this error on log:
ubuntu#anth-mgt-bigip1:/var/log/seesaw$ cat seesaw_ha.anth-mgt-bigip1.root.log.ERROR.20210727-123208.1738
Log file created at: 2021/07/27 12:32:08
Running on machine: anth-mgt-bigip1
Binary: Built with gc go1.15.11 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0727 12:32:08.331013 1738 main.go:86] config: Failed to retrieve Config: HAConfig: Dial failed: dial unix /var/run/seesaw/engine/engine.sock: connect: no such file or directory

Solved after the recreation of the admin workstation with the following parameter.
gkectl delete loadbalancer --config admin-cluster.yaml --seesaw-group-file seesaw-for-gke-admin.yaml
now save the following files from ubuntu home director of the admin workstation to the jump-mgm-wks in /backup
amin-cluster.yaml
admin-cluster-ipblock.yaml
admin-seesaw-ipblock.yaml
gkeadm delete admin-workstation
gkeadm create admin-workstation --auto-create-service-accounts
gkectl create loadbalancer --config admin-cluster.yaml

Related

ELK MetricBeat - Monitor remote mysqlDB

I have installed ELK stack (With metricbeat) on ServerA and want to monitor mysql on ServerB. I added db host details on ServerA mysql.yml metribeat module file (/etc/metricbeat/modules.d/mysql.yml)
- module: mysql
metricsets:
- status
- performance
period: 10s
hosts: ["tcp(ServerB:3306)/"]
username: mysql
password:password
After I start the metricbeat instead of connecting to ServerB it tries to connect to localhost(ServerA) mysql.
below is the error
Error fetching data for metricset mysql.status: Error 1045: Access denied for user 'mysql'#'ServerA...sing password: YES)
Can someone help me with this?
It was an access issue once I grant required privileges I was able to connect.

Error starting distributed load test in non gui mode with JMETER

I get the below error in the master machine while running distributed load test in non gui mode with JMETER. How can I resolve this.
Message in master
C:\apache-jmeter-5.4.1\bin>jmeter -Djava.rmi.server.hostname=xx.xx.xx.xx -n -t C:\apache-jmeter-5.4.1\bin\examples\masterslavetest.jmx -l C:\apache-jmeter-5.4.1\bin\examples\result.jtl -R xx.xx.xx.xx
Creating summariser <summary>
Created the tree successfully using C:\apache-jmeter-5.4.1\bin\examples\masterslavetest.jmx
Configuring remote engine: xx.xx.xx.xx
Using local port: 4000
Starting distributed test with remote engines: [xx.xx.xx.xx] # Thu Mar 04 18:53:43 GMT 2021 (1614884023471)
Error in rconfigure() method java.rmi.MarshalException: error marshalling arguments; nested exception is:
java.io.NotSerializableException: org.apache.jmeter.JMeter$ListenToTest
Remote engines have been started:[]
The following remote engines have not started:[xx.xx.xx.xx]
Waiting for possible Shutdown/StopTestNow/HeapDump/ThreadDump message on port 4445
Message in slave
Using local port: 4000
Created remote object: UnicastServerRef2 [liveRef: [endpoint:[xx.xx.xx.xx:4000,SSLRMIServerSocketFactory(host=lhr4-pegajm-03/xx.xx.xx.xx, keyStoreLocation=rmi_keystore.jks, type=JKS, trustStoreLocation=rmi_keystore.jks, type=JKS, alias=rmi),SSLRMIClientSocketFactory(keyStoreLocation=rmi_keystore.jks, type=JKS, trustStoreLocation=rmi_keystore.jks, type=JKS, alias=rmi)](local),objID:[79aa42b8:177fe8fb2b5:-7fff, 5964228045381296735]]]
Below are few additional details.
Jmeter : 5.4.1
Java : 15
Running the tests on windows 10 vms
Opened server.rmi.localport, client.rmi.localport, server.rmi.port
Slave doesn't show any logs

Docker service tasks stuck in preparing state after reboot on windows

Restarting a windows server that is a swarm worker, causes windows containers to get stuck in a "Preparing" state indefinitely once the server and docker daemon are back online.
Image of tasks/containers stuck in preparing state:
https://user-images.githubusercontent.com/4528753/65180353-4e5d6e80-da22-11e9-8060-451150865177.png
Steps to reproduce the issue:
Create a swarm (in my case I have CentOS7 managers, and a few windows server 1903 workers)
Create a "global" docker service that only runs on the windows machines. They should start up fine
initially and work just fine.
Drain one or more of the windows nodes that is running the windows container(s) from step 2 (docker node update --availability=drain nodename)
Restart one or more of the nodes that were drained in step 3, wait for them to come back up
Set the windows node(s) back to active (docker node update --availability=active nodename)
At this point, just observe that the docker service created in step 2 will be "Preparing" the containers to start up on these nodes, and there it will stay (docker service ps servicename --no-trunc) -- you can observe this and run these commands from any master node
memberlist: Refuting a suspect message (from: c9347e85405d)
memberlist: Failed to send ping: write udp 10.60.3.40:7946->10.60.3.110:7946: wsasendto: The requested address is not valid in its
context.
grpc: addrConn.createTransport failed to connect to {10.60.3.110:2377 0 <nil>}. Err :connection error: desc = "transport: Error while
dialing dial tcp 10.60.3.110:2377: connectex: A socket operation was attempted to an unreachable host.". Reconnecting... [module=grpc]
memberlist: Failed to send ping: write udp 10.60.3.40:7946->10.60.3.186:7946: wsasendto: The requested address is not valid in its
context.
grpc: addrConn.createTransport failed to connect to {10.60.3.110:2377 0 <nil>}. Err :connection error: desc = "transport: Error while
dialing dial tcp 10.60.3.110:2377: connectex: A socket operation was attempted to an unreachable host.". Reconnecting... [module=grpc]
agent: session failed [node.id=wuhifvg9li3v5zuq2xu7c6hxa module=node/agent error=rpc error: code = Unavailable desc = all SubConns are
in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 10.60.3.69:2377:
connectex: A socket operation was attempted to an unreachable host." backoff=6.3s]
Failed to send gossip to 10.60.3.110: write udp 10.60.3.40:7946->10.60.3.110:7946: wsasendto: The requested address is not valid in its
context.
Failed to send gossip to 10.60.3.69: write udp 10.60.3.40:7946->10.60.3.69:7946: wsasendto: The requested address is not valid in its
context.
Failed to send gossip to 10.60.3.105: write udp 10.60.3.40:7946->10.60.3.105:7946: wsasendto: The requested address is not valid in its
context.
Failed to send gossip to 10.60.3.69: write udp 10.60.3.40:7946->10.60.3.69:7946: wsasendto: The requested address is not valid in its
context.
Failed to send gossip to 10.60.3.186: write udp 10.60.3.40:7946->10.60.3.186:7946: wsasendto: The requested address is not valid in its
context.
Failed to send gossip to 10.60.3.105: write udp 10.60.3.40:7946->10.60.3.105:7946: wsasendto: The requested address is not valid in its
context.
Failed to send gossip to 10.60.3.186: write udp 10.60.3.40:7946->10.60.3.186:7946: wsasendto: The requested address is not valid in its
context.
Failed to send gossip to 10.60.3.69: write udp 10.60.3.40:7946->10.60.3.69:7946: wsasendto: The requested address is not valid in its
context.
Failed to send gossip to 10.60.3.105: write udp 10.60.3.40:7946->10.60.3.105:7946: wsasendto: The requested address is not valid in its
context.
Failed to send gossip to 10.60.3.109: write udp 10.60.3.40:7946->10.60.3.109:7946: wsasendto: The requested address is not valid in its
context.
Failed to send gossip to 10.60.3.69: write udp 10.60.3.40:7946->10.60.3.69:7946: wsasendto: The requested address is not valid in its
context.
Failed to send gossip to 10.60.3.110: write udp 10.60.3.40:7946->10.60.3.110:7946: wsasendto: The requested address is not valid in its
context.
memberlist: Failed to send gossip to 10.60.3.105:7946: write udp 10.60.3.40:7946->10.60.3.105:7946: wsasendto: The requested address is
not valid in its context.
memberlist: Failed to send gossip to 10.60.3.186:7946: write udp 10.60.3.40:7946->10.60.3.186:7946: wsasendto: The requested address is
not valid in its context.
Many of these errors are odd, for example... 7946 is totally open between the cluster nodes, telnets confirm this.
I expect to see the docker service containers start promptly, and not stuck in a Preparing state. The docker image is already pulled, it should be fast.
docker version output
Client: Docker Engine - Enterprise
Version: 19.03.2
API version: 1.40
Go version: go1.12.8
Git commit: c92ab06ed9
Built: 09/03/2019 16:38:11
OS/Arch: windows/amd64
Experimental: false
Server: Docker Engine - Enterprise
Engine:
Version: 19.03.2
API version: 1.40 (minimum version 1.24)
Go version: go1.12.8
Git commit: c92ab06ed9
Built: 09/03/2019 16:35:47
OS/Arch: windows/amd64
Experimental: false
docker info output
Client:
Debug Mode: false
Plugins:
cluster: Manage Docker clusters (Docker Inc., v1.1.0-8c33de7)
Server:
Containers: 4
Running: 0
Paused: 0
Stopped: 4
Images: 4
Server Version: 19.03.2
Storage Driver: windowsfilter
Windows:
Logging Driver: json-file
Plugins:
Volume: local
Network: ics l2bridge l2tunnel nat null overlay transparent
Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
Swarm: active
NodeID: wuhifvg9li3v5zuq2xu7c6hxa
Is Manager: false
Node Address: 10.60.3.40
Manager Addresses:
10.60.3.110:2377
10.60.3.186:2377
10.60.3.69:2377
Default Isolation: process
Kernel Version: 10.0 18362 (18362.1.amd64fre.19h1_release.190318-1202)
Operating System: Windows Server Datacenter Version 1903 (OS Build 18362.356)
OSType: windows
Architecture: x86_64
CPUs: 4
Total Memory: 8GiB
Name: SWARMWORKER1
ID: V2WJ:OEUM:7TUQ:WPIO:UOK4:IAHA:KWMN:RQFF:CAUO:LUB6:DJIJ:OVBX
Docker Root Dir: E:\docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: this node is not a swarm manager - check license status on a manager node
Additional Details
These nodes are not using Docker Desktop for windows. I am provisioning docker on the box primarily based on the powershell instructions here: https://docs.docker.com/install/windows/docker-ee/
Windows firewall is disabled
iptables/firewalld is disabled
Communication is completely open between the cluster nodes
Totally up-to-date on cumulative updates
I posted on the moby repo issues but never heard a peep:
https://github.com/moby/moby/issues/39955
The ONLY way I've found to temporarily fix the issue is to drain the node from the swarm, delete docker files, reinstall windows "Containers" feature, then rejoin to the swarm. But, it happens again on reboot.
What's interesting is that when I see a swarm task in a "Preparing" state on the windows worker, the server doesn't seem to be doing anything at all, it's like the manager thinks the worker is preparing the container, but it isn't...
Anyone have any suggestions??

Setting up ELK stack

I'm completely new to ELK and trying to install the stack with some beats for our servers.
Elasticsearch, Kibana and Logstash are all installed (on server A). I followed this guide here https://www.elastic.co/guide/en/elastic-stack/current/installing-elastic-stack.html.
Filebeat template was installed as well.
I also installed filebeat on another server (server B), and was trying to test the connection
$ /usr/share/filebeat/bin/filebeat test output -c
/etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -
path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs
/var/log/filebeat
logstash: my-own-domain:5044...
connection...
parse host... OK
dns lookup... OK
addresses: 163.172.167.147
dial up... OK
TLS...
security: server's certificate chain verification is enabled
handshake... OK
TLS version: TLSv1.2
dial up... OK
talk to server... OK
Things seems to be ok, yet data from filebeat on server B doesn't seem to be sending data to logstash.
Accessing Kibana keeps redirecting me back to Create Index pattern, with the message
Couldn't find any Elasticsearch data
Any direction pointing would be really appreciated.
Can you check your filebeat.yml file and see if configuration for logs are activated :
filebeat.prospectors:
- type: log
enabled: true
paths:
- /var/log/*.log

Starting Realm Object server on AWS stalls

I've been trying to use Realm Object Server deployed on an Amazon ec2 instance, using the basic Amazon Ubuntu AMI (since the Realm AMI has ROS v.1.8.3).
To use the latest ROS (v2.x) I followed Realm's instructions to use curl -s https://raw.githubusercontent.com/realm/realm-object-server/master/install.sh | bash which appears to execute successfully. I follow that script's instructions to load nvm and use its latest version.
Then I run ros start. Here's what I get:
info: Loaded feature token capabilities=[Sync], expires=Wed Apr 19 2017 14:15:29 GMT+0000 (UTC)
info: Realm Object Server version 2.0.18 is starting
info: [sync] Realm sync server started ([realm-core-4.0.3], [realm-sync-2.1.4])
info: [sync] Directory holding persistent state: /home/ubuntu/data/sync/user_data
info: [sync] Operating mode: master_with_no_slave
info: [sync] Log level: info
info: [sync] Download log compaction is enabled
info: [sync] Max download size: 131072 bytes
info: [sync] Listening on 127.0.0.1:40134 (sync protocol version 22)
info: [http] 127.0.0.1 - GET /realms/files/%2F__wildcardpermissions HTTP/1.1 200 55 - 56.996 ms
info: [http] 127.0.0.1 - GET /realms/files/%2F__password HTTP/1.1 200 44 - 53.009 ms
info: [http] 127.0.0.1 - GET /realms/files/%2F__perm HTTP/1.1 200 40 - 9.402 ms
info: Autocreated admin user: realm-admin
info: Realm Object Server has started and is listening on http://0.0.0.0:9080
info: [http] 127.0.0.1 - GET /realms/files/%2F__admin HTTP/1.1 200 41 - 4.187 ms
info: [http] 127.0.0.1 - GET /realms/files/%2F__admin HTTP/1.1 200 41 - 29.902 ms
And then...nothing. It doesn't even get me back to my ubuntu#ip-XXX-XX-XX-XX: prompt. (It's possible that this is exactly what you'd expect but I'm pretty new to these kind of processes).
When I try to access my server in the browser (my DNS:9080) the browser says Cannot GET / and the CLI says info: [http] 96.2xx.xxx.xxx - GET / HTTP/1.1 404 139 - 0.521 ms
The security groups for my ec2 instance are:
HTTP / TCP / 80 / 0.0.0.0/0
SSH / TCP / 22 / 0.0.0.0/0
Custom UDP Rule / UDP / 9080 / 0.0.0.0/0
Custom TCP Rule / TCP / 9080 / 0.0.0.0/0
I'm stuck. What am I doing wrong? Thanks for your help.
The web based dashboard was part of ROS 1.x, but was replaced by Realm Studio in ROS 2.0.

Resources