AWS ECS WaitUntilTasksRunningWithContext returning ResourceNotReady - go

I'm not very proficient with Go but seeing the following error while waiting for an ECS task to start:
error waiting AWS ECS Task "arn:aws:ecs:eu-west-1:123456789012:task/ecs-cluster/22452be490a149e781a596a7847dd27c" to be in "Running" state: ResourceNotReady: failed waiting for successful resource state
The task being started has a launch type of EC2. Sometimes it will start, but fairly regularly it will return that error.
We get that while doing:
input := ecs.DescribeTasksInput{
Cluster: &cluster,
Tasks: []*string{&taskARN},
}
err = a.ecsSvc.WaitUntilTasksRunningWithContext(ctx, &input)
if err != nil {
return fmt.Errorf(`error waiting AWS Fargate Task %q to be in "Running" state: %w`, taskARN, err)
}
What would be the better solution here? To send in a custom WaiterOption to WaitUntilTasksRunningWithContext or to rather just retry the method if we get this failure?
For additional context. When checking in the console, I can see the task running, so this is most probably a case of WaitUntilTasksRunningWithContext returning too quickly.
If a retry is a good option, how would that look like?

Related

`Invalid blockhash` error while deploying smart contract

I am trying to deploy the demo smart contract on solana for chainlink price feed but getting an error. I followed all the steps from https://docs.chain.link/docs/solana/using-data-feeds-solana/
$ anchor deploy --provider.wallet ./id.json --provider.cluster devnet
Deploying workspace: https://api.devnet.solana.com
Upgrade authority: ./id.json
Deploying program "chainlink_solana_demo"...
Program path: /home/test/solana-starter-kit/target/deploy/chainlink_solana_demo.so...
=============================================================================
Recover the intermediate account's ephemeral keypair file with
`solana-keygen recover` and the following 12-word seed phrase:
=============================================================================
until reason almost can clean wish trend buffalo future auto artefact balcony
=============================================================================
To resume a deploy, pass the recovered keypair as the
[BUFFER_SIGNER] to `solana program deploy` or `solana write-buffer'.
Or to recover the account's lamports, pass it as the
[BUFFER_ACCOUNT_ADDRESS] argument to `solana program close`.
=============================================================================
Error: Custom: Invalid blockhash
There was a problem deploying: Output { status: ExitStatus(unix_wait_status(256)), stdout: "", stderr: "" }.
This is just a timeout error message. It happens from time to time depending on the network availability
Follow these steps and run again. I've also found that you'll want quite a bit more sol in your wallet than you think you need.
To resume a deploy, pass the recovered keypair as the
[BUFFER_SIGNER] to `solana program deploy` or `solana write-buffer'.
Or to recover the account's lamports, pass it as the
[BUFFER_ACCOUNT_ADDRESS] argument to `solana program close`.

How to propagate a Kubernetes operator error to kubectl command line?

I have a Kubernetes operator that creates a new Deployment based upon the custom resource configuration. There are some error conditions that will cause a failure and the Deployment creation step is skipped. Is it possible to have the error text displayed on the command line?
At the moment I have:
err := validateSettings()
if err != nil {
// Log the error
logger.Error(err, "The Deployment settings are invalid")
// I also record the event in the custom object
r.recorder.Event(object, "Warning", "Failed", err.Error())
return reconcile.Result{}, err
}
When a user creates the custom object, the deployment is not created but the command line says that the custom object was created successfully.
# kubectl apply -f myobject.yaml
test.com/my-object created
The logs for the operator show the error and the a describe of the custom object show the event. I was hoping to have the event text displayed after the kubectl apply command.

process interrupted: signal: killed

I installed a utility called watcher.
https://github.com/canthefason/go-watcher
It works as expected using VS code.
But when I tried to use it in Goland (from Jetbrains) I get the following:
watcher main.go --port 8080
2020/03/04 14:10:42 build started
Building ....
2020/03/04 14:10:43 build completed
Running ...
2020/03/04 14:10:43 process interrupted: signal: killed
Needless to say go run main.go --port 8080 works.
I use a MAC Catalina.
Any suggestions?
Looks like an error from cmd.Wait()
if err := cmd.Wait(); err != nil {
log.Printf("process interrupted: %s \n", err)
...
A similar report was the OS killing the process because it was out of memory (OOM), and dmesg might have logged the error.

telegraf output to Elasticsearch: "health check timeout: no Elasticsearch node available"

I'm having trouble connecting to an Elasticsearch instance with a Telegraf output plugin.
I created an Elasticsearch setup via the Elasticsearch service. I created a user and password (connected to a role) in Kibana for it.
Then I setup a Telegraf output for it:
[[outputs.elasticsearch]]
urls = [ "https://hostname:port" ] # required.
timeout = "5s"
enable_sniffer = false
health_check_interval = "10s"
## HTTP basic authentication details.
username = "my_username"
password = "my_password"
index_name = "device_logs" # required.
insecure_skip_verify = true
manage_template = true
template_name = "telegraf"
overwrite_template = false
But when I try to start Telegraf with this, it just gives the error,
[agent] Failed to connect to [outputs.elasticsearch], retrying in 15s, error was 'health check timeout: no Elasticsearch node available'
The connect fail seems to originate deep in the bowels of golang's net/http library, and I don't know how to get some more useful output at this point.
Things I've tried:
Thing #1: I tested cURL:
curl -u my_username:my_password -X POST "https://hostname:port/device_logs/_doc" -H 'Content-Type: application/json' -d'
{
"name": "John Doe"
}'
This works fine.
Thing #2: I created a simple Go program to connect to elasticsearch from Go:
package main
import (
"log"
"time"
"gopkg.in/olivere/elastic.v3"
)
func main() {
// configure connection to ES
client, err := elastic.NewClient(elastic.SetURL("https://hostname:port"))
if err != nil {
panic(err)
}
log.Printf("client.running? %v",client.IsRunning())
if ! client.IsRunning() {
panic("Could not make connection, not running")
}
}
.. and it hits the first panic with the same "no Elasticsearch node available".
Thing #3: I tried running gdb on that Go program to debug into it.
It jumps down to assembly as soon as I call NewClient, so I can't really learn what is happening in the bowels of net/http.
I've never used Go before, so I'm hoping to avoid hours of learning Go, spelunking, and debugging to get around what hopefully is a simple issue here.
Any ideas on how to get more info here or why this is failing? Are there build or runtime flags for Go that I can use? gdb-with-Go debugging tips so I can step down into the Go library code? Elasticsearch client know-how?
To answer my own question, the problem here turned out to be the roles permissions. The Telegraf output plugin for Elasticsearch needs both the monitor and the manage_index_templates permissions to be enabled, or else it'll fail to connect to the Elasticsearch server without printing any information about why.
BTW: to build golang code and be able to debug into the libraries it calls:
go build -gcflags=all="-N -l"

Client timeout exceeded while awaiting headers

I got below error, I am using go v1.10.4 linux/amd64.
I am not behind any firewall or whatsoever. New Relic in java server (same network segment) that we have runs fine.
We have tried:
Increasing the timeout to 60 seconds
Use http2 in the server
Using Postman return 503 with response:
{"exception":{"message":"Server Error","error_type":"RuntimeError"}}
troubleshooting with ./nrdiag says “No Issues Found”
Below is our code:
config := newrelic.NewConfig(os.Getenv("NEW_RELIC_APP_NAME"), os.Getenv("NEW_RELIC_KEY"))
config.Logger = newrelic.NewDebugLogger(os.Stdout)
app, err := newrelic.NewApplication(config)
if err != nil {
fmt.Println("Failed to create newrelic application", err)
os.Exit(1)
}
.................
httpListener, err := net.Listen("tcp", *httpAddr)
if err != nil {
oldlog.Print("Error: ", err)
logger.Log("transport", "HTTP", "during", "Listen", "err", err)
os.Exit(1)
}
g.Add(func() error {
logger.Log("transport", "HTTP", "addr", *httpAddr)
return http.Serve(httpListener, nrgorilla.InstrumentRoutes(httpHandler, app))
}, func(error) {
httpListener.Close()
})
}
However this what we got,note some_key was removed:
(28422) 2019/07/29 18:08:50.058559 {"level":"warn","msg":"application connect failure","context":{"error":"Post https://collector-001.eu01.nr-data.net/agent_listener/invoke_raw_method?license_key=some_key\u0026marshal_format=json\u0026method=connect\u0026protocol_version=17: net/http: request canceled (Client.Timeout exceeded while awaiting headers)"}}
I think it is due to DNS network timeout.
You can easily test this out by using the following steps (in Ubuntu)
Select the IPv4 Settings tab.
Disable the “Automatic” toggle switch and enter the DNS resolvers' IP addresses, separated by a comma. We’ll use the Google DNS nameservers:
8.8.8.8,8.8.4.4
If it works, then you may be able to reset the DNS to "Automatic"
On Windows OS, Running Linux Containers with WSL2, i followed the following steps,
Ran the command docker logout
Ran the command, docker network prune, so as to remove all the preconfigured settings of the network.
From Docker Settings, Enabled the DNS server configuration with 8.8.8.8
Restarted the Docker
Now executed login command with registry to login, docker login {registry}

Resources