ibm-cloud-private CE - 2.1.0 Catalog - Error Loading Charts - ibm-cloud-private

I have just installed ICp CE edition 2.1.0 on Ubuntu 16.04 (one cluster, one master, one worker node, very basic installation). When opening the 'catalog' page (https://.......... :8443/catalog/), I get the message 'Error loading Charts'.
In the 'admin>repositories' page I can see ibm-charts https//blablabla and local-charts https://blablabla/helm-repo/....
The 'admin>metering dasboard dispays an error 'E_DATA_QUERY_ERROR: The query for loginbootstrap failed with the response '500 Internal Server Error'
I have done very few modifications in the config.yaml (and hosts) files in the cluster directory (just configured the password authentication). Maybe some more custom configuration is required.
I'm discovering/learning about this product,maybe there is an obvious explication for such kind of behavior according to an expert.
Thanks

Regarding the "error loading charts", check the following:
Deployments > helm-api > {click the pod name at the bottom} > logs.
Then in another tab open the Admin > Repositories page and click Sync Repositories and watch the log in other tab. Attempt to open the Catalog as well and watch the same log.
If you are seeing any cloudant related error, one possible way to resolve is to delete the helm-api pod and it will reinitialize with the view and the error should go away.
There was possibly an issue when connecting to cloudant when we setup the connection to it. So that helm-api pod needs a restart in order to add some files to cloudant now that it has been initialized.
My understanding is that a fix will be going in to help automate this recovery step in the next release.
As for the 'E_DATA_QUERY_ERROR: The query for loginbootstrap failed with the response '500 Internal Server Error' that was supposedly fixed in the GA release. Are you certain that you have installed the latest ICP from dockerhub for the CE release?
https://www.ibm.com/support/knowledgecenter/SSBS6K_2.1.0/installing/install_containers_CE.html

The two problems, chart loading error and metering 'loginbootstap' error, are likely to have the same root cause: a problem communicating with the Cloudant database at the time of first startup when databases would be initialized. Restarting the helm-api pod would help the charts, and restarting the metering-server and then the metering-ui pods should resolve the Metering error.

Today I have seen the same issue on ICp 2.1.0.1 EE when I try to navigate Catalog -> helm charts page. Page loading for a while then ended with "error loading charts". Weird thing is I didn't do anything, just leave it, after several hours re-visit again and it works.
Next time, I will first try sync repository Manage -> Helm Repositories -> Sync repositories, then check helm-api pod: (kubectl is on Windows)
kubectl -n kube-system get pods |findstr helm-api
then kill the pod if it is not running.

Related

DataHub installation on Minikube failing: "no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"" on elasticsearch setup

Im following the deployement guide of DataHub with Kubernetes present on the documentation: https://datahubproject.io/docs/deploy/kubernetes
Settin up the local clusten with Minikube I've started following the prerequisites session of the guide.
At first I tried to change some of the default values to try it locally (I've already installed it sucessfully on Google Kubernetes Engine, so I was trying different set ups)
But on the first step of the installation I've received the error:
Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: resource mapping not found for name: "elasticsearch-master-pdb" namespace: "" from "": no matches for kind "PodDisruptionBudget" in version "policy/v1beta1"
ensure CRDs are installed first
The steps I've followed after installing Minikube where the exact steps presented on the page:
helm repo add datahub https://helm.datahubproject.io/
helm install prerequisites datahub/datahub-prerequisites
With the error happening on step 2
At first I've changed to the default configuration to see if it wasnt a mistake on the new values, but the error remained.
Ive expected that after followint the exact default steps the installation would be successfull locally, just like it was on the GKE
I got help browsing the DataHub slack community and figured out a way to fix this error.
It was simply a matter of a version error with Kubernetes, I was able to fix it by forcing minikube to start with the 1.19.0 version of Kubernetes:
minikube start --kubernetes-version=v1.19.0

Error syncing pod on starting Beam - Dataflow pipeline from docker

We are constantly getting an error while starting our Beam Golang SDK pipeline (driver program) from a docker image which works when started from local / VM instance. We are using Dataflow runner for our pipeline and Kubernetes to deploy.
LOCAL SETUP:
We have GOOGLE_APPLICATION_CREDENTIALS variable set with service account for our GCP cluster. When running the job from local, job gets submitted to dataflow and completes successfully.
DOCKER SETUP:
Build image used is FROM golang:1.14-alpine. When we pack the same program with Dockerfile and try to run, it fails with error
User program exited: fork/exec /bin/worker: no such file or directory
On checking Stackdriver logs for more details, we see this:
Error syncing pod 00014c7112b5049966a4242e323b7850 ("dataflow-go-job-1-1611314272307727-
01220317-27at-harness-jv3l_default(00014c7112b5049966a4242e323b7850)"),
skipping: failed to "StartContainer" for "sdk" with CrashLoopBackOff:
"back-off 2m40s restarting failed container=sdk pod=dataflow-go-job-1-
1611314272307727-01220317-27at-harness-jv3l_default(00014c7112b5049966a4242e323b7850)"
Found reference to this error in Dataflow common errors doc, but it is too generic to figure out whats failing. After multiple retries, we were able to eliminate any permission / access related issues from pods. Not sure what else could be the problem here.
After multiple attempts, we decided to start the job manually from a new Debian 10 based VM instance and it worked. This brought to our notice that we are using alpine based golang image in Docker which may not have all the required dependencies installed to start the job.
On golang docker hub, we found a golang:1.14-buster where buster is codename for Debian 10. Using that for docker build helped us solve the issue. Self answering here to help anyone else facing the same issues.

Microclimate Pod CrashLoopBackOff in IBM Cloud Private

I'm trying to deploy IBM Microclimate to IBM Cloud Private CE 2.1.0.3, as described in the documentation (https://github.com/IBM/charts/blob/master/stable/ibm-microclimate/README.md), but the Microclimate pod status shows CrashLoopBackOff and the Portal is not accessible (it shows a 503 Service Unavailable error in the browser). I tried looking at the logs for the pod, but that is not possible either. Has anyone faced an issue like this one before? Any hints on how to troubleshoot or solve the issue? Thanks!
That's not a lot of information to go on. If you'd like some more interactive help do please ask in our Slack channel as per https://microclimate-dev2ops.github.io/community. If you want to debug it here, can you please post the results of: kubectl get pods, kubectl get ing, kubectl describe pods, helm list --tls, kubectl get deployments -o yaml. If you installed to a non-default namespace, please add --namespace [your-mc-ns] to each command.
Adding the command "mount --make-rshared /run" to the Vagrant file for the ICP CE image solves this issue and Microclimate is able to be installed successfully. Reference: https://github.com/IBM/deploy-ibm-cloud-private/issues/139

cloudera host with bad health during install

Trying again & again with all required steps completed but cluster Installation when install selected Parcels, always shows every host with bad health. setup never completed at full.
i am installing cm 5.5 on CentOS 6.7 using virtualbox.
The Error
Host is in bad health cm.feuni.edu
Host is in bad health dn1.feuni.edu
Host is in bad health dn2.feuni.edu
Host is in bad health nn1.feuni.edu
Host is in bad health nn2.feuni.edu
Host is in bad health rm.feuni.edu
above error are shown on step 6 where setup says
The selected parcels are being downloaded and installed on all the hosts in the cluster
in previous step 5 all hosts were completed with heartbeat checks in the end
memory distributions
cm 8GB
all others with 1GB
i could not find proper answer anywhere else. What reason could be for the bad health?
I don't know if it will help you...
For me, after a few days I struggled with it,
I found the log files (at )
It had a comment there is a mismatch of the guid,
so I uninstalled everything from both machines (using the script they give,/usr/share/cmf/uninstall-cloudera-manager.sh , yum remove 'cloudera-manager-*' and deletion of every directory related to cloudera I found...)
and then removed the guid file:
rm /var/lib/cloudera-scm-agent/cm_guid
Afterwards I re-installed everything, and that fixed that issue for me...
I read online that there can be issues with the hostname and things like that, but I guess that if you get to this part of the installation, you already fixed all the domain/FDQN/hosname/hosts issues.
It saddens me there is no real manual/FAQ for this product.. :(
Good luck!
I faced the same problem. This is my solution:
First I edited config.ini
$ nano /etc/cloudera-scm-agent/config.ini
so that the hostname where the same as the command $ hostname returned.
then I restarted the agent and the server of cloudera:
$ service cloudera-scm-agent restart
$ service cloudera-scm-server restart
then in cloudera manager I deleted the cluster and added again. The wizard continued to run normally.

Unable to disable Google Analytics via Spring to address DS-2718 (failed GA connections prevent file downloads) when building DSpace 5.3 with Mirage 2

I'm working on a fresh installation of stock DSpace 5.3 (Windows Server 2012, Tomcat 8.0, Maven 3.2.5, Ant 1.9.6). This particular instance will be a dark archive without Google Analytics enabled; we don't currently have a GA account or analytics key, although we plan to register one eventually for a separate public-facing instance.
As per the problem described in JIRA ticket DS-2718, DSpace hangs with the following message in dspace.log when I attempt to download a bitstream:
2015-10-20 09:52:02,324 INFO org.apache.http.impl.execchain.RetryExec
# I/O exception (java.net.SocketException) caught when processing
request to {s}->https://www.google-analytics.com:443: Network is
unreachable: connect
2015-10-20 09:52:02,324 INFO org.apache.http.impl.execchain.RetryExec
# Retrying request to {s}->https://www.google-analytics.com:443
Since we won't be using GA on this instance, disabling it in Spring is a good workaround until the issue is resolved. As per the instructions, I commented out the Google Analytics entry in dspace-5.3-src-release\dspace-xmlui\src\main\webapp\WEB-INF\spring\applicationContext.xml, disabled Tomcat and rebuilt DSpace. An initial attempt running mvn package -Dmirage2.on=true still produced the problem, so I tried a "ground up" rebuild:
cd d:\dspace-5.3-src-release\dspace
mvn clean package -U -Dmirage2.on=true
[successful build]
cd d:\dspace-5.3-src-release\dspace\target\dspace-installer
ant update
[successful update]
[copy webapps to Tomcat 8.0\webapps and start Tomcat]
Even after the rebuild, however, I'm still getting the same error, with the same java.net.SocketException in dspace.log.
Not sure why this isn't working. Have I missed a step or setting in the rebuild process so that the change to applicationContext.xml isn't being applied?
FWIW, I tried grepping for "google" in dspace-5.3-src-release\dspace-xmlui-mirage2 to see if this could be a Mirage 2 problem, but I don't see anything that looks relevant.
This isn't an answer to why you're still seeing the SocketException, but the real fix for the problem you're describing is to remove the default GA key from dspace-services/src/main/resources/config/dspace-defaults.cfg, see https://github.com/DSpace/DSpace/commit/5b84fef1ad789443d06c338558a92f854b20c8ef. Have you tried doing that?
The issue resolved itself after I ran mvn clean -Dmirage2.on=true in both [dspace-src] and [dspace-src]\dspace. I'm guessing that the issue originated on our end due to someone running a maven build from the wrong directory.
I've also removed the default key from dspace-defaults.cfg as suggested. Everything's now working.

Resources