I have get the script file from https://github.com/nmilford/rpm-mesos/blob/master/mesos-master and I start to execute
service mesos-master start
end up I got this
Starting Mesos Master daemon (mesos-master):
Password:
su: Authentication failure
How do I start my mesos-master?
This is not a Mesos authentication failure. It is a failure to run su $mesosUser, when $mesosUser may not even be set. This repo is very old, so I would recommend looking for a newer rpm. If you don't need to build it yourself, you can download one from http://mesosphere.com/downloads/details/index.html#apache-mesos
Related
I have got an assignment. The assignment is "Write a shell script to install and configure docker swarm(one master/leader and one node) and automate the process using Jenkins." I am new to this technology and finding it difficult to proceed. Can anyone help me in explaining step-by-step process of how to proceed?
#Rajnish Kumar Singh, Have you tried to check resources online? I understand you are very new to this technology, but googling some key words like
what is docker swarm
what is jenkins , etc would definitely helps
Having said that, Basically you need to do below set of steps to complete your assignment
Pre-requisites
2 or more - Ubuntu 20.04 Server
(You can use any linux distros like ubuntu, Redhat etc, But make sure your install and execute commands change accordingly.
Here we need two nodes mainly to configure the master and worker node cluster)
Eg :
manager --- 132.92.41.4
worker --- 132.92.41.5
You can create these nodes in any of public cloud providers like AWS EC2 instances or GCP VMs etc
Next, You need to do below set of steps
Configure Hosts
Install Docker-ce
Docker Swarm Initialization
You can refer this article for more info https://www.howtoforge.com/tutorial/ubuntu-docker-swarm-cluster/
This completes first part of your assignment.
Next, You can create one small shell script and include all those install and configuration commands in that script. Basically shell script is collection of set of linux commands. Instead of running each commands separately , you will run script alone and all set up will be done for you.
You can create small script using touch command
touch docker-swarm-install.sh
Specify proper privileges to script to make it executable
chmod +x docker-swarm-install.sh
Next include all your install + configure commands, which you have used earlier to do docker swarm set up in scripts (You can refer above shared link)
Now, when your script is ready, you can configure this script in jenkins job and whenever jenkins job is run, script will get execute and docker swarm cluster will be created
You need a jenkins server. Jenkins is open source software, you can install it in any of public cloud instance (Aws EC2)
Reference : https://devopsarticle.com/how-to-install-jenkins-on-aws-ec2-ubuntu-20-04/
Next once installation is completed. You need to configure job in jenkins
Reference : https://www.toolsqa.com/jenkins/jenkins-build-jobs/
Add your 'docker-swarm-install.sh' as build step in created job
Reference : https://faun.pub/jenkins-jobs-hands-on-for-the-different-use-cases-devops-b153efb483c7
If all set up is successful and now when you run your jenkins job, your docker swarm cluster must be get created.
We are constantly getting an error while starting our Beam Golang SDK pipeline (driver program) from a docker image which works when started from local / VM instance. We are using Dataflow runner for our pipeline and Kubernetes to deploy.
LOCAL SETUP:
We have GOOGLE_APPLICATION_CREDENTIALS variable set with service account for our GCP cluster. When running the job from local, job gets submitted to dataflow and completes successfully.
DOCKER SETUP:
Build image used is FROM golang:1.14-alpine. When we pack the same program with Dockerfile and try to run, it fails with error
User program exited: fork/exec /bin/worker: no such file or directory
On checking Stackdriver logs for more details, we see this:
Error syncing pod 00014c7112b5049966a4242e323b7850 ("dataflow-go-job-1-1611314272307727-
01220317-27at-harness-jv3l_default(00014c7112b5049966a4242e323b7850)"),
skipping: failed to "StartContainer" for "sdk" with CrashLoopBackOff:
"back-off 2m40s restarting failed container=sdk pod=dataflow-go-job-1-
1611314272307727-01220317-27at-harness-jv3l_default(00014c7112b5049966a4242e323b7850)"
Found reference to this error in Dataflow common errors doc, but it is too generic to figure out whats failing. After multiple retries, we were able to eliminate any permission / access related issues from pods. Not sure what else could be the problem here.
After multiple attempts, we decided to start the job manually from a new Debian 10 based VM instance and it worked. This brought to our notice that we are using alpine based golang image in Docker which may not have all the required dependencies installed to start the job.
On golang docker hub, we found a golang:1.14-buster where buster is codename for Debian 10. Using that for docker build helped us solve the issue. Self answering here to help anyone else facing the same issues.
I'm a student working on a test cluster, consisting of around 25 hosts. We installed using Ambari and have FreeIpa running on a host as a dns and ldap server. The rest are typical Hadoop
infrastructure. Hive was failing and I wondered whether the db connection parameters used during the Ambari installation were incorrect and I tried to find a way to re-run the db connection process. I didn't get anywhere and it was late so I left it, ambari interface working.
Next morning, ambari webUI seems to be down. I thought that maybe the webserver needed restarted so I tried the following:
[akidd#dw ~]$ sudo ambari-server start
Using python /usr/bin/python
Starting ambari-server
ERROR: Exiting with exit code 1.
REASON: Unable to detect a system user for Ambari Server.
- If this is a new setup, then run the "ambari-server setup" command to create the user
- If this is an upgrade of an existing setup, run the "ambari-server upgrade" command.
Refer to the Ambari documentation for more information on setup and upgrade.
Can anyone help me to understand what could have happened?
If I run ambari-server setup will the existing cluster be ok assuming I create everything like for like with how it was originally?
Thanks for your help!
#user3535074 You should try to start it with the user that installed it.
If you do run ambari-server setup as current user, remember to choose No the following options:
Customize user account for ambari-server daemon [y/n] (n)? n
Do you want to change Oracle JDK [y/n] (n)? n
Enter advanced database configuration [y/n] (n)? n
More info on the following post, including how to backup ambari database before running setup again:
https://community.cloudera.com/t5/Support-Questions/Ambari-server-failed-to-start-after-system-reboot-Below-is/td-p/203806
I'm currently creating a build agent for our DevOps.
With the agent I want to stop solr via shell script on the unix server. The problem is, I need to insert a password, to stop the service.
Is there a simple way to provide the password in the shell script? The current script:
service solr stop
throws the following error:
2020-03-12T12:50:27.0480993Z Successfully connected.
2020-03-12T12:50:27.0483958Z service solr stop
2020-03-12T12:50:29.7903365Z
2020-03-12T12:50:29.8193011Z ##[error]Failed to stop solr.service: Interactive authentication required.
2020-03-12T12:50:29.8208401Z
2020-03-12T12:50:29.8208979Z
2020-03-12T12:50:29.8209589Z ##[error]See system logs and 'systemctl status solr.service' for details.
2020-03-12T12:50:29.8209971Z
2020-03-12T12:50:29.8210587Z ##[error]Command failed with errors on remote machine.
2020-03-12T12:50:29.8341862Z ##[section]Finishing: stop solr
I also tried to provide the password on the script, like:
echo <password> | sudo -S service solr stop
It works on the server locally, but the buildserver throws:
2020-03-12T12:52:17.9543478Z Successfully connected.
2020-03-12T12:52:17.9545108Z echo *** | sudo -S service solr stop
2020-03-12T12:52:18.9946984Z ##[error][sudo] password for ***:
2020-03-12T12:52:23.4808034Z ##[error]Command failed with errors on remote machine.
2020-03-12T12:52:23.5019910Z ##[section]Finishing: stop solr
Any other ideas how I could remotely stop solr with a shell script?
As mentioned I found a solution and posted it in another post. Here's the answer link: https://stackoverflow.com/a/60666738/5835745
Using Ambari, This is comment for WebHcat Service is the final selection in the “Services Selection” step.
If I don't select this service, then the Customize Services step hangs indefinitely. It doesn't matter which other services are selected.
If I select it, then the Customize Services step functions normally, but the installation will stop on step four with the error message:
“org.apache.ambari.server.controller.spi.SystemException:
An internal system exception occurred:
Configuration with tag version1439256707212 exists for webhcat-site
This is on a clean install, for a single node SLES 11 SP3 server.
What is the service This is comment for WebHcat Service, and why is it a comment instead of a service name?
If this is a fresh install, it's strange your getting configuration already exists errors. I would try to clean your ambari server instance by running:
sudo ambari-server reset
This will reset the postgres database that ambari-server uses, giving you a clean slate to retry another cluster install.