Can ClearML (formerly Trains) work a local server? - trains

I am trying to start my way with ClearML (formerly known as Trains).
I see on the documentation that I need to have server running, either on the ClearML platform itself, or on a remote machine using AWS etc.
I would really like to bypass this restriction and run experiments on my local machine, not connecting to any remote destination.
According to this I can install the trains-server on any remote machine, so in theory I should also be able to install it on my local machine, but it still requires me to have Kubernetes or Docker, but I am not using any of them.
Anyone had any luck using ClearML (or Trains, I think it's still quite the same API and all) on a local server?
My OS is Ubuntu 18.04.

Disclaimer: I'm a member of the ClearML team (formerly Trains)
I would really like to bypass this restriction and run experiments on my local machine, not connecting to any remote destination.
A few options:
The Clearml Free trier offers free hosting for your experiments, these experiment are only accessible to you, unless you specifically want to share them among your colleagues. This is probably the easiest way to get started.
Install the ClearML-Server basically all you need is docker installed and you should be fine. There are full instructions here , this is the summary:
echo "vm.max_map_count=262144" > /tmp/99-trains.conf
sudo mv /tmp/99-trains.conf /etc/sysctl.d/99-trains.conf
sudo sysctl -w vm.max_map_count=262144
sudo service docker restart
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
sudo mkdir -p /opt/trains/data/elastic_7
sudo mkdir -p /opt/trains/data/mongo/db
sudo mkdir -p /opt/trains/data/mongo/configdb
sudo mkdir -p /opt/trains/data/redis
sudo mkdir -p /opt/trains/logs
sudo mkdir -p /opt/trains/config
sudo mkdir -p /opt/trains/data/fileserver
sudo curl https://raw.githubusercontent.com/allegroai/trains-server/master/docker-compose.yml -o /opt/trains/docker-compose.yml
docker-compose -f /opt/trains/docker-compose.yml up -d
ClearML also supports full offline mode (i.e. no outside connection is made). Once your experiment completes, you can manually import the run to your server (either self hosted or free tier server)
from clearml import Task
Task.set_offline(True)
task = Task.init(project_name='examples', task_name='offline mode experiment')
When the process ends you will get a link to a zip file containing the output of the entire offline session:
ClearML Task: Offline session stored in /home/user/.clearml/cache/offline/offline-2d061bb57d9e408a9420c4fe81e26ad0.zip
Later you can import the session with:
from clearml import Task
Task.import_offline_session('/home/user/.clearml/cache/offline/offline-2d061bb57d9e408a9420c4fe81e26ad0.zip')

Related

committed image fails to render desktop as a container

i am attempting to run a windows GUI app in a container on Linux. The intent is to protect an ancient windows app that is no longer supported. So i get a Red Hat developer subscription, install RHEL 8.6 with container tools, run the universal base image 'UBI-INIT', and within the container, i install GNOME desktop with Xrdp, and i successfully render the GUI desktop in a RHEL container.
Now that the container is working well, I commit to an image, but when i run that image, the GUI fails to render. the xrdp session times out as if services are not running and/or ports are not accessible.
Within the container that i ran from the committed image:
i verify that all of the services necessary to support XRDP and GNOME are up and running.
journalctl does not seem to show any errors. There are complaints around rtkit but i see similar errors in the working container.
i see no evidence that an xrdp connection was attempted in the xrdp or xrdp-sesman logs. But i am fairly certain that ports are not the issue because i can ssh to the container.
the commands i used to install and configure the working container are:
podman run -d -v /mnt/share:/share -p 53389:3389 -p 50022:22 --rm --privileged --name ubi-ini registry.access.redhat.com/ubi8/ubi-init;
podman exec -it ubi-ini bash
and within the container i run the following:
timedatectl set-timezone America/New_York
# GNOME desktop GUI
dnf install -y selinux-policy-targeted
dnf groupinstall -y --skip-broken "Server with GUI"
# xrdp
dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
dnf install -y xrdp
echo 'if [ "$DISPLAY" !=' "'\"\"' ]; then xhost +; fi;" >> /etc/profile
sed -i '/^port=3389.*/a port=tcp://:3389' /etc/xrdp/xrdp.ini
useradd -g root -p $(echo "jde" | openssl passwd -1 -stdin) jde
usermod -aG wheel jde
systemctl enable xrdp xrdp-sesman gdm
systemctl unmask systemd-logind.service
systemctl restart sshd xrdp xrdp-sesman dbus gdm systemd-logind.service
I commit the image like this:
podman commit ubi-ini ubi-gui
I run the image with this command:
podman run -d -v /mnt/share:/share -p 63389:3389 -p 60022:22 --rm --privileged --name ubi-gui ubi-gui
xrdp communicates with the desktop manager through systemd UBI-INIT is the only linux base container that supports systemd.
i suspect there is something about the processes in the derived container but when i compare the working and non-working container with ps aux, i don't see significant anomolies.
Any ideas?
I have absolutely no idea about 'XRDP' but I see you use a different host port in your second container instance, is that intentional?
Got this to work by disabling firewall and selinux everywhere, meaning the container host abd the base container UBI-INIT as well. Now the image based on the modified container (with Gnome desktop and XRDP and disabled security) results in a container that serves the GUI desktop.
It's working fine except that gdm (gnome desktop manager) does not start even though it is enabled and all the other enabled services are ok. Still working that one out, but the basic question is answered: it was not the software stack but rather it was security configuration. i suspect selinux in the container somehow interfered with inter-process communication, because i am able to ssh on (mapped) port 22 externally.

Running gsutil on instance using python subprocess - access permissions?

I have a python script that does calculations on google compute engine instances. The code works fine in terms of doing the calculations, but at certain points in the code it needs to add/delete files from a cloud storage bucket and I do this using gsutil. This works well when run from my local computer, but isn't working when the same code is run from a google cloud instance. By "not working" an error message is reported at the offending line, but my code carries on running and just ignores the steps that involve gsutil.
My understanding from Google's documentation is that gcloud instances boot with the "gsutil" utility already installed. My instances boot running a script like this (where is my actual google username):
#! /bin/bash
sudo apt-get update
sudo apt-get -yq install python-pip
sudo pip install --upgrade google-cloud
sudo pip install --upgrade google-cloud-storage
sudo pip install --upgrade google-api-python-client
sudo pip install --upgrade google-auth-httplib2
mkdir -p /home/<xxxx>/code
mkdir -p /home/<xxxx>/rawdata
mkdir -p /home/<xxxx>/processeddata
sudo chown -R <xxxx> /home/<xxxx>
gsutil cp gs://<codestorebucket>/worker-python-code/* /home/<xxxx>/code/
gsutil -m cp gs://<rawdatabucket>/* /home/<xxxx>/rawdata/
I dont run my code from the boot script yet as I want to "SSH" into the instance and run it myself from the command line while I am still developing. When I SHH into the instance the directories have all been created and all of the code and raw datafiles have been copied. I can run my ".py" file and it runs, but there are lines which use the python command:
subprocess.call('gsutil -q rm gs://<mybuckname>/<myfilename>', shell=True)
This generates an error which reads:
ERROR: (gsutil) Failed to create the default configuration. Ensure your have the correct permissions on: [/home/<xxxx>/.config/gc
loud/configurations].
Could not create directory [/home/<xxxx>/.config/gcloud/configurations]: Permission denied.
If it provides any clues, in the "daemon.log" file there an error line which reads:
chown: invalid user: ‘<xxxxx>’
which is reported when the sudo chown... command line runs.
The instances have full access to all APIs. If I run
whoami
The response is "xxxxx". If I run
echo $UID
The response is 1000.
I am a Linux novice, as I have only "learnt" about it through needing to do stuff on google instances. There is a link here where a user appears to have a similar problem. He fixes it using a sudo chown type command line, but when I run an equivalent command I am told that it "cannot access '/home/paulgarlick07/.config/': No such file or directory"
I'm really confused, and any help would be very much appreciated. If any additional info is required to help resolve this please let me know!
gsutil is not a program. It is a script. Therefore you need to execute a shell with gsutil as a command line argument. You will need to pass the full pathname for gsutil which might be different on your system.
subprocess.call('/bin/sh /usr/bin/gsutil -q rm gs://<mybuckname>/<myfilename>', shell=True)
If you are running gsutil from a service, then you will need to ensure that the user that the service is running under has gsutil setup. gsutil stores its configuration files based from the home directory of the user that it is executing under.

How to install a GUI on Amazon AWS EC2 or EMR with the Amazon AMI

I have a need to run an application that requires a GUI interface to start and configure. I also need to be able to run this application on Amazon's EC2 service and EMR service. The EMR requirement means it has to run on Amazon's Linux AMI.
After extensive searching I've been unable to find any ready made solutions, in particular the requirement to run on Amazon's AMI. The closest match and most often referenced solution is here. Unfortunately it was developed on a RHEL6 instance which differs enough from Amazon's AMI that the solution does not work.
I'm posting my solution below. Hopefully it will save some others from the many hours of experimentation it took to come up with the right recipe.
Here is my solution to get a GUI running on Amazon's AMI. I used this post as a starting point, but had to make many changes to get it working on Amazon's AMI. I also added additional info to make this work in a reasonably automated way so an individual who needs to bring up this environment more than once could do it without too much hassle.
Note: I include a lot of commentary in this post. I apologize in advance, but I thought it might be helpful to someone needing to make modfications if they could understand why made the various choices along the way.
The scripts included below install some files along the way. See section 4 for a list of the files and the directory structure used by these scripts.
Step 1. Install the Desktop
After performing a 'yum update', most solutions include a line like
sudo yum groupinstall -y "Desktop"
This deceivingly simple step requires significantly more effort on the Amazon AMI. This group is not configured in the Amazon AMI (AAMI from here on out). The AAMI has Amazon's own repositories installed and enabled by default. Also installed is the epel repo, but it is disabled by default. After enabling epel I found the Desktop group but it was not populated with packages. I also found Xfce (another desktop alternative) which was populated. Eventually I decided to install Xfce rather than Desktop. Still, that was not straight forward, but it eventually led to the solution.
Here it's worth noting that the first thing I tried was to install the centos repository and install the Desktop group from there. Initially this seemed promising. The group was fully populated with packages. However, after some effort I eventually decided there were simply too many version conflicts between the dependencies and packages that were already installed on the AAMI.
This led me choose Xfce from the epel repo. Since the epel repo was already installed on AAMI I figured there would be better dependency version coordination with the Amazon repos. This was generally true. Many dependencies were found either in the epel repo or the Amazon repos. For the ones that weren't, I was able to find them in the centos repo, and in most cases those were leaf dependencies. So most of the trouble came from the few dependencies in the centos repo that had sub-dependencies which conflicted with the amazon or epel repo. In the end a few hacks were required to bypass the dependency conflicts. I tried to minimize those as much as possible. Here is the script for installing Xfce
installGui.sh
#!/bin/bash
# echo each command
set -x
# assumes RSRC_DIR and IS_EMR set by parent script
YUM_RSRC_DIR=$RSRC_DIR/yum
sudo yum -y update
# Most info I've found on installing a GUI on AWS suggests to install using
#> sudo yum groupinstall -y "Desktop"
# This group is not available by default on the Amazon Linux AMI. The group
# is listed if the epel repo is enabled, but it is empty. I tried installing
# the centos repo, which does have support for this group, but it simply end
# up having to many dependency version conflicts with packages already installed
# by the Amazon repos.
#
# I found the path of least resistance to be installing the group Xfce from
# the epel repo. The epel repo is already included in amazon image, just not enabled.
# So I'm guessing there was at least some consideration by Amazon to align
# the dependency versions of this repo with the Amazon repos.
#
# My general approach to this problem was to start with the last command:
#> sudo yum groupinstall -y Xfce
# which will generate a list of missing dependencies. The script below
# essentially works backwards through that list to eliminate all the
# missing dependencies.
#
# In general, many of the dependencies required by Xfce are found in either
# the epel repo or the Amazon repos. Most of the remaining dependencies can be
# found in the centos repo, and either don't have any further dependencies, or if they
# do those dependencies are satisfied with the centos repo with no collisions
# in the epel or amazon repo. Then there are a couple of oddball dependencies
# to clean up.
# if yum-config-manager is not found then install yum-utils
#> sudo yum install yum-utils
sudo yum-config-manager --enable epel
# install centos repo
# place the repo config # /etc/yum.repos.d/centos.repo
sudo cp $YUM_RSRC_DIR/yum.repos.d/centos.repo /etc/yum.repos.d/
# The config centos.repo specifies the key with a URL. If for some reason the key
# must be in a local file, it can be found here: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
# It can be installed to the right location in one step:
#> wget -O /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6 https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
# Note, a key file must also be installed in the system key ring. The docs are a bit confusing
# on this, I found that I needed to run both gpg AND then followed by rpm, eg:
#> sudo gpg --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
#> sudo rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6
# I found there are a lot of version conflicts between the centos, Amazon and epel repos.
# So I did not enable the centos repo generally. Instead I used the --enablerepo switch
# enable it explicitly for each yum command that required it. This only works for yum. If
# rpm must be used, then yum-config-manager must be used to enable/disable repos as a
# separate step.
#
# Another problem I ran into was yum installing the 32-bit (*.i686) package rather than
# the 64-bit (*.x86_64) verision of the package. I never figured out why. So I had
# to specify the *.x86_64 package explicitly. The search tools (eg. 'whatprovides')
# did not list the 64 bit package either even though a manual search through the
# package showed the 64 bit components were present.
#
# Sometimes it is difficult to determine which package must be in installed to satisfy
# a particular dependency. 'whatprovides' is a very useful tool for this
#> yum --enablerepo centos whatprovides libgdk_pixbuf-2.0.so.0
#> rpm -q --whatprovides libgdk_pixbuf
sudo yum --enablerepo centos install -y gdk-pixbuf2.x86_64
sudo yum --enablerepo centos install -y gtk2.x86_64
sudo yum --enablerepo centos install -y libnotify.x86_64
sudo yum --enablerepo centos install -y gnome-icon-theme
sudo yum --enablerepo centos install -y redhat-menus
sudo yum --enablerepo centos install -y gstreamer-plugins-base.x86_64
# problem when we get to libvte, installing libvte requires expat, which conflicts with amazon lib
# the centos package version was older and did not install right lib version
# but … the expat dependency was coming from a dependency on python-libs.
# the easiest workaround was to install python using the amazon repo, that in turn
# installs a version of python libs that is compatible with the version of libexpat on the system.
sudo yum install -y python
sudo yum --enablerepo centos install -y vte.x86_64
sudo yum --enablerepo centos install -y libical.x86_64
sudo yum --enablerepo centos install -y gnome-keyring.x86_64
# another sticky point, xfdesktop requires desktop-backgrounds-basic, but ‘whatprovides’ does not
# provide any packages for this query (not sure why). It turns out this is provided by the centos
# repo, installing ‘desktop-backgrounds-basic’ will try to install the package redhat-logos, but
# unfortunately this is obsoleted by Amazon’s generic-logos package
# The only way I could find to get around this was to erase the generic logos package.
# This doesn't seem too risky since this is just images for the desktop and menus.
#
sudo yum erase -y generic-logos
# Amazon repo must be disabled to prevent interference with the install
# of redhat-logos
sudo yum --disablerepo amzn-main --enablerepo centos install -y redhat-logos
# next problem is a dependency on dbus. The dependency comes from dbus-x11 in
# centos repo. It requires dbus version 1.2.24, the amazon image already has
# version 1.6.12 installed. Since the dbus-x11 is only used by the GUI package,
# easiest way around this is to install dbus-x11 with no dependency checks.
# So it will use the newer version of dbus (should be OK). The main thing that could be a problem
# here is if it skips some other dependency. When doing manually, its possible to run the install until
# the only error left is the dbus dependency. It’s a bit risky running in a script since, basically it’s assuming
# all the dependencies are already in place.
yumdownloader --enablerepo centos dbus-x11.x86_64
sudo rpm -ivh --nodeps dbus-x11-1.2.24-8.el6_6.x86_64.rpm
rm dbus-x11-1.2.24-8.el6_6.x86_64.rpm
sudo yum install -y xfdesktop.x86_64
# We need the version of poppler-glib from centos repo, but it is found in several repos.
# Disable the other repos for this step.
# On EMR systems a newer version of poppler is already installed. So move up 1 level
# in dependency chain and force install of tumbler.
if [ $IS_EMR -eq 1 ]
then
yumdownloader --enablerepo centos tumbler.x86_64
sudo rpm -ivh --nodeps tumbler-0.1.21-1.el6.x86_64.rpm
else
sudo yum --disablerepo amzn-main --disablerepo amzn-updates --disablerepo epel --enablerepo centos install -y poppler-glib
fi
sudo yum install --enablerepo centos -y polkit-gnome.x86_64
sudo yum install --enablerepo centos -y control-center-filesystem.x86_64
sudo yum groupinstall -y Xfce
Here are the contents for the centos repository config file:
centos.repo
[centos]
name=CentOS mirror
baseurl=http://repo1.ash.innoscale.net/centos/6/os/x86_64/
failovermethod=priority
enabled=0
gpgcheck=1
gpgkey=https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
If all you needed was a recipe to get a desktop package installed on the Amazon AMI, then you're done. The rest of this post covers how to configure VNC to access the desktop via an SSH tunnel, and how to package all of this so that the instance can be easily started from a script.
Step 2. Install and Configure VNC
Below is my top level script for installing the GUI. After configuring a few variables the first thing it does is call the script from step 1 above. This script has some extra baggage since I've built it to work on a regular ec2 instance, or emr and as root or as ec2-user. The essential steps are
install libXfont
install tiger-vnc-server
install the VNC server config file
create a .vnc directory in the user home directory
install the xstartup file in the .vnc directory
install a dummy passwd file in the .vnc directory
start the VNC server
A few key points to note:
This assumes you will access the VNC server through an SSH tunnel. In the end this really seemed like the easiest and most reliably secure way to go. Since you probably have a port for SSH open in your security group specification, you won't have to make any changes to it. Also, the encryption config for VNC clients/servers is not straight forward. It seemed easy to make a mistake and leave your communications unencrypted. The settings for this are in the vncservers file. The -localhost switch tells vnc only to accept local connections. The '-nolisten tcp' tells associated xserver modules to also not accept connections from the network. Finally the '-SecurityTypes None' switch allows you to open your VNC session without typing a passwd, since the only way into the machine is through ssh, the additional password check seems redundant.
The xstartup file determines what will start when your VNC session is initiated the first time. I've noticed many posts on this subject skip this point. If you don't tell it to start the Xfce desktop, you will just get a blank window when you start VNC. The config I have here is very simple.
Even though I mentioned above that the VNC server is configured to not prompt for a password, it nevertheless requires a passwd file in the .vnc directory in order for the server to start. The first time you run the script it will fail when it tries to start the server. Login to the machine via ssh and run 'vncpasswd'. It will create a passwd file in the .vnc directory that you can save to use as part of these scripts during install. Note, I've read that VNC does not do anything sophisticated to protect the passwd file. So I would not recommend using a passwd that you use for other, more important accounts.
installGui.sh
#!/bin/bash
# echo each command
set -x
BIN_DIR="${BASH_SOURCE%/*}"
ROOT_DIR=$(dirname $BIN_DIR)
RSRC_DIR=$ROOT_DIR/rsrc
VNC_DIR=$RSRC_DIR/vnc
# Install user config files into ec2-user home directory
# if it is available. In practice, this should always
# be true
if [ -d "/home/ec2-user" ]
then
USER_ACCT=ec2-user
else
USER_ACCT=hadoop
fi
HOME_DIR="/home"
# Use existence of hadoop home directory as proxy to determine if
# this is an EMR system. Can be used later to differentiate
# steps on EC2 system vs EMR.
if [ -d "/home/hadoop" ]
then
IS_EMR=1
else
IS_EMR=0
fi
# execute Xfce desktop install
. "$BIN_DIR/installXfce.sh"
# now roughly follow the following from step 3: https://devopscube.com/setup-gui-for-amazon-ec2-linux/
sudo yum install -y pixman pixman-devel libXfont
sudo yum -y install tigervnc-server
# install the user account configuration file.
# This setup assumes the user will always connect to the VNC server
# through an SSH tunnel. This is generally more secure, easier to
# configure and easier to get correct than trying to allow direct
# connections via TCP.
# Therefore, config VNC server to only accept local connections, and
# no password required.
sudo cp $VNC_DIR/vncservers-$USER_ACCT /etc/sysconfig/vncservers
# install the user account, vnc config files
sudo mkdir $HOME_DIR/$USER_ACCT/.vnc
sudo chown $USER_ACCT:$USER_ACCT $HOME_DIR/$USER_ACCT/.vnc
# need xstartup file to tell vncserver to start the window manager
sudo cp $VNC_DIR/xstartup $HOME_DIR/$USER_ACCT/.vnc/
sudo chown $USER_ACCT:$USER_ACCT $HOME_DIR/$USER_ACCT/.vnc/xstartup
# Even though the VNC server is config'd to not require a passwd, the
# server still looks for the passwd file when it starts the session.
# It will fail if the passwd file is not found.
# The first time these scripts are run, the final step will fail.
# Then manually run
#> vncpasswd
# It will create the file ~/.vnc/passwd. Then save this file to persistent
# storage so that it can be installed to the user account during
# server initialization.
sudo cp $ROOT_DIR/home/user/.vnc/passwd $HOME_DIR/$USER_ACCT/.vnc/
sudo chown $USER_ACCT:$USER_ACCT $HOME_DIR/$USER_ACCT/.vnc/passwd
# This script will be running as root if called from the EC2 launch
# command. VNC server needs to be started as the user that
# you will connect to the server as (eg. ec2-user, hadoop, etc.)
sudo su -c "sudo service vncserver start" -s /bin/sh $USER_ACCT
# how to stop vncserver
# vncserver -kill :1
# On the remote client
# 1. start the ssh tunner
#> ssh -i ~/.ssh/<YOUR_KEY_FILE>.pem -L 5901:localhost:5901 -N ec2-user#<YOUR_SERVER_PUBLIC_IP>
# for debugging connection use -vvv switch
# 2. connect to the vnc server using client on the remote machine. When
# prompted for the IP address, use 'localhost:5901'
# This connects to port 5901 on your local machine, which is where the ssh
# tunnel is listening.
vncservers
# The VNCSERVERS variable is a list of display:user pairs.
#
# Uncomment the lines below to start a VNC server on display :2
# as my 'myusername' (adjust this to your own). You will also
# need to set a VNC password; run 'man vncpasswd' to see how
# to do that.
#
# DO NOT RUN THIS SERVICE if your local area network is
# untrusted! For a secure way of using VNC, see this URL:
# http://kbase.redhat.com/faq/docs/DOC-7028
# Use "-nolisten tcp" to prevent X connections to your VNC server via TCP.
# Use "-localhost" to prevent remote VNC clients connecting except when
# doing so through a secure tunnel. See the "-via" option in the
# `man vncviewer' manual page.
# Use "-SecurityTypes None" to allow session login without a password.
# This should only be used in combination with "-localhost"
# Note: VNC server still looks for the passwd file in ~/.vnc directory
# when the session starts regardless of whether the user is
# required to enter a passwd.
# VNCSERVERS="2:myusername"
# VNCSERVERARGS[2]="-geometry 800x600 -nolisten tcp -localhost"
VNCSERVERS="1:ec2-user"
VNCSERVERARGS[1]="-geometry 1280x1024 -nolisten tcp -localhost -SecurityTypes None"
xstartup
#!/bin/sh
unset SESSION_MANAGER
unset DBUS_SESSION_BUS_ADDRESS
# exec /etc/X11/xinit/xinitrc
/usr/share/vte/termcap/xterm &
/usr/bin/startxfce4 &
Step 3. Connect to Your Instance
Once you've got the VNC server running on EC2 you can try connecting to it. First open an SSH tunnel to your instance. 5901 is the port where the VNC server listens for display 1 from the vncservers file. It will listen for display 2 on port 5902, etc. This command creates a tunnel from port 5901 on your local machine to port 5901 on the instance.
ssh -i ~/.ssh/<YOUR_KEY_FILE>.pem -L 5901:localhost:5901 -N ec2-user#<YOUR_SERVER_PUBLIC_IP>
Now open your preferred VNC client. Where it prompts for the IP address of the server enter:
localhost:5901
If nothing happens at all, then either there was a problem starting the vnc server, or there is a connectivity problem preventing the client from reaching the server, or possibly a problem in vncservers config file
If a window comes up, but it is just blank then check that the Xfce install completed successfully and that the xstartup file is installed.
Step 4. Simplify
If you just need to do this once then sftp'ing the scripts over to your instance and running manually is fine. Otherwise you're going to want to automate this as much as possible to make it faster and less error prone when you do need to fire up an instance with a GUI.
The first step to automating is to create an EFS volume containing the scripts and config files that can be mounted when the instance is started. Amazon has plenty of info on creating a network file system. A couple points to pay attention to when creating the volume. If you don't want your volume to be open to the world you may want to create a custom security group to use for your EFS volume. I created security group for my EFS volume (call it NFS_Mount) that only allows inbound TCP traffic on port 2049 coming from one of my other security groups, call it MasterVNC. Then when you create an instance, make sure to associate the MasterVNC security group with that instance. Otherwise the EFS volume won't allow your instance to connect with it.
Now mount the EFS volume:
sudo mkdir /mnt/YOUR_MOUNT_POINT_DIR
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-YOUR_EFS_ID.efs.us-east-1.amazonaws.com:/ /mnt/YOUR_MOUNT_POINT_DIR
Now populate /mnt/YOUR_MOUNT_POINT_DIR with the 6 files mentioned in steps 1 and 2 using the following directory structure. Recall that you must create the passwd file the first time using the command 'vncpasswd'. It will create the file at ~/.vnc/passwd.
/mnt/YOUR_MOUNT_POINT_DIR/bin/installGui.sh
/mnt/YOUR_MOUNT_POINT_DIR/bin/installXfce.sh
/mnt/YOUR_MOUNT_POINT_DIR/rsrc/vnc/vncservers-ec2-user
/mnt/YOUR_MOUNT_POINT_DIR/rsrc/vnc/xstartup
/mnt/YOUR_MOUNT_POINT_DIR/rsrc/vnc/passwd
/mnt/YOUR_MOUNT_POINT_DIR/rsrc/yum/yum.repos.d/centos.repo
At this point, setting up an instance with a GUI should be pretty easy. Create your instance as you normally would (make sure to include the MasterVNC security group), ssh to the instance, mount the EFS volume, and run the installGui.sh script.
Step 5. Automate
You can take things a step further and launch your instance in 1 step using the AWS CLI tools on your local machine. To do this you will need to mount the EFS volume and run the installGui.sh script using arguments to the AWS CLI commands. This just requires creating a top level script and passing it to the CLI command.
Of course there are a couple complications. EC2 and EMR use different switches and mechanisms to attach the script. And furthermore, on EMR I only want the GUI to be installed on the master node (not the core or task nodes).
Launching an EC2 instance requires embedding the script in the command with the --user-data switch. This is done easily by specifying the absolute path to the script file on your local machine.
aws ec2 run-instances --user-data file:///PATH_TO_YOUR_SCRIPT/top.sh ... other options
The EMR launch does not support embedding scripts from a local file. Instead you can specify an S3 URI in the bootstrap actions.
aws emr create-cluster --bootstrap-actions '[{"Path":"s3://YOUR_BUCKET/YOUR_DIR/top.sh","Name":"Custom action"}]' ... other options
Finally, you'll see in top.sh below most of the script is a function to determine if the machine is a basic EC2 instance or an EMR master. If not for that the script could be 3 lines. You may wonder why not just use the built in 'run-if' bootstrap action rather than writing my own function. The built in 'run-if' script has a bug and does not properly run scripts located in S3.
Debugging things once you put them in the init sequence can be a challenge. One thing that can help is the log file: /var/log/cloud-init-output.log. This captures all the console output from the scripts run during bootstrap initialization.
top.sh
#!/bin/bash
# note: conditional bootstrap function run-if has a bug, workaround ...
# this function adapted from https://forums.aws.amazon.com/thread.jspa?threadID=222418
# Determine if we are running on the master node.
# 0 - running on master, or non EMR node
# 1 - running on a task or core node
check_if_master_or_non_emr() {
python - <<'__SCRIPT__'
import sys
import json
instance_file = "/mnt/var/lib/info/instance.json"
try:
with open(instance_file) as f:
props = json.load(f)
is_master_or_non_emr = props.get('isMaster', False)
except IOError as ex:
is_master_or_non_emr = True # file will not exist when testing on a non-emr machine
if is_master_or_non_emr:
sys.exit(1)
else:
sys.exit(0)
__SCRIPT__
}
check_if_master_or_non_emr
IS_MASTER_OR_NON_EMR=$?
# If this machine is part of EMR cluster, then ONLY install on the MASTER node
if [ $IS_MASTER_OR_NON_EMR -eq 1 ]
then
sudo mkdir /mnt/YOUR_MOUNT_POINT_DIR
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 fs-YOUR_EFS_ID.efs.us-east-1.amazonaws.com:/ /mnt/YOUR_MOUNT_POINT_DIR
. /mnt/YOUR_MOUNT_POINT_DIR/bin/installGui.sh
fi
exit 0

Why can't I get the postgresql server to run?

Ok. I have been trying to solve this problem for several days. I installed, uninstalled, and reinstalled Postgresql 3 times. I followed precisely the instructions in this forum: https://dba.stackexchange.com/questions/42048/cant-connect-to-the-postgres-server-ls-tmp-s-pgsql-5432-no-such-file-or-dir
I found this solution in many forums, so I tried to run:
$ mkdir /var/pgsql_socket/
$ sudo mkdir /var/pgsql_socket/
$ ln -s /private/tmp/.s.PGSQL.5432 /var/pgsql_socket/
But this didn't work. When I try to start the server it still says there is another one running and then proceeds to fail every time I try to create a database or type "psql"
I then tried to run the following in order to change the path of the commands from OS X's builtin version of postgres to my version and it seemed to work:
$ cd /usr/local/bin
$ rm postgres
$ ln -s /Library/PostgreSQL/9.2/bin/postgres postgres
$ rm psql
$ ln -s /Library/PostgreSQL/9.2/bin/psql psql
$ rm pg_ctl
$ ln -s /Library/PostgreSQL/9.2/bin/pg_ctl pg_ctl
So then I ran the following to create a user for postgres:
$ sudo -u postgres createuser --superuser $Sarah
$ sudo -u postgres createuser --superuser user_sarah
$ sudo -u postgres psql postgres
But it kept saying "unknown user postgres"
I then tried to install the Ruby pg gem, but that also failed, saying there was a problem with necessary libraries.
I have saved a text file of everything I tried to do in the terminal. Let me know if I should post that. Thanks.
update:
When I try to run this:
$ pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
I get this:
-bash: /usr/local/bin/pg_ctl: No such file or directory
Which is different from a lot of other errors that I have seen posted on this problem.
first, you should verify that there is no postmaster running: ps -ef | grep postmaster. once you've verified the postmaster is not running, you should look into the postgresql command initd. Depending on your server installation, it may or may not be installed. You need to create a database before attempting to start postgres. It sounds to me like you got an installation that didn't install the main postgresql commands to /usr/bin, pg_ctl being one of these.
FYI: the postgres account is not a login account and is automatically created when installing postgresql, so it should be there if you had a good installation. You cannot sudo as postgres if postgres is not in the sudoers file.
The homebrew install of PostgreSQL does not create or need a postgres account, so all the mentions of doing sudo -u postgresor ps -fu postgres don't apply to your case.
The command brew info postgresql outputs various information about to start and stop it, you may read them. By contrast you don't want to put blind faith into what random users tell about how they "fixed" their non-working installation. In fact, the web carries a shocking amount of bad advice concerning PostgreSQL on MacOS X, and to me the answer you linked on dba.se is among them. It's wrong from start to finish, and you should note that it was not accepted as an answer. Certainly the author means well, but he fails to see that his own context can't be generalized to other installations.
The worst part is the suggestion to delete in /usr/local/bin/ the binaries postgres, psql, pg_ctl and soft-linking them into supposed equivalents inside /Library/PostgreSQL/9.2/bin/.
To me it just breaks the homebrew install.
No wonder that after doing this you get this error:
/usr/local/bin/pg_ctl: No such file or directory
So my answer would be to reinstall postgresql with brew to restart from a clean state, then make sure that all postgres commands you launch are from /usr/local/bin, and always first read the server log file passed to pg_ctl if you have any doubt on anything PG-related.

How can I start PostgreSQL server on Mac OS X?

Final update:
I had forgotten to run the initdb command.
By running this command
ps auxwww | grep postgres
I see that postgres is not running
> ps auxwww | grep postgres
remcat 1789 0.0 0.0 2434892 480 s000 R+ 11:28PM 0:00.00 grep postgres
This raises the question:
How do I start the PostgreSQL server?
Update:
> pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
server starting
sh: /usr/local/var/postgres/server.log: No such file or directory
Update 2:
The touch was not successful, so I did this instead:
> mkdir /usr/local/var/postgres
> vi /usr/local/var/postgres/server.log
> ls /usr/local/var/postgres/
server.log
But when I try to start the Ruby on Rails server, I still see this:
Is the server running on host "localhost" and accepting
TCP/IP connections on port 5432?
Update 3:
> pg_ctl -D /usr/local/var/postgres status
pg_ctl: no server running
Update 4:
I found that there wasn't any pg_hba.conf file (only file pg_hba.conf.sample), so I modified the sample and renamed it (to remover the .sample). Here are the contents:
# IPv4 local connections:
host all all 127.0.0.1/32 trust
# IPv6 local connections:
host all all ::1/128 trust
But I don't understand this:
> pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
server starting
> pg_ctl -D /usr/local/var/postgres status
pg_ctl: no server running
Also:
sudo find / -name postgresql.conf
find: /dev/fd/3: Not a directory
find: /dev/fd/4: Not a directory
Update 5:
sudo pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
Password:
pg_ctl: cannot be run as root
Please log in (using, e.g., "su") as the (unprivileged) user that will own the server process.
Update 6:
This seems odd:
> egrep 'listen|port' /usr/local/var/postgres/postgresql.conf
egrep: /usr/local/var/postgres/postgresql.conf: No such file or directory
Though, I did do this:
>sudo find / -name "*postgresql.conf*"
find: /dev/fd/3: Not a directory
find: /dev/fd/4: Not a directory
/usr/local/Cellar/postgresql/9.0.4/share/postgresql/postgresql.conf.sample
/usr/share/postgresql/postgresql.conf.sample
So I did this:
egrep 'listen|port' /usr/local/Cellar/postgresql/9.0.4/share/postgresql/postgresql.conf.sample
#listen_addresses = 'localhost' # what IP address(es) to listen on;
#port = 5432 # (change requires restart)
# supported by the operating system:
# %r = remote host and port
So I tried this:
> cp /usr/local/Cellar/postgresql/9.0.4/share/postgresql/postgresql.conf.sample /usr/local/Cellar/postgresql/9.0.4/share/postgresql/postgresql.conf
> cp /usr/share/postgresql/postgresql.conf.sample /usr/share/postgresql/postgresql.conf
I am still getting the same "Is the server running?" message.
The Homebrew package manager includes launchctl plists to start automatically. For more information, run brew info postgres.
Start manually
pg_ctl -D /usr/local/var/postgres start
Stop manually
pg_ctl -D /usr/local/var/postgres stop
Start automatically
"To have launchd start postgresql now and restart at login:"
brew services start postgresql
What is the result of pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start?
What is the result of pg_ctl -D /usr/local/var/postgres status?
Are there any error messages in the server.log?
Make sure tcp localhost connections are enabled in pg_hba.conf:
# IPv4 local connections:
host all all 127.0.0.1/32 trust
Check the listen_addresses and port in postgresql.conf:
egrep 'listen|port' /usr/local/var/postgres/postgresql.conf
#listen_addresses = 'localhost' # What IP address(es) to listen on;
#port = 5432 # (change requires restart)
Cleaning up
PostgreSQL was most likely installed via Homebrew, Fink, MacPorts or the EnterpriseDB installer.
Check the output of the following commands to determine which package manager it was installed with:
brew && brew list|grep postgres
fink && fink list|grep postgres
port && port installed|grep postgres
If you want to manually start and stop PostgreSQL (installed via Homebrew), the easiest way is:
brew services start postgresql
and
brew services stop postgresql
If you have a specific version, make sure to suffix the version. For example:
brew services start postgresql#10
I had almost the exact same issue, and you cited the initdb command as being the fix. This was also the solution for me, but I didn't see that anyone posted it here, so for those who are looking for it:
initdb /usr/local/var/postgres -E utf8
If your computer was abruptly restarted
You may want to start PG server but it was not.
First, you have to delete the file /usr/local/var/postgres/postmaster.pid Then you can restart the service using one of the many other mentioned methods depending on your install.
You can verify this by looking at the logs of Postgres to see what might be going on: tail -f /usr/local/var/postgres/server.log
For specific version:-
tail -f /usr/local/var/postgres#[VERSION_NUM]/server.log
Eg:
tail -f /usr/local/var/postgres#11/server.log
Another approach is using the lunchy gem (a wrapper for launchctl):
brew install postgresql
initdb /usr/local/var/postgres -E utf8
gem install lunchy
To start PostgreSQL:
lunchy start postgres
To stop PostgreSQL:
lunchy stop postgres
For further information, refer to: "How to Install PostgreSQL on a Mac With Homebrew and Lunchy"
Here my two cents: I made an alias for postgres pg_ctl and put it in file .bash_profile (my PostgreSQL version is 9.2.4, and the database path is /Library/PostgreSQL/9.2/data).
alias postgres.server="sudo -u postgres pg_ctl -D /Library/PostgreSQL/9.2/data"
Launch a new terminal.
And then? You can start/stop your PostgreSQL server with this:
postgres.server start
postgres.server stop
The cleanest way by far to start/stop/restart PostgreSQL if you have installed it through brew is to simply unload and/or load the launchd configuration file that comes with the installation:
launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
The first line will stop PostgreSQL and the second line will start it. There isn't any need to specify any data directories, etc. since everything is in that file.
To start the PostgreSQL server:
pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start
To end the PostgreSQL server:
pg_ctl -D /usr/local/var/postgres stop -s -m fast
You can also create an alias via CLI to make it easier:
alias pg-start='pg_ctl -D /usr/local/var/postgres -l /usr/local/var/postgres/server.log start'
alias pg-stop='pg_ctl -D /usr/local/var/postgres stop -s -m fast'
With these you can just type "pg-start" to start PostgreSQL and "pg-stop" to shut it down.
For test purposes, I think PostgreSQL App is the best option!
Run an app, and the server is up and running.
Close the app, and the server goes down.
http://postgresapp.com/
If you have installed using Homebrew, the below command should be enough.
brew services restart postgresql
This sometimes might not work. In that case, the below two commands should definitely work:
rm /usr/local/var/postgres/postmaster.pid
pg_ctl -D /usr/local/var/postgres start
# Remove old database files (if there was any)
$ rm -rf /usr/local/var/postgres
# Install the binary
$ brew install postgresql
# init it
$ initdb /usr/local/var/postgres
# Start the PostgreSQL server
$ postgres -D /usr/local/var/postgres
# Create your database
$ createdb mydb
# Access the database
$ psql mydb
psql (9.0.1)
Type "help" for help.
Sometimes it's just the version which you are missing, and you are scratching your head unnecessarily.
If you are using a specific version of PostgreSQL, for example, PostgreSQL 10, then simply do
brew services start postgresql#10
brew services stop postgresql#10
The normal brew services start postgresql won't work without a version if you have installed it for a specific version from Homebrew.
When you install PostgreSQL using Homebrew,
brew install postgres
at the end of the output, you will see this methods to start the server:
To have launchd start postgresql at login:
ln -sfv /usr/local/opt/postgresql/*.plist ~/Library/LaunchAgents
Then to load postgresql now:
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
Or, if you don't want/need launchctl, you can just run:
postgres -D /usr/local/var/postgres
I think this is the best way.
You can add an alias into your .profile file for convenience.
I had the same problem and performed all updates from the first post. But after checking the log file,
/usr/local/var/postgres/server.log
I see the true cause:
FATAL: data directory "/usr/local/var/postgres" has group or world access
DETAIL: Permissions should be u=rwx (0700).
After changing permissions on this directory,
chmod 0700 /usr/local/var/postgres
the PostgreSQL server started.
Check the log file every time.
For a quick disposable test database, you can run the server in the foreground.
Initialize a new PostgreSQL database in a new directory:
mkdir db
initdb db -E utf8
createdb public
Start the server in the foreground (Ctrl + C to stop the server):
postgres -d db
In another shell session, connect to the server
psql -d public
If you didn't install it with Homebrew and directly from the Mac package, this worked for me for PostgreSQL 12 when using all the default locations, variables, etc.
$ sudo su postgres
bash-3.2$ /Library/PostgreSQL/12/bin/pg_ctl -D /Library/PostgreSQL/12/data/ stop
Variation on this answer: https://stackoverflow.com/a/13103603/2394728
initdb `brew --prefix`/var/postgres/data -E utf8`` && pg_ctl -D /usr/local/var/postgres/data -l logfile start
PostgreSQL is integrated in Server.app available through the App Store in Mac OS X v10.8 (Mountain Lion). That means that it is already configured, and you only need to launch it, and then create users and databases.
Tip: Do not start with defining $PGDATA and so on. Take file locations as is.
You would have this file:
/Library/Server/PostgreSQL/Config/org.postgresql.postgres.plist
To start:
sudo serveradmin start postgres
Process started with arguments:
/Applications/Server.app/Contents/ServerRoot/usr/bin/postgres_real -D /Library/Server/PostgreSQL/Data -c listen_addresses=127.0.0.1,::1 -c log_connections=on -c log_directory=/Library/Logs/PostgreSQL -c log_filename=PostgreSQL.log -c log_line_prefix=%t -c log_lock_waits=on -c log_statement=ddl -c logging_collector=on -c unix_socket_directory=/private/var/pgsql_socket -c unix_socket_group=_postgres -c unix_socket_permissions=0770
You can sudo:
sudo -u _postgres psql template1
Or connect:
psql -h localhost -U _postgres postgres
You can find the data directory, version, running status and so forth with
sudo serveradmin fullstatus postgres
For development purposes, one of the simplest ways is to install Postgres.app from the official site. It can be started/stopped from Applications folder or using the following commands in terminal:
# Start
open -a Postgres
# Stop
killall Postgres
killall postgres
This worked for me (macOS v10.13 (High Sierra)):
sudo -u postgres /Library/PostgreSQL/9.6/bin/pg_ctl start -D /Library/PostgreSQL/9.6/data
Or first
cd /Library/PostgreSQL/9.6/bin/
If you installed PostgreSQL using the EnterpriseDB installer, then what Kenial suggested is the way to go:
sudo -u postgres pg_ctl -D /Library/PostgreSQL/{version}/data start
sudo -u postgres pg_ctl -D /Library/PostgreSQL/{version}/data stop
Homebrew is the way!!
To start the service:
brew services start postgresql
To list it:
brew services list | grep postgres
To stop the service:
brew services stop postgresql
If you didn't install the Postgres server with Homebrew or installed using .dmg file, try this:
$ sudo su postgres
bash-3.2$ /Library/PostgreSQL/13/bin/pg_ctl -D /Library/PostgreSQL/13/data/ stop
For MacPorts, just use the load/unload command and the port name of the running server:
sudo port load postgresql96-server
- or -
sudo port unload postgresql96-server
so you don't have to remember where the /Library/LaunchDaemons/org.macports.postgresql96.plist file is located.
having installed Postgres with homebrew that is what I do to start postgres and keep it in foreground to see the logs:
/opt/homebrew/opt/postgresql/bin/postgres -D /opt/homebrew/var/postgres
install postgresql using brew: brew install postgresql, you can specify the version using "#" sign: brew install postgresql#14
start postgresql: brew services start postgresql or specific version brew services start postgresql#14
stop postgresql: brew services stop postgresql
$ brew upgrade postgres
fixed it for me.
That, of course, will upgrade your PostgreSQL version and update/install any dependencies.
Warning: Do this knowing that your PostgreSQL version will likely change. For me, that wasn't a big deal.
None of the previous answers fixed the issue for me, despite getting the same error messages.
I was able to get my instance back up and running by deleting the existing postmaster.pid file which was locked and was not allowing connections.
This worked for me every time, inspired by Craig Ringer:
brew install proctools
sudo pkill -u postgres
proctools includes pkill. If you don't have Homebrew: https://brew.sh/
After doing brew services restart postgresql.
It works best to:
brew services stop postgresql
brew postgresql-upgrade-database
brew services start postgresql
Then type: psql
it now runs this was after the error:
psql: error: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
The upgrade may be optional depending on the other dependencies your running.
Which means that rather than Restart using brew for in on mac os, Stop completely postgres and then start postgres and connect to your psql databaseName.
Hope this was useful.

Resources