I'm trying to debug some issues with a single node Hadoop cluster on my Mac. In all the setup docs it says to add:
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
to remove this error:
Unable to load realm info from SCDynamicStore
This works, but it only seems to work for STDOUT. When I check my Hadoop logs directory, under "job_###/atempt_###/stderr" the error is still there:
2013-02-08 09:58:23.662 java[2772:1903] Unable to load realm info from SCDynamicStore
I'm having great difficulty loading RVM Rubies into the Hadoop environment to execute Ruby code with Hadoop streaming. STDOUT is printing that RVM is loaded and using the right Ruby/gemset but my STDERR logs:
env: ruby_noexec_wrapper: No such file or directory
Is there some way to find out what path Hadoop is actually using to execute the jobs, or if it's invoking some other environment here?
Further background:
I'm using Hadoop 1.1.1 installed via Homebrew. It's setup in a manner very similar to "INSTALLING HADOOP ON MAC OSX LION" and debugging an implementation of wukong 3.0.0 as the wrapper for executing Hadoop jobs.
To answer my own question so other's can find it.
I appeared to be loading rvm in my hadoop-env but I must have not restarted the cluster after adding it. To make sure your rubies and gemsets are loaded, add the standard rvm clause to hadoop-env.sh. Something like:
[[ -s "/Users/ScotterC/.rvm/scripts/rvm" ]] && source "/Users/ScotterC/.rvm/scripts/rvm"
And make sure to restart the cluster so it picks it up. Oddly enough, without restarting, my logs would show that it was loading rvm but it clearly wasn't executing that ruby and it's respective gemfiles. After restarting it worked.
Related
We started seeing some strange errors in our logs that normally appear when ruby isn't compiled properly with OpenSSL. But it's inconcistent...
We're getting errors like:
RuntimeError: Unsupported digest algorithm (SHA256). (also with other digests, like sha1). example error trace
Faraday::SSLError (SSL_CTX_new: (null)) example error trace
We managed to reproduce it when starting unicorn using service unicorn start or systemctl start unicorn. But only with some requests... Not all of them. Some requests that use OpenSSL under the hood do work. Others don't.
However, when we start unicorn with /etc/init.d/unicorn start, everything works without a hitch. (to clarify, systemd starts the same /etc/init.d script)
We tried debugging ENV vars, user permissions, file/dir ownership, recompile ruby, bootstrap a new server from scratch... Nothing seems to help.
In case this helps:
unicorn init.d script
unicorn.rb
What are we missing? What can we try that we haven't thought of?
UPDATE 1
output of some debug commands, e.g. OpenSSL, ruby etc
PATH is being set inside the init.d script
unicorn is being executed via su into www-data user
The same problem happens when we use this unicorn.service file in /etc/systemd/system
We're running Ubuntu 16.04 on Gcloud
Ruby was not installed via apt (explicitly removed, in case platform came pre-installed) and compiled from scratch. We're currently running 2.3.4 and tried also 2.3.6. Compiled either manually or using ruby-build. No rbenv, nor RVM.
We install libssl-dev via apt (we're running apt-get install -y autoconf bison build-essential libssl-dev libyaml-dev libreadline6-dev zlib1g-dev libncurses5-dev libffi-dev libgdbm3 libgdbm-dev before building ruby)
UPDATE 2
We're using a scripted/repeatable build process for the VM (using fabric), and this problem is consistent on multiple VMs we bootstrapped on GCloud. We then tried a VM on DigitalOcean with the same bootstrap scripts, and the problem doesn't seem to appear there.
In both cases we picked Ubuntu 16.04 64bit base image, but obviously there are some differences with kernel versions, base installed packages etc...
UPDATE 3
The problem simply vanished. See my answer below.
#gingerlime I had a similar situation with our Jenkins on GCP, we're using ChefDK 3.1.0 (ruby embeed 2.5.1p57) -- tried other also, over a Jenkins that was running over systemd (Ubuntu 16.04) and upstart (Ubuntu 14.04) -- we tried on both versions, right now running over 16.04 in 4.15.0-1023-gcp kernel version, running a few jobs with kitchen-docker and this problem always emerge in a few situations.
I digged into and found that this only happens when the Etc.getlogin class gets called (for me here), this doesn't return any error, it return the correct info, the correct type of the class (String), but once it gets a call, the Unsupported digest algorithm gets raised.
If I start the process manually by root or jenkins user, this problem doesn't happen. I tried to implement the Etc.getlogin in several different ways, like using ENV['USER'], a fixed String, or other classes from Etc, like getpwuid, simulating the return class and values from Etc.getlogin, and the error doesn't get raised.
I'm not sure if this is some bug related to the ruby version and the custom kernel that GCP instances uses, but it happens in a similar situation like yours, and for me, the Etc.getlogin was the problem. Right now, I fixed by using a custom configuration that doesn't gets the call from this function, and it's working normally.
One option is that this isn't an issue of sysVinit vs systemd at all, but you just haven't triggered the issue with your sysVinit script yet.
When you run your svsVinit script through the systemctl command it's going through a compatibility layer, and there may be a problem there. Your problem would be simplified both yourself and for us if you reproduced the issue directly with a systemd service file and shared that file.
You mentioned debugging ENV, but didn't mention exactly what you checked in the ENV. This is definitely one place where systemd could make a difference. As seen in man systemd.exec, systemd sets $PATH in the environment to a fixed value:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
If this is not exactly the same as when run directly as an sysVinit script, that could be an issue.
I would also check for all your copies of SSL on the system. Do you have more than one? Where? Do you have more than copy of the ruby openssl module loaded?
locate -r lib/.*libssl.*so
Also see the answer to the FAQ: Why do things behave differently under systemd?
(also posted on this github issue)
It looks like the problem just vanished. We were testing and reproducing it consistently, across several Compute Engine instances on Google Cloud. Under certain conditions (unicorn / puma started by systemd, etc), it was completely reproducible both with our own rails app, and with a plain vanilla rails app we've set for testing purposes. It was reproducible across several ruby versions as well (we tested 2.3.4, 2.3.6 and 2.5.0).
Suddenly, all instances that were consistently failing started working without exhibiting these problems. Like it never existed. We didn't even reboot some of those instances, and we saw no evidence of any unattended upgrades taking place... We also had one snapshot of a system that had this problem, and that we can reliably reproduce on. Creating an instance from this snapshot stopped exhibiting it as well from that specific point in time a few hours ago.
We're totally confused as to what might have cause it, and what might have made it disappear... However, without being able to reproduce it now, I guess there's no point leaving this issue open, so will close it. Chalk it up to Deus ex machina I suppose. (perhaps the google support gods, but they haven't reported anything back to us yet)
I am trying to install kylo 0.8.4.
There is a step to install kylo specific components after installing Nifi using command,
sudo ./install-kylo-components.sh /opt /opt/kylo kylo kylo
but getting follwing error.
Creating symlinks for NiFi version 1.4.0.jar compatible nars
ERROR: spark-submit not on path. Has spark been installed?
I have spark installed.
need help.
The script calls which spark-submit to check if Spark is available. If available, it uses spark-submit --version to determine the version of Spark that is installed.
The error indicates that spark-submit is not available on system path. Can you please execute which spark-submit on the command line and check the result? Please refer to the screenshot below for expected result on Kylo sandbox.
If spark-submit is not available on the system path, you can fix it by updating the PATHvariable in .bash_profile file by providing the location of your Spark installation.
As a next step, you can also verify the installed version of Spark by running spark-submit --version. Please refer to screenshot below for an example result.
I am trying to cache ruby gems onto a Jenkins slave. I have installed gemstash onto my linux virtualbox which runs the slave, but however, I am not sure if I am installing it in the right location.
Should I be installing it by logging into the Jenkins user in the terminal and installing it there? Because when I created the slave node, I didn't need to install Jenkins onto the box. The source I use for the gemfile is localhost:9292
EDIT:
And how can I check what packages gemstash has cached?
Checking if gemstash has cached packages can be done by following https://github.com/bundler/gemstash#bundling
Any help would be appreciated.
As the README says, have a look in ~/.gemstash:
You might wonder where the gems are stored. After running the commands above, you will find a new directory at ~/.gemstash. This directory holds all the cached and private gems. It also has a server log, the database, and configuration for Gemstash.
I'm currently working on a build pipeline that uses Jenkins and GitLab to trigger builds for the project. Basically, the build is triggered when someone pushes to the repository. Also, some Ruby scripts are executed as part of the build process. These scripts run some checks on the projects and perform some fixes, like synchronizing an Xcode project with added and deleted files from the source directory - in this case they are not the same.
I'm using several tools to configure the pipeline. The builds run on a machine that is physically located on the build slave. Jenkins is deployed to an AWS machine. For this reason, I used pritunl to connect the two on a virtual network. I can use local IPs to communicate between the machines and SSH is working fine both ways.
When I push to the remote the build starts correctly on the slave, but it fails to complete. However, if I manually access using SSH through the terminal, the build performs fine. This is the output I get from Jenkins:
/System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:55:in `require': cannot load such file -- xcodeproj (LoadError)
from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:55:in `require'
from /Users/jenkins/workspace/Core/platform/ios/scripts/pbxsync.rb:58:in `<main>'
As you can see, it fails to require Xcodeproj, causing the build to fail. Still, this only happens if the build is triggered by Jenkins, not manually.
This makes me think that Jenkins is using some different installation of Ruby, or at least a different environment. Basically what I need is to install gems for the same Ruby environment that Jenkins is using, but I don't know which one that is. Any ideas?
Jenkins has a console that runs Groovy scripts on the remote slave. I've been playing with it a bit, but not many conclusions so far. Maybe that helps.
This may be important; this is the shebang I'm using for the Ruby scripts: #!/usr/bin/env ruby
On the terminal, I'm using the same user as Jenkins is to access the slave machine. It's called "jenkins".
One thing I forgot to mention is that the output is telling me the right version: /Users/jenkins/.rvm/rubies/ruby-2.4.0. At least that's the path it's indicating it's trying to load the gem from. So I tried the following:
: /Users/jenkins/.rvm/rubies/ruby-2.4.0/bin/ruby
require 'xcodeproj'
Then I press ctrl+D and get no output - that installation of ruby is finding the gem properly.
If you are using Jenkins Slave plugin to communicate between Jenkins Master and Jenkins Slave, every command that u specify will be run in non-interactive shell. That means that Jenkins will only have access to system ruby in your case.
So if you want to install something that needs to be installed you have to do it in system ruby. You are using rvm so: rvm use system and you can install gem to system ruby.
If you want to use different Ruby version than system ruby you need to add RVM to $PATH for non-interactive shell. Here is basic setup that should help: https://rvm.io/rvm/basics
I finally managed it. As #Cosaquee indicated in another response, it's important to distinguish between interactive and non-interactive shells. The main reason for this is because, depending on how you call SSH, it makes a difference. As the man page indicates:
If command is specified, it is executed on the remote host instead of
a login shell.
This is meaningful, because the Launch Command for the node I have set for Jenkins is this one:
ssh jenkins#x.x.x.x java -jar ~/bin/slave.jar
In the meanwhile, I was logging in with the standard ssh jenkins#x.x.x.x from the terminal, which starts a login shell. It makes sense that I was getting different results because the two shells load different initial scripts. Basically, if you use ssh jenkins#x.x.x.x to login into the machine ~/.bash_profile is loaded, while if you specify a command, such as ssh jenkins#x.x.x.x whatever, then ~/.bashrc is loaded instead. As such, I added this line to ~/.bashrc:
[[ -s "$HOME/.rvm/scripts/rvm" ]] && . "$HOME/.rvm/scripts/rvm"
Without it I got:
RVM is not a function, selecting rubies with 'rvm use ...' will not
work.
The advantage was that I could now use RVM from the same environment Jenkins was using. The rest is easy:
ssh jenkins#x.x.x.x rvm --default use 2.3
And:
ssh jenkins#x.x.x.x
rvm --default use 2.3
And both are now using the same version of ruby.
I installed Hadoop and Pig using brew install hadoop and brew install pig.
I read here that you will to get Unable to load realm info from SCDynamicStore error message unless you add:
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
to your hadoop-env.sh file, which I have.
However, when I run hadoop namenode -format, I still see:
java[1548:1703] Unable to load realm info from SCDynamicStore
amongst the outputs.
Anyone know why I'm still getting it?
As dturnanski suggests, you need to use an older JDK. You can set this in the hadoop-env.sh file by changing the JAVA_HOME setting to:
export JAVA_HOME=`/usr/libexec/java_home -v 1.6`
(Note the grave quotes here.) This fixed the problem for me.
I had the same issue with java 7. works with java 6