getting started with pig - hadoop

This might be a really stupid question but I'm not able to install pig properly on my machine.
Pig's version is 0.9.0.
I have even set my JAVA_HOME to its designated path .
I've set the PATH to
export PATH=/usr/local/pig-0.9.0/bin:$PATH
since my pig dir is in /usr/local/.
Whenever I type pig or pig -help I get the following message
su: /usr/local/pig-0.9.0/bin/pig: Permission denied
Please help. Thank you.

try to type:
chmod +x /usr/local/pig-0.9.0/bin/pig

'chmod 777 -R /usr/local/pig-0.9.0/
usethis code definitely run

Related

execute aws command in script with sudo

I am running a bash script with sudo and have tried the below but am getting the error below using aws cp. I think the problem is that the script is looking for the config in /root which does not exist. However doesn't the -E preserve the original location? Is there an option that can be used with aws cp to pass the location of the config. Thank you :).
sudo -E bash /path/to/.sh
- inside of this script is `aws cp`
Error
The config profile (name) could not be found
I have also tried `export` the name profile and `source` the path to the `config`
You can use the original user like :
sudo -u $SUDO_USER aws cp ...
You could also run the script using source instead of bash -- using source will cause the script to run in the same shell as your open terminal window, which will keep the same env together (such as user) - though honestly, #Philippe answer is the better, more correct one.

spark-submit command not found in airflow

I am trying to run my spark job in airflow, when I executed this command spark-submit --class dataload.dataload_daily /home/ubuntu/airflow/dags/scripts/data_to_s3-assembly-0.1.jar in terminal, it works fine without any issue.
However, I am doing the same here in airflow, but keep getting the error
/tmp/airflowtmpKQMdzp/spark-submit-scalaWVer4Z: line 1: spark-submit:
command not found
t1 = BashOperator(task_id = 'spark-submit-scala',
bash_command = 'spark-submit --class dataload.dataload_daily \
/home/ubuntu/airflow/dags/scripts/data_to_s3-assembly-0.1.jar',
dag=dag,
retries=0,
start_date=datetime(2018, 4, 14))
I have my spark path mentioned in bash_profile,
export SPARK_HOME=/opt/spark-2.2.0-bin-hadoop2.7
export PATH="$SPARK_HOME/bin/:$PATH"
sourced this file as well. Not sure how to debug this, can anyone help me on this?
You could start with bash_command = 'echo $PATH' to see if your path is being updated correctly.
This is because you are metioning editing the bash_profile, but as far as I know Airflow is being run as another user. Since the other user has no changes in the bash_profile, the path to Spark might be missing.
As mentioned here (How do I set an environment variable for airflow to use?) you could try setting the path in .bashrc.

File not found exception while starting Flume agent

I have installed Flume for the first time. I am using hadoop-1.2.1 and flume 1.6.0
I tried setting up a flume agent by following this guide.
I executed this command : $ bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template
It says log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: ./logs/flume.log (No such file or directory)
Isn't the flume.log file generated automatically? If not, how can I rectify this error ?
Try this:
mkdir ./logs
sudo chown `whoami` ./logs
bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template
The first line creates the logs directory in the current directory if it does not already exist. The second one sets the owner of that directory to the current user (you) so that flume-ng running as your user can write to it.
Finally, please note that this is not the recommended way to run Flume, just a quick hack to try it.
You are getting this error probably because you are running command directly on console, you've to first go to the bin in flume and try running your command there over console.
As #Botond says, you need to set the right permissions.
However, if you run Flume within a program, like supervisor or with a custom script, you might want to change the default path, as it's relative to the launcher.
This path is defined in your /path/to/apache-flume-1.6.0-bin/conf/log4j.properties. There you can change the line
flume.log.dir=./logs
to use an absolute path that you would like to use - you still need the right permissions, though.

Unable to find Namenode class when setting up Hadoop on Windows 8

Trying to set up Hadoop 2.4.1 on my machine using Cygwin and I'm stuck when I try to run
$ hdfs namenode -format
which gives me
Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
I think it's due to an undefined environment variable, since I can run
$ hadoop version
without a problem. I've defined the following:
JAVA_HOME
HADOOP_HOME
HADOOP_INSTALL
as well as adding the Hadoop \bin and \sbin (and Cygwin's \bin) to the Path. Am I missing an environment variable that I need to define?
Ok, looks like the file hadoop\bin\hdfs also has to be changed like the hadoop\bin\hadoop file described here.
The end of the file must be changed from:
exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$#"
to
exec "$JAVA" -classpath "$(cygpath -pw "$CLASSPATH")" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$#"
I assume I'll have to make similar changes to the hadoop\bin\mapred and hadoop\bin\yarn when I get to using those files.

modify file content

I'm installing a lighttpd server on a remote machine using a bash script. After installation, I need to configure the port for the server. The system says I don't have permission to modify the file /etc/lighttpd/lighttpd.conf even though I do
sudo echo "server.bind=2000" >> /etc/lighttpd/lighttpd.conf
How shall I modify this?
What you're doing is running echo as root, then trying to append its output to the config file as the normal user.
What you want is sudo sh -c 'echo "server.bind=2000" >> /etc/lighttpd/lighttpd.conf'
Try to change the file permission using chmod
$ sudo chmod a+x /etc/lighttpd/lighttpd.conf
If you don't have the right to change the file /etc/lighttpd/lighttpd.conf check the man page of lighthttpd. If you can start it with a different config file, then create a config file somewhere and start lighthttpd with it.
The problem is that the bit on the right of >> is not run under sudo. Either use sudo -i to bring up a root shell a superuser and run the command, or just use an editor as mentioned before.

Resources