Databricks error while reading GTiff file using RasterIO - azure-databricks

While reading raster file in Databricks getting below error:
ConnectException: Connection refused (Connection refused)
Error while obtaining a new communication channel
Code:
import rasterio
import rasterio.plot
import pyproj
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import rasterio.features
import rasterio.warp
raster = rasterio.open('/dbfs/mnt/Firescar/cvmsre_201909_afka2.tif')
raster.read(1)

Related

SSLCertVerificationError in jupiter notebook

from urllib.request import urlopen
html = urlopen('http://pythonscraping.com/pages/page1.html')
print(html.read())
this code i have run but still im getting error i have tried to install openssl packages and upgrade certifi
The reason you got error is the cert of that website expired. Below is a suggested solution:
import urllib
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
html=urlopen("https://pythonscraping.com/pages/page1.html", context=ctx)
print(html.read())

ImportError: cannot import name 'MultiObjectiveDisplay' from 'pymoo.util.display'

Does anybody know how to solve this ImportError while running the example code?
ImportError: cannot import name 'MultiObjectiveDisplay' from 'pymoo.util.display' (C:\Users\mycomputer\anaconda3\lib\site-packages\pymoo\util\display_init_.py)

cannot import name 'spawn' from 'pexpect' on windows

Trying to ssh tunnel from Windows system through linux jumphost with ip (xx.xx.xx.xx) and connect to target config windows system with ip 127.0.0.1
import sys
import paramiko
import subprocess
import pexpect
from pexpect.popen_spawn import PopenSpawn
import winpexpect
from winpexpect.winspawn import winspawn
child = winpexpect.winspawn('ssh -L 22:xx.xx.xx.xx:4022 Administrator#127.0.0.1 -o StrictHostKeyChecking=no')
child.expect('127.0.0.1')
child.sendline('password')
The above program throws below error when run on Windows system.
File "C:\Python38\lib\site-packages\winpexpect.py", line 18, in
from pexpect import spawn, ExceptionPexpect, EOF, TIMEOUT
ImportError: cannot import name 'spawn' from 'pexpect' (C:\Python38\lib\site-packages\pexpect_init_.py)
cannot import name 'spawn' from 'pexpect' spawn is not for Windows system. use pexpect.popen_spawn.PopenSpawn for windows. check here https://pexpect.readthedocs.io/en/stable/overview.html#pexpect-on-windows

ClassNotFoundException while running MissingPokerCards on ec2 Instance

I'm getting the following error when I try to run jar file -
Exception in thread "main" java.lang.ClassNotFoundException: finalPoker.MissingPokerCards
at java.net.URLClassLoader$1.run(URLClassLoader.java:360)
at java.net.URLClassLoader$1.run(URLClassLoader.java:349)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:348)
at java.lang.ClassLoader.loadClass(ClassLoader.java:430)
at java.lang.ClassLoader.loadClass(ClassLoader.java:363)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)
The following code is my MissingPokerCards program which will count the number of missing cards from the deck of 52 cards.
package MissingPokerCards;
import java.io.IOException;
import java.util.ArrayList;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class PokerCardsProgramme {
//Mapper function
//Reduce funtion
//Main function
public static void main(String[] args) throws Exception {
Configuration config = new Configuration();
Job job = new Job(config, "Search for list of missing Cards");
job.setJarByClass(PokerCardsProgramme.class);
job.setMapperClass(mapper.class);
job.setReducerClass(reducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
Compiled code using - javac -classpath /home/ec2-user/hadoop_home/hadoop-1.2.1/hadoop-core-1.2.1.jar PokerCardsProgramme.java
Jar is created by using following command - jar cvf MissingPokerCards.jar PokerCardsProgramme*.class
Jar file is ran using - hadoop jar MissingPokerCards.jar MissingPokerCards.PokerCardsProgramme \input\inputcards.txt output
My Hadoop version is 1.2.1 and java version is 1.7.0_241
Even I tried using a different version of Hadoop-2-7-3
hadoop jar /home/ec2-user/hadoop/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.3.jar MissingPokerCards.PokerCardsProgramme inputcards.txt /output
Still facing the same issue. I think I am missing the PokerCards function related jar file.
Can anybody please help me with this problem. Am I using the correct command to compile and run the program or else is there any way to execute the MissingPokerCards program on ec2 instance.
I am able to run the same code in eclipse but when I tried to execute on ec2 it is showing this issue.
The error has nothing to do with Hadoop or EC2. This is just a regular Java error. If you really want to run Hadoop code in AWS use EMR, not EC2 instances
Your package is MissingPokerCards. The error says it's finalPoker
Your class is PokerCardsProgramme. Your error says it's MissingPokerCards
FWIW, not many people actually write mapreduce nowadays, but you definitely should be using Hadoop 2 or 3 with Java 8, not 1.2.1 with Java 7

Pig UDF running on AWS EMR with java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc

I am developing an application that try to read log file stored in S3 bucks and parse it using Elastic MapReduce. Current the log file has following format
-------------------------------
COLOR=Black
Date=1349719200
PID=23898
Program=Java
EOE
-------------------------------
COLOR=White
Date=1349719234
PID=23828
Program=Python
EOE
So I try to load the file into my Pig script, but the build-in Pig Loader doesn't seems be able to load my data, so I have to create my own UDF. Since I am pretty new to Pig and Hadoop, I want to try script that written by others before I write my own, just to get a teast of how UDF works. I found one from here http://pig.apache.org/docs/r0.10.0/udf.html, there is a SimpleTextLoader. In order to compile this SimpleTextLoader, I have to add a few imports, as
import java.io.IOException;
import java.util.ArrayList;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.InputFormat;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit;
import org.apache.pig.backend.executionengine.ExecException;
import org.apache.pig.data.Tuple;
import org.apache.pig.data.TupleFactory;
import org.apache.pig.data.DataByteArray;
import org.apache.pig.PigException;
import org.apache.pig.LoadFunc;
Then, I found out I need to compile this file. I have to download svn and pig running
sudo apt-get install subversion
svn co http://svn.apache.org/repos/asf/pig/trunk
ant
Now i have a pig.jar file, then I try to compile this file.
javac -cp ./trunk/pig.jar SimpleTextLoader.java
jar -cf SimpleTextLoader.jar SimpleTextLoader.class
It compiles successful, and i type in Pig entering grunt, in grunt i try to load the file, using
grunt> register file:/home/hadoop/myudfs.jar
grunt> raw = LOAD 's3://mys3bucket/samplelogs/applog.log' USING myudfs.SimpleTextLoader('=') AS (key:chararray, value:chararray);
2012-12-05 00:08:26,737 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org/apache/pig/LoadFunc Details at logfile: /home/hadoop/pig_1354666051892.log
Inside the pig_1354666051892.log, it has
Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. org/apache/pig/LoadFunc
java.lang.NoClassDefFoundError: org/apache/pig/LoadFunc
I also try to use another UDF (UPPER.java) from http://wiki.apache.org/pig/UDFManual, and I am still get the same error by try to use UPPER method. Can you please help me out, what's the problem here? Much thanks!
UPDATE: I did try EMR build-in Pig.jar at /home/hadoop/lib/pig/pig.jar, and get the same problem.
Put the UDF jar in the /home/hadoop/lib/pig directory or copy the pig-*-amzn.jar file to /home/hadoop/lib and it will work.
You would probably use a bootstrap action to do either of these.
Most of the Hadoop ecosystem tools like pig and hive look up $HADOOP_HOME/conf/hadoop-env.sh for environment variables.
I was able to resolve this issue by adding pig-0.13.0-h1.jar (it contains all the classes required by the UDF) to the HADOOP_CLASSPATH:
export HADOOP_CLASSPATH=/home/hadoop/pig-0.13.0/pig-0.13.0-h1.jar:$HADOOP_CLASSPATH
pig-0.13.0-h1.jar is available in the Pig home directory.

Resources