Error occurred when loading csv file into Spark DataFrame in Rstudio - rstudio

Following code to read a csv file into Spark DataFrame in Rstudio
Error occurred and could not resolve it.

Error Resolved while reading a csv files into Spark DataFrame in RStudio
Reading a csv files into SparkDataFrame in Rstudio without using hdfs,amazon web services and file path protocols

Related

Jar file not found exception when running map reduce job when copying data from hbase

When I tried to execute the following command to copy data from hbase to another cluster in a hbase client environment. The command I ran is:
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=[destination zk]:/hbase [source table name]
I got this error:
Exception in thread "main" java.io.FileNotFoundException: File does
not exist:
hdfs://servername:8020/opt/hbase-1.2.10/lib/metrics-core-2.2.0.jar at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
The /opt/hbase-1.2.10/lib/metrics-core-2.2.0.jar is on my local path but it does not exist in the hdfs. It seems the CopyTable util is submitting a mapreduce job without the dependency jars. I read a few articles and it seems the only solution is to upload the jar lib to hdfs with the same path. This is really an ugly solution.
Please kindly advise. Thanks!

Pentaho job "Hadoop File Output" writing onto pentaho server instead of HDFS path

I am using the "Hadoop File output" component. Sometimes it works correctly but sometimes it tries to write the file on pentaho server itself. Has anyone faced this issue?

Issue with load data into HIVE

We have launched two EMR in AWS and installed the hadoop and hive-0.11.0 in one EMR and hive-0.13.1 other one.
Everything seems to be working fine but while trying to loading data into TABLE it's giving the below error and it happening in both the Hive Servers.
ERROR MESSAGE:
An error occurred when executing the SQL command: load data inpath
's3://buckername/export/employee_1/' into table employee_2 Query
returned non-zero code: 10028, cause: FAILED: SemanticException [Error
10028]: Line 1:17 Path is not legal
''s3://buckername/export/employee_1/'': Move from:
s3://buckername/export/employee_1 to:
hdfs://XXX.XX.XXX.XX:X000/mnt/hive_0110/warehouse/employee_2 is not
valid. Please check that values for params "default.fs.name" and
"hive.metastore.warehouse.dir" do not conflict. [SQL State=42000, DB
Errorcode=10028]
I searched for the reason and mean of this message, I found this link but when tried to execute command suggested in the given link it's also giving the below error.
Command:
--service metatool -updateLocation hdfs://XXX.XX.XXX.XX:X000 hdfs://XXX.XX.XXX.XX:X000
Initializing HiveMetaTool.. HiveMetaTool:Parsing failed. Reason:
Unrecognized option: -hiveconf
Any help in this will be really appreciated.
LOAD does not support S3. It is best practice to leave data in S3 and just use it as a Hive external table instead of copying the data to HDFS. Some references http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive-additional-features.html and When you create an external table in Hive with an S3 location is the data transfered?
If you have installed hive on your Hadoop cluster, the default storage of hive data is HDFS (hive.metastore.warehouse.dir=/user/hive/warehouse).
As a workaround you can copy the file from S3 file system to HDFS and then from HDFS load the file to hive.
Most probably we may need to modify the parameter "hive.exim.uri.scheme.whitelist=hdfs,pfile" to load the data from S3 file system.

HDInsight VM initialization error when loading data

I am trying to run this getting started sample for loading data into my single node HDInsight hadoop cluster. When I run the sample I get the error as shown below:
c:\Hadoop\GettingStarted>powershell -ExecutionPolicy unrestricted -F importdata.
ps1 w3c
Attempting to import scenario w3c
Path
----
C:\Hadoop\GettingStarted\w3c
Error occurred during initialization of VM
java.nio.charset.IllegalCharsetNameException:
at java.nio.charset.Charset.checkName(Charset.java:273)
at java.nio.charset.Charset.lookup2(Charset.java:458)
at java.nio.charset.Charset.lookup(Charset.java:437)
at java.nio.charset.Charset.defaultCharset(Charset.java:579)
at sun.nio.cs.StreamEncoder.forOutputStreamWriter(StreamEncoder.java:37)
at java.io.OutputStreamWriter.<init>(OutputStreamWriter.java:94)
at java.io.PrintStream.<init>(PrintStream.java:100)
at java.lang.System.initializeSystemClass(System.java:1092)
It seems this issue is related with file write permission when creating data file on your machine.

Loading files into hive through JDBC

I'm getting this error when trying to load a file into Hive through its JDBC driver. The Hive instance is running on a vm. The file loads perfectly fine when I load it through hive commandline. The file is located on the same instance as Hive. I hope jdbc supports the load command.
java.sql.SQLException: Query returned non-zero code: 10, cause: FAILED: Error in semantic analysis: Line 1:23 Invalid path ''/home/cloudera/Desktop/test.csv'': No files matching path file:/home/cloudera/Desktop/test.csv
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:189)
at Main.main(Main.java:55)
Since hive in-turn runs in a map/reduce environment, user need to provide hdfs path for the csv file (not local path) when using hive jdbc. While running using hive cli, it takes local path as it takes care of uploading files to hdfs to load into table.

Categories

Resources