Unable to create Hive table, flaky metastore connections - hadoop

I'm following this blog post that partitions S3 Access Logs by date using Hive and EMR. I was able to run this script against a small bucket of access logs okay, but table creation on top of a large bucket (~ 1.5 TB) fails with the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.thrift.transport.TTransportException
I looked through the hive logs, but nothing stand out: /mnt/var/log/hive. Not sure what the problem is as this error is pretty generic. I'm pretty much following the blog post verbatim and the script errors out on the following after 10 or 15 minutes
CREATE EXTERNAL TABLE IF NOT EXISTS Accesslogs(....
Update: I found more log info and also ran Hive in debug mode. EMR gets intermittent connection failures to the metastore then finally fails
.........
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) [hive-cli-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) [hive-cli-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) [hive-cli-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) [hive-cli-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) [hive-cli-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) [hive-cli-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
at org.apache.hadoop.util.RunJar.run(RunJar.java:221) [hadoop-common-2.7.3-amzn-5.jar:?]
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) [hadoop-common-2.7.3-amzn-5.jar:?]
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_151]
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_151]
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_151]
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_151]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_151]
at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_151]
at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ~[hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
... 33 more
2017-12-10T15:18:18,718 INFO [e74af478-3227-4bf9-9fde-74d8babcf5f0 main([])]: hive.metastore (HiveMetaStoreClient.java:open(506)) - Waiting 1 seconds before next connection attempt.
2017-12-10T15:18:19,719 INFO [e74af478-3227-4bf9-9fde-74d8babcf5f0 main([])]: hive.metastore (HiveMetaStoreClient.java:open(392)) - Trying to connect to metastore with URI thrift://ip-172-50-31-107.ec2.internal:9083
2017-12-10T15:18:19,719 WARN [e74af478-3227-4bf9-9fde-74d8babcf5f0 main([])]: hive.metastore (HiveMetaStoreClient.java:open(472)) - Failed to connect to the MetaStore Server...
org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.thrift.transport.TSocket.open(TSocket.java:226) ~[hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:465) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.reconnect(HiveMetaStoreClient.java:335) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:163) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at com.sun.proxy.$Proxy37.createTable(Unknown Source) [?:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2303) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at com.sun.proxy.$Proxy37.createTable(Unknown Source) [?:?]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:854) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:869) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4356) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) [hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) [hive-cli-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) [hive-cli-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) [hive-cli-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) [hive-cli-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) [hive-cli-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) [hive-cli-2.3.1-amzn-0.jar:2.3.1-amzn-0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
at org.apache.hadoop.util.RunJar.run(RunJar.java:221) [hadoop-common-2.7.3-amzn-5.jar:?]
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) [hadoop-common-2.7.3-amzn-5.jar:?]
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_151]
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_151]
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_151]
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_151]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_151]
at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_151]
at org.apache.thrift.transport.TSocket.open(TSocket.java:221) ~[hive-exec-2.3.1-amzn-0.jar:2.3.1-amzn-0]
... 33 more
2017-12-10T15:18:19,720 INFO [e74af478-3227-4bf9-9fde-74d8babcf5f0 main([])]: hive.metastore (HiveMetaStoreClient.java:open(506)) - Waiting 1 seconds before next connection attempt.
2017-12-10T15:18:20,721 INFO [e74af478-3227-4bf9-9fde-74d8babcf5f0 main([])]: hive.metastore (HiveMetaStoreClient.java:open(392)) - Trying to connect to metastore with URI thrift://ip-172-50-31-107.ec2.internal:9083
2017-12-10T15:18:20,721 INFO [e74af478-3227-4bf9-9fde-74d8babcf5f0 main([])]: hive.metastore (HiveMetaStoreClient.java:open(466)) - Opened a connection to metastore, current connections: 1
2017-12-10T15:18:20,795 INFO [e74af478-3227-4bf9-9fde-74d8babcf5f0 main([])]: hive.metastore (HiveMetaStoreClient.java:open(519)) - Connected to metastore.
2017-12-10T15:18:28,013 DEBUG [java-sdk-http-connection-reaper([])]: conn.PoolingHttpClientConnectionManager (PoolingHttpClientConnectionManager.java:closeIdleConnections(401)) - Closing connections idle longer than 60000 MILLISECONDS
2017-12-10T15:18:28,014 DEBUG [java-sdk-http-connection-reaper([])]: conn.PoolingHttpClientConnectionManager (PoolingHttpClientConnectionManager.java:closeIdleConnections(401)) - Closing connections idle longer than 60000 MILLISECONDS
2017-12-10T15:19:28,014 DEBUG [java-sdk-http-connection-reaper([])]: conn.PoolingHttpClientConnectionManager (PoolingHttpClientConnectionManager.java:closeIdleConnections(401)) - Closing connections idle longer than 60000 MILLISECONDS
2017-12-10T15:19:28,014 DEBUG [java-sdk-http-connection-reaper([])]: conn.PoolingHttpClientConnectionManager (PoolingHttpClientConnectionManager.java:closeIdleConnections(401)) - Closing connections idle longer than 60000 MILLISECONDS
2017-12-10T15:20:28,014 DEBUG [java-sdk-http-connection-reaper([])]: conn.PoolingHttpClientConnectionManager (PoolingHttpClientConnectionManager.java:closeIdleConnections(401)) - Closing connections idle longer than 60000 MILLISECONDS
2017-12-10T15:20:28,014 DEBUG [java-sdk-http-connection-reaper([])]: conn.PoolingHttpClientConnectionManager (PoolingHttpClientConnectionManager.java:closeIdleConnections(401)) - Closing connections idle longer than 60000 MILLISECONDS
2017-12-10T15:21:28,015 DEBUG [java-sdk-http-connection-reaper([])]: conn.PoolingHttpClientConnectionManager (PoolingHttpClientConnectionManager.java:closeIdleConnections(401)) - Closing connections idle longer than 60000 MILLISECONDS
2017-12-10T15:21:28,015 DEBUG [java-sdk-http-connection-reaper([])]: conn.PoolingHttpClientConnectionManager (PoolingHttpClientConnectionManager.java:closeIdleConnections(401)) - Closing connections idle longer than 60000 MILLISECONDS
2017-12-10T15:22:28,015 DEBUG [java-sdk-http-connection-reaper([])]: conn.PoolingHttpClientConnectionManager (PoolingHttpClientConnectionManager.java:closeIdleConnections(401)) - Closing connections idle longer than 60000 MILLISECONDS
2017-12-10T15:22:28,015 DEBUG [java-sdk-http-connection-reaper([])]: conn.PoolingHttpClientConnectionManager (PoolingHttpClientConnectionManager.java:closeIdleConnections(401)) - Closing connections idle longer than 60000 MILLISECONDS
2017-12-10T15:22:44,472 ERROR [e74af478-3227-4bf9-9fde-74d8babcf5f0 main([])]: exec.DDLTask (DDLTask.java:failed(639)) - org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:864)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:869)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4356)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:354)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1199)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1185)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2372)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:93)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:737)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:725)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
at com.sun.proxy.$Proxy37.createTable(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2303)
at com.sun.proxy.$Proxy37.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:854)
... 22 more
2017-12-10T15:22:44,472 ERROR [e74af478-3227-4bf9-9fde-74d8babcf5f0 main([])]: ql.Driver (SessionState.java:printError(1126)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.thrift.transport.TTransportException
2017-12-10T15:22:44,472 DEBUG [e74af478-3227-4bf9-9fde-74d8babcf5f0 main([])]: ql.Driver (DriverContext.java:shutdown(132)) - Shutting down query CREATE EXTERNAL TABLE IF NOT EXISTS Accesslogs(
BucketOwner string,
Bucket string,
RequestDateTime string,
RemoteIP string,
Requester string,
RequestID string,
Operation string,
Key string,
RequestURI_operation string,
RequestURI_key string,
RequestURI_httpProtoversion string,
HTTPstatus string,
ErrorCode string,
BytesSent string,
ObjectSize string,
TotalTime string,
TurnAroundTime string,
Referrer string,
UserAgent string,
VersionId string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1',

At a guess, Hive is trying to do something which is fast on a filesystem (recursive treewalk, rename) which keels over when it gets to s3 and all these things are faked in the client.
Consider filing a JIRA against Hive on this; include any logs server side you can too, and try some different scaled files to see when things fail.

Related

Team site connection issue from my pc only, others are able to connect

com.interwoven.cssdk.common.CSException: (java.net.SocketException: Connection reset)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.interwoven.cssdk.client.axis.common.AxisExceptionTranslator.translateException(AxisExceptionTranslator.java:72)
at com.interwoven.cssdk.client.axis.access.AccessServiceAdapterImpl.beginSessionUsingPassword(AccessServiceAdapterImpl.java:228)
at com.interwoven.cssdk.client.axis.common.AxisFactory.getClient(AxisFactory.java:273)
Root cause:
AxisFault
faultCode: {http://schemas.xmlsoap.org/soap/envelope/}Server.userException
faultSubcode:
faultString: java.net.SocketException: Connection reset
faultActor:
faultNode:
faultDetail:
{http://xml.apache.org/axis/}stackTrace:java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read(BufferedInputStream.java:265)

Cleaning staging area Error:Hadoop

I am running a map reduce code on hadoop and getting the following error:
INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/home/dataflair/hdata/mapred/staging/intern124288574/.staging/job_local124288574_0001
Exception in thread "main" ExitCodeException exitCode=1: chmod: cannot access '/home/dataflair/hdata/mapred/staging/intern124288574/.staging/job_local124288574_0001': No such file or directory
at org.apache.hadoop.util.Shell.runCommand(Shell.java:998)
at org.apache.hadoop.util.Shell.run(Shell.java:884)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1216)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1310)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1292)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:767)
at org.apache.hadoop.fs.ChecksumFileSystem$1.apply(ChecksumFileSystem.java:506)
at org.apache.hadoop.fs.ChecksumFileSystem$FsOperation.run(ChecksumFileSystem.java:487)
at org.apache.hadoop.fs.ChecksumFileSystem.setPermission(ChecksumFileSystem.java:503)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:720)
at org.apache.hadoop.mapreduce.JobResourceUploader.mkdirs(JobResourceUploader.java:648)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadResourcesInternal(JobResourceUploader.java:167)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:128)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:101)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
at wordCount.WordCount.main(WordCount.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
at org.apache.hadoop.util.RunJar.main(RunJar.java:153)

Why does DataFrame.saveAsTable("df") save table to different HDFS host?

I have configured Hive (1.13.1) with Spark (1.4.0) and I am able to access all the Databases and Table from hive and my warehouse directory is hdfs://192.168.1.17:8020/user/hive/warehouse
But when, I am trying to save a Dataframe through Spark-Shell (using master) into Hive using df.saveAsTable("df") function, I got this error.
15/07/03 14:48:59 INFO audit: ugi=user ip=unknown-ip-addr cmd=get_database: default
15/07/03 14:48:59 INFO HiveMetaStore: 0: get_table : db=default tbl=df
15/07/03 14:48:59 INFO audit: ugi=user ip=unknown-ip-addr cmd=get_table : db=default tbl=df
java.net.ConnectException: Call From bdiuser-Vostro-3800/127.0.1.1 to 192.168.1.19:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1414)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:699)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1762)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398)
at org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.run(commands.scala:78)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:332)
at org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:239)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:939)
at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:939)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:211)
at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1517)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
at $iwC$$iwC$$iwC.<init>(<console>:35)
at $iwC$$iwC.<init>(<console>:37)
at $iwC.<init>(<console>:39)
at <init>(<console>:41)
at .<init>(<console>:45)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:604)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:699)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
at org.apache.hadoop.ipc.Client.call(Client.java:1381)
... 86 more
When, I go through this error, I found that program tried different host for HDFS connection to save table.
And i also tried with different worker's spark-shell, I got same error.
Please find the example below:
val options = Map("path" -> hiveTablePath)
result.write.format("orc").partitionBy("partitiondate").options(options).mode(SaveMode.Append).saveAsTable(hiveTable)
I have explained this a little bit more in my blog.
With saveAsTable the default location that Spark saves to is controlled by the HiveMetastore (based on the docs). Another option would be to use saveAsParquetFile and specify the path and then later register that path with your hive metastore OR use the new DataFrameWriter interface and specify the path option write.format(source).mode(mode).options(options).saveAsTable(tableName).
You can write spark dataframe to the existing spark table.
Please find the example below:
df.write.mode("overwrite").saveAsTable("database.tableName")

apache thrift transport TTransportException

Hive Version : 0.13.1
Pig Version : 0.13.0
I was trying to get read the hive tables using pig with the below command.
grunt> DATA = LOAD 'dev.profile' USING org.apache.hcatalog.pig.HCatLoader();
I get the below piece of log
2014-07-16 22:44:58,986 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
2014-07-16 22:44:59,037 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://localhost:10000
2014-07-16 22:44:59,057 [main] INFO hive.metastore - Connected to metastore.
2014-07-16 22:45:02,019 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
2014-07-16 22:45:02,166 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
when i do the describe the results comes properly as expected.
grunt> describe DATA
2014-07-16 22:46:42,189 [main] WARN org.apache.hadoop.hive.conf.HiveConf - DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
DATA: {name: chararray,age: int,salary: int}
but when i dump the data i get SocketTimeoutException
2014-07-16 22:47:25,146 [main] ERROR hive.log - Got exception: org.apache.thrift.transport.TTransportException java.net.SocketTimeoutException: Read timed out
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_databases(ThriftHiveMetastore.java:600)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_databases(ThriftHiveMetastore.java:587)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:826)
at org.apache.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.isOpen(HiveClientCache.java:276)
at org.apache.hcatalog.common.HiveClientCache.get(HiveClientCache.java:146)
at org.apache.hcatalog.common.HCatUtil.getHiveClient(HCatUtil.java:548)
at org.apache.hcatalog.pig.PigHCatUtil.getHiveMetaClient(PigHCatUtil.java:158)
at org.apache.hcatalog.pig.PigHCatUtil.getTable(PigHCatUtil.java:200)
at org.apache.hcatalog.pig.HCatLoader.getSchema(HCatLoader.java:195)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)
at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:885)
at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1712)
at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1420)
at org.apache.pig.PigServer.storeEx(PigServer.java:1004)
at org.apache.pig.PigServer.store(PigServer.java:974)
at org.apache.pig.PigServer.openIterator(PigServer.java:887)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:752)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:228)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:542)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 40 more
2014-07-16 22:47:25,148 [main] ERROR hive.log - Converting exception to MetaException
2014-07-16 22:47:25,151 [main] INFO hive.metastore - Trying to connect to metastore with URI thrift://localhost:10000
2014-07-16 22:47:25,152 [main] INFO hive.metastore - Connected to metastore.
2014-07-16 22:47:45,173 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Cannot get schema from loadFunc org.apache.hcatalog.pig.HCatLoader
Failed to parse: Can not retrieve schema from loader org.apache.hcatalog.pig.HCatLoader#1342464f
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1712)
at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1420)
at org.apache.pig.PigServer.storeEx(PigServer.java:1004)
at org.apache.pig.PigServer.store(PigServer.java:974)
at org.apache.pig.PigServer.openIterator(PigServer.java:887)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:752)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:228)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:203)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:542)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.lang.RuntimeException: Can not retrieve schema from loader org.apache.hcatalog.pig.HCatLoader#1342464f
at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:91)
at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:885)
at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3568)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1625)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
... 17 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2245: Cannot get schema from loadFunc org.apache.hcatalog.pig.HCatLoader
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:179)
at org.apache.pig.newplan.logical.relational.LOLoad.<init>(LOLoad.java:89)
... 24 more
Caused by: java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.hcatalog.pig.PigHCatUtil.getTable(PigHCatUtil.java:205)
at org.apache.hcatalog.pig.HCatLoader.getSchema(HCatLoader.java:195)
at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:175)
... 25 more
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table(ThriftHiveMetastore.java:1036)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table(ThriftHiveMetastore.java:1022)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:997)
at org.apache.hcatalog.common.HCatUtil.getTable(HCatUtil.java:194)
at org.apache.hcatalog.pig.PigHCatUtil.getTable(PigHCatUtil.java:201)
... 27 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 37 more
2014-07-16 22:47:45,176 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2245: Cannot get schema from loadFunc org.apache.hcatalog.pig.HCatLoader
Even though i am able to connect to metastore i am not able to retrieve the data. What could be the reason for read fail ?
and at times the process fails with java.lang.OutOfMemoryError: Java heap space
Any help would be greatly appreciated.
Edit the hive-site.xml.
Replace hive.metastore.ds.retry with /hive.hmshandler.retry.
vim /usr/local/Cellar/hive/0.13.1/libexec/conf/hive-site.xml
:%s/hive.metastore.ds.retry/hive.hmshandler.retry/g
:wq

error while copying the files from local file system to HDFS in Hadoop

I am using hadoop for processing the files, presently i am trying to copy the files from local file system to HDFS using the command below
hadoop fs -put d:\hadoop\weblogs /so/data/weblogs
Got the error as below
c:\Hadoop\hadoop-1.1.0-SNAPSHOT>hadoop fs -put d:\hadoop\weblogs /so/data/weblog
s
12/12/03 19:05:16 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop
.ipc.RemoteException: java.io.IOException: File /so/data/weblogs/weblogs/u_ex121
10418.log could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1557)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:695)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1135)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI
nvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat
ionHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock
(DFSClient.java:3518)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStrea
m(DFSClient.java:3381)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2500(DFSClien
t.java:2593)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2833)
12/12/03 19:05:16 WARN hdfs.DFSClient: Error Recovery for block null bad datanod
e[0] nodes == null
12/12/03 19:05:16 WARN hdfs.DFSClient: Could not get block locations. Source fil
e "/so/data/weblogs/weblogs/u_ex12110418.log" - Aborting...
put: java.io.IOException: File /so/data/weblogs/weblogs/u_ex12110418.log could o
nly be replicated to 0 nodes, instead of 1
12/12/03 19:05:16 ERROR hdfs.DFSClient: Exception closing file /so/data/weblogs/
weblogs/u_ex12110418.log : org.apache.hadoop.ipc.RemoteException: java.io.IOExce
ption: File /so/data/weblogs/weblogs/u_ex12110418.log could only be replicated t
o 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1557)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:695)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1135)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /so/data/weblog
s/weblogs/u_ex12110418.log could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1557)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:695)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1135)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI
nvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat
ionHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock
(DFSClient.java:3518)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStrea
m(DFSClient.java:3381)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2500(DFSClien
t.java:2593)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2833)
c:\Hadoop\hadoop-1.1.0-SNAPSHOT>
Can anyone please let me know, whats wrong in the above command, and what need to be done in order to avoid this error ?
One of:
The remote system with HDFS (as specified in the config) hasn't been started
There is a network problem preventing connectivity with the remote system
You ran out of disk space on the HDFS file system
You have the wrong remote system configured.
DataNode of your Hadoop cluster is not running. Check it using jps command.

Resources