Error while connecting Elastic Map Reduce ruby client - hadoop

I am following the steps mentioned on the AWS to use an interactive Hive session using SSH.
I used the following resources
https://github.com/ucbtwitter/getting-started/wiki/Using-Elastic-Map-Reduce-via-Command-Line
http://docs.amazonwebservices.com/ElasticMapReduce/latest/GettingStartedGuide/SignUp.html
I was getting this error initially "Error: Missing key access-id" and then I fixed my JSON file. The JSON file is in the same format as mentioned in the above links.
When I run this command
./elastic-mapreduce
I am getting the following error :-
Error: Unable to parse credentials.json: can't convert String into Integer.
I checked the values required in JSON at AWS as well.
Does anyone has an idea why am I getting this error?

The region value in the credentials.json must be of int type.
{......
......
"region": 1
}

Related

Can't use 'put'() to add data to hbase with happybase

My python version is 3.7, and after I ran pip3 install happybase, I started the command hbase thrift start and tried to write a brief .py file as following:
import happybase
connection = happybase.Connection('master')
table = connection.table('jmlr') #'jmlr' is a table in hbase
for i in table.scan():
print(i)
table.put('001', {'title':'dasds'}) #error here
connection.close()
When it's about to run table.put(), it reported such an error:
thriftpy2.transport.base.TTransportException: TTransportException(type=4, message='TSocket read 0 bytes')
And at the same time, the thrift reported an error:
ERROR [thrift-worker-1] thrift.TBoundedThreadPoolServer: Error occurred during processing of message. java.lang.IllegalArgumentException: Invalid famAndQf provided.
But just now I ran this python file again, it gave me a different error in thrift:
thrift.TBoundedThreadPoolServer: Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Bad version in readMessageBegin
I have tried to add parameters like protocol='compact', transport='framed', but this didn't work, even the table.scan() failed.
Everything in the hbase shell is OK, so I can't figure out what went wrong, I'm about to collapse.
I ran into the same issue and found this sollution. You need to add even empty Column Qualifier ( ':' symbol as delimiter between Column Family and Column Qualifier) into put() method:
table.put('001:', {'title':'dasds'})
Also, you have a different error message after second run of script because thrift server is already failed.
I hope it will help you.

Error while running yaml from cloudformation

Getting error when I run following code:
Parameters:
Counter:
Type : Number
Default : 5
Description : Maximum number of times to check query execution
Error:
An error occurred (ValidationError) when calling the CreateChangeSet operation: Invalid input for parameter key Counter. Cannot specify usePreviousValue as true for a parameter key not in the previous template
I am writing code in yaml and running via AWS cloudformation.
Are you creating a ChangeSet or updating the stack using the option usePreviousValue? The error mentions you are using the usePreviousValue with a parameter that doesn't exist in the template. You can only use the previous value if this parameter is part of the latest version of the template.

Required field 'uncompressed_page_size' was not found in serialized data! Parquet

I am getting below error while trying to save parquet file from local directory using pyspark.
I tried spark 1.6 and 2.2 both give same error
It display's schema properly but gives error at the time of writing file.
base_path = "file:/Users/xyz/Documents/Temp/parquet"
reg_path = "file:/Users/xyz/Documents/Temp/parquet/ds_id=48"
df = sqlContext.read.option( "basePath",base_path).parquet(reg_path)
out_path = "file:/Users/xyz/Documents/Temp/parquet/out"
df2 = df.coalesce(5)
df2.printSchema()
df2.write.mode('append').parquet(out_path)
org.apache.spark.SparkException: Task failed while writing rows
Caused by: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: Required field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
In my own case, I was writing a custom Parquet Parser for Apache Tika and I experienced this error. It turned out that if the file is being used by another process, the ParquetReader will not be able to access uncompressed_page_size. Hence, causing the error.
Verify if other processes are not holding on to the file.
Temporary resolved by the spark config:
"spark.sql.hive.convertMetastoreParquet": "false"
Although it would has extra cost, but a walkaround approach by now.

Reading sas file from blob storage in R

I am trying to read .sas7bdat file from default container. I have tried following till now:
sas_file <- RxSasData("wasbs://container#storageaccount.blob.core.windows.net/abc/xyz.sas7bdat")
sas_df <- rxImport(sas_file)
but I get following error:
The file 'wasbs://container#storageaccount.blob.core.windows.net/abc/xyz.sas7bdat' does not exist.
Could not open data source.
Error in doTryCatch(return(expr), name, parentenv, handler) :
Could not open data source.
File exists at the mentioned location in code. Still it throws error. Can someone please help me this?
According to your code, I think you want to local a SAS data file from HDFS on Azure HDInsight via RxSasData. However, RxSasData seems to be not supported on Hadoop env, as the figure below, please see here.
Please try to copy the file to local filesystem on HDI, then to read.

PIG Cassandra ERROR 2118 Could not get input splits

I started off trying to do simple pig+cassandra integration with this tutorial from datastax: http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/ana/anaPigExRel.html
but when i try to store the result into cql, i get this error:
Message: org.apache.pig.backend.executionengine.ExecException: ERROR
2118: Could not get input splits
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:279)
any ideas whats happening? i read some answers here, referring to changing my PIG_PARTITIONER to Murmur3Partitioner
which i already did and it still happens. is it configuration issue?
export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner
I found out that after doing:
export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner
i need to do source ~/.bashrc and do pig from that particular console.
though I get another error, but I think this case is solved.

Resources