Parquet-MR library is throwing an exception while reading (FIXED_LEN_BYTE_ARRAY / UUID) column - parquet

I have a parquet file which has a column "FIXED_LEN_BYTE_ARRAY / UUID", when I feed it to parquet-mr library, I get this exception:
Exception - caused by: org.apache.parquet.io.ParquetDecodingException: The requested schema is not compatible with the file schema. incompatible types: required binary
Identity (STRING) != required fixed_len_byte_array(16) Identity (UUID)
at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.incompatibleSchema(ColumnIOFactory.java:101)
at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:93)
at org.apache.parquet.schema.PrimitiveType.accept(PrimitiveType.java:602)
at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visitChildren(ColumnIOFactory.java:83)
at org.apache.parquet.io.ColumnIOFactory$ColumnIOCreatorVisitor.visit(ColumnIOFactory.java:57)
at org.apache.parquet.schema.MessageType.accept(MessageType.java:55)
at org.apache.parquet.io.ColumnIOFactory.getColumnIO(ColumnIOFactory.java:162)
at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:135)
***
Btw,
I am using latest parquet-mr library i.e. 1.12.0
When i feed same file to parquet cpp library, it is able to decode it. So, I just want find out, is there any known issue in parquet-mr library w.r.t UUID?
-DevD

Related

Feature type rename failed after 9

I'm trying to upload shape file for geoserver and I'm getting this error when I upload the same feature type 10 times in to different datastores.
I'm getting this warning,
WARN [rest.catalog] - Feature type surface_zone_line-line already exists in namespace MyWorkSpace, attempting to rename
And In the next line this error is showing,
ERROR java.lang.RuntimeException: java.lang.IllegalArgumentException: Resource named 'surface_zone_line-line9' already exists in namespace: 'MyWorkSpace'
That renaming worked well up to 9 feature types but It didn't work for 10th.
Please help!
GeoServer Version 2.14.1

Cannot create Hive external table using jdbcStorageHandler

I am running a small cluster in Amazone EMR in order to play with Apache Hive 2.3.5. It is my understanding that Apache Hive can import data from a remote database and have the cluster to run queries. I was following an example that is provided in Apache Hive web documentation (https://cwiki.apache.org/confluence/display/Hive/JdbcStorageHandler) and created the following code:
CREATE EXTERNAL TABLE hive_table
(
col1 int,
col2 string,
col3 date
)
STORED BY 'org.apache.hive.storage.jdbc.JdbcStorageHandler'
TBLPROPERTIES (
'hive.sql.database.type'='POSTGRES',
'hive.sql.jdbc.driver'='org.postgresql.Driver',
'hive.sql.jdbc.url'='jdbc:postgresql://<url>/<dbname>',
'hive.sql.dbcp.username'='<username>',
'hive.sql.dbcp.password'='<password>',
'hive.sql.table'='<dbtable>',
'hive.sql.dbcp.maxActive'='1'
);
But I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: Property hive.sql.query is required.)
According to the documentation, I need to specify either “hive.sql.table” or “hive.sql.query” to tell how to get data from jdbc database. But if I replace hive.sql.table with hive.sql.query I get the following error:
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: No enum constant org.apache.hive.storage.jdbc.conf.DatabaseType.POSTGRES)
I tried looking in the web for a solution and it doesn't look like anyone experience the same issues that I am having. Do I need to modify a config file or am I missing something critical in my code?
I think you are using a version of the jar which doesn't support POSTGRES.
Download the latest jar from this link:
http://repo1.maven.org/maven2/org/apache/hive/hive-jdbc-handler/3.1.2/hive-jdbc-handler-3.1.2.jar
Put this downloaded jar into a hdfs location.
Run hive normally.
Run command: add jar ${HDFS_PATH_TO_DOWNLOADED_JAR}
Run your create table command

Required field 'uncompressed_page_size' was not found in serialized data! Parquet

I am getting below error while trying to save parquet file from local directory using pyspark.
I tried spark 1.6 and 2.2 both give same error
It display's schema properly but gives error at the time of writing file.
base_path = "file:/Users/xyz/Documents/Temp/parquet"
reg_path = "file:/Users/xyz/Documents/Temp/parquet/ds_id=48"
df = sqlContext.read.option( "basePath",base_path).parquet(reg_path)
out_path = "file:/Users/xyz/Documents/Temp/parquet/out"
df2 = df.coalesce(5)
df2.printSchema()
df2.write.mode('append').parquet(out_path)
org.apache.spark.SparkException: Task failed while writing rows
Caused by: java.io.IOException: can not read class org.apache.parquet.format.PageHeader: Required field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
In my own case, I was writing a custom Parquet Parser for Apache Tika and I experienced this error. It turned out that if the file is being used by another process, the ParquetReader will not be able to access uncompressed_page_size. Hence, causing the error.
Verify if other processes are not holding on to the file.
Temporary resolved by the spark config:
"spark.sql.hive.convertMetastoreParquet": "false"
Although it would has extra cost, but a walkaround approach by now.

CONNECT ERROR: Package file is invalid magento error

using everst theme and trying to upload megamenu extension and getting this error need help.....
CONNECT ERROR: Package file is invalid
Invalid package name, allowed: [a-zA-Z0-9_-] chars
Invalid version, should be like: x.x.x
Invalid stability
Invalid date, should be YYYY-DD-MM
Invalid channel URL
Empty authors section
Empty package contents section
what all you need to do is paste the extension key in link : http://freegento.com/ddl-magento-extension.php) and you can download the extension files. Than copy the extension files to root of the site
note : Take backup of files and database before doing any major changes

magento1.6.2 Invalid package name, allowed: [a-zA-Z0-9_-] chars

I have installed Magento 1.6.2 and I am trying to install paid extension using “Direct Package file Upload”.
I use Magento connect manager 1.5.0
(I installed free extensions with extension key, It worked.)
Then I tried uploading one of the paid extensions I purchased using “Direct Package file Upload” and got the following error
CONNECT ERROR: Package file is invalid
Invalid package name, allowed: [a-zA-Z0-9_-] chars
Invalid version, should be like: x.x.x
Invalid stability
Invalid date, should be YYYY-DD-MM
Invalid channel URL
Empty authors section
Empty package contents section
With another plugin I got:
File upload problem
or
CONNECT ERROR: Package file is invalid
Invalid channel URL
Appreciate if anyone has resolved this issue before.
Thanks

Resources