EMR - Cannot create external Hive table using jdbcstoragehandler - hadoop

I am trying to create an external hive table on postgres.
My first error got resolved as per answer in below topic:
Cannot create Hive external table using jdbcStorageHandler
But I hit another issue:
java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException java.lang.IllegalArgumentException: No enum constant org.apache.hive.storage.jdbc.conf.DatabaseType.postgres)
Surprisingly could not find anything on this issue so far in any forums.
Anyone encountered this error on EMR and resolved?

I resolved it finally and posting answer in case it will help someone.
Root cause was the conflicting old version of same jar file left in hive lib directory. Hence it was not picking the new jar files and rather refering old one.
After I deleted the old jar, problem is resolved.

Related

Spark WebUI Application application_xyz not found

When I am trying to open the history of any of the spark job I am facing this issue "Application_id: Application application_xyz not found",
NOTE:
previously I figured out this as one of the spark history folder was full so this error occurred but now I don't remember how to do it?
Any Help is much appreciated.
Tobe able to access Spark UI after application has finished you need a separate history server.
Please start the server
$SPARK_HOME/sbin/start-history-server.sh
and follow configuration notes.

Flyway 3.0 Migration Checksum mismatch

after upgrading Flyway Maven plugin from 2.3 to 3.0 I get:
[ERROR] Failed to execute goal
org.flywaydb:flyway-maven-plugin:3.0:migrate (default-cli) on project
xxx: org.flywaydb.core.api.FlywayException: Validate failed. Found
differences between applied migrations and available migrations:
Migration Checksum mismatch for migration
V003__data_feed_sources_locations.sql: DB=942424992,
Classpath=1117634405 -> [Help 1]
Got a similar error on some other project.
If I downgrade back to 2.3 the migration runs ok. Does this has something to do with different platform encoding for calculating checksums?
Any workaround, or better yet, proper solution?
Flyway 3.0 changed the default of validateOnMigrate to true.
This is however a good thing, as in the spirit of fail fast, errors are discovered sooner.
In your case some scripts did change since they were applied, which is what Flyway is reporting.
You have two options:
suppress the error by setting validateOnMigrate to false (2.3 default behavior)
invoke Flyway.repair() to reallign the checksums
Reference
Flyway Repair
To add to Axel Fontaine's answer:
I was able to use mvn flyway:repair but I had to point the flyway.locations config property at the folder that contains my db migration scripts. Otherwise I would get the message "Repair of metadata table xyz.schema_version not necessary. No failed migration detected." like other folks mentioned.
I used mvn -Dflyway.locations=filesystem:<project dir>/src/main/resources/db/migrations flyway:repair and I saw the checksum updated in the metadata table, fixing my problem.
I found the easiest way to resolve this issue was to literally update the checksum in the schema table to the value flyway expected. I knew for a fact that my migration files had not changed and that the current state of the database was what it needed to be. I also did not want to spend time reading documentation and messing around with Flyway.repair() or other methods that could potentially mess things up even more. A simple sql update resolved it right away
First, it looks for checksum changes. These changes occur if we update migration files which are already applied to a db instance.
FlywayException: Validate failed: Migration checksum mismatch for migration version 18.2.6
-> Applied to database : 90181454
-> Resolved locally : 717386176
repair() method would fix up checksum issue by updating the flyway_schema_history table with local checksum value.
However, it would neglect updated statements in same migration file. So, new changes in same file would be neglected as there is already an entry for version in flyway_schema_history table. setValidateOnMigrate() method has no effect in this scenario. We should follow incremental approach, schema changes should be supplied through new files.
The issue happen right after I changed the V1_2__books.sql ddl file, There should be a better way to force flyway to recognize the new changes!!!
I tried to run mvn flyway:repair but it did not work, I ended up changing the schema url in the application.properties file [datasource.flyway.url] to books2
I removed the below files as well (books is my old schema name )
~ #~:rm books.mv.db
~ #~:rm -r books.trace.db
datasource.flyway.url=jdbc:h2:file:~/books2
datasource.flyway.username=sa
datasource.flyway.password=
datasource.flyway.driver-class-name=org.h2.Driver
I ran the application and BINGO :)
Just wanted to add, that in order for the checksum to be updated by repair. Flyway has to have access to the directory where all the migrations are. Otherwise flyway just goes about it's business and outputs
"Repair of failed migration in metadata table xyz.schema_version not necessary. No failed migration detected."
The checksum needs to be updated using the flyway repair command (run the Flyway command as stated in “Upgrade procedure” but replace “migrate” with “repair”).
I recommend you do not intrude directly to database, sql scripts, etc. It can be dangerous
Example:
./flyway repare -user=root -password=changeme -url=jdbc:mysql://localhost/mypath -table=my_flyway_schema_version_table -locations=filesystem:/mypath_sql_scripts
if you are running on local db u can delete the flyway_schema_history table
Invoke Flyway.repair() directly from configurations.
Add below bean to your configuration class or create a new class with #Congiguration annotation and add the below code.
#Bean
public FlywayMigrationStrategy repairFlyway() {
return flyway -> {
// repair each script's checksum
flyway.repair();
// before new migrations are executed
flyway.migrate();
};
}
There is yet another solution. You can drop your migration from schema_version table.

HDInsight Hive not finding SerDe jar in ADD JAR statement

I've uploaded json-serde-1.1.9.2.jar to the blob store with path "/lib/" and added
ADD JAR /lib/json-serde-1.1.9.2.jar
But am getting
/lib/json-serde-1.1.9.2.jar does not exist
I've tried it without the path and also provided the full url to the ADD JAR statement with the same result.
Would really appreciate some help on this, thanks!
If you don't include the scheme, then Hive is going to look on the local filesystem (you can see the code around line 768 of the source)
when you included the URI, make sure you use the full form:
ADD JAR wasb:///lib/json-serde-1.1.9.2.jar
If that still doesn't work, provide your updated command as well as some details about how you are launching the code. Are you RDP'd in to the cluster running via the Hive shell, or running remote via PowerShell or some other API?

Hadoop avro correct jar files issue

I'm writing my first Avro job that is meant to take an avro file and output text. I tried to reverse engineer it from this example:
https://gist.github.com/chriswhite199/6755242
I am getting the error below though.
Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected
I looked around and found it was likely an issue with what jar files are being used. I'm running CDH4 with MR1 and am using the jar files below:
avro-tools-1.7.5.jar
hadoop-core-2.0.0-mr1-cdh4.4.0.jar
hadoop-mapreduce-client-core-2.0.2-alpha.jar
I can't post code for security reasons but it shouldn't need anything not used in the example code. I don't have maven set up yet either so I can't follow those routes. Is there something else I can try to get around these issues?
Try using avro 1.7.3
AVRO-1170 bug

Cloudera Manager failed to format HDFS, topology.py.vm is missing

I encountered an error while adding a new service (Service Type = HDFS) using Cloudera Manager (Free Edition). The error message says as follows:
Could not create process: com.cloudera.cmf.service.config.ConfigFileSpec$GenerateException: Unable to process template:couldn't find the template hadoop/topology.py.vm
I checked /var/log/cloudera-scm-server/cloudera-scm-server.log and found a line like below.
org.apache.velocity.exception.ResourceNotFoundException: Unable to find resource '/WEB-INF/templates/hadoop/topology.py.vm'
I guess that a certain war file does not contain hadoop-metrics.properties.vm (Velocity template file?) although it should do and that this might be related to WHIRR-370.
Could you help me to solve this problem, please?
May I ask which version of Cloudera Manager is being used? Does this error occurred just after you try to add add the service of after some time when the service is added?
Based on the error, it seems some of the configuration is missing that why service addition failed. So I would like to know how did you install Hadoop on this cluster?
If you download the virtual machine and compare from your installation, you can compare the folder for completeness and missing content. It does work for me always.

Resources