how to use yarn for running OLAP in janusgraph - janusgraph

I tried to use yarn to run OLAP and bulk loading.the version is janusgrapg-0.3.2,spark-2.2.1
In order to launch spark-yarn in janusgraph,I copy the jars of spark into $JANUSGRAPH_HOME/lib.
hadoop-graphson.properties:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat
gremlin.hadoop.inputLocation=data/tinkerpop-modern.json
gremlin.hadoop.outputLocation=output
gremlin.hadoop.jarsInDistributedCache=true
gremlin.vertexProgram=org.apache.tinkerpop.gremlin.process.computer.ranking.pagerank.PageRankVertexProgram
giraph.minWorkers=2
giraph.maxWorkers=2
spark.master=yarn
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoRegistrator
the command in gremlin console:
graph = GraphFactory.open('/Users/lwh/dev/janusgraph-0.3.2-SNAPSHOT-hadoop2/conf/hadoop-graph/hadoop-graphson.properties')
blvp = BulkLoaderVertexProgram.build().bulkLoader(OneTimeBulkLoader).writeGraph('/Users/lwh/dev/janusgraph-0.3.2-SNAPSHOT-hadoop2/conf/janusgraph-cql-es.properties').create(graph);graph.compute(SparkGraphComputer).program(blvp).submit().get()
the result in yarn:
Application application_1547728549460_0008 failed 2 times due to AM Container for appattempt_1547728549460_0008_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://lwhdeMacBook-Pro.local:8088/cluster/app/application_1547728549460_0008Then, click on links to logs of each attempt.
Diagnostics: Resource file:/Users/lwh/.sparkStaging/application_1547728549460_0008/__spark_libs__7746382379788050915.zip changed on src filesystem (expected 1547732288000, was 1547732288958
java.io.IOException: Resource file:/Users/lwh/.sparkStaging/application_1547728549460_0008/__spark_libs__7746382379788050915.zip changed on src filesystem (expected 1547732288000, was 1547732288958
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:255)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Failing this attempt. Failing the application.

maybe it is caused by version match, se the document here
I deploy the spark 2.0.2 together with janusgraph 0.3.2, and it always throws Exceptions, when I change to the spark version of 1.3.X, it work well!

Related

How to setup Jib container to authenticate with docker remote registry to pull images?

Hi using Quarkus with Jib extension and setting: quarkus.jib.base-jvm-image=azul/zulu-openjdk-alpine:11
The build fails with the below error.
I'm on Mac OS X, with Docker Desktop.
This seems to have happened after updating Docker Desktop
Running with sudo ./gradlew clean build --stacktrace -Dquarkus.container-image.build=true -Dquarkus.profile=dev works.
Checked with ./docker-credential-osxkeychain list and my credentials are listed.
Also looking at Docker Desktop I'm also logged in.
io.quarkus.builder.BuildException: Build failure: Build failed due to errors
[error]: Build step io.quarkus.container.image.jib.deployment.JibProcessor#buildFromJar threw an exception: java.lang.RuntimeException: Unable to create container image
at io.quarkus.container.image.jib.deployment.JibProcessor.containerize(JibProcessor.java:240)
at io.quarkus.container.image.jib.deployment.JibProcessor.buildFromJar(JibProcessor.java:166)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at io.quarkus.deployment.ExtensionLoader$3.execute(ExtensionLoader.java:925)
at io.quarkus.builder.BuildContext.run(BuildContext.java:277)
at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2449)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1478)
at java.base/java.lang.Thread.run(Thread.java:829)
at org.jboss.threads.JBossThread.run(JBossThread.java:501)
Caused by: com.google.cloud.tools.jib.api.RegistryAuthenticationFailedException: Failed to authenticate with registry registry-1.docker.io/azul/zulu-openjdk-alpine because: 401 Unauthorized
GET https://auth.docker.io/token?service=registry.docker.io&scope=repository:azul/zulu-openjdk-alpine:pull
{"details":"incorrect username or password"}
at com.google.cloud.tools.jib.registry.RegistryAuthenticator.authenticate(RegistryAuthenticator.java:305)
at com.google.cloud.tools.jib.registry.RegistryAuthenticator.authenticate(RegistryAuthenticator.java:257)
at com.google.cloud.tools.jib.registry.RegistryAuthenticator.authenticatePull(RegistryAuthenticator.java:176)
at com.google.cloud.tools.jib.registry.RegistryClient.doBearerAuth(RegistryClient.java:334)
at com.google.cloud.tools.jib.registry.RegistryClient.authPullByWwwAuthenticate(RegistryClient.java:393)
at com.google.cloud.tools.jib.builder.steps.PullBaseImageStep.call(PullBaseImageStep.java:177)
at com.google.cloud.tools.jib.builder.steps.PullBaseImageStep.call(PullBaseImageStep.java:69)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.google.cloud.tools.jib.http.ResponseException: 401 Unauthorized
GET https://auth.docker.io/token?service=registry.docker.io&scope=repository:azul/zulu-openjdk-alpine:pull
{"details":"incorrect username or password"}
at com.google.cloud.tools.jib.http.FailoverHttpClient.call(FailoverHttpClient.java:355)
at com.google.cloud.tools.jib.http.FailoverHttpClient.call(FailoverHttpClient.java:266)
at com.google.cloud.tools.jib.registry.RegistryAuthenticator.authenticate(RegistryAuthenticator.java:283)
... 12 more
Caused by: com.google.api.client.http.HttpResponseException: 401 Unauthorized
GET https://auth.docker.io/token?service=registry.docker.io&scope=repository:azul/zulu-openjdk-alpine:pull
{"details":"incorrect username or password"}
at com.google.api.client.http.HttpResponseException$Builder.build(HttpResponseException.java:293)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1118)
at com.google.cloud.tools.jib.http.FailoverHttpClient.call(FailoverHttpClient.java:349)
... 14 more
Not exactly sure how it came about. But I erased my ~/.docker/docker.config and it works. I think on older Docker Desktop the keychain auth had to be manually setup and I guess updating made things screwy.

Getting error while generating report in Sonarqube-9.4

I was scanning DotNet application using Sonarqube. Sonarscaner.MSBuild.exe begin, MSbuild and end, all 3 executed successfully in the same project root path. But while trying to access the report on Sonar server i get following error.
java.lang.IllegalStateException: Fail to extract report AYD0z6QfJAqVN7fqeTDx from database
at org.sonar.ce.task.projectanalysis.step.ExtractReportStep.execute(ExtractReportStep.java:73)
at org.sonar.ce.task.step.ComputationStepExecutor.executeStep(ComputationStepExecutor.java:80)
at org.sonar.ce.task.step.ComputationStepExecutor.executeSteps(ComputationStepExecutor.java:71)
at org.sonar.ce.task.step.ComputationStepExecutor.execute(ComputationStepExecutor.java:58)
at org.sonar.ce.task.projectanalysis.taskprocessor.ReportTaskProcessor.process(ReportTaskProcessor.java:75)
at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.executeTask(CeWorkerImpl.java:212)
at org.sonar.ce.taskprocessor.CeWorkerImpl$ExecuteTask.run(CeWorkerImpl.java:194)
at org.sonar.ce.taskprocessor.CeWorkerImpl.findAndProcessTask(CeWorkerImpl.java:160)
at org.sonar.ce.taskprocessor.CeWorkerImpl$TrackRunningState.get(CeWorkerImpl.java:135)
at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:87)
at org.sonar.ce.taskprocessor.CeWorkerImpl.call(CeWorkerImpl.java:53)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.util.zip.ZipException: invalid entry size (expected 1342191808 but got 14528 bytes)
at java.base/java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:400)
at java.base/java.util.zip.ZipInputStream.read(ZipInputStream.java:199)
at java.base/java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.sonar.api.internal.apachecommons.io.IOUtils.copyLarge(IOUtils.java:1309)
at org.sonar.api.internal.apachecommons.io.IOUtils.copy(IOUtils.java:978)
at org.sonar.api.internal.apachecommons.io.IOUtils.copyLarge(IOUtils.java:1282)
at org.sonar.api.internal.apachecommons.io.IOUtils.copy(IOUtils.java:953)
at org.sonar.api.utils.ZipUtils.copy(ZipUtils.java:152)
at org.sonar.api.utils.ZipUtils.unzipEntry(ZipUtils.java:102)
at org.sonar.api.utils.ZipUtils.unzip(ZipUtils.java:86)
at org.sonar.api.utils.ZipUtils.unzip(ZipUtils.java:63)
at org.sonar.ce.task.projectanalysis.step.ExtractReportStep.execute(ExtractReportStep.java:71)
... 19 more```
It seems there was a problem with transferring report from CLI to sonar server.
Scanning again and generating report seems to have solved the problem.

EMR - Sqoop import using hcatalog failing on EMR

I am using EMR - 6.4.0 (sqoop version 1.4.7 )
and using oozie workflow to import data from postgres into hive partitions using hcatalog,data is getting loaded into the table and partitions are getting created but job fails with the following error
Job commit failed: java.lang.UnsupportedOperationException: getTokenStrForm is not supported
at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.getTokenStrForm(GlueMetastoreClientDelegate.java:1630)
at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.getTokenStrForm(AWSCatalogMetastoreClient.java:611)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.invoke(HiveClientCache.java:590)
at com.sun.proxy.$Proxy109.getTokenStrForm(Unknown Source)
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.cancelDelegationTokens(FileOutputCommitterContainer.java:1012)
at org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:238)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I dont see any mention about this error or limitation in EMR docs or in web.
Importing directly to table directory is working but wanted to know why hcatalog option couldn't be used

Intellij IDEA cannot open Local Terminal java.util.concurrent.ExecutionException: Failed to start in linux

When I upgrade the Intellij-IDEA to latest version. I'm facing the issue to launch terminal in IDEA.
Cannot open Local Terminal
Failed to start [/bin/bash, --rcfile, <-path-to-intellij->/ideaIC-2020.1.1/idea-IC-201.7223.91/plugins/terminal/jediterm-bash.in, -i] in <-path-to-project->
Below are the idea logs.
2021-04-12 17:05:29,428 [3739060] INFO - .plugins.terminal.TerminalView - Activating Terminal tool window
2021-04-12 17:05:29,451 [3739083] INFO - erminal.AbstractTerminalRunner - Cannot open Local Terminal
java.util.concurrent.ExecutionException: Failed to start [/bin/bash, --rcfile, /<-path-to-intellij->/ideaIC-2020.1.1/idea-IC-201.7223.91/plugins/terminal/jediterm-bash.in, -i] in /<-path-to-project->
at org.jetbrains.plugins.terminal.LocalTerminalDirectRunner.createProcess(LocalTerminalDirectRunner.java:193)
at org.jetbrains.plugins.terminal.LocalTerminalDirectRunner.createProcess(LocalTerminalDirectRunner.java:46)
at org.jetbrains.plugins.terminal.AbstractTerminalRunner.lambda$openSessionInDirectory$6(AbstractTerminalRunner.java:243)
at com.intellij.util.RunnableCallable.call(RunnableCallable.java:20)
at com.intellij.util.RunnableCallable.call(RunnableCallable.java:11)
at com.intellij.openapi.application.impl.ApplicationImpl$1.call(ApplicationImpl.java:265)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.util.concurrent.Executors$PrivilegedThreadFactory$1$1.run(Executors.java:668)
at java.base/java.util.concurrent.Executors$PrivilegedThreadFactory$1$1.run(Executors.java:665)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/java.util.concurrent.Executors$PrivilegedThreadFactory$1.run(Executors.java:665)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: Exec_tty error:Unknown reason
at com.pty4j.unix.UnixPtyProcess.execInPty(UnixPtyProcess.java:219)
at com.pty4j.unix.UnixPtyProcess.<init>(UnixPtyProcess.java:59)
at com.pty4j.PtyProcessBuilder.start(PtyProcessBuilder.java:127)
at org.jetbrains.plugins.terminal.LocalTerminalDirectRunner.createProcess(LocalTerminalDirectRunner.java:180)
... 13 more
Below I'm adding the Version details.
IntelliJ IDEA 2021.1 (Community Edition)
Build #IC-211.6693.111, built on April 6, 2021

error in streaming twitter data to Hadoop using flume

I am using Hadoop-1.2.1 on Ubuntu 14.04
I am trying to stream data from twitter to HDFS by using Flume-1.6.0. I have downloaded flume-sources-1.0-SNAPSHOT.jar and included it in flume/lib folder. I have set path of flume-sources-1.0-SNAPSHOT.jar as FLUME_CLASSPATH in conf/flume-env.sh . This is my flume agent conf file:
#setting properties of agent
Twitter-agent.sources=source1
Twitter-agent.channels=channel1
Twitter-agent.sinks=sink1
#configuring sources
Twitter-agent.sources.source1.type=com.cloudera.flume.source.TwitterSource
Twitter-agent.sources.source1.channels=channel1
Twitter-agent.sources.source1.consumerKey=<consumer-key>
Twitter-agent.sources.source1.consumerSecret=<consumer Secret>
Twitter-agent.sources.source1.accessToken=<access Toekn>
Twitter-agent.sources.source1.accessTokenSecret=<acess Token Secret>
Twitter-agent.sources.source1.keywords= morning, night, hadoop, bigdata
#configuring channels
Twitter-agent.channels.channel1.type=memory
Twitter-agent.channels.channel1.capacity=10000
Twitter-agent.channels.channel1.transactionCapacity=100
#configuring sinks
Twitter-agent.sinks.sink1.channel=channel1
Twitter-agent.sinks.sink1.type=hdfs
Twitter-agent.sinks.sink1.hdfs.path=flume/twitter/logs
Twitter-agent.sinks.sink1.rollSize=0
Twitter-agent.sinks.sink1.rollCount=1000
Twitter-agent.sinks.sink1.batchSize=100
Twitter-agent.sinks.sink1.fileType=DataStream
Twitter-agent.sinks.sink1.writeFormat=Text
when I run this agent, I am getting an error like this:
15/06/22 14:14:49 INFO source.DefaultSourceFactory: Creating instance of source source1, type com.cloudera.flume.source.TwitterSource
15/06/22 14:14:49 ERROR node.PollingPropertiesFileConfigurationProvider: Unhandled error
java.lang.NoSuchMethodError: twitter4j.conf.Configuration.isStallWarningsEnabled()Z
at twitter4j.TwitterStreamImpl.<init>(TwitterStreamImpl.java:60)
at twitter4j.TwitterStreamFactory.<clinit>(TwitterStreamFactory.java:40)
at com.cloudera.flume.source.TwitterSource.<init>(TwitterSource.java:64)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:44)
at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:322)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:97)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
My flume/lib folder already has twitter4j-core-3.0.3.jar
How do I rectify this error?
I found a solution to this issue. As flume-sources-1.0-SNAPSHOT.jar and twitter4j-stream-3.0.3.jar contains the same FilterQuery.class, there arises a jar conflict. All twitter4j-3.x.x uses this class so it would be better to download twitter jars of version 2.2.6(twitter4j-core,twitter4j-stream,twitter4j-media-support) and replace 3.x.x with these newly downloaded jars under flume/lib directory.
Run the agent again and twitter data will be streamed to HDFS.
Change
Twitter-agent.sources.source1.type=com.cloudera.flume.source.TwitterSource
with
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource

Resources