I can not solve GCS bucket permission issue when submitting job to Dataproc.
Here is what I'm doing:
Created a project
Created a bucket xmitya-test
Created a cluster:
gcloud dataproc clusters create cascade --bucket=xmitya-test \
--master-boot-disk-size=80G --master-boot-disk-type=pd-standard \
--num-master-local-ssds=0 --num-masters=1 \
--num-workers=2 --num-worker-local-ssds=0 \
--worker-boot-disk-size=80G --worker-boot-disk-type=pd-standard \
--master-machine-type=n1-standard-2 \
--worker-machine-type=n1-standard-2 \
--zone=us-west1-a --image-version=1.3 \
--properties 'hadoop-env:HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/etc/tez/conf:/usr/lib/tez/*:/usr/lib/tez/lib/*'
Uploaded job jar: /apps/wordcount.jar and library /apps/lib/commons-collections-3.2.2.jar
Then submit a job with jar in classpath:
gcloud dataproc jobs submit hadoop --cluster=cascade \
--jar=gs:/apps/wordcount.jar \
--jars=gs://apps/lib/commons-collections-3.2.2.jar --bucket=xmitya-test \
-- gs:/input/url+page.200.txt gs:/output/wc.out local
Then I'm getting forbidden error accessing the library file:
java.io.IOException: Error accessing: bucket: apps, object: lib/commons-collections-3.2.2.jar
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.wrapException(GoogleCloudStorageImpl.java:1957)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1983)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1870)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfo(GoogleCloudStorageFileSystem.java:1156)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1058)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:363)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:314)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2375)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2344)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.copyToLocalFile(GoogleHadoopFileSystemBase.java:1793)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2320)
at com.google.cloud.hadoop.services.agent.util.HadoopUtil.download(HadoopUtil.java:70)
at com.google.cloud.hadoop.services.agent.job.AbstractJobHandler.downloadResources(AbstractJobHandler.java:448)
at com.google.cloud.hadoop.services.agent.job.AbstractJobHandler$StartDriver.call(AbstractJobHandler.java:579)
at com.google.cloud.hadoop.services.agent.job.AbstractJobHandler$StartDriver.call(AbstractJobHandler.java:568)
at com.google.cloud.hadoop.services.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.cloud.hadoop.services.repackaged.com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
at com.google.cloud.hadoop.services.repackaged.com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "714526773712-compute#developer.gserviceaccount.com does not have storage.objects.get access to apps/lib/commons-collections-3.2.2.jar.",
"reason" : "forbidden"
} ],
"message" : "714526773712-compute#developer.gserviceaccount.com does not have storage.objects.get access to apps/lib/commons-collections-3.2.2.jar."
}
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:150)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:401)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1097)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:499)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
at com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1978)
... 23 more
Tried set read permission from browser to 714526773712-compute#developer.gserviceaccount.com user and set public permissions to all files: gsutil defacl ch -u AllUsers:R gs://xmitya-test and gsutil acl ch -d allUsers:R gs://xmitya-test/** - no effect.
What could be the reason?
Thanks!
It's complaining about access to apps, input and output buckets, that you specified in parameters of job submission command:
gcloud dataproc jobs submit hadoop --cluster=cascade --jar=gs:/apps/wordcount.jar --jars=gs://apps/lib/commons-collections-3.2.2.jar --bucket=xmitya-test gs:/input/url+page.200.txt gs:/output/wc.out local
To fix this issue you need to grant access to these buckets or if these are folders inside xmitya-test bucket then you need to specify it explicitly in the path: gs://xmitya-test/apps/wordcount.jar.
Related
I'm trying to deploy liferay ce 7.4 in kubernetes and I can't connect elasticsearch 7.14.0. I get the following error:
2022-03-19 20:06:29.375 ERROR [main][ElasticsearchEngineConfigurator:93] bundle com.liferay.portal.search.elasticsearch7.impl:6.0.30 (1134)[com.liferay.portal.search.elasticsearch7.internal.ElasticsearchEngineConfigurator(3789)] : The activate method has thrown an exception
java.lang.RuntimeException: org.elasticsearch.ElasticsearchException: ElasticsearchException[java.util.concurrent.ExecutionException: java.net.ConnectException: Timeout connecting to [search/10.110.10.150:9200]]; nested: ExecutionException[java.net.ConnectException: Timeout connecting to [search/10.110.10.150:9200]]; nested: ConnectException[Timeout connecting to [search/10.110.10.150:9200]];
at org.elasticsearch.client.RestHighLevelClient.performClientRequest(RestHighLevelClient.java:2078) ~[?:?]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1732) ~[?:?]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1702) ~[?:?]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1672) ~[?:?]
at org.elasticsearch.client.ClusterClient.health(ClusterClient.java:119) ~[?:?]
at com.liferay.portal.search.elasticsearch7.internal.search.engine.adapter.cluster.HealthClusterRequestExecutorImpl._getClusterHealthResponse(HealthClusterRequestExecutorImpl.java:112) ~[?:?]
I have verified that the elasticsearch is correctly deployed by running: kubectl port-forward search-59fcc9c4f6-brhcv 9200
My file com.liferay.portal.search.elasticsearch7.configuration.ElasticsearchConfiguration.config:
additionalConfigurations=""
additionalIndexConfigurations=""
additionalTypeMappings=""
authenticationEnabled="false"
bootstrapMlockAll="false"
clusterName="LiferayElasticsearchCluster"
discoveryZenPingUnicastHostsPort="9300-9400"
embeddedHttpPort="9200"
httpCORSAllowOrigin="/https?:\\/\\/localhost(:[0-9]+)?/"
httpCORSConfigurations=""
httpCORSEnabled="true"
httpSSLEnabled="false"
indexNamePrefix="liferay-"
indexNumberOfReplicas=""
indexNumberOfShards=""
logExceptionsOnly="true"
networkBindHost=""
networkHost=""
networkHostAddresses=[ \
"", \
]
networkPublishHost=""
nodeName=""
operationMode="REMOTE"
overrideTypeMappings=""
productionModeEnabled="true"
proxyHost=""
proxyPort="0"
proxyUserName=""
remoteClusterConnectionId="RemoteElasticSearchCluster"
restClientLoggerLevel="ERROR"
sidecarDebug="false"
sidecarDebugSettings="-agentlib:jdwp\=transport\=dt_socket,address\=8001,server\=y,suspend\=y,quiet\=y"
sidecarHeartbeatInterval="10000"
sidecarHome="elasticsearch7"
sidecarHttpPort=""
sidecarJVMOptions=[ \
"-Xms1g", \
"-Xmx1g", \
"-XX:+AlwaysPreTouch", \
]
sidecarShutdownTimeout="10000"
trackTotalHits="true"
transportTcpPort=""
truststorePath="/path/to/localhost.p12"
truststoreType="pkcs12"
username="elastic"
And my file com.liferay.portal.search.elasticsearch7.configuration.ElasticsearchConnectionConfiguration.config:
active="true"
authenticationEnabled="false"
connectionId="RemoteElasticSearchCluster"
httpSSLEnabled="false"
networkHostAddresses=[ \
"search:9200" \
]
proxyHost=""
proxyPort="0"
proxyUserName=""
truststorePath="/path/to/localhost.p12"
truststoreType="pkcs12"
username="elastic"
To configure the elasticsearch connector I followed the page: http://www.liferaysavvy.com/2021/07/configure-remote-elasticsearch-cluster.html and https://liferay.dev/blogs/-/blogs/deploying-liferay-7-3-in-kubernetes
Someone could help me?
Thanks in advance.
Below is how i'm creating my dataproc cluster, while formulating properties i'm taking care of the network timeout by assigning 3600s but despite of that the executor's heartbeat timed out after 125009ms. Why is this happening and what can be done to avoid this?
default_parallelism=512
PROPERTIES="\
spark:spark.executor.cores=2,\
spark:spark.executor.memory=8g,\
spark:spark.executor.memoryOverhead=2g,\
spark:spark.driver.memory=6g,\
spark:spark.driver.maxResultSize=6g,\
spark:spark.kryoserializer.buffer=128m,\
spark:spark.kryoserializer.buffer.max=1024m,\
spark:spark.serializer=org.apache.spark.serializer.KryoSerializer,\
spark:spark.default.parallelism=${default_parallelism},\
spark:spark.rdd.compress=true,\
spark:spark.network.timeout=3600s,\
spark:spark.rpc.message.maxSize=256,\
spark:spark.io.compression.codec=snappy,\
spark:spark.shuffle.service.enabled=true,\
spark:spark.sql.shuffle.partitions=256,\
spark:spark.sql.files.ignoreCorruptFiles=true,\
yarn:yarn.nodemanager.resource.cpu-vcores=8,\
yarn:yarn.scheduler.minimum-allocation-vcores=2,\
yarn:yarn.scheduler.maximum-allocation-vcores=4,\
yarn:yarn.nodemanager.vmem-check-enabled=false,\
capacity-scheduler:yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
"
gcloud beta dataproc clusters create $CLUSTER_NAME \
--zone $ZONE \
--region $REGION \
--master-machine-type n1-standard-4 \
--master-boot-disk-size 500 \
--worker-machine-type n1-standard-4 \
--worker-boot-disk-size 500 \
--num-workers 3 \
--bucket $GCS_BUCKET \
--image-version 1.4-ubuntu18 \
--optional-components=ANACONDA,JUPYTER \
--subnet=default \
--enable-component-gateway \
--scopes 'https://www.googleapis.com/auth/cloud-platform'
Below is the error i'm getting:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 11, cluster-abc-z-2.c.project_name.internal, executor 5): ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 125009 ms
You should be setting spark.executor.heartbeatInterval. Default value for it is 10s.
https://spark.apache.org/docs/latest/configuration.html
I am writing a Jenkins Pipeline job for setting up AWS infrastructure using API calls to our in-house AWS CLI wrapper library. Running the raw bash scripts on a CentOS box or as a Jenkins Freestyle job runs fine. However, it fails in the context of a Pipeline job. I think that the quotes may need to be different for the Pipeline job but I am not sure how.
After further investigation, I found that the curl command returns the wrong response from the service when running the scripts within a Jenkins Pipeline job.
pipeline {
agent any
stages {
stage('Checkout code from Git'){
steps {
echo "Checkout code from a GitHub repository"
// Checkout code from a GitHub repository
checkout([$class: 'GitSCM', branches: [[name: '*/master']], doGenerateSubmoduleConfigurations: false, extensions: [[$class: 'SubmoduleOption', disableSubmodules: false, parentCredentials: false, recursiveSubmodules: true, reference: '', trackingSubmodules: false]], submoduleCfg: [], userRemoteConfigs: [[credentialsId: 'xxxx', url: 'git#github.com:bbc/repo.git']]])
}
}
stage('Call our internal AWS CLI Wrapper System API to perform an ACTION on a specified ENVIRONMENT') {
steps {
script {
if("${params.ENVIRONMENT}" == 'int' && "${params.ACTION}" == 'create'){
echo "ENVIRONMENT=${params.ENVIRONMENT}, ACTION=${params.ACTION}"
echo ""
sh '''#!/bin/bash
# Create Neptune Cluster for the Int environment
cd blah-db
echo "Current working directory is $PWD"
CLOUD_FORMATION_FILE=$PWD/infrastructure/templates/neptune-cluster.json
echo "The CloudFormation file to operate on is $CLOUD_FORMATION_FILE"
echo "Running jq to transform the source CloudFormation file"
template=$(jq -M '.Parameters.Env.Default="int"' $CLOUD_FORMATION_FILE)
echo "Echoing the transformed CloudFormation file: \n$template"
echo "Running curl to make the http request to our internal AWS CLI Wrapper System"
curl -d "{\"aws_account\": \"1111111111\", \"region\": \"us-east-1\", \"name_suffix\": \"cluster\", \"template\": $template}" \
-H 'Content-Type: application/json' -H 'Accept: application/json' https://base.api.url/v1/services/blah-neptune/int/stacks \
--cert /path/to/client/certificate/client.crt --key /path/to/client/private-key/client.key
cd ..
pwd
# Set a timer to run for 300 seconds or 5 minutes to create a delay to allow for the Neptune Cluster to be fully provisioned first before adding instances to it.
'''
}
}
}
}
}
}
The actual result that I get from making the API call:
{"error": "Invalid JSON. Expecting property name: line 1 column 1 (char 1)"}
try change the curl as following:
curl -d '{"aws_account": "1111111111", "region": "us-east-1", "name_suffix": "cluster", "template": $template}'
Or assign the whole cmd to a variable and print it out to see it's as your wanted or not.
cmd = '''#!/bin/bash
cd blah-db
...
'''
echo cmd // compare the output string to the cmd of freestyle job.
sh cmd
I want to pull down call data from RingCentral using a shell script and curl. I'm then going to put that into ELK to build a dashboard using Kibana. However, I don't know what I'm doing with the API. Does anyone have a place for me to start or some sample code to do this?
I'm currently struggling with just using curl to authenticate to get a token. At the moment I keep getting unsupported grant type. I setup the application in Sandbox and a "Server Only No UI".
I have run this from a Centos 7 box using a bash shell.
Here is the code have tried:
curl -X POST "https://platform.devtest.ringcentral.com/restapi/oauth/token"; \
-H "Accept: application/json" \
-H "Content-Type: application/x-www-form-urlencoded" \
-u "my client id:my client secret" \
-d "username=username&password=password&extension=<extension>&grant_type=password"
I left the username and password blank because I wasn't sure what that was.
The output is as follows:
{
"error" : "invalid_request",
"error_description" : "Unsupported grant type",
"errors" : [ {
"errorCode" : "OAU-250",
"message" : "Unsupported grant type"
} ]
}./rctest1.sh: line 2: -H: command not found
I've been able to reproduce your error and can resolve it by removing the semi-colon (;) after the URL in your command.
Explanation
A semi-colon creates two separate CLI commands instead of one, so in your call, you have two requests.
- Your Request 1
$ curl -X POST "https://platform.devtest.ringcentral.com/restapi/oauth/token"
- Your Request 2
$ -H "Accept: application/json" \
-H "Content-Type: application/x-www-form-urlencoded" \
-u "my client id:my client secret" \
-d "username=username&password=password&extension=&grant_type=password"
- Your Response 1
{
"error" : "invalid_request",
"error_description" : "Unsupported grant type",
"errors" : [ {
"errorCode" : "OAU-250",
"message" : "Unsupported grant type"
} ]
}
- Your Response 2
./rctest1.sh: line 2: -H: command not found
- Test Command
Here's a simple test showing the OS trying to process two commands:
$ hello;world
-bash: hello: command not found
-bash: world: command not found
Solution
- Working Request
Here is a working request without the semi-colon:
$ curl -X POST "https://platform.devtest.ringcentral.com/restapi/oauth/token" \
-H "Accept: application/json" \
-H "Content-Type: application/x-www-form-urlencoded" \
-u "my client id:my client secret" \
-d "username=username&password=password&extension=&grant_type=password"
- Working Response
Here is the working response:
{
"access_token" : "myAccessToken",
"token_type" : "bearer",
"expires_in" : 3600,
"refresh_token" : "myRefreshToken",
"refresh_token_expires_in" : 604800,
"scope" : "Meetings VoipCalling Glip SubscriptionWebhook Faxes Contacts RingOut SMS",
"owner_id" : "11111111",
"endpoint_id" : "22222222"
}
I tried to setup a persistent data store for REST server but was unable to do it.I am posting the steps which I have followed to do it.
Steps which I followed to set a persistent data store for REST server.
Started an instance of MongoDB:
root#ubuntu:~# docker run -d --name mongo --network composer_default -p 27017:27017 mongo
dda3340e4daf7b36a244c5f30772f50a4ee1e8f81cc7fc5035f1090cdcf46c58
Created a new, empty directory. Created a new file named Dockerfile the new directory, with the following contents:
FROM hyperledger/composer-rest-server
RUN npm install --production loopback-connector-mongodb passport-github && \
npm cache clean && \
ln -s node_modules .node_modules
Changed into the directory created in step 2, and build the Docker image:
root#ubuntu:~# cd examples/dir/
root#ubuntu:~/examples/dir# ls
Dockerfile ennvars.txt
root#ubuntu:~/examples/dir# docker build -t myorg/my-composer-rest-server .
Sending build context to Docker daemon 4.096 kB
Step 1/2 : FROM hyperledger/composer-rest-server
---> 77cd6a591726
Step 2/2 : RUN npm install --production loopback-connector-couch passport-github && npm cache clean && ln -s node_modules .node_modules
---> Using cache
---> 2ff9537656d1
Successfully built 2ff9537656d1
root#ubuntu:~/examples/dir#
Created file named ennvars.txt in the same directory.
The contents are as follows:
COMPOSER_CONNECTION_PROFILE=hlfv1
COMPOSER_BUSINESS_NETWORK=blockchainv5
COMPOSER_ENROLLMENT_ID=admin
COMPOSER_ENROLLMENT_SECRET=adminpw
COMPOSER_NAMESPACES=never
COMPOSER_SECURITY=true
COMPOSER_CONFIG='{
"type": "hlfv1",
"orderers": [
{
"url": "grpc://localhost:7050"
}
],
"ca": {
"url": "http://localhost:7054",
"name": "ca.example.com"
},
"peers": [
{
"requestURL": "grpc://localhost:7051",
"eventURL": "grpc://localhost:7053"
}
],
"keyValStore": "/home/ubuntu/.hfc-key-store",
"channel": "mychannel",
"mspID": "Org1MSP",
"timeout": "300"
}'
COMPOSER_DATASOURCES='{
"db": {
"name": "db",
"connector": "mongodb",
"host": "mongo"
}
}'
COMPOSER_PROVIDERS='{
"github": {
"provider": "github",
"module": "passport-github",
"clientID": "a88810855b2bf5d62f97",
"clientSecret": "f63e3c3c65229dc51f1c8964b05e9717bf246279",
"authPath": "/auth/github",
"callbackURL": "/auth/github/callback",
"successRedirect": "/",
"failureRedirect": "/"
}
}'
Loaded the env variables by the following command.
root#ubuntu:~/examples/dir# source ennvars.txt
Started the docker container by the below command
root#ubuntu:~/examples/dir# docker run \
-d \
-e COMPOSER_CONNECTION_PROFILE=${COMPOSER_CONNECTION_PROFILE} \
-e COMPOSER_BUSINESS_NETWORK=${COMPOSER_BUSINESS_NETWORK} \
-e COMPOSER_ENROLLMENT_ID=${COMPOSER_ENROLLMENT_ID} \
-e COMPOSER_ENROLLMENT_SECRET=${COMPOSER_ENROLLMENT_SECRET} \
-e COMPOSER_NAMESPACES=${COMPOSER_NAMESPACES} \
-e COMPOSER_SECURITY=${COMPOSER_SECURITY} \
-e COMPOSER_CONFIG="${COMPOSER_CONFIG}" \
-e COMPOSER_DATASOURCES="${COMPOSER_DATASOURCES}" \
-e COMPOSER_PROVIDERS="${COMPOSER_PROVIDERS}" \
--name rest \
--network composer_default \
-p 3000:3000 \
myorg/my-composer-rest-server
942eb1bfdbaf5807b1fe2baa2608ab35691e9b6912fb0d3b5362531b8adbdd3a
It got executed successfully. So now I should be able to access the persistent and secured REST server by going to explorer page of loopback
But when tried to open the above url got the below error.
Error Image
Have I missed any step or done something wrong.
Two things:
You need to put export in front of the envvars in your envvars.txt file.
Check the version of Composer you are running. The FROM hyperledger/composer-rest-server command will pull the latest version of the rest server down, and if your composer version is not updated, the two will be incompatible.