Jar not found error while while trying to deploy SCDF Stream - spring

I registered the sink first as follows:
app register --name mysink --type sink --uri file:///Users/swatikaushik/Downloads/kafkaStreamDemo/target/kafkaStreamDemo-0.0.1-SNAPSHOT.jar
Then I created a stream
stream create --definition “:myKafkaTopic > mysink" --name myStreamName --deploy
I got the error
Command failed org.springframework.cloud.dataflow.rest.client.DataFlowClientException: File
/Users/swatikaushik/Downloads/kafkaStreamDemo/target/kafkaStreamDemo-0.0.1-SNAPSHOT.jar must exist
While the jar exists!!

I'v followed the maven local repository mounting approach, using the docker compose, hope this helps:
Maven:
mvn clean install
Setup your environment variables:
$Env:DATAFLOW_VERSION="2.5.1.RELEASE"
$Env:SKIPPER_VERSION="2.4.1.RELEASE"
$Env:HOST_MOUNT_PATH="C:\Users\yourUserName\.m2"
$Env:DOCKER_MOUNT_PATH="/root/.m2/"
Restart/start the containers:
docker-compose down
docker-compose up
Register your apps:
app register --type sink--name mysink --uri maven://groupId:artifactId:version
Register Doc

File permission is one thing - please double check as advised.
A few other ideas:
1) Run app info sink:mysink. If the JAR is actually available, it should return with a list of Boot/Whitelisted properties of the Application.
2) Run the Jar standalone. Make sure it actually starts via java -jar.....
3) The stream definition appear to include a special character (“:myKafkaTopic > mysink" instead of ":myKafkaTopic > mysink" - notice the “ character); it would fail in the Shell, but it looks like you were able to deploy it. A full stacktrace would help.

We just had the same error as described above.
We had mounted the folder of the jar files to the skipper.
The solution was, that we had to mount the jars to the data-flow server docker container as well.
Skipper is deploying it, but dataflow server registers it.

Related

Flink on GCP No FileSystem for scheme: gs

I've been trying to use Flink on GCP (https://github.com/spotify/flink-on-k8s-operator) but there is a problem with google cloud storage access.
So, I've just followed the steps that explained here (https://github.com/spotify/flink-on-k8s-operator/blob/master/images/flink/README.md)
So, I've created a docker image like;
ARG GCS_CONNECTOR_VERSION=latest-hadoop2
ARG FLINK_HADOOP_VERSION=2.8.3-10.0
ARG GCS_CONNECTOR_NAME=gcs-connector-${GCS_CONNECTOR_VERSION}.jar
ARG GCS_CONNECTOR_URI=https://storage.googleapis.com/hadoop-lib/gcs/${GCS_CONNECTOR_NAME}
ARG FLINK_HADOOP_JAR_NAME=flink-shaded-hadoop-2-uber-${FLINK_HADOOP_VERSION}.jar
ARG FLINK_HADOOP_JAR_URI=https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/${FLINK_HADOOP_VERSION}/${FLINK_HADOOP_JAR_NAME}
RUN echo "Downloading ${GCS_CONNECTOR_URI}" && \
wget -q -O /opt/flink/lib/${GCS_CONNECTOR_NAME} ${GCS_CONNECTOR_URI}
RUN echo "Downloading ${FLINK_HADOOP_JAR_URI}" && \
wget -q -O /opt/flink/lib/${FLINK_HADOOP_JAR_NAME} ${FLINK_HADOOP_JAR_URI}
I can see the jars on task manager and job manager's lib folder after deploying job, but task manager throws error like;
org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: Could not find a file system implementation for scheme 'gs'. The scheme is not directly supported by Flink and no Hadoop file system to support this scheme could be loaded. For a full list of supported file systems, please see https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/.
The interesting thing here is that the task manager throws an error but I can see the base path that should be created for the checkpoint on GCS successfully. For example;
I gave gs://bucket/flink/job/checkpoint config for checkpoint, i can see this folder after deploying but of course there is no data inside.
What can the problem be?
You should check the official GCS connector docs. Basically you need to copy the optional gcs plugin under the plugins directory to make it available to Flink in your container image.
In adittion to this I recommend you check out the recently added Flink Kubernetes Operator project which should provide you some benefits over your current setup and improve integration with newer Flink versions.

Netflix Conductor WorkflowStatusListener

I'm using Netflix Conductor over Rest API. I'm able to create a workflow and run it but I would like to know how to use the workflowStatusListener feature.
I'm running Conductor on my localhost with Docker and I saw that the server is a simple jar, possibly a SpringBoot app. So, how to pass my on jar with my Listener or Simple Tasks in this scenario?
I found how to deploy it using the Docker image.
I copied the folder /app from my docker container, changed the startup.sh script and mounted my local folder.
I copied my jar into /app/libs
java -cp libs/*.jar com.netflix.conductor.bootstrap.Main $config_file $log4j_file

How to register a Spring batch in Spring Cloud Data Flow

I don't understand how to register an app. I followed a lot of guides and they use this example to explain it:
dataflow:>app register --name fileIngest --type task --uri file:///path/to/target/ingest-X.X.X.jar
My jar is in "C:\Temp" but if i set the uri: file:///Temp/myjar-0.0.1-SNAPSHOT.jar
i have this error:
java.lang.IllegalArgumentException: File /Temp/myjar-0.0.1-SNAPSHOT.jar must exist
Can someone explain me how to run a local batch with Spring Cloud Data Flow in local?
I understood how to do it. In docker-compose.yml i set the path in skipper-server and dataflow-server like this:
image: springcloud/spring-cloud-dataflow-server:${DATAFLOW_VERSION:?DATAFLOW_VERSION is not set!}
container_name: dataflow-server
volumes: - 'C:/Temp:/root/apps'
"Then the right way to register the app is: "
app register --name 'mybatch' --type task --uri file:///root/apps/myjar-0.0.1-SNAPSHOT.jar
What you tried is intended to be used in the Unix box, but for Windows, you'd have to point to the file with a different namespace pattern.
Perhaps try this:
app register --name fileIngest --type task --uri file:/C:/Temp/myjar-0.0.1-SNAPSHOT.jar

Docker container with two jar files , run on demand not as entry point

In my use case I would like to have two jar files in the containers. In typical docker image, I see an entry point, which basically starts the jar file. In my case, I will not know which program to be started, till the time the container is getting used in the K8s services. In my example, I have a jar file that applies the DDLs and the second Jar file is my application. I want the k8s to deploy my DDL application first and upon completion it will deploy my spring boot application (from a different jar but from same container ) next. There by I cannot give an entry point for my container, rather I need to run the specific jar file using command and argument from my yaml file. In all the examples I have come across, I see an entry point being used to start my java process.
The difference here from the post referred here is- i want to have the container to have two jar files and when I load the container through k8s, I want to decide which program to run from command prompt. One option I am exploring is to have a parametrized shell script, so I can pass the jar name as parameter and the shell will run java -jar . I will update here once I find something
solution update
Add two jars in the docker file and have a shell script that uses parameter. Use the below sample to invoke the right jar file form the K8s yaml file
spec: containers:
- image: URL
imagePullPolicy: Always name: image-name
command: ["/bin/sh"]
args: ["-c", "/home/md/javaCommand.sh jarName.jar"]
ports: - containerPort: 8080
name: http
A docker image doesn't have to run a java jar when starting, it has to run something.
You can simply make this something a bash script that will make these decisions and start the jar you like
Try to add the per-requisites in the Init Containers while deploying it to kubernetes and in the regular container you can place your application, it will make DDLs container to be initialized first and then the following application container can be executed.

Where to set env variables for local Spring Cloud Dataflow?

For development, I'm using the local Spring Cloud Dataflow server on my Mac, though we plan to deploy to a Kubernetes cluster for integration testing and production. The SCDF docs say you can use environment variables to configure various things, like database configuration. I'd like my registered app to use these env variables, but they don't seem to be able to see them. That is, I start the SCDF server by running its jar from a terminal window, which can see a set of environment variables. I then configure a stream using some Spring Cloud stream starter apps and one custom Spring Boot app. I have the custom app logging System.getenv() and it's not showing the env variables I need. I set them in my ~/.bashrc file, which I also source from ~/.bash_profile. That works for my terminal windows and most other things that need environment, but not here. Where should I be defining them?
To the points in the first answer and comments, they sound good, but nothing works for me. I have an SQS Source that get's its connection via:
return AmazonSQSAsyncClientBuilder.standard()
.withRegion(Regions.US_WEST_2.getName()))
.build();
When I deploy to a Minikube environment, I edit the sqs app's deployment and set the AWS credentials in the env section. Then it works. For a local deployment, I've now tried:
stream deploy --name greg1 --properties "deployer.sqs.AWS_ACCESS_KEY_ID=<id>,deployer.sqs.AWS_SECRET_ACCESS_KEY=<secret>"
stream deploy --name greg1 --properties "deployer.sqs.aws_access_key_id=<id>,deployer.sqs.aws_secret_access_key=<secret>"
stream deploy --name greg1 --properties "app.sqs.AWS_ACCESS_KEY_ID=<id>,app.sqs.AWS_SECRET_ACCESS_KEY=<secret>"
stream deploy --name greg1 --properties "app.sqs.aws_access_key_id=<id>,app.sqs.aws_secret_access_key=<secret>"
All fail with the error message I get when credentials are wrong, which is, "The specified queue does not exist for this wsdl version." I've read the links, and don't really see anything else to try. Where am I going wrong?
You can pass environment variables to the apps that are deployed via SCDF using application properties or deployment properties. Check the docs for a description of each type.
For example:
dataflow:> stream deploy --name ticktock --properties "deployer.time.local.javaOpts=-Xmx2048m -Dtest=foo"

Resources