How does load balancing in Spring XD get done?

How does load balancing in Spring XD get done? - spring-xd

I've got Sring Xd running in distributed mode and now beginning to run a few JMeter tests.
My question is around how the load generated gets distributed across containers in distributed mode.
If I generate 1000 messages each for 100 users, I'd like thiis traffic to be split between two or more containers running.
Is this possible? Or does one container take on the entire load? In my current setup this is what seems to be happening.

What is the stream definition? And, specifically, what is the source module?
If it's an http source, you will need a load balancer (as with any http application). You can use a hardware balancer or software, such as Apache (mod_proxy etc).
EDIT: I mentioned the deployment manifest in a comment below.
Deployment is different to stream definition. This is about how to deploy a stream that has been defined.
See the Reference Guide for information about the deployment manifest. That link is for the M7 document, the current documentation can be found on the Wiki (use the browser's 'find' feature to search for Deployment Manifest).
For this stream...
xd:>stream create test1 --definition "http | transform --expression=payload.toUpperCase() | log"
you can get 3 instances of transform using
xd:>stream deploy --name test1 --properties "module.transform.count=3"

Related

Creating a Simple Hello World app in Kubernetes

Most software tech has a "Hello World" type example to get started on. With Kubernetes this seems to be lacking.
My scenario cannot be simpler. I have a simple hello world app made with Spring-Boot with one Rest controller that just returns: "Hello Hello!"
After I create my docker file, I build an image like this :
docker build -t helloworld:1.0 .
Then I run it in a container like this :
docker run -p 8080:8080 helloworld:1.0
If I open up a browser now, I can access my application here :
http://localhost:8080/hello/
and it returns :
"Hello Hello!"
Great! So far so good.
Next I tag it (my docker-hub is called ollyw123, and the ID of my image is 776...)
docker tag 7769f3792278 ollyw123/helloworld:firsttry
and push :
docker push ollyw123/helloworld
If I log into Docker-Hub I will see
Now I want to connect this to Kubernetes. This is where I have plunged deep into the a state of confusion.
My thinking is, I need to create a cluster. Somehow I need to connect this cluster to my image, and as I understand, I just need to use the URL of the image to connect to (ie.
https://hub.docker.com/repository/docker/ollyw123/helloworld)
Next I would have to create a service. This service would then be able to expose my "Hello World!" rest call through some port. This is my logical thinking, and for me this would seem like a very simple thing to do, but the tutorials and documentation on Kubernetes is a mine field of confusion and dead ends.
Following on from the spring-boot kubernetes tutorial (https://spring.io/guides/gs/spring-boot-kubernetes/) I have to create a deployment object, and then a service object, and then I have to "apply" it :
kubectl create deployment hello-world-dep --image=ollyw123/helloworld --dry-run -o=yaml > deployment.yaml
kubectl create service clusterip hello-world-dep --tcp=8080:8080 --dry-run -o=yaml >> deployment.yaml
kubectl apply -f deployment.yaml
OK. Now I see a service :
But now what???
How do I push this to the cloud? (eg. gcloud) Do I need to create a cluster first, or is this already a cluster?
What should my next step be?

There are a couple of concepts that we need to go through regarding your question.
The first would be about the "Hello World" app in Kubernetes. Even this existing (as mentioned by Limido in the comments [link]), the app itself is not a Kubernetes app, but an app created in the language of your choice, which was containerized and it is deployed in Kubernetes.
So I would call it (in your case) a Dockerized SpringBoot HelloWorld app.
Okay, now that we have a container we could simply deploy it running docker, but what if your container dies, or you need to scale it up and down, manage volumes, network traffic and a bunch of other things, this starts to become complicated (imagine a real life scenario, with hundreds or even thousands of containers running at the same time). That's exactly where the Container Orchestration comes into place.
Kubernetes helps you managing this complexity, in a single place.
The third concept that I'd like to talk, is the create and apply commands. You can definitely find a more detailed explanation in here, but both of then can be used to create the resource in Kubernetes.
In your case, the create command is not creating the resources, because you are using the --dry-run and adding the output to your deployment file, which you apply later on, but the following command would also create your resource:
kubectl create deployment hello-world-dep --image=ollyw123/helloworld
kubectl create service clusterip hello-world-dep --tcp=8080:8080
Note that even this working, if you need to share this deployment, or commit it in a repository you would need to get it:
kubectl get deployment hello-world-dep -o yaml > your-file.yaml
So having the definition file is really helpful and recommended.
Great... Going further...
When you have a deployment you will also have a number of replicas that is expected to be running (even when you don't define it - the default value is 1). In your case your deployment is managing one pod.
If you run:
kubectl get pods -o wide
You will get your pod hello-world-dep-hash and an IP address. This IP is the IP of your container and you can access your application using it, but as pods are ephemeral, if your pod dies, Kubernetes will create a new one for you (automatically) with a new IP address, so if you have for instance a backend and its IP is constantly changing, you would need to manage this change in the frontend every time you have a new backend pod.
To solve that, Kubernetes has the Service, which will expose the deployment in a persistent way. So if your pod dies and a new one comes back, the address of your service will continue the same, and all the traffic will be automatically routed to your new pod.
When you have more than one replica of your deployment, the service also load balance the load across all the available pods.
Last but not least, your question!
You have asked, now what?
So basically, once you have your application containerized, you can deploy it almost anywhere. There are N different places you can get it. In your case you are running it locally, but you could get your deployment.yaml file and deploy your application in GKE, AKS, EKS, just to quote the biggest ones, but all cloud providers have some type of Kubernetes service available, where you can spin up a cluster and start playing around.
Actually, to play around I'd recommend Katakoda, as they have scenarios for free, and you can use the cluster to play around.
Wow... That was a long answer...
Just to finish, I'd recommend the Network Introduction in Katakoda, as there are different types of Services, depending on your scenario or what you need, and the tutorial is goes through the different types in a hands-on approach.

In the context of Kubernetes, Cluster is the environment where your PODS and Services are running. Think of it like a VM environment where you setup your Web Server and etc.. (although I don't like my own analogy)
If you want to run the same thing in GCloud, then you create a Kubernetes cluster there and all you need to do is to apply your YAML files that contains the Service and Deployment there via the CLI that Google Cloud provides to interact with your Cluster.
In order to interact with GCloud GKS Cluster via your local command prompt, you need to get the credentials for that cluster. This official GCloud document explain how to retrieve your cluster credential. once done, you can start interacting with the Kubernetes instance running in GCloud via kubectl command using your command prompt.

The service that you have is of type clusterIP which is only accessible from within the kubernetes cluster. You need to either use NodePort or LoadBalanacer type service or ingress to expose the application outside the remote kubernetes cluster(a set of VMs or bare metal servers in public or private cloud environment with kubernetes deployed on them) or local minikube/docker desktop. Once you do that you should be able to access it using a browser or curl

hazelcast-jet deployment and data ingestion

I have a distributed system running on AWS EC2 instances. My cluster has around 2000 nodes. I want to introduce a stream processing model which can process metadata being periodically published by each node (cpu usage, memory usage, IO and etc..). My system only cares about the latest data. It is also OK with missing a couple of data points when the processing model is down. Thus, I picked hazelcast-jet which is an in-memory processing model with great performance. Here I have a couple of questions regarding the model:
What is the best way to deploy hazelcast-jet to multiple ec2 instances?
How to ingest data from thousands of sources? The sources push data instead of being pulled.
How to config client so that it knows where to submit the tasks?
It would be super useful if there is a comprehensive example where I can learn from.

What is the best way to deploy hazelcast-jet to multiple ec2 instances?
Download and unzip the Hazelcast Jet distribution on each machine:
$ wget https://download.hazelcast.com/jet/hazelcast-jet-3.1.zip
$ unzip hazelcast-jet-3.1.zip
$ cd hazelcast-jet-3.1
Go to the lib directory of the unzipped distribution and download the hazelcast-aws module:
$ cd lib
$ wget https://repo1.maven.org/maven2/com/hazelcast/hazelcast-aws/2.4/hazelcast-aws-2.4.jar
Edit bin/common.sh to add the module to the classpath. Towards the end of the file is a line
CLASSPATH="$JET_HOME/lib/hazelcast-jet-3.1.jar:$CLASSPATH"
You can duplicate this line and replace -jet-3.1 with -aws-2.4.
Edit config/hazelcast.xml to enable the AWS cluster discovery. The details are here. In this step you'll have to deal with IAM roles, EC2 security groups, regions, etc. There's also a best practices guide for AWS deployment.
Start the cluster with jet-start.sh.
How to config client so that it knows where to submit the tasks?
A straightforward approach is to specify the public IPs of the machines where Jet is running, for example:
ClientConfig clientConfig = new ClientConfig();
clientConfig.getGroupConfig().setName("jet");
clientConfig.addAddress("54.224.63.209", "34.239.139.244");
However, depending on your AWS setup, these may not be stable, so you can configure to discover them as well. This is explained here.
How to ingest data from thousands of sources? The sources push data instead of being pulled.
I think your best option for this is to put the data into a Hazelcast Map, and use a mapJournal source to get the update events from it.

How to configure Application Logging Service for SCP application

I have created the hello world application from the SAP Cloud SDK archetypes and pushed this to the cloud foundry environment, binding it to an application logging service instance. My understanding is that this should already provide me with the ability to analyze all logs in the Kibana dashboard of the cloud platform and previously it also worked this way.
However, this time the Kibana dashboard remains empty, so I am wondering if I missed a step or configuration. Looking at the documentation of the service and the respective tutorial blog, I was not able to identify any additional required steps. In the Logs view on the SCP cockpit I can definitely see the entries, but they are not replicated to the ELK stack in the background.

Problem was not SDK related, but seems to have been an incident on the SCP - now works correctly without any changes.

RESTAPI Performance (Stress) test using docker

I want to do a performance test on Perl based REST API. Is there any docker container available to do this?
Like I can input:
1000 requests per second
POST request URL and body
Run for 5 mins.
I have monitoring setup on the server side. If the client (docker image) also provides some monitors then its a plus.

You can use e.g. Locust (https://locust.io) from container. Docs can be found here: https://docs.locust.io/en/latest/running-locust-docker.html

Docker doesn't provide any load testing capabilities per se, it's one of virtualization options mostly used for environmental integrity between DEV/QA/PROD systems. You might need this if you plan to dynamically add load generators using container orchestration solution like k8s.
Theoretically you can install any load testing tool into Docker container. Given your question tags:
there is k6 docker image
there are multiple JMeter docker images available at Docker Hub (or you can build your own JMeter docker image)
Monitoring can be done using i.e. cadvisor tool.

Spring-XD: Deployment of modules to certain containers

Three questions regarding deployment of modules to Spring XD container:
For certain sources and sinks it's necessary to say to which container a module should be deployed. Let's say we have a lot of containers on different machines, and we want to establish a stream reading a log file from one machine. The source module of type tail has to be deployed to the container running on the machine with the log file. How can you do that?
You may want to restrict the execution of modules to a group of containers. Let's say we have some powerful machines for our batch processing with containers on it, and we have other machines where our container runs parallel to some other processes only for ingesting data (log files etc.). Is that possible?
If we have a custom module, is it possible to add the module xml and the jars just to certain containers, so that those modules are just executed there? Or is it necessary that we have the same module definitions on all containers?
Thanks!

You bring up excellent points, we have been doing some design work around these issues, in particular #1 and #2 and will have some functionality here in our next milestone release in about 1 month time.
In terms of #3, the model for resolving the jars that are loaded in the containers requires the local file system or a shared file system to resolve the classpath. This is also something that has come up in our prototypes of using Spring XD on the CloudFoundry PaaS and we want to provide a more dynamic/at runtime ability to located and load new modules. No estimate on when that will be address.
Thanks for questions!
Cheers,
Mark

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio