Setting up distributed erasure code using minio on a local network on premise - minio

I was able to setup non EC mode easily, Now I am trying to setup minio server on local 3 node, as a distributed erasure code mode before I move to setup in the production. I want to replicate this setup in local network.
https://docs.min.io/docs/distributed-minio-quickstart-guide.html
Following the guide link :
I am running this command:
zc#rockpix:/minio$ export MINIO_ROOT_USER=minio
zc#rockpix:/minio$ export MINIO_ROOT_PASSWORD=minio123#
zc#rockpix:/minio$ minio server http://localhost{1...3}/mnt/ssd{1...3}
but running the above command result in
Invalid command line arguments: lookup localhost1 on 127.0.0.53:53: server misbehaving
> Please provide correct combination of local/remote paths
HINT:
For more information, please refer to https://docs.min.io/docs/minio-erasure-code-quickstart-guide
The guide does not mention if I need to setup a domain name. I was thinking if the non EC mode can working without a domain setup then why can't we setup distributed erasure code without having domain name setup ?

The error message looks like there is a DNS issue with that hostname, use a valid resolvable hostname or IP's. As a side note four nodes is the minimum recommended for a production setup.

Related

Can someone elaborate on the necessary proxy settings in the install-config.yaml file for an OKD installation in an air-gapped environment?

I am attempting an installation of OKD 4.5 in a restricted (i.e. air-gapped) environment. I am running into an issue during the installation process where-in, as far as I can tell, the bootstrap machine is attempting and failing to access the mirrored registry I have running.
Based on my research, I believe this issue is stemming from a lack of proxy settings within the install-config.yaml file as described in the documentation here, however I am having trouble wrapping my brain around what functions I'm attempting to accommodate by adding this proxy information into the configuration and exactly what information I should be adding. I haven't been able to find any other segments of the documentation that go into details about this either (however if someone can simply point me in the direction of such documentation, that would be extremely helpful).
Would anyone be willing to explain to me what values should be going into the proxy lines in this file and why? Does this information replace, compliment, or require changes in any way to the networking segment of the configuration?
As a related question, do I need to change any of the networking subnet values to reflect my local network? In all examples I've seen the clusterNetwork.cidr and serviceNetwork subnets are the same as the documentation (cidr: 10.128.0.0/14, serviceNetwork: - 172.30.0.0/16), and some include an additional machineNetwork field. Is this field something I should be adding and if so, should I just be including my subnet for this field?
As context for my specific scenario, here are my environment specifications as well as the specific errors I am getting:
OKD Release: `4.5.0-0.okd-2020-10-15-235428`
Environment: Virtualized Bootstrap, Master and Worker nodes, in virt-manager, running on Centos 7 in
air-gapped environment. This host machine contains the install directory and also provides DNS,
Apache Server, HAProxy for load balancing and the mirrored registry.
Errors:
From <log-bundle>/bootstrap/journals/release-image.log:
localhost.localdomain release-image-download.sh[114151]: Error: Error initializing source docker://okd-services.okd.local:5000/okd#sha256<.....>:
error pinging docker registry okd-services.okd.local:5000: Get "https://okd-services.okd.local:5000/v2/":
dial tcp <okd-services.okd.local ip>:5000: connect: connection refused
From systemctl status named (several requests to IPs I don't recognize which seem to be NTP requests):
network unreachable resolving '2.fedora.pool.ntp.org.okd/AAAA..
network unreachable resolving './NS/IN': 199.7.91.13#53
etc
I have ensured that host-node and node-node communication is present, and that the registry is accessible from the nodes ( to test, I netcat the certificate pem into a node and update its trusts, then curl -u the registry using https://fqdn:5000/v2/_catalog), so I am fairly certain all the connections are established properly.
To conclude, since I'm fairly sure that the proxy/network settings in the install-config.yaml file are to blame, and since I am unable to find more elaboration on these configurations in the official docs or elsewhere, I would very much appreciate any in-depth explanation of how I should be configuring this for an air-gapped environment. Additionally, if anyone believes that another issue is the cause, any input regarding that would be great.

Alibaba ECS Instances do not respond to ping on private network unless I set password

I have the following configuration in Alibaba ECS:
Public Connector and Three Test Nodes
Connector has network connections on the public internet and the default VSwitch in the default VPC. Connector was created using the ECS web interface. The testnode[0-2] machines were created in a script using the Alibaba cli command: aliyun.
When the instances start running, the connector can ping none of them. If I set a password on any of the test nodes, and then restart the test node, ping starts working. The script uses a snapshot of the Connector as the image for the test nodes. The ```Connector`` has a randomly generated, long, and forgotten root password. Root access is via ssh with a passphrase protected key pair. It also has the same for a non-root user for the test code.
What I have tried is creating test nodes with the following CreateInstance options:
No --Password and no --InheritPassword options (original intent: why set a password? I have the access I need from the Connector image)
--InheritPassword option (I need a root password in order for the private network interfaces to work, the root password in the Connector image is fine)
--Password option (I need to explicitly set a root password on the test nodes)
The result is all the same, until I use the ECS web interface to set a password and restart a test node, Console cannot ping the test nodes.
What I know:
This is not a problem with the default security group, VPC, or VSwitch as I touch no settings on these entities in order for ping to work.
This is not a problem with the instance image because as soon as ping works, ssh to the test nodes works as well.
What I am doing wrong, or what am I missing? The whole purpose is to spin up instances without having to type away at the ECS web interface. I figured out what it took to get the private network traffic moving because I wanted to debug the situation on the test nodes, and for that, I had to set a root password and gain access from the ECS web console, which again, defeats the purpose of scripting.
Aliyun command for creating the test nodes:
aliyun ecs CreateInstance --ImageId m-2vchb2oxldfuloh51wp9 --RegionId=cn-chengdu --InstanceType=ecs.c6.xlarge --SpotStrategy SpotWithPriceLimit --SpotPriceLimit 0.25 --ZoneId cn-chengdu-a --InternetChargeType PayByTraffic --InternetMaxBandwidthOut 99 --InstanceName TEST_NODE-0 --HostName testnode0 --Password 'notgoingtotellyou'
Operating system for all instances is Ubuntu 18.0.4.
Aliyun command version is 3.0.30.
I got two answers. One from a co-worker. One from Alibaba.
Co-worker's answer:
The configuration fails because the Unbuntu 18.0.4 image that I created for the non-public test machines used a static address for the internal network interface. I changed the internal network interface (eth0) to use dhcp and all worked. See netplan configuration examples for how to change the IP address assignment.
Alibaba's answer:
Try using aliyun ecs RunInstances instead of three individual aliyun ecs CreateInstance and aliyun ecs StartInstance invocations. I did not try this solution as it would have involved rewriting my scripts. Alibaba could have done more to motivate me by providing an explanation as to why RunInstances would produce a different result than the combination of CreateInstance and StartInstance.

Why when I try to clone a machine with the Node-RED on it I lose all the graphical configuration that I've made (Ubuntu Amazon Server)?

I'm running an Ubuntu Server on an Amazon EC2 Service. And I'm using the Node-RED to create an IOT project on the cloud.
I succeeded configuring one machine in a way that it works for my project. My problem is when I clone this machine (creating an Amazon Machine Image of my original server and launching it as a new machine). I don't know why all the nodes that I've created on the graphical interface with the Node-RED disappear when I clone my Ubuntu Server. On my cloned server I just see a blank page when I access the Node-RED as if I had never created any node on the original server:
I think this is a problem with the Node-RED because I'm also running a Kibana instance on the same server and all Kibana's graphical configurations are preserved with the cloned server.
Does anyone know why this is happening? Is there a specific configuration on the Node-RED that I have to change to allow its graphical interface to be cloned?
OBS: I know I could just export everything that I did on the original server to my cloned server using the Node-RED import/export tools... But I'm planning to clone my original server many times, so it'd be better if everything were exactly the same when I clone the machine, without the need of manual work.
Node-RED stores the flow in a file in the ~/.node-red/ directory of the user running that instance, the file name is based on the host name of the machine.
e.g. on a raspberry pi the default flow file is called:
/home/pi/.node-red/flows_raspberrypi.json
So assuming that the host name gets changed when you "clone" the machine, Node-RED will not be able to find a flow file that matches the host name and as such start with an empty flow.
There are a few of ways to work round this.
if you start Node-RED manually from the command line you can specify the flow file as the last argument: node-red flow.json
if you are running Node-RED as a service then you can edit the ~/.node-red/settings.js to include a flowFile key that holds the name of the flow to use.

Job tracking URL in Google Compute engine not working

I am using Google Compute Engine to run Mapreduce jobs on Hadoop (pretty much all default configs). While running the job I get a tracking URL of the form http://PROJECT_NAME:8088/proxy/application_X_Y/ but it fails to open. Did I forget to configure something?
To elaborate on the option Amal mentioned in the other answer of using the "external ip address" of your Google Compute Engine VM, you can obtain the external IP address by running gcloud compute instances describe --zone <your zone> <your master hostname> and looking for natIP.
To open port 8088, you'll have to set up a firewall rule opening that port, likely on your default Google Compute Engine network. You'll want to specify a your.ip.address.here/32 address in the --source-ranges to restrict incoming traffic to just your local machine dialing into your VM, otherwise the anyone in the IP source-ranges would be able to access your Hadoop pages.
If you had used bdutil to turn up your cluster, there's an alternative way which is much easier and more secure; simply run
bdutil <your flags used in deployment, like -e hadoop2, --prefix, etc.> socksproxy
to open SSH with dynamic port forwarding to use as a SOCKS5 proxy that your browser can point to. If you're running on Linux or Mac and have Chrome or Firefox installed, bdutil should also print out a copy/paste command for starting a fresh isolated browser pre-configured to use the socks proxy so that you can click through all the useful links.
If bdutil didn't print out a browser command or you didn't use bdutil, you can also run and configure your SSH socks proxy using these instructions. An SSH-based socks proxy is more secure than opening up firewall ports, and also allows the Hadoop page links to work (otherwise you have to keep manually replacing the hostnames with the external IP addresses).
One correction. You are using YARN. So there is no jobtracker. Jobtracker is present in hadoop 1.x. In YARN, the processing layer became a generic framework and the jobtracker got replaced with Resource manager and application master. The UI that you mentioned in the question was of Resource Manager.
For your problem, try the following tips.
Use the public ip address of the resource manager instance instead of PROJECT_NAME.
Check whether the 8088 port is opened for accessing it from outside.
Another (more secure) way to do this is to use gcloud compute to make an ssh tunnel to your deployment, and then launch Chrome though it.
$ gcloud compute ssh clustername --zone=us-central1-a --ssh-flag="-D 1080" --ssh-flag="-N" --ssh-flag="-n"
You will need to replace clustername with the name of your deployment, and change the --zone if necessary.
From there, you can launch Chrome through it and then reach the hadoop job tracking URL.
$ chrome --proxy-server="socks5://localhost:1080" \
--host-resolver-rules="MAP * 0.0.0.0 , \
EXCLUDE localhost" --user-data-dir=/tmp/clustername

Use spark-submit to submit a application to EC2 cluster

I am new to Spark and I am trying to run it on EC2. I follow the tutorial on spark webpage by using spark-ec2 to launch a Spark cluster. Then, I try to use spark-submit to submit the application to the cluster. The command looks like this:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://ec2-54-88-9-74.compute-1.amazonaws.com:7077 --executor-memory 2G --total-executor-cores 1 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0.jar 100
However, I got the following error:
ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
Please let me know how to fix it. Thanks.
You're seeing this issue because the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your machine). The default mode of spark-submit is client which runs the driver on the machine that submitted it.
A new cluster mode was added to spark-deploy that submits the job to the master where it is then run on a client, removing the need for a direct connection. Unfortunately this mode is not supported in standalone mode.
You can vote for the JIRA issue here: https://issues.apache.org/jira/browse/SPARK-2260
Tunneling your connection via SSH is possible but latency would be a big issue since the driver would be running locally on your machine.
I'm curious if you still having this issue ... But in case anyone is asking here is a brief answer. As clarified by jhappoldt, the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your local machine). Two workarounds are possible, tested and succeeded.
(1) From EC2 Management Console, create a new security group and add rules to enable TCP back and forth from your PC (public IP). (what I did was adding TCP rules inbound and outbound) ... Then add this security group to your master instance. (right click --> Networking --> Change security groups). Note: add it and don't remove the already established security groups.
This solution work well, but in your specific scenario, deploying your application from local machine to EC2 cluster, you will face further problems (resource related) so the next option is the best one
(2) Having your .jar file (or .egg) copy it to the master node using scp. You can check this link http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html for information about how to do that; and deploy your application from the master node. Note: spark is already pre-insalled so you will do nothing but write the same exact command you write on your local machine from ~/spark/bin. This shall work perfect.
Are you executing the command on your local machine, or on the created EC2 node? If you're doing it locally, make sure port 7077 is open in the security settings, as its closed to the outside by default.

Resources