Hadoop Cluster datanode network error - hadoop

I have a small cluster with Cloudera Hadoop installation. After a few days, I noticed that there is constantly errors/dropped/frame when I run ifconfig -a command. (From the highlevel perspective, map reduce job will run smoonthly without error and there are no errors from the end user perspective, I am wondering if I do something, will the performance be much better)
All the nodes, including the namenode, are installed and configured by the same redhat kickstart server, following the same recipe and I would say they are the "same". However, I did not notice any network errors on the namenode and the network errors exist on all the datanode consistently.
For example, my namenode looks like:
namenode.datafireball.com | success | rc=0 >>
eth4 Link encap:Ethernet HWaddr ...
inet addr:10.0.188.84 Bcast:10.0.191.255 Mask:...
inet6 addr: xxxfe56:5632/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:11711470 errors:0 dropped:0 overruns:0 frame:0
TX packets:6195067 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:6548704769 (6.0 GiB) TX bytes:12093046450 (11.2 GiB)
Data node:
datanode1.datafireball.com | success | rc=0 >>
eth4 Link encap:Ethernet HWaddr ...
inet addr:10.0.188.87 Bcast:10.0.191.255 Mask:...
inet6 addr: xxxff24/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:27474152 errors:0 dropped:36072 overruns:36072 frame:36072
TX packets:28905940 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:158509736560 (147.6 GiB) TX bytes:180857576718 (168.4 GiB)
I also did some stress test following Michael's tutorial and I can see the errors increasing as the job goes. So it is some error left when I first set up.
FYI, we have two NIC cards in one box, the first 4 ports are embedded nic card 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20) which we are not using at all, we are using 0e:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0) which is the 10Gb NIC.
This is the output of the firmware and some general info for NIC card:
$ ethtool -i eth4
driver: mlx4_en
version: 2.0 (Dec 2011)
firmware-version: 2.8.600
bus-info: 0000:0e:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
I am so surprised to find that the data node will have network errors and the namenode doesn't since they have the same set up and configuration. Can anyone give me some guidance?

B.Mr.W.!
Answering your question, my hypothesis based on the functionality of each component, the namenode only handles metadata information, managing only the location of blocks and servers, with requests and responses using little bandwidth on the network. The datanode is responsible for massive data, using the network bandwidth in its entirety, since it transfers the 'big' data, hence the dropped packets.
I suggest you to evaluate the switch port connected to this server network interface configuration, if the jumbo frame is enabled (MTU = 9000).
The same configuration must be verified in the network interface settings for the server.
A great way to check if the configuration is missing at one of the points is checking if there are dropped packages with the command 'ifconfig -a', executed in server SO console:
[root#<hostname> ~]# ifconfig -a
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 9000
inet Ip.Ad.re.ss netmask net.m.as.k broadcast bro.d.ca.st
ether XX:XX:XX:XX:XX:XX txqueuelen 1000 (Ethernet)
RX packets 522849928 bytes 80049415915 (74.5 GiB)
RX errors 274721 dropped 276064 overruns 0 frame 274721
TX packets 520714273 bytes 72697966414 (67.7 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
In this case, jumbo frame is configured only in server network interface.
Regards,
Caseiro.

Related

How to add "port-mac" on linux embedded device?

I have a device with two port switch function.The ethernet interface is as below
eth0 Link encap:Ethernet HWaddr 90:59:af:6e:02:43
inet addr:192.168.1.3 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
In order to apply LLDP protocol, I need to have other 2 mac address, one for port1, another for port2.
This is what I called "port-mac".
For instance, Interface mac is called mac_mcu, port1 is called mac_port1 then port2 is mac_port2.
When I plug ethernet cable on port1 on the device, I should see source mac address is filled with mac_port1 in the LLDP_Multicast frame.
In the other hand, as I plug cable on port2 on the same device, I should see source mac address is filled wit mac_port2 in the LLDP_Multicast frame.
That is why I need 3 mac addresses on the device and maybe it can be done with DSA function (Distributed Switch Architecture)?
Another question is how do I assign the mac_port1 to physical port1 and mac_port2 to physical port2 ?
In the end, the result I hope to see is,
Interface mac: 90:59:af:6e:02:43.
Port1 mac: 90:59:af:6e:02:44.
Port2 mac: 90:59:af:6e:02:45
I have tried below command,
$ifconfig eth0:0 192.168.1.10 up
$ifconfig eth0:1 192.168.1.11 up
But the mac address are all the same with eth0, it's not what I need.

Docker on Mac... Can't access anything on local network

Running OSX 10.11.6 (for campatibility reasons with some other software that isn't compatible with anything higher, so upgrading OSX isn't a solution) and the highest version of Docker compatible with 10.11.6, which is 17.12.0-ce-mac55 (23011).
Before I continue, I should say I'm not too knowledgeable when it comes to networks, subnets, netmasks, gateways, etc...
From inside any of my containers, I cannot access any hosts on my local network (apart from the host machine). My network config looks like this on the host:
en4: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ether 00:50:b6:69:51:f3
inet6 fe80::250:b6ff:fe69:51f3%en4 prefixlen 64 scopeid 0x4
inet 172.25.18.19 netmask 0xfffffe00 broadcast 172.25.19.255
nd6 options=1<PERFORMNUD>
media: autoselect (1000baseT <full-duplex>)
status: active
and like this in a container:
eth0 Link encap:Ethernet HWaddr 02:42:C0:A8:10:04
inet addr:192.168.16.4 Bcast:192.168.31.255 Mask:255.255.240.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:24 errors:0 dropped:0 overruns:0 frame:0
TX packets:30 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1683 (1.6 KiB) TX bytes:2551 (2.4 KiB)
For example, from the host, I can telnet to an HP JetDirect box on 172.25.33.51, but not from any container. Similarly, I can get a response from our Navision server's SOAP server from the host, on 172.20.38.62, but not from any container. Same story with a few other machines on slightly different IPs.
Problem is, I need to write an integration bundle that connects to Navision.
Any ideas how I can get access to hosts on the network?

Virtualbox - can't use internet on terminal

I have a Linux Mint in a virtualbox VM and I'm able to use Internet through browser. However, when I've tried to use the command wget www.google.com, for example, the results is
$ wget www.google.com
--2018-12-03 16:46:10-- http://www.google.com/
Resolving www.google.com (www.google.com)... 2800:3f0:4001:810::2004,
172.217.28.4
Connecting to www.google.com
(www.google.com)|2800:3f0:4001:810::2004|:80...
I've checked the issue No internet in terminal . But, unfortunatelly appears as an specific proxy problem and that's not my case.
My VM network config
I know! Portuguese...
Basicaly, the connection type is set on "Bridge"
And "promiscuous" mode is set as 'Allow everything'.
There is no other adaptor configuration.
Result of command ifconfig
enp0s3 Link encap:Ethernet HWaddr 08:00:27:2b:04:c7
inet addr:192.168.0.39 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: 2804:14d:c092:4057:6d41:5685:4959:c973/64 Scope:Global
inet6 addr: 2804:14d:c092:4057::1005/128 Scope:Global
inet6 addr: fe80::da8c:1d0b:592d:5c90/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:14289 errors:0 dropped:0 overruns:0 frame:0
TX packets:9307 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:15589075 (15.5 MB) TX bytes:938043 (938.0 KB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:776 errors:0 dropped:0 overruns:0 frame:0
TX packets:776 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:66576 (66.5 KB) TX bytes:66576 (66.5 KB)
Linux Mint network config
Thanks to #darnir, I figured out how to make a 'workaround' to solve this problem! Basicaly, I had to add some 'aliases' for wget and apt-get in my .bashrc file and edit /etc/sysctl.conf
Aliases on ~/.bashrc:
# alias for wget force connection through ipv4
alias wget='wget -4 '
# alias for apt-get force connections through ipv4
apt-get='apt-get -o Acquire::ForceIPv4=true
Editing on /etc/sysctl.conf (Remember this solution is implemented over Linux Mint distro)
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
To restart systcl:
sudo sysctl -p
Or, you can use -w in sysctl command directly. But you'll lost this config as soon as you end the terminal session:
sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1
sysctl -w net.ipv6.conf.lo.disable_ipv6=1
WARNING this is not a good solution because is not comprehensive to all system. The problem appearently is the algorithms to resolve IPv6 is just too slow to perform properly in VMs( at least in common machines ). If someone has another idea please, post it! :D

HortonWorks Hadoop Sandbox and Tableau

I am attempting to connect Tableau to the HortonWorks Hadoop sandbox as described here: http://hortonworks.com/kb/how-to-connect-tableau-to-hortonworks-sandbox/
Tableau is able to see the virtual server as a data source, and it accurately lists the available Schemas and Tables.
However, when attempting to select any table or preview it's data, it displays an error popup that 'An error has occurred while loading the data. No such table [default].[tablename]' where default is the schema and tablename is the name of the table I'm attempting to view.
Here is what comes back when I run ifconfig from the Terminal window in the vm sandbox. Tableau is connecting to the vm via 192.168.50.128.
eth3 Link encap:Ethernet HWaddr 00:0C:29:EB:B9:DC
inet addr:192.168.50.128 Bcast:192.168.50.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:feeb:b9dc/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:42011 errors:0 dropped:0 overruns:0 frame:0
TX packets:9750 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:15123871 (14.4 MiB) TX bytes:4019795 (3.8 MiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:5185579 errors:0 dropped:0 overruns:0 frame:0
TX packets:5185579 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2054785522 (1.9 GiB) TX bytes:2054785522 (1.9 GiB)
The guide states Enter the IP address of the Sandbox VM (typically 192.168.56.101) which is different.
Is this IP difference the source of the issue or is there something else I've overlooked? Im assuming that since it can see the schema and tables that this wouldn't matter.
Turns out this was a permissions issue which I was able to resolve by following this guide: http://diveintobigdata.blogspot.com/2015/10/error-occurred-executing-hive-query.html
However, everywhere I was told to input localhost, such as when accessing Ambari, I had to replace localhost with 192.168.50.128 which I mentioned above is the IP I saw when executing ifconfig in the terminal.
Also, in step 1 of the guide there should not be any spaces in the file paths that were provided.

How to access a docker container running on MacOSX from another host?

I'm trying to get started with docker and want to run the Ubiquiti video controller. I have installed Docker Toolbox and managed to get the container to run on my Yosemite host and can access it on the same mac by going to the IP returned by docker-machine ip default. But I want to access it on other machines on the network and eventually set up port forwarding on my home router so I can access it outside my home network.
As suggested in boot2docker issue 160, using the Virtualbox GUI I was able to add a bridged network adaptor, but after restarting the VM docker-machine can no longer connect with the VM. docker env default hangs for a long time but eventually returns some environment variables along with the message Maximum number of retries (60) exceeded. When I set up the shell with those variables and try to run docker ps I get the error: An error occurred trying to connect: Get https://10.0.2.15:2376/v1.20/containers/json: dial tcp 10.0.2.15:2376: network is unreachable.
I suspect that docker-machine has some assumptions about networking configuration in the VM and I'm mucking them up.
docker-machine ssh ifconfig -a returns the following:
docker0 Link encap:Ethernet HWaddr 02:42:86:44:17:1E
inet addr:172.17.42.1 Bcast:0.0.0.0 Mask:255.255.0.0
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
dummy0 Link encap:Ethernet HWaddr 96:9F:AA:B8:BB:46
BROADCAST NOARP MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
eth0 Link encap:Ethernet HWaddr 08:00:27:37:2C:75
inet addr:192.168.1.142 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe37:2c75/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2996 errors:0 dropped:0 overruns:0 frame:0
TX packets:76 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:278781 (272.2 KiB) TX bytes:6824 (6.6 KiB)
Interrupt:17 Base address:0xd060
eth1 Link encap:Ethernet HWaddr 08:00:27:E8:38:7C
inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fee8:387c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:767 errors:0 dropped:0 overruns:0 frame:0
TX packets:495 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:122291 (119.4 KiB) TX bytes:116118 (113.3 KiB)
eth2 Link encap:Ethernet HWaddr 08:00:27:A4:CF:12
inet addr:192.168.99.100 Bcast:192.168.99.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fea4:cf12/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:430 errors:0 dropped:0 overruns:0 frame:0
TX packets:322 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:53351 (52.1 KiB) TX bytes:24000 (23.4 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
eth0 seems to be getting a reasonable DHCP address from my router.
I'm not sure whether this is the right approach or whether I'm barking up the wrong tree. If I can get the bridged network adaptor working on the VM, I don't know how to then convince my docker container to use it. I've tried searching high and low on the internet. I've found dozens of sites that explain how you need to access the container using the value of docker-machine ip default rather than localhost but nothing to explain how to access from a different host. Maybe I need to improve my googling skills.
This working for me
with stopped VM add a 3rd "bridge" network
start the VM with docker-machine start machine-name
regenerate certs with docker-machine regenerate-certs machine-name
check if ok with docker-machine ls
OK, so I found a better way to do it than trying to use a bridging network adaptor. I found it in the boot2docker docs on port forwarding.
Just use VBoxManage modifyvm default --natpf1 "my_web,tcp,,8080,,80"
or use the VirtualBox GUI to specify your port forwarding for the NAT adaptor.
Then, remove the -p option from your docker run command and use --net=host instead. That is instead of
docker run -d -p 8080:80 --name=web nginx
use
docker run -d --net=host --name=web nginx
And voila! Your web server is available at localhost:8080 on your host or YOURHOSTIP:8080 elsewhere on your LAN.
Note that using --net=host may mess up communication between containers on the VM, but since this is the only container I plan to run, it works great for me.
On a machine with Docker Toolbox for Mac, I'm solving the problem as follows (using the default machine).
Preparation
Stop the machine if it's running:
docker-machine stop default
VirtualBox Setup
Open VirtualBox, select the default machine, open Settings (Cmd-S), go to Network, and select "Adapter 3".
Check "Enable Network Adapter" (turn it on).
Set "Attached to" to "Bridged Adapter".
Set Name to "en0: Ethernet" (or whatever is the primary network interface or your Mac).
Disclose "Advanced", and make sure "Cable Connected" is checked on.
Note the "MAC Address" of "Adapter 3" (we'll use it later).
Press "OK" to save the settings.
Docker Setup
Now, back in Terminal, start the machine:
docker-machine start default
Just in case, regenerate the certs:
docker-machine regenerate-certs default
Update the environment:
eval $(docker-machine env default)
At this point, the machine should be running (with the default IP address of 192.168.99.100, accessible only from the hosting Mac). However, if you ssh into the docker VM (docker-machine ssh default) and run ifconfig -a, you'll see that one of the VM's interfaces (eth0 in my case) has an IP in the same network as your Mac (e.g. 192.168.0.102), which is accessible from other devices on your LAN.
Router Setup
Now, the last step is to make sure this address is fixed, and not changed from time to time by your router's DHCP. This may differ from router to router, the following applies to my no-frills TP-LINK router, but should be easily adjustable to other makes and models.
Open your router settings, and first check that default is in the router's DHCP Clients List, with the MAC address from step 7 above.
Open "DHCP" > "Address Reservation" in the router settings, and add the "Adapter 3" MAC Address (you may have to insert the missing dashes), and your desired IP there (e.g. 192.168.0.201).
Now my router asks me to reboot it. After the reboot, run docker-machine restart default for the Docker VM to pick up its new IP address.
Final verification: docker-machine ssh default, then ifconfig -a, and find your new IP address in the output (this time the interface was eth1).
Result
From the hosting Mac the machine is accessible via two addresses (192.168.99.100 and 192.168.0.201); from other devices in the LAN it's accessible as 192.168.0.201.
This question main use case would be to access the applications running in the container from host(Mac) machine or other machines in the host(Mac) network
Once the container application has been started and exposed as below
docker run -d -p 8080 <<image-name>>
Then find the mapping between the host(Mac) port with container port as below
docker port <<container-name>>
sample output : 8080/tcp -> 0.0.0.0:32771
now access the container application as host(Mac IP):32771 from any machine in your host(Mac) network
If I change the first network card from NAT to bridge I also can't connect to it.
What I have found working was to add 3rd network card, set it up to bridge mode and change adapter type to the Intel PRO/1000 MT Desktop (82540EM). The default one is probably not supported by boot2docker distro.
See my comment at github.

Resources