SSH access for the headnode of FIWARE-Cosmos - hadoop

I am following this guide on Hadoop/FIWARE-Cosmos and I have a question about the Hive part.
I can access the old cluster’s (cosmos.lab.fiware.org) headnode through SSH, but I cannot do it for the new cluster. I tried both storage.cosmos.lab.fiware.org and computing.cosmos.lab.fiware.org and failed to connect.
My intention in trying to connect via SSH was to test Hive queries on our data through the Hive CLI. After failing to do so, I checked and was able to connect to the 10000 port of computing.cosmos.lab.fiware.org with telnet. I guess Hive is served through that port. Is this the only way we can use Hive in the new cluster?

The new pair of clusters have not enabled the ssh access. This is because users tend to install a lot of stuff (even not related with Big Data) in the “old” cluster, which had the ssh access enabled as you mention. So, the new pair of clusters are intended to be used only through the APIs exposed: WebHDFS for data I/O and Tidoop for MapReduce.
Being said that, a Hive Server is running as well and it should be exposing a remote service in the 10000 port as you mention as well. I say “it should be” because it is running an experimental authenticator module based in OAuth2 as WebHDFS and Tidoop do. Theoretically, connecting to that port from a Hive client is as easy as using your Cosmos username and a valid token (the same you are using for WebHDFS and/or Tidoop).
And what about a Hive remote client? Well, this is something your application should implement. Anyway, I have uploaded some implementation examples in the Cosmos repo. For instance:
https://github.com/telefonicaid/fiware-cosmos/tree/develop/resources/java/hiveserver2-client

Related

Establishing the connection to a Redshift cluster in a golang app, using ODBC via SSH tunnel (using the AWS Redshift ODBC Driver)

The Goal
I need to query a Redshift cluster from my golang application. The cluster is not available for public, so I want to use SSH to access the said cluster via bastion host.
Status Quo
I have an AWS Redshift cluster in a private VPC, with inbound rules to not allow any traffic from the internet, but tcp 22;
There's a bastion (which can connect to the cluster), so fowarding a port and using rsql works perfectly fine from the terminal;
I use ODBC, and the official ODBC driver from AWS;
In golang application, I use the following db implementation of the ODBC https://github.com/alexbrainman/odbc;
I can't use Devart's Redshift ODBC driver;
I can't use JDBC;
MacOS/Unix.
The Problem
The problem is pretty much trivial. When cluster is available for public and accessible from the internet, the alexbramain's library does it job. However, when the cluster is behind the wall, that's when problems kick in.
The code of the library is translated into C (system calls), I can't really debug it. While, with mysql, e.g., it's possible to register your custom dialer, it doesn't seem to be a case with ODBC.
Even when the tunnel is active, providing an ODBC DSN to the local host for some reason doesn't work. The SQLRETURN is always -1 (api/zapi_unix.go).
The Question
Did someone have such experience? How did you resolve a problem of accessing the cluster from the internet via a go app?
Thank you!

ElasticSearch and Redis Remote Servers

I deployed a Laravel application on AWS Elasticbeanstalk.
I want to incorporate caching with Redis as my cache driver as well as Elasticsearch.
I managed to run these 2 features locally (redis on port 6379 and elasticsearch on 9200),
but now I want them to run on remote servers and I simply specify their endpoints in my .env file.
Can anyone let me know how I can obtain remote URLs for Redis and Elasticsearch?
Update:
I found out that Heruko offers the ability to create a Redis instance and thereby one can obtain a URL for Redis. I presume a similar thing is for Elasticsearch.
If this is not the right way to do so, please let me know how it works

How to configure JDBC for Cloud Fusion to connect MySQL installed on localhost:3306

I'm trying to connect my local standalone MySQL with Cloud Fusion to create and test a data pipeline. I have deployed the driver successfully.
Also, I have configured the pipeline properties with correct values of jdbc string, user name and password but connectivity isn't getting established.
Connection String: jdbc:mysql://localhost:3306/test_database
I have also tried to test the connectivity via data wrangling option but that is also not getting succeeded.
Do I need to bring both the environments under same network by setting up some VPC and tunneling?
In your example, I see that you specified localhost in your Connection String. localhost is only advertised to other services running local to your machine, and Cloud Data Fusion (running in GCP) will not be able to reach the MySQL instance (running on your machine). Hence you're seeing the connectivity issue.
I highly recommend looking at this answer on SO that will help you setup a quick proof-of-concept.
I think that your question is more related to the way how to connect some on-premise environments to GCP networking system that gathering Google cloud instances or resources throughout VPC connection model.
Admitting the fact that GCP is actually leveraging different approaches for connection methods within a Hybrid cloud concepts, I would encourage you to learn some fundamental principles of Cloud VPN as a essential part of performing secure connection between particular VPN Peer Gateway and Cloud VPN Gateway and further creating a VPN tunnel between parties.
I guess there is even dedicated chapter in GCP documentation about Data Fusion VPC peering implementation that might be helpful in your user case.

How are clients for Hortonworks Sandbox properly configured?

Related: How connect to Hortonworks sandbox Hbase using Java Client API
I currently make a proof of concept using the Hortonworks Sandbox in a VM. However, I fail at properly configuring the client (outside the VM, but on the same computer). I looked for documentation as to how a client needs to be configured, but didn't find one.
I need client configuration for accessing HBase and MapReduce, but most appreciated would be a documentation that lists configuration for clients to all parts of the sandbox.
It is actually even more stupid than I would have expected. It seems that not all necessary ports are forwarded by default, it is necessary to add them all in the VM configuration.

What are some options for securing redis db?

I'm running Redis locally and have multiple machines communicating with redis on the same port -- any suggestions for good ways to lock down access to Redis? The database is run on Mac OS X. Thank you.
Edit: This is assuming I do not want to use the built-in (non backwards compatible) Redis requirepass directive in the config.
On EC2 we lock down the machines that can make requests to the redis port on our redis box to only be our app box (we also only use it to store non-sensitive data).
Another option could be to not open up the redis port externally, but require doing port forwarding through an ssh tunnel. Then you could only allow requests coming through the tunnel and only allow ssh with a known key.
You'd pay the ssh penalty, but maybe that's ok for your scenario.
There is a simple requirepass directive in the configuration file which allow access only to clients who authenticate through AUTH command. I recommend to read docs on this command, namely the "note" section.

Resources