Change HBase WAL location - hadoop

I'm planning to use HBase with gs:// scheme (google buckets), but gs:// fs can't work with WAL files.
Cause:
java.io.IOException: cannot get log writer
Caused by: java.io.IOException: createNonRecursive unsupported for this filesystem class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
As I found it should way to store WAL files separate from a HBase root dir, eg: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/images/hbase_s3.png
So, my question is - how to separate data files and WAL files. To store data in gs:// and WALs in hdfs:// ... Unfortunately still can't find it by myself ...
Many thanks in advance

Current stable version doesn't support storing WALs in a different location outside of hbase.rootdir.
It's planned in future releases, ref https://issues.apache.org/jira/browse/HBASE-17437

Related

Jar file not found exception when running map reduce job when copying data from hbase

When I tried to execute the following command to copy data from hbase to another cluster in a hbase client environment. The command I ran is:
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=[destination zk]:/hbase [source table name]
I got this error:
Exception in thread "main" java.io.FileNotFoundException: File does
not exist:
hdfs://servername:8020/opt/hbase-1.2.10/lib/metrics-core-2.2.0.jar at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1072)
at
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
The /opt/hbase-1.2.10/lib/metrics-core-2.2.0.jar is on my local path but it does not exist in the hdfs. It seems the CopyTable util is submitting a mapreduce job without the dependency jars. I read a few articles and it seems the only solution is to upload the jar lib to hdfs with the same path. This is really an ugly solution.
Please kindly advise. Thanks!

Snappy Compression not working due to tmp folder previliges

I have a problem whenever i am trying to store my data in a compressed format with pig, Sqoop, or Spark. I know the problem is with mounting our tmp folder to nonexec and this causes for instance snappy to give me this error:
java.lang.IllegalArgumentException: java.lang.UnsatisfiedLinkError: /tmp/snappy-1.1.2-fe4e30d0-e4a5-4b1a-ae31-fd1861117288-libsnappyjava.so: /tmp/snappy-1.1.2-fe4e30d0-e4a5-4b1a-ae31-fd1861117288-libsnappyjava.so: failed to map segment from shared object: Operation not permitted
The solutions that i found in the internet is that either mount the tmp folder to exec which is not an option for me as the sysadmin won't allow it due to security concerns.The other option is to change the java opts execution path to some other paths instead of tmp.
I have tried the following approach, but it didn't solve the problem.
add these lines to hadoop-env.sh and sqoop-env
export HADOOP_OPTS="$HADOOP_OPTS -Dorg.xerial.snappy.tempdir=/newpath"
export HADOOP_OPTS="$HADOOP_OPTS -Djava.io.tmpdir=/newpath"
I would appreciate if you guys have any other solutions that could solve the issue.
Thanks
For other users with this issue, try starting Hive with
hive --hiveconf org.xerial.snappy.tempdir=/../
and supply a location that can execute

Why DATA is COPIED and not MOVED while loading data from local filesystem Hive hadoop

When we use following command:
Load data local inpath "mypath"
why the data is copied from local filesystem into HDFS and not moved?
Since you are moving data between 2 different file systems (sh + HDFS) this cannot be a metadata operation as in non-local load.
The data itself should be copied.
Theoretically this command could also initiate a deletion command of the source file, but what for?

Could not find uri with key dfs.encryption.key.provider.uri to create a keyProvider in HDFS encryption for CDH 5.4

CDH Version: CDH5.4.5
Issue: When HDFS Encryption is enabled using KMS available in Hadoop CDH 5.4 , getting error while putting file into encryption zone.
Steps:
Steps for Encryption of Hadoop as follows:
Creating a key [SUCCESS]
[tester#master ~]$ hadoop key create 'TDEHDP'
-provider kms://https#10.1.118.1/key_generator/kms -size 128
tde group has been successfully created with options
Options{cipher='AES/CTR/NoPadding', bitLength=128, description='null', attributes=null}.
KMSClientProvider[https://10.1.118.1/key_generator/kms/v1/] has been updated.
2.Creating a directory [SUCCESS]
[tester#master ~]$ hdfs dfs -mkdir /user/tester/vs_key_testdir
Adding Encryption Zone [SUCCESS]
[tester#master ~]$ hdfs crypto -createZone -keyName 'TDEHDP'
-path /user/tester/vs_key_testdir
Added encryption zone /user/tester/vs_key_testdir
Copying File to encryption Zone [ERROR]
[tdetester#master ~]$ hdfs dfs -copyFromLocal test.txt /user/tester/vs_key_testdir
15/09/04 06:06:33 ERROR hdfs.KeyProviderCache: Could not find uri with
key [dfs.encryption.key.provider.uri] to create a keyProvider !!
copyFromLocal: No KeyProvider is configured, cannot access an
encrypted file 15/09/04 06:06:33 ERROR hdfs.DFSClient: Failed to close
inode 20823
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /user/tester/vs_key_testdir/test.txt.COPYING (inode
20823): File does not exist. Holder
DFSClient_NONMAPREDUCE_1061684229_1 does not have any open files.
Any idea/suggestion will be helpful.
This issue was crossposted here: https://community.cloudera.com/t5/Storage-Random-Access-HDFS/Could-not-find-uri-with-key-dfs-encryption-key-provider-uri-to/td-p/31637
Main conclusion: It is a non-issue
Here is the answer that was provided by the support staff:
CDH's base release versions are just that: base. The fix for the
harmless log print due to HDFS-7931 is present in all CDH5 releases
since CDH 5.4.1.
If you see that error in context of having configured a KMS, then its
a worthy one to consider. If you do not use KMS or EZs, then the error
may be ignored. Alternatively upgrade to the latest CDH5 (5.4.x or
5.5.x) releases to receive a bug fix that makes the error only appear when in the context of a KMS being configured over an encrypted path.
Per your log snippet, I don't see a problem (the canary does not
appear to be failing?). If you're trying to report a failure, please
send us more characteristics of the failure, as HDFS-7931 is a minor
issue with an unnecessary log print.

Does a file need to be in HDFS in order to use it in distributed cache?

I get
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file:/path/to/my.jar, expected: hdfs://ec2-xx-xx-xx-xxx.compute-1.amazonaws.com
if I try to add a local file to distributed cache in hadoop. When the file is on HDFS, I don't get this error (obviously, since it's using the expected FS). Is there a way to use a local file in distributed cache without first copying it to hdfs? Here is a code snippet:
Configuration conf = job.getConfiguration();
FileSystem fs = FileSystem.getLocal(conf);
Path dependency = fs.makeQualified(new Path("/local/path/to/my.jar");
DistributedCache.addArchiveToClassPath(path, conf);
Thanks
It has to be in HDFS first. I'm going to go out on a limb here, but I think it is because the file is "pulled" to the local distributed cache by the slaves, not pushed. Since they are pulled, they have no way to access that local path.
No, I don't think you can put anything on the distributed cache without it being in HDFS first. All Hadoop jobs use input/output path in relation to HDFS.
File can be either in local system, hdfs, S3 or other cluster also. You need to specify as
-files hdfs:// if the file is in hdfs
by default it assumes local file system.

Resources