Could not extract Cloudera Hadoop VM archive - windows

I am new to Cloudera. I have worked on hadoop previously, now I want to try Cloudera Hadoop. For this I started with Cloudera Hadoop VM.
The downloaded the file in 7zip format with 2GB size. When I try to extract, it shows error
Can not open file cloudera-quickstart-vm-4.4.0-1-vmware.7z as archive.
All other files are extracting properly but this single file is not extracting. I have downloaded the file three times but got the same error. Is there any specific way to extract this file?
Any help would be appreciated.

You don't need to do anything special, but I had to download the Standard QuickStart VirtualBox VM 3 times before the archive was complete. The final file that worked for me was actually ~2.6G in size.

If Windows, you need to have WinRar installed. This seems to be a common problem while trying to use/install Cloudera QuickStart VM.

Related

Can I use Spark prebuilt without hadoop on Windows?

I'm using Spark 3.1.3 prebuilt without Hadoop on a production unix based server. Spark is running in standalone mode. I'm using local filesystem rather than a distributed filesystem such as Hadoop.
I'd ideally like to replicate my production environment locally but unfortunately I'm restricted to using Windows.
Typically, I am able to run Spark on Windows by using Spark 3.1.3 prebuilt for Hadoop Y and using the winutils tool provided here: https://github.com/steveloughran/winutils
It's my understanding that winutils is simulating Hadoop rather than a unix FS.
Am I able to use the exact same Spark binaries in production and on my Windows development machine? Or am I restricted to using Spark prebuilt for Hadoop locally?
Can you explain why either solution works?
I tried running my Spark scripts locally using the version prebuilt without Hadoop but I'm unable to start my scripts. (Will provide some logs and edit this when I'm back on my Windows machine)
"Without" only refers to the scripts/libraries in the downloaded tarball. The more correct term would be "bring your own Hadoop". You will still need HADOOP_CONF_DIR + HADOOP_HOME set, as well as HDFS client JAR libraries to use a local FS.
Yes, you can use Spark on Windows by including the correct version of Winutils. Or you can use WSL2 and download Spark within a full Unix environment.

How do I run burrows-wheeler aligner on windows subsystem for linux?

Total newbie here and have no idea on what I'm doing.
I have installed ubuntu on windows can open bash from windows now.
I have also downloaded burrow-wheeler aligner from the Sourceforge: https://sourceforge.net/projects/bio-bwa/files/
From there I tried extracting the bz2 file. And I added the extracted folder into PATH
but when I type in bwa on bash, it says bwa: command not found
I'm a total beginner and want to get started with bioinformatics. I performed the aforementioned steps because that's how I setup conda to work on windows cmd.
What am I doing wrong?
In the subsystem, almost everything remains the same in your scenario just as an Ubuntu system, then you just follow the Readme file of this repository: lh3/bwa. Since the repo in Sourceforge seems to have been archived for a long time, you'd better use the newer alternative on Github.

How to uninstall the Hadoop on Mac Completely

I have installed hadoop 2.5.1 on my mac book pro through terminal but now i want to uninstall completely from my mac book pr.
so please let me know the process.
Thank you in advance.
If you have installed by downloading and extracting Hadoop tarball, then you just have to delete the extracted directory (the directory path depends on where you have extracted the tarball to on the filesystem) using command line utility like rm.
Also, if you have changed Namenode, Datanode data directories (by configuring them in hdfs-site.xml) other than the default then you have to delete those directories as well.

installing Chorus's GreenPlum on OSX

I am trying to install Chorus on OSX. So I need to install GreenPlum as described here. The doc says that I have to download the GreenPLum database and extract the greenplum-db-4.2.5.0.tar.gz tar file. So I went to the dedicated site [Pivotal][2]. This file provides some .bin file, but when I execute it I get the message Installer will only install on RedHat/CentOS x86_64. The execution of this file supposes that it provides the mentionned tar file.
So I deduce that I must get some OSX dedicated file, but the Pivotal documentation says that the tar file should be extracted (only in development mode). Perhaps I am running some wrong commands. Could someone help ?
Go to https://network.pivotal.io/products/pivotal-gpdb#files to get the actual installers you'll need for OSX. Be aware that the community edition may not support everything.

Running Apache Spark on Windows 7

I am trying to run Apache Spark on Windows 7. At first I have installed SBT by msi, then extracted files from spark-1.0.0 to program files by 7-zip. In the command line, I wrote the following:
spark-directory: sbt/sbt assembly
After a few seconds of processing, I got errors like:
-server access error: connection timed out
-could not retrieve jansi 1.1
-error during sbt execution: error retrieving required libraries
-unresolved dependency, jansi 1.1 not found
Could you please give me some advices about running Spark on Windows? I am looking for the right way because I am completely new with this technology. Regards.
You could use the pre-built spark from here
The scripts inside bin folder works in windows 7.
You need to set HADOOP_HOME variable in your path.
spark on windows for more information
If you are using building with sbt approach, then you'll need git also.
Install Scala, sbt and git on your machine. Download Spark source code and run following command
sbt assembly
In case,if you use prebuilt release,Here is the step by step process :
How to run Apache Spark on Windows7 in standalone mode

Resources