Spark on Windows - What exactly is winutils and why do we need it? - hadoop

I'm curious! To my knowledge, HDFS needs datanode processes to run, and this is why it's only working on servers. Spark can run locally though, but needs winutils.exe which is a component of Hadoop. But what exactly does it do? How is it, that I cannot run Hadoop on Windows, but I can run Spark, which is built on Hadoop?

I know of at least one usage, it is for running shell commands on Windows OS. You can find it in org.apache.hadoop.util.Shell, other modules depends on this class and uses it's methods, for example getGetPermissionCommand() method:
static final String WINUTILS_EXE = "winutils.exe";
...
static {
IOException ioe = null;
String path = null;
File file = null;
// invariant: either there's a valid file and path,
// or there is a cached IO exception.
if (WINDOWS) {
try {
file = getQualifiedBin(WINUTILS_EXE);
path = file.getCanonicalPath();
ioe = null;
} catch (IOException e) {
LOG.warn("Did not find {}: {}", WINUTILS_EXE, e);
// stack trace comes at debug level
LOG.debug("Failed to find " + WINUTILS_EXE, e);
file = null;
path = null;
ioe = e;
}
} else {
// on a non-windows system, the invariant is kept
// by adding an explicit exception.
ioe = new FileNotFoundException(E_NOT_A_WINDOWS_SYSTEM);
}
WINUTILS_PATH = path;
WINUTILS_FILE = file;
WINUTILS = path;
WINUTILS_FAILURE = ioe;
}
...
public static String getWinUtilsPath() {
if (WINUTILS_FAILURE == null) {
return WINUTILS_PATH;
} else {
throw new RuntimeException(WINUTILS_FAILURE.toString(),
WINUTILS_FAILURE);
}
}
...
public static String[] getGetPermissionCommand() {
return (WINDOWS) ? new String[] { getWinUtilsPath(), "ls", "-F" }
: new String[] { "/bin/ls", "-ld" };
}

Though Max's answer covers the actual place where it's being referred. Let me give a brief background on why it needs it on Windows -
From Hadoop's Confluence Page itself -
Hadoop requires native libraries on Windows to work properly -that
includes accessing the file:// filesystem, where Hadoop uses some
Windows APIs to implement posix-like file access permissions.
This is implemented in HADOOP.DLL and WINUTILS.EXE.
In particular, %HADOOP_HOME%\BIN\WINUTILS.EXE must be locatable
And , I think you should be able to run both Spark and Hadoop on Windows.

Related

Spring boot controlling start up of application

How can I control programmatically start up of app depending of config server? I have sh script for controlling on docker-compose up, but I’m wondering can I control it programly in the spring app.
Regards
If you are on Ubuntu then the following solution will work flawless. If you are on a different distro and if ps -A doesn't work, then kindly find the equivalent one. Now, talking about the strategy how it is done below. I wanted my application to be started by a script as it contains some JVM arguments. So when my application will be running, the script will also be in the active process list. After the startup is completed, the application finds whether my script file is present in the active process list or not. If it's not there then it shuts down the application. The following example will probably give you an idea on how to implement the same with the config server.
#Autowired
private ApplicationContext context;
#EventListener(ApplicationReadyEvent.class) // use in production
public void initiateStartup() {
try {
String shProcessName = "root-IDEA.sh";
String line;
boolean undesiredStart = true;
Process p = Runtime.getRuntime().exec("ps -A");
InputStream inputStream = p.getInputStream();
InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader);
while ((line = bufferedReader.readLine()) != null) {
if (line.contains(shProcessName)) {
undesiredStart = false;
break;
}
}
bufferedReader.close();
inputStreamReader.close();
inputStream.close();
if (undesiredStart) {
System.out.println("--------------------------------------------------------------------");
System.out.println("APPLICATION STARTED USING A DIFFERENT CONFIG. PLEASE START USING THE '.sh' FILE.");
System.out.println("CLOSING APPLICATION");
System.out.println("--------------------------------------------------------------------");
// close the application
int exitCode = SpringApplication.exit(context, (ExitCodeGenerator) () -> 0);
System.exit(exitCode);
}
System.out.println("APPLICATION STARTED CORRECTLY");
} catch (IOException e) {
e.printStackTrace();
}
}

Is there a way to Alias a dead network path to a local directory in Windows 7?

I have a bunch of old Batch scripts that I may need to revive that have hundreds of references to a dead specific network path. Is there a way to alias \\myNetWorkPath.com\SomeFolder\SomeFolder2 to a specific local Windows 7 directory?
For example, \\myNetWorkPath.com\SomeFolder\SomeFolder2 alias to C:\SomeFolder2.
Again, \\myNetWorkPath.com\SomeFolder\SomeFolder2 is a dead (not working anymore) network path.
Please let me know if that doesn’t make any sense.
Thanks!
Following up on my "Pick a language and write a quick and dirty application that will change your code base." comment... Here's a bit of c# that could get your going.
static void Main(string[] args)
{
//foreach file you drop onto the compiled EXE
foreach (string item in args)
{
if (System.IO.File.Exists(item))//if the file path actually exists
{
ChangeThePath(item);
}
}
}
private static void ChangeThePath(string incomingFilePath)
{
string backupCopy = incomingFilePath + ".bck";
System.IO.File.Copy(incomingFilePath, backupCopy);//make a backup
string newPath = "c:\\This\\New\\Path\\Is\\Much\\Better";
string oldPath = "c:\\Than\\This\\Deprecated\\One";
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(incomingFilePath))
{
using (System.IO.StreamReader sr = new System.IO.StreamReader(backupCopy))
{
string currentLine = string.Empty;
while ((currentLine = sr.ReadLine()) != null)
{
sw.WriteLine(currentLine.Replace(oldPath, newPath));
}
}
}
}

java 8 - some error with compiling lambda function

public class GrammarValidityTest {
private String[] dataPaths = new String[] {"data/", "freebase/", "tables/", "regex/"};
#Test(groups = {"grammar"})
public void readGrammars() {
try {
List<String> successes = new ArrayList<>(), failures = new ArrayList<>();
for (String dataPath : dataPaths) {
// Files.walk(Paths.get(dataPath)).forEach(filePath -> {
try {
if (filePath.toString().toLowerCase().endsWith(".grammar")) {
Grammar test = new Grammar();
LogInfo.logs("Reading grammar file: %s", filePath.toString());
test.read(filePath.toString());
LogInfo.logs("Finished reading", filePath.toString());
successes.add(filePath.toString());
}
}
catch (Exception ex) {
failures.add(filePath.toString());
}
});
}
LogInfo.begin_track("Following grammar tests passed:");
for (String path : successes)
LogInfo.logs("%s", path);
LogInfo.end_track();
LogInfo.begin_track("Following grammar tests failed:");
for (String path : failures)
LogInfo.logs("%s", path);
LogInfo.end_track();
assertEquals(0, failures.size());
}
catch (Exception ex) {
LogInfo.logs(ex.toString());
}
}
}
The line beginning with // is the one that brings up the error -"illegal start of expression" starting at the '>' sign.
I do not program much in java. I just downloaded a code from somewhere that is quite popular and supposed to run but I got this error. Any help/fixes/explanation would be appreciated.
Run javac -version and verify that you are actually using the compiler from JDK8, it's possible that even if your java points to the 1.8 releaase, your javac has a different version.
If you are using Eclipse, remember to set the source type for your project to 1.8.
Edit:
Since you are using ant, verify that your JAVA_HOME environment variable points to your jdk1.8 directory.

Receive partial file(sometimes) when reading from Google Storage using HTTP Response

I am trying to read files from Google Storage and write it to files in our filesystem (HDFS). If i run it for a period of time (lets say 7 days), sometimes i get the full file with lines matching with whats on the source and sometimes i get partial files (discrepancy is quite large). I am pasting below the method that takes a response and writes it to a file.
Any help or suggestions as to how i can troubleshoot this further would be much appreciated.
Thanks,
Before calling this method i do a simple check on the response status code -
if(response.getStatusCode() == 200 &&
StringUtils.equals(response.getContentType(), "application/zip")) {
writeHdfsFile(response, path);
}
private void writeHdfsFile(HttpResponse response, String path) throws IOException {
final GZIPInputStream inputStream = new GZIPInputStream(response.getContent());
Path filePath = new Path(path);
final FSDataOutputStream outputStream = fileSystem.create(filePath, true);
final byte[] buffer = new byte[1024];
int length;
try {
while((length = inputStream.read(buffer)) > 0) {
outputStream.write(buffer, 0, length);
}
outputStream.flush();
} finally {
inputStream.close();
outputStream.close();
}
}
The way we solved it was downloading the file first and then unzipping and writing it. Basically, splitting it into two steps solved that issue. If someone else ran into the same issue..

Cannot read two consecutive files with a Windows Service using StreamReader object

I need to be able to read lines of a file with a StreamReader processed by a FileSystemWatcher in a Windows service.
I've read and tried everything that made sense online, but it still doesn't work. When I'm attahced to my Windows service process (local machine using Visual Studio 2010), the whole thing works flawlessly!
When I try to run it (on my local machine) without attaching to it and debugging it, the second file never makes it through and I get the following msg:
"The process cannot access the file 'C:\Projects\Data\VendingStats\20121213_AZM_Journey_MIS.txt' because it is being used by another process." I do not have this file open anywhere else on my machine. It is just sitting in a directory. I then copy it in a directory and the FSW takes over (and the code below).
Can someone please tell me what I need to do to get this to work? I don't know why it works fine when I'm attached to and debugging it, but it doesn't work when I send the files through without being attached and debugging it. I feel it's defeintiely something on my local box that I need to disable, etc --- I don't know.....
I noticed that the error occurs even before it gets into the "using" statement, because the second file is never copied to the temp directory for it to be processed.
I noticed in my StackTrace, I'm getting the following error:
system.io.__error.winioerror(int32 errorcode string maybefullpath)
Here is my code:
protected override void OnStart(string[] args)
{
FileSystemWatcher Watcher = new FileSystemWatcher(#"C:\Projects\Data\VendingStats");
Watcher.EnableRaisingEvents = true;
Watcher.Created += new FileSystemEventHandler(Watcher_Created);
Watcher.Filter = "*.txt";
Watcher.IncludeSubdirectories = false;
}
private void Watcher_Created(object sender, FileSystemEventArgs e)
{
try
{
string targetPath = #"C:\Temp\VendorStats";
// Use Path class to manipulate file and directory paths.
FileInfo fi = new FileInfo(e.FullPath); // full name of path & file in the FSW directory
string destFile = Path.Combine(targetPath, fi.Name);
// To copy a folder's contents to a new location:
// Create a new target folder, if necessary.
if (!Directory.Exists(targetPath))
Directory.CreateDirectory(targetPath);
// To copy a file to another location and
File.Copy(e.FullPath, destFile, true);
// Set attribute to READONLY
if (fi.IsReadOnly == false)
fi.Attributes = FileAttributes.ReadOnly;
GetCruiseLineShipName(destFile, ref cruiseLine, ref shipName);
using (StreamReader sr = new StreamReader(File.Open(destFile, FileMode.Open, FileAccess.Read, FileShare.Read)))
{
filename = e.FullPath;
//How many lines should be loaded?
int NumberOfLines = 39;
//Read the number of lines and put them in the array
for (int i = 1; i < NumberOfLines; i++)
{
ListLines[i] = sr.ReadLine();
switch (i)
{
case 3:
int idx = ListLines[i].IndexOf(":");
string timeLine = ListLines[i].Substring(idx + 1);
dt = GetDate(Convert.ToDateTime(timeLine.Substring(1)));
break;
//more code here of the same
}
}
//InsertData into database }
}
catch (Exception ex)
{
EventLog.WriteEntry("VendorStats", "Error in the Main:" + "\r\n\r\n" + ex.Message + "\r\n\r\n" + ex.InnerException);
return;
}
}
The bottom line to solving this was to put the method (that was spawned by the FileSystemWatcher) to sleep for "X" amount of seconds until Windows completely releases the resources to the previous and present files as well as the folder.
It was the FileSystemWatcher that actaully had a hold on the resources.
Here is some sample code:
private static void Watcher_Created(object sender, FileSystemEventArgs e)
{
try
{
Thread.Sleep(10000);
GetCruiseLineShipName(e.FullPath, ref cruiseLine, ref shipName);
using (StreamReader sr = new StreamReader(File.Open(e.FullPath, FileMode.Open, FileAccess.Read, FileShare.Read)))
{

Resources