haddop/mapreduce local job directories are not deleted - hadoop

I just started using hadoop and I noticed that local job directories are not deleted.
I am using hadoop 2.2.0 on Windows .
Is there any configuration that's needed so hadoop can do the clean up of all directories under “/tmp/hadoop-/mapred/local/”?
Also, after investigating and looking in the code, I found that part of the logic is in the the class “org.apache.hadoop.mapred.LocalJobRunner” (hadoop-mapreduce-client-common-2.2.0)
try {
fs.delete(systemJobFile.getParent(), true); // delete submit dir
localFs.delete(localJobFile, true); // delete local copy
// Cleanup distributed cache
localDistributedCacheManager.close();
} catch (IOException e) {
LOG.warn("Error cleaning up "+id+": "+e);
}
Why not just use (as it's the case for systemJobFile):
localFs.delete(localJobFile.getParent(), true); // delete local copy
Is it correct to do that?
I try it and looks like it's fixing the issue, but I am not sure.
Update: I just noticed that a lot of directories "attempy_local****" are still there. Not deleted by hadoop!
Thank you.

As I have to find a quick solution and I don't like the idea to create a script to clean-up these directories, I did this patch (org.apache.hadoop.mapred.LocalJobRunner):
// line: 114
private Path localCacheJobDir;
// line: 156
this.localCacheJobDir = localFs.makeQualified(new Path(new Path(new Path(conf.getLocalPath(jobDir), user), JOBCACHE), jobid.toString()));
// line: 492
try {
fs.delete(systemJobFile.getParent(), true); // delete submit dir
final Path localJobFilePath = localJobFile.getParent();
localFs.delete(localJobFile, true); // delete local copy
// Cleanup distributed cache
localDistributedCacheManager.close();
localFs.delete(localJobFilePath, true); // delete local copy
localFs.delete(localCacheJobDir, true); // delete local copy
} catch (IOException e) {
LOG.warn("Error cleaning up "+id+": "+e);
}
I have never worked with hadoop before and I just started playing with in the last two days, so I don't know if my solution won't have any impact on hadoop. Unfortunatly this is the best solution I have.

There are some configuration keys like
mapreduce.task.files.preserve.failedtasks
in mapred config.
Anyway...
By default hadoop should clear the temporary job directory.
On success the files are moved to ${mapreduce.output.fileoutputformat.outputdir}
If things gone wrong, files are deleted.
So I'm not sure this fix for real what is happening on your installation.

Related

fs mkdir/copy intermittently fails in protected directory, despite having permissions

I have an electron app on Mac with full disk permissions. I am using fs to make a directory in a protected folder, and copy files from a temp folder to the new directory.
When using fs.copy, I periodically get two different types of errors:
If the directory already exists and is owned by the user:
EPERM errors (operation not permitted, unlink xxx) when attempting to overwrite the existing directory, specifically when replacing a manifest.json file. This is very intermittent.
If the directory does not exist or is owned by root:
EACCES errors when attempting to make the directory or copy files to the new location.
Code:
[...Array(sourceDirs.length).keys()].map(async (idx) => {
try {
await fs.ensureDir(destPaths[idx]);
}
catch (e) {
console.log('Directory does not exist and could not be created');
}
try {
await fs.copy(sourceDirs[idx], destPaths[idx]);
}
catch (e) {
console.log('Copy error:', e);
}
});
After some more research, I determined that the directory's R/W permissions varied based on what entity created the directory. Some elements of the directory and its children were owned by root, and everyone only had read permissions, while other folders were owned by everyone and had write permissions.
Programmatically, the only way to solve this was by spawning a chmod command with sudo to update the permissions. In my case, there isn't any issue with taking ownership of the directory.

Updating Tomcat WAR file on Windows

I have inherited a Maven/Tomcat(8.5) web application that is locally hosted on different machines. I am trying to get the application to update from a war file that is either stored locally (on a USB drive) or downloaded from AWS on Windows more reliably. On the Linux version, the software is able to upload the new war file via curl using the manager-script by doing the following:
String url = "http://admin:secret#localhost:8080/manager/text/deploy?path=/foo&update=true";
CommandRunner.executeCommand(new String[] { "curl", "--upload-file", war, url });
For Windows, the current current method tries to copy over the war file in the /webapps directory and has tomcat auto deploy the new war file after restarting either tomcat or the host machine. The problem is that the copied war file ends up being 0kb and there is nothing to deploy. This appears to happen outside of my install function because the file size after FileUtils.copyFile() for the /webapps/foo.war is the correct size.
I have tried to implement a PUT request for the manager-script roll from reading the Tomcat manager docs and another post:
File warfile = new File(request.getWar());
String warpath = warfile.getAbsolutePath().replace("\\", "/");
//closes other threads
App.terminate();
String url = "http://admin:secret#localhost:8080/manager/text/deploy?path=/foo&war=file:" + warpath + "&update=true";
HttpClient client = HttpClientBuilder.create().build();
HttpPut request;
try {
request = new HttpPut(url);
HttpResponse response = client.execute(request);
BufferedReader rd = new BufferedReader(
new InputStreamReader(response.getEntity().getContent()));
StringBuffer result = new StringBuffer();
String line = "";
while ((line = rd.readLine()) != null) {
result.append(line);
}
System.err.println(result.toString());
} catch (Exception e){
LOGGER.info("war install failed");
throw new ServiceException("Unable to upload war file.", e);
}
However, the process still ends up with a 0kb war file to /webapps. The only error in my log is when tomcat tries to deploy the empty war. I have tried relaxing file read/write permissions in my tomcat installation and it doesn't seem to help. I also don't know if I can guarantee that all Windows installations will have access to curl if I try that implementation. Has anyone ran into similar issues?Thank you for the help!
Will, you are openning the file for sure but you are not writing anything to it, the result string is wrote to the stdout, so you got to replace System.err.println(result.toString()); with fr = new FileWriter(file); br = new BufferedWriter(fr); br.write(result);
and don't forget to close your resources at the end (br.close(); fr.close(); ) :)
PS: check if there are some informations to be filtered (headers or staff like that)

How do I rename file in Heroku running on Spring batch?

I have a Java codebase which is being executed on a heroku dyno. The command has executed and the log after that confirms the file is changed but actually it doesn't seem to work.
Here is my code:
File file =null;
String fileName = System.getProperty("user.dir") + env.getProperty("filePath");
try
{
if (jobExecution.getStatus() == BatchStatus.COMPLETED) {
//Get the file and rename the same.
file = new File(fileName);
if (file!=null && file.exists())
{
String renameFile = System.getProperty("user.dir") + "/wardIssue_"+ new SimpleDateFormat("dd_MM_yyyy_HH_mm_ss").format(new Date().getTime()).toString() +"_completed";
logger.info("File being renamed to {}", renameFile);
file.renameTo(new File(renameFile));
}
logger.info("Batch job completed successfully");;
}
}
If you see the logger.info section actually prints the renamed file but in the server the file name is not changed.
The same code in my local is working fine i.e. file name is changed.
Should I be running the Java command for this spring batch with sudo? Are there any other things that might cause this problem?
I am using a Procfile with following command:
worker: java -Dserver.port=9002 $JAVA_OPTS -jar target/com.cognitive.bbmp.anukula.batch-0.0.1-SNAPSHOT.jar
Heroku's filesystem is ephemeral. Any changes you make to it will be lost the next time your dyno restarts, which happens frequently (at least once per day).
Furthermore, each dyno has its own ephemeral filesytem. Operations you perform on one dyno have no effect on other dynos, so you can't even make temporary filesystem changes with a worker and expect it to affect web (or any other) dynos.
You'll have to approach your problem in a way that doesn't require files to be renamed if you want to run your code on Heroku.

Cannot install search-guard - "ERROR: `elasticsearch` directory is missing in the plugin zip"

As topic states, I have a problem while trying to install search-guard plugin for my ELK stack:
[XXX#XXXX bin]$ ./elasticsearch-plugin install -b file:///home/xxxx/search-guard-6-6.2.1-21.0.zip
-> Downloading file:///home/xxxx/search-guard-6-6.2.1-21.0.zip
[=================================================] 100%  
ERROR: `elasticsearch` directory is missing in the plugin zip
I tried to do it from custom directory, then, following this answer from home, but it did not help. When I unzip the archive, I can see that there is a directory called "elasticsearch" there:
Does anyone have any suggestions how to proceed with that?
The error comes from InstallPluginCommand.class within the lib\plugin-cli-x.x.x.jar and is exactly what is says. Here's a clipped portion of the code as it's reading thru the entries in the zip file:
ZipInputStream zipInput = new ZipInputStream(Files.newInputStream(zip));
try {
ZipEntry entry;
while((entry = zipInput.getNextEntry()) != null) {
if (entry.getName().startsWith("elasticsearch/")) {
hasEsDir = true;
...
}
}
if (!hasEsDir) {
throw new UserException(2, "`elasticsearch` directory is missing in the plugin zip");
}
I realize that doesn't help you much, but as a last ditch effort, if you can't get to the root cause of the issue, 1 thing I did to get me over the hurdle was to just copy the files from the zip file into the es plugins directory (/usr/share/elasticsearch/plugins in our case). They go within /plugins, but under a directory, which is the name that Elasticsearch knows the plugin by.
The only 2 gotchas are:
You need to know the directory name to create under /plugins.
You need to know the replacement values for the plugin-descriptor.properties file.
If you can get that far, you can start ES and it should load everything fine.

Manual installation of Magento extension turns every file into a folder

I am completely baffled by this. I packaged up an extension and manually installed it on a fresh Magento instance. (Both the packaging and installing machine were running Magento 1.7). The installation went smoothly, except every single file I installed was turned into a folder, named after the file. Every single file. Has anyone run into this? Could it be a Magento bug?
I used to see this problem when I manually created a tar archive to use as a Magento Connect archive. Unfortunately, I don't have a solution, but here's what I understand about the problem.
While Magento Connect tgz packages are technically gzip compressed tar archives — the code that creates and extracts these archives in not the standard *nix tar tool. Instead, Magento implemented its own packing and unpacking tar code for Magento Connect
downloader/lib/Mage/Archive/Tar.php
Unfortunately, this packing and unpacking code hasn't been robustly tested across operating systems or against tar archives created with standard *nix tools. My problem with this code was archives created on my Mac OS system via tar wouldn't unpack correctly with Magento Connect's code on a system running linux.
Hard to track down, hard to report, hard to reproduce means hard to fix.
These directories are being created when Magento Connect unpacks the tgz file. I'm be 99% sure your directories are being created by this bit of code
#File: downloader/lib/Mage/Archive/Tar.php
if (in_array($header['type'], array("0",chr(0), ''))) {
if(!file_exists($dirname)) {
$mkdirResult = #mkdir($dirname, 0777, true);
if (false === $mkdirResult) {
throw new Mage_Exception('Failed to create directory ' . $dirname);
}
}
$this->_extractAndWriteFile($header, $currentFile);
$list[] = $currentFile;
} elseif ($header['type'] == '5') {
if(!file_exists($dirname)) {
$mkdirResult = #mkdir($currentFile, $header['mode'], true);
if (false === $mkdirResult) {
throw new Mage_Exception('Failed to create directory ' . $currentFile);
}
}
$list[] = $currentFile . DS;
These are the two locations where Magento unpacks the archives and creates a folder. For some reason, there's a certain condition on your two systems where the data is being packed, or unpacked, incorrectly in/out of the archive file. Try un-archiving the tgz file manually with a command line tool or your operating system's built in un-archive program. If weird things happen then at least you know it's the packing code that's the problem.
It's definitely a bug, and while I'd report it, the only "solution" would be to not create your archive on your local machine (which I realize is an awful solution, but Ours is not to question why and all that)
This is a bug that has been present since 1.7, due to an if comparison never evaluating to false when reading the #././LongLink header. I answered it more on this question:
https://magento.stackexchange.com/questions/5585/long-file-names-and-magento-connect-extension-packager/45187#45187
I found that issue happening when packing a Magento Extension on OS X that is linked (modman) into magento folders. Folder creation only occured on Windows systems.
Might that happen here too?
Rico
I encountered it, when for some reason my plugin file was set with suffix .gz
so it was plugin.tgz.gz
unzip it to plugin.tgz solved my issue
I think the issue is because of PHP version. I faced the same issue while installing extension on Magento 1.8.1 , but I found a fix by changing _getFormatParseHeader() function in /downloader/lib/Mage/Archive/Tar.php file.
Originally the function was :
protected static final function _getFormatParseHeader()
{
return 'a100name/a8mode/a8uid/a8gid/a12size/a12mtime/a8checksum/a1type/a100symlink/a6magic/a2version/'
. 'a32uname/a32gname/a8devmajor/a8devminor/a155prefix/a12closer';
}
I changed it to :
protected static final function _getFormatParseHeader()
{
if (version_compare(phpversion(), '5.5.0', '<') === true) {
return 'a100name/a8mode/a8uid/a8gid/a12size/a12mtime/a8checksum/a1type/a100symlink/a6magic/a2version/'
. 'a32uname/a32gname/a8devmajor/a8devminor/a155prefix/a12closer';
}
return 'Z100name/Z8mode/Z8uid/Z8gid/Z12size/Z12mtime/Z8checksum/Z1type/Z100symlink/Z6magic/Z2version/'
. 'Z32uname/Z32gname/Z8devmajor/Z8devminor/Z155prefix/Z12closer';
}
Really nasty bug.
For me it renaming my manually packed file from *.tar.gz to *.tgz solved it.
At least it worked on my ubuntu 15.04
Tested with magento 1.8
it's more likely that you choose the wrong path when adding content to your extension.
For me the bug happened when I added (non existing) files from layout/base instead from layout/base/default.

Resources