ZlibDecompressor throws incorrect header check exception - hadoop

I'm using ZlibDecompressor in Hadoop, but I'm getting incorrect header check exception.
This is how I instantiate it
ZlibDecompressor inflater = new ZlibDecompressor(ZlibDecompressor.CompressionHeader.DEFAULT_HEADER,1024);
inflater.setInput(bytesCompressed, 0, bytesCompressed.length);
And here is how I use it for decompression
inflater.decompress(bytesDecompressedBuffer,0,bufferSizeInBytes);
I'm using Hadoop 0.20.2.
What could be the problem and how to solve it?
Thanks
d31efcf42e83e76d3df76d38db5d3c141f76135e7417de41d44dc50b507a07b03a07a03ad40f75db7f00038d7df02177db9dbbd01f02e35ef7eb60f6f77dfaebde3a0b7f75036d41dc3dc00c4e40136e3b044e83ec5d35f01044f050841011000c0df4d3ae40ec1079078101f02dfcd40dfbef9df5ec4db8e45d37d85102d350b8001d79f7de8303ce7a045efdd75e35dfc03b036f3c0f5e43034d78dfadb9e7ad7d0750c10c30bce7a103d04ef4000dbde01dfdf7a0c20b907df7def9d80137ef8

The problem reported is that there is no valid zlib header in the first two bytes. The problem with the data is that it does not appear to have any deflate-compressed data anywhere in it, regardless of whether such data could be zlib wrapped, gzip wrapped, or raw.

Related

Loading protocol buffer in ruby or java similar to node

I have a .proto file that contains my schema and service definition. I'm looking for a method in ruby/java that is similar to how Node loads and parses it (code below). Looking at the grpc ruby gem, I don't see anything that can replicate how Node does it.
Digging around I see this (https://github.com/grpc/grpc/issues/6708) which states that dynamically loading .proto files is only available in node. Hopefully, someone can provide me with an alternative.
Use case: Loading .proto files dynamically as provided in the client but I can only use either ruby or java to do it.
let grpc = require("grpc");
let loader = require("#grpc/proto-loader");
let packageDefinition = loader.loadSync(file.file, {});
let parsed = grpc.loadPackageDefinition(packageDefinition);
I've been giving this a try for the past few month and it seems like Node is the only way to read a protobuf file on runtime. Hopefully this helps anyone in the future that needs it.

Sequence file reading issue using spark Java

i am trying to read the sequence file generated by hive using spark. When i try to access the file , i am facing org.apache.spark.SparkException: Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException:
I have tried the workarounds for this issue like making the class serializable, still i face the issue. I am writing the code snippet here , please let me know what i am missing here.
Is it because of the BytesWritable data type or something else which is causing the issue.
JavaPairRDD<BytesWritable, Text> fileRDD = javaCtx.sequenceFile("hdfs://path_to_the_file", BytesWritable.class, Text.class);
List<String> result = fileRDD.map(new Function<Tuple2<BytesWritables,Text>,String>(){
public String call (Tuple2<BytesWritable,Text> row){
return row._2.toString()+"\n";
}).collect();
}
Here is what was needed to make it work
Because we use HBase to store our data and this reducer outputs its result to HBase table, Hadoop is telling us that he doesn’t know how to serialize our data. That is why we need to help it. Inside setUp set the io.serializations variable
You can do it in spark accordingly
conf.setStrings("io.serializations", new String[]{hbaseConf.get("io.serializations"), MutationSerialization.class.getName(), ResultSerialization.class.getName()});

appending to existing sequence file

In my use case, I need a find a way to append key/value pairs to the existing sequence file. How to do it? Any clue would be greatly helpful. I am using hadoop 2x.
Also, I came across the below documentation. Can anyone tell me how to use this to append?
public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileContext fc,
Configuration conf,
Path name,
Class keyClass,
Class valClass,
org.apache.hadoop.io.SequenceFile.CompressionType compressionType,
CompressionCodec codec,
org.apache.hadoop.io.SequenceFile.Metadata metadata,
EnumSet createFlag,
org.apache.hadoop.fs.Options.CreateOpts... opts)
throws IOException
Construct the preferred type of SequenceFile Writer.
Parameters:
fc - The context for the specified file.
conf - The configuration.
name - The name of the file.
keyClass - The 'key' type.
valClass - The 'value' type.
compressionType - The compression type.
codec - The compression codec.
metadata - The metadata of the file.
**createFlag - gives the semantics of create: overwrite, append etc.**
opts - file creation options; see Options.CreateOpts.
Returns:
Returns the handle to the constructed SequenceFile Writer.
Throws:
IOException
UPDATE: The issue HADOOP-7139 now it's closed and from version 2.6.1 / 2.7.2 It's possible to append to an existing SequenceFile :)
(I was using version 2.7.1 and looking for append to a SequenceFile, so I downgraded to 2.6.1 because version 2.7.2 it's not still out)
It's still not possible to append to an existing Sequence File.
There is an open issue to work on that but it's still unresolved.

DAMAGE: after Normal block in deleting SAXParser xerces

I am working on an old MFC application which uses xerces 2.7 for XML parsing.
In debug mode, while trying to debug a stack corruption, I have been able to narrow down the issue to the following code:
BOOL CXMLHandler::LoadFile(CString fileName)
{
XMLPlatformUtils::Initialize();
SAXParser* parser = new SAXParser();
delete parser;
XMLPlatformUtils::Terminate();
return TRUE;
}
while deleting the parser, I get the error
"DAMAGE: after Normal block (#1695) at 0x0795EEA8."
the SAXParser class is from xerces.
I cannot figure out what is wrong with the code. Can anyone help in finding out what is wrong here. Could a memory leak/corruption elsewhere in the code be causing this?
If that #1695 is the same each time you run add the following to the start of the program:
_CrtSetBreakAlloc(1695);
Allocation number 1695 is the data that has been damaged. The debugger will halt there.

A null reference or invalid value was found [GDI+ status: InvalidParameter]

I am running an MVC3 app on Mono/linux and everything is working fine with the exception of an image upload utility. Whenever an image upload is attempted i get the Invalid Parameter error from within the method below:
System.Drawing.GDIPlus.CheckStatus(status As Status) (unknown file): N 00339
System.Drawing.Bitmap.SetResolution(xDpi As Single, yDpi As Single)
I've googled this error extensively and have found that the Invalid parameter error can often be misleading, and could fire if for example there was an error with the upload itself, or if the image was not fully read. This runs fine on IIS/Windows, but I've not been able to get it to work in Mono.
Apache2
Mono 2.10.8.1
Am i missing something simple or do i need to find a different way to handle image manipulation for mono?
After doing quite a bit of testing, I was able to determine the root of my error. I was attempting to use the Image.HorizontalResolution and Image.VerticalResolution properties for Bitmap.Resolution . While these properties were set on the initial upload (where the file is read into a stream from the tmp directory), when i posted back with the base64 encoded string of the image itself, it appears these values were lost somehow. Because of this the SetResolution method failed.
For whatever reason i do not have this issue on IIS/Windows, the properties exist in both circumstances.
I encountered a similar issue. A Bitmap loaded from disk, reported bmp.HorizontalResolution==0 and bmp.VerticalResolution==0 when they were both==300. This behaviour does not occur on Windows.
A bit more digging, I found that the following test fails:
[Test]
public void GDI_SetResoltion()
{
var b1 = new Bitmap (100, 100);
Assert.That (b1.HorizontalResolution, Is.Not.EqualTo (0));
Assert.That (b1.VerticalResolution, Is.Not.EqualTo (0));
}
I believe Windows will default resolution to 96 dpi.

Resources