Missing data when decompressing zlib data with Ruby - ruby

Well I have a deflated json encoded in a request log and I need to decompress it.
I tried to use zlib, i.e:
Zlib::Inflate.new(-Zlib::MAX_WBITS).inflate(File.read("PATH_OF_FILE"))
It shows only a part of the JSON. Something like:
"{\"seq\":53,\"app_id\":\"567067343352427\",\"app_ver\":\"10.3.2\",\"build_num\":\"46395473\",\"device_id\":\"c12f541a-5936-4477-b6fc-653db675d16"
There is a lot of missing data because the deflated data is too big.
Deflate full data:
Check it here.
After testing, I figured out that only this part is being decompressed:
Check it here.
Well I'm a bit confused with this. Could someone could please help me with it?

Related

How to fix UTF-8 decoded with ISO-8859-1 in Redshift

I assumed a dataset was ISO-8859-1 encoded, while it was actually encoded in utf-8.
I wrote a python script where i decoded the data with ISO-8859-1 and wrote it into a redshift sql database.
I wrote the messed up characters into the redshift table, the decoding did not happen while writing into the table. (used python and pandas with wrong encoding)
Now the datasource is not available anymore but the data in the table has a lot of messed up characters.
E.g. 'Hello Günter' -> 'Hello GĂŒnter'
What is the best way to resolve this issue?
Right now i can only think of collecting a complete list of messed up characters and their translation, but maybe there is a way i have not thought of.
So my questions:
First of all i would like to know if information was lost when the decoding happened..
Also i would like to know if there might be a way in redshift to solve such a decoding issue. Finally i have been searching for a complete list, so i do not have to create it myself. I could not find such list.
Thank you
EDIT:
I pulled a part of the table and found out i have to do the following thing:
"Ð\x97амÑ\x83ж вÑ\x8bÑ\x85оди".encode('iso-8859-1').decode('utf8')
The table has billions of rows, would it be possible to do that in redshift?

I have a telephony log in base64 on a mac that I can't make sense of

I'm digging into a log file of telephony data on a mac, there are a few entries that are intelligible plaintext but most of it is base64 and without knowing what it originally represented I haven't been able to figure out how to decode it into anything meaningful. They're 108-character blocks that I am completely certain are base64 (all the right characters for base64 and none that aren't, end in equals signs), but I am at a loss as to how to get anything useful out of them.
Someone previously was able to use this data productively, but how isn't documented. Does anyone have an idea what it would have been before it was base64 or how to get it back into a usable format?
Why don't you try a Python script?
There is a post that can help you:
Python base64 data decode
Check it out! There is an answer that can really help you.
If you don't know how to use python, here is an official Beginner's Guide:
https://www.python.org/about/gettingstarted/
Download it from here:
https://www.python.org/downloads/mac-osx/
I would write a Python program like this:
import base64
file = open('yourlog.log','r')
outputfile = open('result.log','wb')
for line in file:
decoded_line = base64.b64decode(line)
outputfile.write(decoded_line)
file.close()
outputfile.close()
print('Finished!')

Ruby unpack file compressed in snappy framing format *.sz

I need to unpack snappy *.sz files in Ruby.
Format specification is here:
https://github.com/google/snappy/blob/master/framing_format.txt
I have found 2 gems so far.
https://github.com/miyucy/snappy - seems to be completely useless.
https://github.com/willglynn/snappy-ruby - is able to unpack separate snappy chunks but not the whole framing snappy file.
QUESTION:
Is there a working ruby gem that would allow me to do something like:
framing_snappy.unpack('filename.sz')
or the only way is write own code that will parse bytes and mess with bitwise shifts?
Just in case someone is facing similar issue.
I finally came up with this code and it seems to be working.

How to convert hadoop sequence file to json format?

As the name suggests, I'm looking for some tool which will convert the existing data from hadoop sequence file to json format.
My initial googling have only shown up results related to jaql, which I'm desperately trying to get to work.
Is there any tool from Apache available for this very purpose?
NOTE:
I've hadoop sequence file sitting on my local machine and would like to get data in corresponding json format.
So in-effect, I'm looking for some tool/utility which will take hadoop sequence file as input and produce output in json format.
Thanks
Apache Hadoop might be a good tool for reading sequence files.
All kidding aside, though, why not write the simplest possible Mapper java program that uses, say, Jackson to serialize each key and value pair it sees? That would be a pretty easy program to write.
I thought there must be some tool which will do this given that its such common requirement. Yes, it should be pretty easy to code but again why to do so if you already have something which does just the same.
Anyway, I figured out to do it using jaql. Sample working query which worked for me,
read({type: 'hdfs', location: 'some_hdfs_file', inoptions: {converter: 'com.ibm.jaql.io.hadoop.converter.FromJsonTextConverter'}});

How do I get Zlib to compress to a stream in Ruby?

I’m trying to upload files to Amazon S3 using AWS::S3, but I’d like to compress them with Zlib first. AWS::S3 expects its data to be a stream object, i.e. you would usually upload a file with something like
AWS::S3::S3Object.store('remote-filename.txt', open('local-file.txt'), 'bucket')
(Sorry if my terminology is off; I don’t actually know much about Ruby.) I know that I can zlib-compress a file with something like
data = Zlib::Deflate.deflate(File.read('local-file.txt'))
but passing data as the second argument to S3Object.store doesn’t seem to do what I think it does. (The upload goes fine but when I try to access the file from a web browser it doesn’t come back correctly.) How do I get Zlib to deflate to a stream, or whatever kind of object S3Object.store wants?
I think my problem before was not that I was passing the wrong kind of thing to S3Object.store, but that I was generating a zlib-compressed data stream without the header you’d usually find in a .gz file. In any event, the following worked:
str = StringIO.new()
gz = Zlib::GzipWriter.new(str)
gz.write File.read('local-file.txt')
gz.close
AWS::S3::S3Object.store('remote-filename.txt', str.string, 'bucket')

Resources