I'm trying to determine the algorithm used to compress a series of bytes... I have no idea what algorithm it is or how it works. What I do know is what the contents of the data are, both before and after its compressed.
Is there a program I can use to determine this, is the answer obvious from these small samples, or can you redirect me to some pretty good resources to figure this out?
Input = "\x00\x00"
Output = "\x78\xda\x63\x60\x00\x00\x00\x02\x00\x01"
Input = "\x00\x01\x00\x00\x00\x3C\xEA\x00\x05\x68\x65\x6c\x6c\x6f\x03"
Output = \x78\xda\x63\x60\x64\x60\x60\xb0\x79\xc5\xc0\x9a\x91\x9a\x93\x93\xcf\x0c\x00\x13\x10\x03\x44"
Input = "\x00\x0A\x00\x02\x1a\xec\xEA\x00\x0A\x62\x61\x73\x69\x6c\x61\x64\x65\x31\x32\x02\x00\x02\xe6\x0f\xEA\x00\x0B\x31\x31\x68\x6f\x74\x70\x69\x6e\x6b\x31\x31\x02\x00\x02\xee\x84\xEA\x00\x08\x73\x78\x79\x63\x61\x69\x74\x79\x02\x00\x02\xf3\x6b\xEA\x00\x09\x52\x6f\x62\x6c\x6f\x78\x31\x30\x31\x02\x00\x03\x13\xd3\xEA\x00\x0D\x62\x6c\x75\x65\x5f\x6d\x61\x66\x69\x61\x31\x32\x33\x02\x00\x03\x4c\x94\xEA\x00\x0D\x45\x76\x65\x72\x74\x6f\x6e\x20\x42\x72\x69\x74\x6f\x02\x00\x03\xb3\x96\xEA\x00\x0D\x69\x48\x65\x61\x72\x74\x43\x6f\x6f\x6b\x69\x65\x73\x02\x00\x04\xbf\x25\xEA\x00\x0B\x6a\x61\x6b\x65\x2e\x2e\x2e\x77\x68\x61\x74\x02\x00\x05\x94\x09\xEA\x00\x07\x7e\x5a\x61\x70\x70\x79\x7e\x02\x00\x06\xa9\x97\xEA\x00\x08\x4c\x75\x63\x79\x4c\x75\x63\x79\x02"
Output = "\x78\xda\x63\xe0\x62\x60\x92\x7a\xf3\x8a\x81\x2b\x29\xb1\x38\x33\x27\x31\x25\xd5\xd0\x88\x89\x81\xe9\x19\xff\x2b\x06\x6e\x43\xc3\x8c\xfc\x92\x82\xcc\xbc\x6c\x43\x43\xa0\xd0\xbb\x96\x57\x0c\x1c\xc5\x15\x95\xc9\x89\x99\x25\x95\x40\xfe\xe7\xec\x57\x0c\x9c\x41\xf9\x49\x39\xf9\x15\x86\x06\x40\x05\xcc\xc2\x97\x5f\x31\xf0\x26\xe5\x94\xa6\xc6\xe7\x26\xa6\x65\x26\x1a\x1a\x19\x03\x05\x7d\xa6\x00\x05\x5d\xcb\x52\x8b\x4a\xf2\xf3\x14\x9c\x8a\x32\x4b\xf2\x81\x82\x9b\xa7\x01\x05\x33\x3d\x52\x13\x8b\x4a\x9c\xf3\xf3\xb3\x33\x53\x8b\x99\x18\x58\xf6\xab\x02\xad\xcc\x4a\xcc\x4e\xd5\xd3\xd3\x2b\xcf\x48\x2c\x61\x62\x60\x9d\xc2\xf9\x8a\x81\xbd\x2e\x2a\xb1\xa0\xa0\xb2\x8e\x89\x81\x6d\xe5\x74\xa0\x0b\x7c\x4a\x93\x2b\x41\x98\x09\x00\x28\x9c\x3b\x2f"
That is zlib format using the Compress method it is very common.
https://www.rfc-editor.org/rfc/rfc1950
Related Answer What does a zlib header look like?
Edit
It is possible to be something else of course but this is the best place to start to decompress it.
http://www.zlib.net
Related
I have a data set that I would like to have a program analyse and solve for the algorithm that it uses. I only have the data and the algorithm is unknown. I have two inputs and it needs generates a single output. If possible I would like to be able to add new inputs and generate their output.
(Input1, Input2, Actual Output)
(542, 575, 291)
(797, 424, 202)
(529, 564, 321)
(888, 484, 523)
(789, 095, 497)
I have more data if necessary that I can provide. I don't really know how to start this because I cannot see a relation between the inputs and the output. It would be insanely helpful if someone could help find an algorithm to solve this so that I can plug in more data to get the correct outputs.
Edit: what is regressional analysis is and how I would use it? Also the data set is from a challenge set that I don’t now how to approach.
Scenario: Have to generate qr code which contains some customer information. It will be scanned in android phone.
The information have to transfer in following process.
----------------------Server side---------|| Image ||------Android-----------------------------
original data-->Encrypt--->> Compress---> ||Qr code|| --> Decompress-->>Decrypt-->original data
Everything goes well. But Text compression is not efficient.
Is any efficient way to do this?
Without having tried it, I would say that Run-Length Encoding (RLE) (http://en.wikipedia.org/wiki/Run-length_encoding) would be a nice candidate.
The main idea is that you can replace a run of identical symbols with it's length.
So, if you have the 0 and 1 symbols for the pixels (dunno, if they are called like that on a QRcode), then one line of the qr code which would resemble something like this:
000000000000111111011111111111111110000000001 would be compressed to this:
12,6,1,16,9,1
I am a novice Go lang programmer,trying to learn Go lang features.I wanted to split a large csv file into multiple files in GO lang, each file containing the header.How do i do this? I have searched everywhere but couldnt get the right solution.Any help in this regard will be greatly appreciated.
Also please suggest me a good book for reference.
Thanking You
Depending on your shell fu this problem might be better suited for common shell utilities but you specifically mentioned go.
Let's think through the problem.
How big is this csv file? Are we talking 100 lines or is it 5G ?
If it's smallish I typically use this:
http://golang.org/pkg/io/ioutil/#ReadFile
However, this package also exists:
http://golang.org/pkg/encoding/csv/
Regardless - let's return to the abstraction of the problem. You have a header (which is the first line) and then the rest of the document.
So what we probably want to do (if ignoring csv for the moment) is to read in our file.
Then we want to split the file body by all the newlines in it.
You can use this to do so:
http://golang.org/pkg/strings/#Split
You didn't mention but do you know how many files you want to split by or would you rather split by the line count or byte count? What's the actual limitation here?
Generally it's not going to be file count but if we pretend it is we simply want to divide our line count by our expected file count to give lines/file.
Now we can take slices of the appropriate size and write the file back out via:
http://golang.org/pkg/io/ioutil/#WriteFile
A trick I use sometime to help think me threw these things is to write down our mission statement.
"I want to split a large csv file into multiple files in go"
Then I start breaking that up into pieces but take the divide/conquer approach - don't try to solve the entire problem in one go - just break it up to where you can think about it.
Also - make gratiutious use of pseudo-code until you can comfortably write the real code itself. Sometimes it helps to just write a short comment inline with how you think the code should flow and then get it down to the smallest portion that you can code and work from there.
By the way - many of the golang.org packages have example links where you can literally run in your browser the example code and cut/paste that to your own local environment.
Also, I know I'll catch some haters with this - but as for books - imo - you are going to learn a lot faster just by trying to get things working rather than reading. Action trumps passivity always. Don't be afraid to fail.
Here is a package that might help. You can set a necessary chunk size in bytes and a file will be split on an appropriate amount of chunks.
I have over 120 .txt files (all named like s1.txt, s2.txt, ..., s120.txt) that I need to convert to ASCII extension to use in MATLAB.
my .txt (comma , delimited .txt) files look like the following:
20080102,43.0300,3,9.493,569.567,34174.027,34174027
20080102,43.0600,3,9.498,569.897,34193.801,34193801
In MATLAB I wish to use code similar to the following:
for i = svec;
%# where svec = [1 2 13 15] some random number between 1 and 120.
eval(['load %mydirectory', eval(['s',int2str(i)]),'.ascii']);
end;
If I am not mistaken I can't use the above command with .txt files and therefore I must use ASCII files.
Since I have a lot of files to convert and they are large in size, is there a quick way to convert all my files via MATLAB, or perhaps there is a great converting software available for Mac on the web? Would anyone have a better suggestion than using the code above?
Adding to nrz's answer:
I'm not sure what you want to do exactly, but know that you can open any file in MATLAB, both as text (ASCII) or in binary mode. The latter can be achieved using fread.
As a side note, you also asked for a better suggestion for your code.
Well, what did you try to achieve with the two eval invocations? Why not call the commands directly? Do this instead:
for i = svec
load (['%mydirectory\s', int2str(i), '.txt'], '-ascii');
end
I also took the liberty to add a backslash that I think you had omitted.
In most cases, you'd be better off without using eval. Check the alternatives...
Can you show an example file? Not every text file is valid for load command. If your file is not in a valid format, changing the extension part of filename from .txt to .ascii doesn't help at all. Instead, in that case the data must be either converted to a valid format for load command or, alternatively, loaded into MATLAB by some other means eg. by using fscanf or xlsread. File structure is needed for both ways to solve this.
See also load command in matlab loading blank file.
A slightly cleaner way:
for i=1:120
fname = fullfile('mydirectory', sprintf('s%d.txt',i));
X = load(fname, '-ascii');
end
Is there a good way to see what format an image is, without having to read the entire file into memory?
Obviously this would vary from format to format (I'm particularly interested in TIFF files) but what sort of procedure would be useful to determine what kind of image format a file is without having to read through the entire file?
BONUS: What if the image is a Base64-encoded string? Any reliable way to infer it before decoding it?
Most image file formats have unique bytes at the start. The unix file command looks at the start of the file to see what type of data it contains. See the Wikipedia article on Magic numbers in files and magicdb.org.
Sure there is. Like the others have mentioned, most images start with some sort of 'Magic', which will always translate to some sort of Base64 data. The following are a couple examples:
A Bitmap will start with Qk3
A Jpeg will start with /9j/
A GIF will start with R0l (That's a zero as the second char).
And so on. It's not hard to take the different image types and figure out what they encode to. Just be careful, as some have more than one piece of magic, so you need to account for them in your B64 'translation code'.
Either file on the *nix command-line or reading the initial bytes of the file. Most files come with a unique header in the first few bytes. For example, TIFF's header looks something like this: 0x00000000: 4949 2a00 0800 0000
For more information on the TIFF file format specifically if you'd like to know what those bytes stand for, go here.
TIFFs will begin with either II or MM (Intel byte ordering or Motorolla).
The TIFF 6 specification can be downloaded here and isn't too hard to follow