Are VTK files endian independent when read in visualization softwares such as Paraview? - endianness

I am working on a file whose endian is different from my desktop and I need to convert it, but When I visualized the vtk it worked. So are vtkreaders of vtk files endian independent

ParaView can read VTK files written using either big- or little-endian byte ordering and will try to work out which format has been used when writing a given file. From the VTK File format documentation:
Binary Files.
Binary files in VTK are portable across different computer systems as long as you observe two conditions.
First, make sure that the byte ordering of the data is correct, and second, make sure that the length of each data type is consistent. Most of the time VTK manages the byte ordering of binary files for you. When you write a binary file on one computer and read it in from another computer, the bytes representing the data will be automatically swapped as necessary. For example, binary files written on a Sun are stored in big endian order, while those on a PC are stored in little endian order. As a result, files written on a Sun workstation require byte swapping when read on a PC. (See the class vtkByteSwap for implementation details.) The VTK data files described here are written in big endian form.
Note, that if you are using the newer XML VTK file formats you can specify the byte ordering explicitly in the file, see, for example, the VTK wiki for more information on this. Basically the VTKFile element has the form
<VTKFile type="..." version="version" byte_order="byte-order" ...>
...rest of file...
</VTKFile>
where the byte_order specified:
may be either "LittleEndian" or "BigEndian" and indicates the byte order used for any binary data in the file.
It should be emphasied that this is only a issue for binary data. Legacy ASCII VTK files do not care about byte ordering.

Related

Hex - Search by bytes to get Offset & Search by Offset to get Bytes

just currently prototyping a little software and currently stuck. I'm trying to create a little program that'll edit a .bin file, and for this I will need to do the following:
Get Bytes by Searching for Offset
Get Offset by searching for Bytes
Write/Update .bin file
I usually use the program HxD to do this manually, but want to get a small automated process in place.
Using hex.EncodeToString returns what I want as the output (Like HxD) however I can't find a way to search for the values by bytes and offests
Could anyone help or have suggestions?
OK, "searching of an offset" is a misnomer because if you have an offset and a medium which supports random access, you just "seek" the known offset there; for files, see os.File.Seek.
Searching is more complex: it consists of converting the user input into something searchable and, well, the searching itself.
Conversion is the process of translation of the human operator's imput to a slice of bytes — for instance, you'd need to convert a string "00 87" to a slice of bytes, []byte{00, 87}.
Such conversion can be done using, say, encoding/hex.Decode after removing any whitespace, which can be done using a multitude of ways.
Searching the file given a slice of bytes can be either simple of complex.
If a file is small (a couple megabytes, on today's hardware), you can just slurp it into memory (for instance, using io.ReadAll) and do a simple search using bytes.Index.
If a file is big, the complexity of the task quickly escalates.
For instance, you could read the file from its beginning to its end using chunks of some sensible size and search for your byte slice in each of them.
But you'd need to watch out for two issues: the slice to search should be smaller than each of such chunks, and two adjacent chunks might contain the sequence to be found positioned right across their "sides" — so that the Nth chunk contains the first part of the pattern at its end and the N+1th chunk contains the rest of it at its beginning.
There exist more advanced approaches to such searching — for instance, using so-called "memory-mapped files" but I'd speculate it's a bit too early to tread these lands, given your question.

How can I find the encryption/encoding method used on a file given its original version

I have two image files. One is a regular picture [here], while the other is it's foreignly modified counterpart retrieved from a remote server [here]. I have no idea what the server did to this image or the rest of them stored on the server, but they are obviously modified in some way because the second file cannot be read. They are both the exact same picture, even the same byte count, but I can't figure out how to reverse whatever was done. What should I be trying?
Note: The modified file was retrieved from a packet capture as an octet stream. I wrote the raw binary to a file and then base64 decoded it.
Unlike the original JPEG file, the encrypted data is very "random" — every byte value from 0 to 255 appears with almost exactly the same probability. This rules out the possibility of a transposition cipher.
Also, the files are exactly the same length (3,018,705 bytes), which makes it unlikely that a block cipher (like DES) was used.
So that makes a stream cipher (like RC4) the most likely candidate. If this is the case, you can obtain the keystream simply by XORing each byte of the two files together. However, you might find it difficult to figure out the cryptographic key from this data. Good luck with that :-)

How to first check files on equality before doing a byte by byte comparison?

I am writing a program that compare a lot of files.
I first group files by filesize. Then I check them byte by byte between grouped files. What params or propeties can I check before byte by byte comparsion to minimize using it?
Upd:
To get check sum i need to read entire file. I seek some property that can filter unequal files. I forgot to say that i need 100% equal of files. Hash functions have collision.
If the files are recorded as being the same size by the operating system then there is no way to know if they are different other than checking bytes.
For a group of files, once two files are known to be the same, then the comparison only needs to be done for one of the two. It would be wise to sort the files in a group by date for this reason, on the theory that files with similar dates are more likely to be identical. Thus, you should maintain lists of identical files. When a new comparison is done it need only be compared to the head of the list.
You should allocate as much memory as possible up front and keep the list heads in memory.
When the comparison is being done you should not actually compare bytes, but words. For example, on a 32-bit machine you would read data in 512-byte blocks from the hard drive and then each block would be compared 4-bytes at a time. Newer x86 processors have vectorized op instructions called MMX. You want to be sure you are using those.
If you are writing in C for an Intel box, use Intel's compiler, not Microsoft's. Double check the assembly to make sure the compiler is not doing something stupid.
You can also increase the speed of the work by parallelizing it. This is done by creating threads. For example, if the code is running on a quad core machine you create 4 threads and divide the work among the 4 threads.
Check file's checksum. It was mend for this task
For Python you can use hashlib. For C you can use, for example, md5 from openssl. There are similar functions for php, MySQL, and probably for every other programming language
Eventually you can use linux built-in md5sum

endian.h for floats

I am trying to read an unformatted binary file, that was written by a big endian machine. My machine is a 32bit little endian.
I already know how to swap bytes for different variable types, but it is a cumbersome work. I found this set of functions endian.h that handle integer swapping very easily.
I was wondering if there is something similar for floats or strings, or if I have to program it from scratch? Since they are handled differently for this endianness problem as integers.
Thanks.
I do not think there is a standart header for swapping floats. You could take a look at http://www.gamedev.net/page/resources/_/technical/game-programming/writing-endian-independent-code-in-c-r2091
which provides some helpful code.
As for strings, there is no need to do endian-swapping. Endianness is used to order the bytes of a variable. A string consists of a series of chars. Each char has only one Byte so there is nothing to swap.

viewing the contents of a file as decimal numbers format using vi

I am saving audio input in a file called sound.raw by using alsa api. I think that the sound amplitudes are being saved (it is a guess, i am not sure). The format i use is signed 16 bit number little endian (S16_LE). Now if the amplitudes are being saved, how do i see the amplitudes in decimal number format because as of now i only see a collection of #s and ^s and various other symbols which aren't making sense when i open the sound.raw file with vi.
What you are seeing is the binary representation of the sound data as interpreted by vi (probably as ASCII). However, it is not meant to be human-readable, or a lot of storage would be wasted.
See Using vi as a hex editor for a way to show the data in hexadecimal format, which is the closest you're going to get to an answer to your question without (writing your own) specific software for displaying ALSA-formatted sound data in a human-readable fashion.

Resources