What are some strategies for opening/reading a .DAT file when it was created by an arbitrary program of which you are unaware?
Use an hex-viewer for binary files. Pick up a free one from sourceforge.
Text editors are a bad idea due to charset encodings.
Related
Are there any rules for file extensions? For example, I wrote some code which reads and writes a byte pattern that is only understood by that specific programm. I'm assuming my anti virus programm won't be too happy if I give it the name "pleasetrustme.exe"... Is it gerally allowed to use those extensions? And what about the lesser known ones, like ".arw"?
You can use any file extension you want (or none at all). Using standard extensions that reflect the actual type of the file just makes things more convenient. On Windows, file extensions control stuff like how the files are displayed in Windows Explorer and what happens when you double click on it.
I wrote some code which reads and writes a byte pattern that is only
understood by that specific programm.
A file extension is only an indication of what type of data will be inside, never a guarantee that certain data formatted in a specific way will be inside the file.
For your own specific data structure it is of course always best to choose an extension that is not already in use for other file formats (or use a general extension like .dat or .bin maybe). This also has the advantage of being able to use an own icon without it being overwritten by other software using the same extension - or the other way around.
But maybe even more important when creating a custom (binary?) file format, is to provide a magic number as the first bytes of that file, maybe followed by a file header structure containing a version number etc. That way your own software can first check the header data to make sure it's the right type and version (for example: anyone could rename any file type to your extension, so your program needs to have a way to do some checks inside the file before reading the remaining data).
I have a folder of image file which have been compressed into .dat file. Since the .dat files are extremly huge(They are the microscopic image of the organ.), I don't really know what kind of tools that I can use to convert it into jpeg file. So the best case would that the whole image is split up into pieces, and I can get all the pieces of the image.
The ".dat" file suffix is used broadly, so you'll need to specify more details on what format/source software created the original data. As a guess, from a quick search of ".dat" format microscopy, these tools looks like they might be applicable to your domain:
http://gwyddion.net/
or
http://www.openmicroscopy.org/site/products/bio-formats
If you can't find a library for the format/languages you are using, then you'll need to find documentation of the file format, and write a converter (at least, the reading portion of the converter - you can use something like libjpeg to handle the writing portion.)
We're creating an app that is going to generate some text files on *nix systems with hashed filenames to avoid too-long filenames.
However, it would be nice to tag the files with some metadata that gives a better clue as to what their content is.
Hence my question. Does anyone have any experience with creating files with custom metadata in Ruby?
I've done some searching and there seem to be some (very old) gems that read metadata:
https://github.com/kig/metadata
http://oai.rubyforge.org/
I also found: system file, read write o create custom metadata or attributes extended which seems to suggest that what I need may be at the system level, but dropping down there feels dirty and scary.
Anyone know of libraries that could achieve this? How would one create custom metadata for files generated by Ruby?
A very old but interesting question with no answers!
In order for a file to contain metadata, it has to have a format that has some way (implicitly or explicitly) to describe where and how the metadata is stored.
This can be done by the format, such as having a header that says where the "main" data is stored and where the "metadata" is stored, or perhaps implicitly, such as having a length to the "main" data, and storing metadata as anything beyond the "main" data.
This can also be done by the OS/filesystem by storing information along with the files, such as permission info, modtime, user, and more comprehensive file information like "icon" as you would find with iOS/Windows.
(Note that I am using "quotes" around "main" and "metadata" because the reality is that it's all data, and needs to be stored in some way that tools can retrieve it)
A true text file does not contain any headers or any such file format, and is essentially just a continuous block of characters (disregarding how the OS may store it). This also means that it can be generally opened by any text editor, which will merely read and display all the characters it finds.
So the answer in some sense is that you can't, at least not on a true text file that is truly portable to multiple OS.
A few thoughts on how to get around this:
Use binary at the end of the text file with hope/requirements that their text editor will ignore non-ascii.
Store it in the OS metadata for the file and make it OS specific (such as storing it in the "comments" section that an OS may have for a file.
Store it in a separate file that goes "along with" the file (i.e., file.txt and file.meta) and hope that they keep the files together.
Store it in a separate file and zip the text and the meta file together and have your tool be zip aware.
Come up with a new file format that is not just text but has a text section (though then it can no longer be edited with a text editor).
Store the metadata at the end of the text file in a text format with perhaps comments or some indicator to leave the metadata alone. This is similar to the technique that the vi/vim text editor uses to embed vim commands into a file, it just puts them as comments at the beginning or end of the file.
I'm not sure there are many other ways to accomplish what you want, but perhaps one of those will work.
I have been reading about the issue with trying to figure out the actual encoding of a file and all its complications.
But I just need to know what the encoding of a file was set to when it was saved. Does windows store this information somewhere similar to file type , date modified etc., ?
That's not available. The Windows file system (NTFS) doesn't store any metadata for a file beyond the trivial stuff like name, extension, last written date, etcetera. Nothing that's specific for the file type.
All you have available is the BOM, bytes at beginning of the file that indicate the UTF encoding and byte order. It only exists for files encoded in UTF and, unfortunately, is optional. The real troublemakers however are text files that were encoded with a particular 8-bit non-Unicode code page. Usually created by a legacy application. Nothing you can do for that but hope that the file wasn't created too far away from your machine so that the default system code page is a match.
No operating system stores the information about the encoding to a file. the encoding is a property of text file only. Since some text files do not have .txt extension and some .txt file is not really a text file, associating the encoding to a file does not make much sense.
Some UTF-8 files store the byte order mark (BOM) at the beginning of the file which can be used to check whether it is a UTF-8 file or not. However, BOM is not always present and a UTF-8 file does not need to have BOM. So the only way to determine the encoding of the text file is to open it up with different encoding method until you can read the file.
Using WritePrivateProfileString and GetPrivateProfileString results in ??? instead of the real characters.
GetPrivateProfileString() and WritePrivateProfileString() will work with Unicode, sort of.
If the ini file is UTF-16LE encoded, i.e. it has a UTF-16 BOM, then the functions will work in Unicode. However if the functions have to create the file they will create an ANSI file and only work in ANSI.
So to use the functions with Unicode, create your ini file before you first use it and write a UTF-16LE Byte Order Mark to it. Then carry on as normal.
Note that the functions do not work at all with UTF-8.
See Michael Kaplan's blog for more detail than you ever wanted to know about this.
The WritePrivateProfileStringW function will write the INI file in legacy system encoding (e.g. Shift-JIS on a Japanese system) because it is a legacy support function. If you want to have a fully Unicode-enabled INI file, you will need to use an external library.
Try SimpleIni http://code.jellycan.com/simpleini/
It is C++, single header file, template library with an MIT licence (i.e. commercial use is OK). Include it into your source file and use it. It is cross-platform, supports UTF-8 and legacy encoded files, and can read and write the INI file largely preserving comments and structure, etc. Easiest to check out the page.
It's been around for a while and is appears to be used by quite a number of people. I wrote it and continue to support it.
According to the WritePrivateProfileString documentation, there is a Unicode version: WritePrivateProfileStringW. Use that, and you should be able to use Unicode characters.
It might just be a problem with how you are displaying or handling the strings.
For example, the normal console window can't display japanese strings with printf.
Can you post some of your code?