"descript.ion" file spec? - windows

There appears to be a somewhat standard "descript.ion" file in Windows programs universe which provides meta data for all/some of the files in a given directory.
I know there are various programs which write this file (example: NewsBin, UseNet downloader) and read it (Example: "FAR", a file manager mimicking old Norton Commander).
I'm writing my own file indexer, and would like to add the ability to parse and use the info from "descript.ion" files.
The problem I have is that I have not been able to find an actual spec for the file, despine much googling.
I reverse engineered it as best I could, but I'm not certain whether I captured 100% of the possible details, so I figured I'd ask SO.
Here are example lines from the file:
"Rus Song1.mp3" SovietMus 1/2, rus_song#gmail.com, Fri Aug 08 00:46:27 2008
RusSong2.mp3 SovietMus 2/2, rus_song#gmail.com, Fri Aug 08 01:46:22 2008
As it seems the structure is:
First "token" is a file name.
If the token starts with any letter but double quote, the token ends at the first space character.
If the token starts with the double quote, the end of token is the following double quote
Not sure what happens if filename contains a double quote, IIRC it's illegal in Windows filesystems, so escaping the quote may be a moot question)
Last token (end of line to the very last comma moving backwards) is a timestamp.
Second to last token (the very last comma to second-to-last comma moving backwards) is the name of the poster from the Usenet newsgroup. I'm not quite sure what happens in generic format since the only descript.ion files I saw were from NewsBin that is obviously Usenet centric.
Everything in between is a description, in NewsBin's case coming from post's subject.
QUESTIONs:
Does anyone know of a bit more official "descript.ion" file spec/documentation?
(or, at elast, have your own knowledge of those files and can verify my spec)
Does anyone know of any other programs that read or write this file?
Thanks!

The description files on my system are from Total Commander as well. They follow the basic spec mentioned in the other answers:
Filename Text I typed to describe the file
"Long filename" Some text
Each line ends in a normal Windows line break.
In addition, the program stores multi-line comments as follows:
Filename This is the first line\\nSecond line\\nLast line\x04\xc2
Here, I mean that the descript.ion file contains a backslash and a letter 'n' where I typed a line break, and two special characters 04 C2 at the end of the comment. In addition, the line is ended by a Windows line break 0D 0A.
Apparently, the two extra characters at the end of the line signal the end of a multiline comment. If I remove them, the comment is rendered as a single line in the GUI, and the '\n' sequences are displayed literally.

The original usage of DESCRIPT.ION was to provide longer more descriptive names to 8.3 filenames; all it had was the shortname and a longer description. As you've found, others have co-opted the name with varying formats and usages. Frankly speaking, I don't think you'll find any specific commonality among the various usages.

Format is simple: FileName remainder of the line is a description of the file
https://jpsoft.com/ascii/descfile.txt
(Wayback Machine)

The descript.ion file is extensively used in the file management utility "total commander", a shareware found in www.ghisler.com. From version 7.5 of TC, it can have length of 4096 bytes. I have been using it extensively to annotate my files without any issues. You may look up different user's experience at the total commander users forum.

the answer above looks correct for me, just a addition:
from http://filext.com/file-extension/ION
The ION file type is primarily associated with '4DOS'. Note: Norton Utilities also uses 4DOS.
http://www.optimasc.com/products/fileid/4dos-descext.pdf
Collected links to 4DOS description-aware programs of all kind and 4DOS tools.
http://www.4dos.info/4tools.htm
http://drupal.org/node/289988

Related

unix filename pattern in sftp command prompt

Please help, I am new to unix,
I want to read file which are available before 130000(hhmiss), meaning file from 00 hours to 13 hours.
XYZ_20170504_{00,01,02,03,04,05,06,07,08,09,10,11,12}[0-5][0-9][0-5][0-9].csv
OR
XYZ_20170504_{13,14,15,16,17,18,19,20,21,22,23}[0-5][0-9][0-5][0-9]
Double digit pattern matching is not working for my case.
Examples
File pattern 1: XYZ_DATE_TIME.csv
I want to mget files from sftp server where twice a day.
String_yyyymmdd_hhmiss.csv
XYZ_20170514_035959.csv
OR
XYZ_20170514_165959.csv
First of all in your question, i can see an error in your search regarding the dates.
Your regex is looking for this
XYZ_20170504_
and your file contains this
XYZ_20170514_
So obviously these will never match. I think this is more like what you need
000000 to 125959
XYZ_20170514_(0[0-9]|1[0-2])[0-5][0-9][0-5][0-9].csv
OR
130000 to 235959
XYZ_20170514_(1[3-9]|2[0-3])[0-5][0-9][0-5][0-9].csv
Also this is a relatively quick fix, only looking for matches relating to the time. You should also look to further improve this so that you fully sure you are matching correctly on the dates also.
For your own reference there are many tools available online to help you with your regular expressions and pattern matching. The one i favour most is regex101, however there are many others available too.

Creating files in a *nix environment with an email address as its name

PLEASE don't tell me why you think its a bad idea. Just tell me if its a workable idea.
I want to create files in a folder with names like the following:
asdf#qwerty.com.eml
abc+def#asdf.net.eml
abc_def#sasdf.at.eml
Is there some fundamental incompatibility in the characters allowed in email addresses and those allowed by a unix system?
I will be having a bash script reading the file names, subtracting the ".eml" ending, converting it into the "correct" usable format and sending an email to the address.
A simple test showed that it saved the above as files called
asdf\#qwerty.com.eml
abc+def\#asdf.net.eml
abc_def\#sasdf.at.eml
The only characters not allowed in a *nix filename are \0 and /, neither of which is allowed in an email address anyways. How your shell may handle symbols is another matter.
There are no characters disallowed in UNIX file names except / (directory separator) and ASCII 0 (string terminator), so there is no problem at a fundamental level.
Handling those file names in shell scripts is a different matter; it requires at least quoting every variable reference as "$FILENAME", and forgetting even one quotatino will create a very rare, insidious bug. (Also, many other utilities will fail on strange characters such as | or newline unless you consistently use the -0 option.)
So yes, technically your bad idea is workable :-)
Short answer:
przemek#linux-634b:~/tmp/email> touch john.smith#example.com
przemek#linux-634b:~/tmp/email> ls
john.smith#example.com
Works perfectly;)
Long answer:
It depends on filesystem you're using. See Wikipedia entry which lists allowed characters in file names. Most UNIX file systems support all characters that can be included in e-mail addresses. Non-UNIX filesystems, such as FAT, however, may cause problems.
Note that your problems may come from improper escaping. Check how are you creating your files.
What was your "simple test"?
Typing abc and hitting tab?
The bash autocompletion will add a \ before every special character.
But this does not mean, it is stored with a \ in its name.
Use ls to see the true name.
There is no problem with such file names on systems which treat file names as blobs and allow all byte sequences, i.e. Linux. But they are not portable to systems which treat file names as Unicode strings and disallow certain characters (Windows) or transform file names (Mac OS X, canonical decomposition).

Why should I have to bother putting a linefeed at the end of every file?

I've occasionally encountered software - including compilers - that refuse to accept or properly handle text files that aren't properly terminated with a newline. I've even encountered explicit errors of the form,
no newline at the end of the file
...which would seem to indicate that they're explicitly checking for this case and then rejecting it just to be stubborn.
Am I missing something here? Why would - or should - anything care whether or not a file ends with a seemingly-superfluous bit of whitespace?
Historically, at least in the Unix world, "newline" or rather U+000A Line Feed was a line terminator. This stands in stark contrast to the practice in Windows for example, where CR+LF is a line separator.
A naïve solution of reading every line in a file would be to append characters to a buffer until an LF was encountered. If done really stupid this would ignore the last line in a file if it wasn't terminated by LF.
Another thing to consider are macro systems that allow including files. A line such as
%include "foo.inc"
might be replaced by the contents of the mentioned file where, if the last line wasn't ended with an LF, it would get merged with the next line. And yes, I've seen this behavior with a particular macro assembler for an embedded platform.
Nowadays I'm in the firm belief that (a) it's a relic of ancient times and (b) I haven't seen modern software that can't handle it but yet we still carry around numerous editors on Unix-like systems who helpfully put a byte more than needed at the end of a file.
Generally I would say that a lack of a newline at the end of a source file would mean that something went wrong in the editor or source code control client and not all of the code in the buffer got flushed. While it's likely that this would result in other errors, knowing that something likely went wrong in the editor/SCM and code may be missing is a pretty useful bit of knowledge. Certainly something that I would want to check.

Do standard windows .ini files allow comments?

Are comments allowed in Windows ini files? (...assuming you're using the GetPrivateProfileString api functions to read them...)
[Section]
Name=Value ; comment
; full line comment
And, is there a proper spec of the .INI file format anywhere?
Thanks for the replies - However maybe I wasn't clear enough. It's only the format as read by Windows API Calls that I'm interested in. I know other implementations allow comments, but it's specifically the MS Windows spec and implementation that I need to know about.
Windows INI API support for:
Line comments: yes, using semi-colon ;
Trailing comments: No
The authoritative source is the Windows API function that reads values out of INI files
GetPrivateProfileString
Retrieves a string from the specified section in an initialization file.
The reason "full line comments" work is because the requested value does not exist. For example, when parsing the following ini file contents:
[Application]
UseLiveData=1
;coke=zero
pepsi=diet ;gag
#stackoverflow=splotchy
Reading the values:
UseLiveData: 1
coke: not present
;coke: not present
pepsi: diet ;gag
stackoverflow: not present
#stackoverflow: splotchy
Update: I used to think that the number sign (#) was a pseudo line-comment character. The reason using leading # works to hide stackoverflow is because the name stackoverflow no longer exists. And it turns out that semi-colon (;) is a line-comment.
But there is no support for trailing comments.
I have seen comments in INI files, so yes. Please refer to this Wikipedia article. I could not find an official specification, but that is the correct syntax for comments, as many game INI files had this as I remember.
Edit
The API returns the Value and the Comment (forgot to mention this in my reply), just construct and example INI file and call the API on this (with comments) and you can see how this is returned.
USE A SEMI-COLON AT BEGINING OF LINE --->> ; <<---
Ex.
; last modified 1 April 2001 by John Doe
[owner]
name=John Doe
organization=Acme Widgets Inc.
I like the analysis of #Ian Boyd, because it is based on the official GetPrivateProfileString() method of Microsoft.
In my attempts of writing a Microsoft compatible INI parser, I'm having a closer look at the said Microsoft API and for comments I found out:
you can have line comments using semicolon
the semicolon needn't be the first character of the line; it can be preceded by space, tab or vertical tab
you can have trailing "comments" after a section even without semicolon. It's probably not intended to be a comment, but the parser will ignore it.
values outside a section cannot be accessed (at least I did not find a way), effectively making them useless except for commenting purposes
certainly abuse, but the parser overflows at 65536 characters, so anything after that will not be part of the value either. I would not rely on this, since Microsoft could fix this in later versions of Windows. Also, it's not very useful as a comment when you don't see it.
Example:
this=cannot be accessed
[section]this=is ignored
;this=is a line comment
;this=is a comment preceded by spaces
key=value <... 65530 spaces ...>this=cannot be parsed
Yes, it allows.
The way to comment is to use ; for a new line rather than just after the content you want to comment in the same line, which is allowable for other files where you want to comment.
Let me show you an example:
I use .ini file to pass some parameters for my training file when I use SUMO software. If I write like this:
width_layers = 400 ;the number of neurons per layer in the neural network.
I will get an error message which is
ValueError: invalid literal for int() with base 10: '400 ;the number of neurons per layer in the neural network.'
I have to create a line for that, which is
width_layers = 400
;the number of neurons per layer in the neural network.
Then, it will work. Hope it helps in detail!

Are there any invalid linux filenames?

If I wanted to create a string which is guaranteed not to represent a filename, I could put one of the following characters in it on Windows:
\ / : * ? | < >
e.g.
this-is-a-filename.png
?this-is-not.png
Is there any way to identify a string as 'not possibly a file' on Linux?
There are almost no restrictions - apart from '/' and '\0', you're allowed to use anything. However, some people think it's not a good idea to allow this much flexibility.
An empty string is the only truly invalid path name on Linux, which may work for you if you need only one invalid name. You could also use a string like "///foo", which would not be a canonical path name, although it could refer to a file ("/foo"). Another possibility would be something like "/dev/null/foo", since /dev/null has a POSIX-defined non-directory meaning. If you only need strings that could not refer to a regular file you could use "/" or ".", since those are always directories.
Technically it's not invalid but files with dash(-) at the beginning of their name will put you in a lot of troubles. It's because it has conflicts with command arguments.
I personally find that a lot of the time the problem is not Linux but the applications one is using on Linux.
Take for example Amarok. Recently I noticed that certain artists I had copied from my Windows machine where not appearing in the library. I check and confirmed that the files were there and then I noticed that certain characters in the folder names (Named for the artist) were represented with a weird-looking square rather than an actual character.
In a shell terminal the filenames look even stranger: /Music/Albums/Einst$'\374'rzende\ Neubauten is an example of how strange.
While these files were definitely there, Amarok could not see them for some reason. I was able to use some shell trickery to rename them to sane versions which I could then re-name with ASCII-only characters using Musicbrainz Picard. Unfortunately, Picard was also unable to open the files until I renamed them, hence the need for a shell script.
Overall this a a tricky area and it seems to get very thorny if you are trying to synchronise a music collection between Windows and Linux wherein certain folder or file names contain funky characters.
The safest thing to do is stick to ASCII-only filenames.

Resources