when using a FTPS connection to transfer a file, what is the difference between a 'Binary mode taransfer' and 'ASCII mode transfer'? - ftp

I am using a FTPS connection to send a text file
[this file will contain EDI(Electronic Data Interchange) information]to a mailbox INOVIS.I have configured the system to open a FTPS connection and using the PUT command I write the file to a folder on the FTP server.
The problem is: what mode of file transfer should I use? How do I switch between modes?
Moreover which mode is the 'best-practice' to use when transferring file over FTPS connection.
If some one can provide me a small ftp script it would be helpful.

Many of the other answers to this question are a collection of nearly correct to outright wrong information.
ASCII mode means that the file should be converted to canonical text form on the wire. Among other things this means:
NVT-ASCII character set. Even if the original file is in some other character set, such as ASCII, EBCDIC or UTF-8. Technically this disallows characters with the 8th bit set, but most implementations won't enforce this.
CRLF line endings.
EBCDIC mode means a similar set of rules, except that the data on the wire should be in EBCDIC.
LOCAL mode allows sending data with a size other than 8 bits per byte.
IMAGE (or BINARY) mode means that the data should be send without any changes. It is up to the user to ensure that the target system can understand the data once it arrives.
Among other things, this means that the recommendation to use BINARY mode to send text data will fail if one of the systems involved doesn't use a ASCII based character set.

ASCII mode changes new line characters between unix and DOS formats. \n to \r\n and viceversa.

Actually, ASCII/BINARY has nothing to do with the 8th bit. It's a convention for translating line endings.
When you are on a windows machine talking to a Unix FTP server (FTPS or FTP - doesn't matter - the protocol is the same), the server will replace any <CR><LF>-Combination with <LF> before storing the file and consequently do the translation in reverse in case you get the file from the unix server.
The idea behind ASCII mode is to convert the line endings to the respective endings of the target platform.
As todays world seems to be converging to the unix convention (<LF>) and as nearly all of todays editors (aside of notepad) can easily handle Unix-Line-Endings, the days of ASCII mode are, indeed, numbered and I would by all means recommend to always use BINARY transfer mode.
The prospect of having data altered in mid-transfer is somewhat frightening anyways.

ASCII mode also makes file sharing of text files across different platforms more straightforward for end users. They won't have to worry about the default line ending (cr/lf versus just lf for example) since the ASCII mode will do that translation for them on the fly.
For most file types you will ALWAYS want to use BINARY mode though.

ACSII mode converts text files between UNIX and Windows formats based on the server and client platform (CR/LF vs LF), Binary doesn't. Of course, if you transfer nearly anything in ASCII mode that isn't text, it will probably be corrupted for that reason.

If you want an exact copy the data use binary mode - using ascii mode will assume the data is 7bit text (chars 0-127) and truncate any data outside of this range. Dates back to arcane 7bit networking days where ascii mode could save you time.
In a globalized environment that we live in - such that it is quite common to find non-ascii characters e.g. foreign languages, currency symbols etc. - you should always use BINARY mode.

For the FTP protocol, the ASCII transfer mode will consider the 8th bit of each of your character as insignificant and will use it for error checking. As for binary transfer mode, your data will be sent as is. Note that sending binary data in ASCII mode will (almost) always end up in data corruption. However, transferring ASCII data in binary mode will work as long as the sending and receiving systems use the 8th bit in the same way (in modern system the 8th bit should stay at 0 to prevent collision with extended ASCII charsets).

Related

Reading a whole file in Ruby (possibly a bug)

In similar question people recommend use File.read to read a whole file. But when I try to read png file (see fig. 1) I get only first line (see fig. 2). What am I doing wrong?
Use File.binread to read binary data.
On certain operating systems (notably Windows), there is a difference between opening a file in "binary mode" (8-bit characters) and "text mode" (7-bit characters). Because of this, these IO implementations can do things like detect end-of-file when there is a zero character, or mangle up characters outside of the ASCII range if you don't tell them to expect binary data.
If you open a file in Ruby, using mode "rb" instead of "r" will tell the OS that you expect binary data, and if it cares about that, it will do the right thing. File.binread() opens the underlying file it will read from with that mode.

In-depth understanding of binary files

I am learning C++ specially about binary file structure/manipulation, and since I am totally new to the subject of binary files, bits, bites & hexadecimal numbers, I decided to take one step backward and establish a solid understanding on the subjects.
In the picture I have included below, I wrote two words (blue thief) in a .txt file.
The reason for this, is when I decode the file using a hexeditor, I wanted to understand how the information is really stored in hex format. Now, don't get me wrong, I am not trying to make a living out of reading hex formats all day, but only to have a minimum level of understanding the basics of a binary file's composition. I also, know all files have different structures, but just for the sake of understanding, I wanted to know, how exactly the words "blue thief" and a single ' ' (space) were converted into those characters.
One more thing, is that, I have heard that binary files contain three types of information:
header, ftm & and the data! is that only concerned with multimedia files like audios, videos? because, I can't seem to see anything, other than what it looks like a the data chunk in this file only.
The characters in your text file are encoded in a Windows extension of ASCII--one byte for each character that you see in Notepad. What you see is what you get.
Generally, a hard distinction is made between text and binary files on Windows systems. On Unix/Linux systems, the distinction is fuzzier... you could argue that there is no distinction, in fact.
On Windows systems, the distinction is enforced by file extensions. All files with the extension ".TXT" are assumed to be text files (i.e., to contain only hex codes that represent visible onscreen characters, where "visible" includes whitespace).
Binary files are a whole different kettle of fish. Most, as you mention, include some sort of header describing how the data that follows is encoded. These headers can vary tremendously in size depending on the type of data (again, assumed to be indicated by the extension on Windows systems as well as Unix). A simple example is the WAV format for uncompressed audio. If you open a WAV file in your hex editing program, you'll see that the first four bytes are "RIFF"--this is a marker, often called a "magic number" even though it is readable as text, indicating that the contents are an audio file. Newer versions of the WAV specification have complicated this somewhat, but originally the WAV header was just the "RIFF" tag plus a dozen or so bytes indicating the sample rate of the following data. (You can see this by comparing the raw data in a track on an audio CD to the WAV file created by ripping an uncompressed copy of that track at 44.1 KHz--the data should be the same, with just a header section added at the start of the WAV file.)
Executable files (compiled programs) are a special type of binary file, but they follow roughly the same scheme of a header followed by data in a prescribed format. In this case, though, the "data" is executable machine code, and the header indicates, among other things, what operating system the file runs on. (For example, most Linux executables begin with the characters "ELF".)

How does line ending effect in coding?

Why do line ending differ from platform to platform? Even why is there term like line ending in programming?
I prefer saving my codes in Unix/Linux format, even if I'm on Windows. Am I missing anything by not saving it in Windows or MacOS format? How does line ending effect in coding.
In the early days, when Typewriters were nearly the only way of getting output from a computer, CR and LF did different things. Unix started the tradition of using a single character to mark the end of a line, probably because it made their pipelining easier; their drivers could easily convert a single LF to CR/LF if need be. Linux is mostly a Unix clone so it keeps that convention. The others hold on to the CR/LF convention for historical reasons, even though it's not strictly necessary.
Some languages such as C, C++, and Python will let you specify the type of file when you open it, either binary or text. For text files a translation is performed so that a single LF is translated into the line ending convention required by the OS.
Basically everyone wanted to be different when creating OS's - Un*x's started with LF, then VMS and DOS wanted CR/LF (like a typewriter) and of course MAC wanted to be different so they went for CR only.
They just wanted to make it harder to transfer between OS's so that you 'bought' into one
Added because of comment
Up to the programmer - if you need to support different line endings then you must code for them. eg you could create a #define for the line ending and then have this change depending on compile options

In what situation should I use ASCII to transfer a file over FTP? (I'm not asking the diff between ascii xfer and bin xfer)

I understand the difference between ASCII mode vs binary when it comes to FTP, but what I don't understand is why there is even a need for ASCII mode at all? Is this just a legacy thing that used to save time by eliminating the most significant bit, therefore causing the overall speed of the transfer to increase by 1/8th? Or is there some hidden use for it that I don't know about?
I've encountered many problems because I would forget to switch the mode to bin when transferring text between different OS's. I don't understand why "bin" isn't just the default for everything, especially with today's much faster internet speeds.
Knowwutimean, Vern?
ASCII mode exists so you can get the right answer when you upload a text file to a remote system without having to know what the line termination or character set conventions are for that system. It was more important when transferring text files was more often done via FTP than, say, email.
To address your practical problem: check the documentation for both your FTP client and server(s) to see if there's a way to set ASCII mode by default. Often this is as simple as some kind of "profile" that sends some FTP commands every time you connect.
To address your philosophical problem: FTP is a 40 year old protocol that has its fair share of historical baggage. One day you'll be very glad that some protocol you depend on was standardized long ago and you can still access some old data.
I, for one, vote to eliminate ascii mode from ftp servers. Any EOL translation can be done by applications consuming the files, and many apps today understand both EOL types anyway. At a minimum, I'd like to see servers switch to using binary by default, and only use ascii if requested.
One scenario of practical use of ASCII mode is to upload PHP or Perl or similar scripts from Windows development machine to Unix server. Use of Binary mode would require separate conversion of line ending sequences, while with ASCII mode conversion is performed "automatically".
Update: there's one more scenario that we have come across - when transferring data to/from mainframes that use EBCDIC encoding, ASCII mode tells the server to perform conversion between encodings.
Here's a practical example of a problem that comes from using a binary FTP connection. In php there are two types of comments:
// a single line comment like this
/* a block comment like this */
The block comment has a start and an end. But the single line comment just ends at the end of the line.
If you upload a php file with single line comments using a binary connection, the php will stop running as soon as it hits the single line comment. It doesn't recognise the end of the line as the end of the comment, so it effectively comments out the rest of your php script.
If however you use FTP in ASCII mode, it will correctly read the end of the line and will run your php code as expected.

difference between text file and binary file

Why should we distinguish between text file and binary files when transmitting them? Why there are some channels designed only for textual data? At the bottom level, they are all bits.
At the bottom level, they are all bits... true. However, some transmission channels have seven bits per byte, and other transmission channels have eight bits per byte. If you transmit ASCII text over a seven-bit channel, then all is fine. Binary data gets mangled.
Additionally, different systems use different conventions for line endings: LF and CRLF are common, but some systems use CR or NEL. A text transmission mode will convert line endings automatically, which will damage binary files.
However, this is all mostly of historical interest these days. Most transmission channels are eight bit (such as HTTP) and most users are fine with whatever line ending they get.
Some examples of 7-bit channels: SMTP (nominally, without extensions), SMS, Telnet, some serial connections. The internet wasn't always built on TCP/IP, and it shows.
Additionally, the HTTP spec states that,
When in canonical form, media subtypes of the "text" type use CRLF as the text line break. HTTP relaxes this requirement and allows the transport of text media with plain CR or LF alone representing a line break when it is done consistently for an entire entity-body.
All files are saved in one of two file formats - binary or text. The two file types may look the same on the surface, but their internal structures are different.
While both binary and text files contain data stored as a series of (bits (binary values of 1s and 0s), the bits in text files represent characters, while the bits in binary files represent custom data.
Distinguishing between the two is important as different OSs treat text files differently. For example in *nix you end your lines with just \n while in MS OSs you use \r\n and in Macs you use \n\r. Software such as FTP clients try to change the line endings on text files to match the destination OS by adding/removing the characters. This is to make sure that the text file will look properly on the destination OS.
for example, if you create a text file in *nix with line breaks and try to copy it to a windows box as a binary file and open it in notepad, you will not see any of the line endings, but just a clog of text.
Important to add to the answers already provided is that text files and binary files both represent bytes but text files differ from binary files in that the bytes are understood to represent characters. The mapping of bytes to characters is done consistently over the file using a certain code page or Unicode. When using 7 or 8-bit code pages you can spin the dial when reading these files and interpret them with an English alphabet, a German alphabet, Russian alphabet, or others. This spinning the dial doesn't affect the bytes, it does affect which characters are chosen to correspond to the bytes.
As others have stated, there is also the issue of the encoding of line break separators which is unique to text files and which may differ from platform to platform. The "line break" is not a letter in our alphabet or a symbol you can write, so other rules apply to it.
With binary files there is no implicit convention on character encoding or on the definition of a "line".
All machine language files are actually binary files.
For opening a binary file, file mode has to be mentioned as "rb"or "wb"in fopen command. Otherwise all files are opened in default mode, which is text mode.
It may be noted that text files can also be stored and processed as binary files but not viceversa.
The binary files differ from text file in 2 ways:
The storage of newline characters
The EOF character
Eg:
wt-t stands for textfile
Wb-b stands for binaryfile
Binary files do not store any special character at the end either file end is verified by ueing their size itself.

Resources