For anyone that remembers the protocol Avatar, (I'm pretty sure this was it's name) I'm trying to find information on it. All I've found so far, is that it's an ANSI style compression protocol, done by compressing common ANSI escape sequences.
But, back in the day, (The early 90's) I swore I remembered that it was used to compress ASCII text for modems like early 2400 baud BIS modems. (I don't recall all the protocol versions, names, etc from back then, sorry).
Anyways, this made reading messages, and using remote shells a lot nicer, due to the display speed. It didn't do anything for file transfers or what not, it was just a way of compressing ASCII text down as small as possible.
I'm trying to do research on this topic, and figured this is a good place to start looking. I think that the protocol used every trick in the book to compress ASCII, like common word replacement to a single byte, or maybe even a bit.
I don't recall the ratio you could get out of it, but as I recall, it was fairly decent.
Anyone have any info on this? Compressing ASCII text to fewer than 7 bits, or protocol information on Avatar, or maybe even an answer to if it even DID any of the ASCII compression I'm speaking of?
Wikipedia has something about AVATAR protocol:
The AVATAR protocol (Advanced Video
Attribute Terminal Assembler and
Recreator) is a system of escape
sequences occasionally used on
Bulletin Board Systems (BBSes). It has
largely the same functionality as the
more popular ANSI escape codes, but
has the advantage that the escape
sequences are much shorter. AVATAR can
thus render colored text and artwork
much faster over slow connections.
The protocol is defined by FidoNet
technical standard proposal FSC-0025.
Avatar was later extended by in late
1989 to AVT/0 (sometimes referred to
as AVT/0+) which included facilities
to scroll areas of the screen (useful
for split screen chat, or full screen
mail writing programs), as well as
more advanced pattern compression.
Avatar was originally implemented in
the Opus BBS, but later popularised by
RemoteAccess. RemoteAccess came with a
utility, AVTCONV that allowed for easy
translation of ANSI documents into
Avatar helping its adoption.
Also:
FSC-0025 - AVATAR proposal at FidoNet Technical Standards Committee.
FSC-0037 - AVT/0 extensions
If I remember correctly, the Avatar compression scheme was some simple kind of RLE (Run-Length Encoding) that would compress repeated strings of the same characters to something smaller. Unfortunately, I don't remember the details either.
Did you check out AVATAR on Wikipedia?
Related
Microsoft announced new support for 3D printing in Windows 8.1. They invented a new file format called 3MF for communicating with the printer.
Does anyone have any kind of spec for the 3MF file format, or even any detailed information about it? I'm especially interested in how geometry is represented.
Also, any thoughts on how 3MF might co-exist with the existing AMF format? Are the two formats competing or complementary?
You can request the Microsoft 3D Printing SDK (which includes the 3D Manufacturing Format spec) at ask3dprint at microsoft dot com.
Geometry is represented very similarly to AMF, with a more compact (fewer bytes) format.
3MF is intended to be a native 3D spool file format in the Windows print pipeline, something which AMF could not be (due to flaws in the spec, reported to ASTM). 3MF and AMF are not the same, but they follow a similar design approach. Since 3MF supports custom metadata, it is possible to transform an AMF document into 3MF format (and back).
3MF does not support curved triangles or a scripting language, however. 3D printing makers generally aren't really using those anyway.
The spec can be downloaded here:
http://3mf.io/what-is-3mf/3mf-specification
I read the spec, and it stores individual mesh vertices with XML tags.
And, this is justified as follows: "The variable-precision nature of ASCII encoding is a significant advantage over fixed-width binary formats, and helps make up the difference in storage efficiency."
Anyone doing web coding knows that XML tags can eat up >60% of storage space. Just compare the number of bytes used for xml tags vs useful data. And, it requires more CPU time to parse. In addition, storing digits in variable-width ASCII is still longer in bytes than fixed-width double precision IEEE.
This essentially cripples 3MF for handling large meshes.. Unless of course, the companies involved develop their own proprietary extension that overcomes this.
I make a call for consumers and users to openly reject the current 3MF spec until the 3MF consortium fixes this particular aspect. Modify the public spec to include an alternative binary format section for vertex / normal / index/ streaming.
I'm trying to come up with a platform independent way to render unicode text to some platform specific surface, but assuming that most platforms support something at least kinda similar, maybe we can talk in terms of the win32 API. I'm most interested in rendering LARGE buffers and supporting rich text, so while I definitely don't want to ever look inside a unicode buffer, I'd like to be told myself what to draw and be hinted where to draw it, so if the buffer is modified, I might properly ask for updates on partial regions of the buffer.
So the actual questions. GetTextExtentExPointW clearly allows me to get the widths of each character, how do I get the width of a nonbreaking extent of text? If I have some long word, he should probably be put on a new line rather than splitting the word. How can I tell where to break the text? Will I need to actually look inside the unicode buffer? This seems very dangerous. Also, how do I figure how far each baseline should be while rendering?
Finally, this is already looking like it's going to be extremely complicated. Are there alternative strategies for doing what I'm trying to do? I really would not at all like to rerender a HUGE buffer each time I change tiny chunks of it. Something between looking at individual glyphs and just giving a box to spat text in.
Finally, I'm not interested in using anything like knuth's word breaking algorithm. No hyphenation. Ideally I'd like to render justified text, but that's on me if the word positioning is given. Ragged right hand margin is fine by me.
What you're trying to do is called shaping in unicode jargon. Don't bother writing your own shaping engine, it's a full-time job that requires continuous updates to take into account changes in unicode and opentype standards. If you want to spend any time on the rest of your app you'll need to delegate shaping to a third-party engine (harbuzz-ng, uniscribe, icu, etc)
As others wrote:
– unicode font rendering is hellishly complex, much more than you expect
– winapi is not cross platform at all
The three common strategies for rendering unicode text are:
1. write one backend per system (plugging on the system native text stack) or
2. select one set of cross-platform libs (for example freebidi + harfbuzz-ng + freetype + fontconfig, or a framework like QT) and recompile them for each target system or
3. take compliance shortcuts
The only strategy I strongly advise against is the last one. You can not control unicode.org normalization (adding upper ss to German), you do not understand worldwide script uses (both African languages and Vietnamese are Latin variants but they exercise unexpected unicode properties), you will underestimate font creators ingenuity (oh, indic users requested this opentype property, but it will be really handy for this English use-case…).
The two first strategies have their own drawbacks. It's simpler to maintain a single text backend but deploying a complete text stack on a foreign system is far from hassle-free. Almost every project that tries cross-plarform has to get rid of msvc first since it targets windows and its language dialect won't work on other platforms and cross-platform libs will typically only compile easily in gcc or llvm.
I think harfbuzz-ng has reached parity with uniscribe even on windows so that's the lib I'd target if I wanted cross-platform today (chrome, firefox and libreoffice use it at least on some platforms). However Libreoffice at least uses the multi-backend strategy. No idea if it reflects current library state more than some past historic assessment. There are not so many cross-platform apps with heavy text usage to look at, and most of them carry the burden of legacy choices.
Unicode rendering is surprisingly complicated. Line breaking is just the beginning; there are many other subtleties that I don't think you've appreciated (vertical text, right-to=left text, glyph combining, and many more). Microsoft has several teams dedicated to doing nothing but implementing text rendering, for example.
Sounds like you're interested in DirectWrite. Note that this is NOT platform independent, obviously. It's also possible to do a less accurate job if you don't care about being language independent; many of the more unusual features only occur in rarer languages. (Chinese being a notable exception.)
If you want some perfect multi platform there will be problems. If you draw one sentence with GDI, one GDI+, one with Direct2D, one on Linux, one on Mac all with the same font size on the same buffer, you'll have differences some round some position to int some other use float for examples.
There is not one, but a least two problems. Drawing text and computing text position, line break etc are very different. Some library do both some do only computing or rendering part. A very simplified explanation is drawing do only render one single char at the position you ask, with transformations zoom, rotation and anti aliasing. Computing do everything else chose each char position in word, sentences line break, paragraphs etc
If you want to be platform independent you could use FreeType to read font files and get every information on each characters. That library get exact same result on each platform and predictability in font is good. The main problem with font is lot of bad, missed, or even wrong information in characters descriptions.Nobody do text perfectly because it's very hard (tipping hat to word, acrobat and every team who deal directly with fonts)
If your computing of font is good. There is lot of work to do everything you could see in a good word processor software (space between characters, spaces between word, line break, alignment, rotation, grease, aliasing...) then you could do the rendering. It should be the easier. You can with same computation do GDI, Direct2D, PDF or printing rendering path.
A little background: a coworker was creating some "glitch-art" from, using this link. He deleted some bytes from a jpeg image, and created the result:
http://jmelvnsn.com/prince_fielder.jpg
The thing that's blowing my mind here, is that chrome is rendering this image differently on each refresh. I'm not sure I understand how the image-rendering code is non-deterministic. What's going on?
EDIT>> I really wish stackoverflow would stop redirecting my url to their imgur url.
Actually it's interesting to know that the JPG standard it's not a standard about imaging techniques or imaging algorithms, it's more like a standard about a container.
As far as I know if you respect the jpeg standard you can decode/encode a jpeg with X number of different techniques and algorithms, that's why it's hard to support JPEG/JPG, from a programmer prospective a JPG can be a million things and it's really hard to handle that kind of fragmentation, often times you are forced to simply jump on the train offered by some library and hope that your users wouldn't experience a trouble with it.
There is no standard way to encode or decode a JPEG image/file ( including the algorithms used in this processes ), considering this the apparent "weird" result offered by your browser is 100% normal.
I have some knowledge on OS (really little.)
I would like to know a lot about specifically the Windows OS (e.g. win 7)
I know, it's the most dominant OS out there, and there is an enormous amount of work I`ll have to do.
Where do I start? what are beginner/intermediate books/articles/websites that I should read?
The first thing I wonder about is that the compiler turns my C programs to binary code, however when I open the (exe) result files, I find something other than 0 and 1.
I can't point you in a direction as far as books go, but I can clarify this:
The first thing I wonder about is that the compiler turns my C programs to binary code, however when I open the (exe) result files, I find something other than 0 and 1.
Your programs are in fact compiled to binary. Everything on your computer is stored in binary.
The reason you do not see ones and zeroes is because of the makeup of character encodings. It takes eight bits, which can have the value 0 or 1, to store one byte. A lot of programs and character encodings represent one byte as one character (with the caveat of non-ASCII unicode characters, but that's not terribly important in this discussion).
So what's going on is that the program you are using to open the file is interpreting sequences of eight bits and turning those eight bits into one character. So each character you see when you open the file is, in fact, eight ones and zeros. The most basic mapping between bytes and characters is ASCII. The character "A", for example, is represented in binary as 01000001. so when the program you use to open the file sees that bit sequence, it will display "A" in its place.
A nice book to read if you are interested in the Microsoft Windows operating system is The Old New Thing by Microsoft legend Raymond Chen. It is very easy reading if you are a Win32 programmer, and even if you are not (even if you are not a programmer at all!) many of the chapters are still readily accessible.
Otherwise, to understand the Microsoft Windows OS, you need to understand the Windows API. You learn this by writing programs for the Windows (native) platform, and the official documentation, which is very good, is at MSDN.
There are a series of books titled "Windows Internals" that could probably keep you busy for the better part of a couple years. Also, Microsoft has been known to release source code to universities to study...
well, if you study the win32 api you will learn a lot about high-level os
(petzold is the king, and it's not about win7 just win32....)
If you want to study about low level, study the processor assembler language.
There are a ton of resources out there for learning operating systems in general, many of which don't really focus on Windows because, as John pointed out, it's very closed and not very useful academically. You may want to look into something like Minix, which is very useful academically. It's small, light, and made pretty much for the sole purpose of education.
From there you can branch out into other OSes (even Windows, as far as not being able to look under the hood can take you) armed with a greater knowledge of what an OS is and does, as well as more knowledge of the inner workings of the computer itself. (For example, opening executable code in I assume a text editor, such as Notepad, to try to see the 1s and 0s, which as cdhowie pointed out eloquently is not doing what you think it's doing.)
I would personally look into the ReactOS project - a working windows clone.
The code con give some ideas of how windows is implemented...
Here is the site:
www. reactos. org
As far as I know Linux chose backward compatibility of UTF-8, whereas Windows added completely new API functions for UTF-16 (ending with "W"). Could these decisions be different? Which one proved better?
UTF-16 is pretty much a loss, the worst of both worlds. It is neither compact (for the typical case of ASCII characters), nor does it map each code unit to a character. This hasn't really bitten anyone too badly yet since characters outside of the Basic Multilingual Plane are still rarely-used, but it sure is ugly.
POSIX (Linux et al) has some w APIs too, based on the wchar_t type. On platforms other than Windows this typically corresponds to UTF-32 rather than UTF-16. Which is nice for easy string manipulation, but is incredibly bloated.
But in-memory APIs aren't really that important. What causes much more difficulty is file storage and on-the-wire protocols, where data is exchanged between applications with different charset traditions.
Here, compactness beats ease-of-indexing; UTF-8 is clearly proven the best format for this by far, and Windows's poor support of UTF-8 causes real difficulties. Windows is the last modern operating system to still have locale-specific default encodings; everyone else has moved to UTF-8 by default.
Whilst I seriously hope Microsoft will reconsider this for future versions, as it causes huge and unnecessary problems even within the Windows-only world, it's understandable how it happened.
The thinking back in the old days when WinNT was being designed was that UCS-2 was it for Unicode. There wasn't going to be anything outside the 16-bit character range. Everyone would use UCS-2 in-memory and naturally it would be easiest to save this content directly from memory. This is why Windows called that format “Unicode”, and still to this day calls UTF-16LE just “Unicode” in UI like save boxes, despite it being totally misleading.
UTF-8 wasn't even standardised until Unicode 2.0 (along with the extended character range and the surrogates that made UTF-16 what it is today). By then Microsoft were on to WinNT4, at which point it was far too late to change strategy. In short, Microsoft were unlucky to be designing a new OS from scratch around the time when Unicode was in its infancy.
Windows chose to support Unicode with UTF-16 and the attendant Ascii/Unicode functions way way way way WAAAAAAY back in the early 90's (Windows NT 3.1 came out in 1993), before Linux ever had the notion of Unicode support.
Linux has been able to learn from best practices built on Windows and other Unicode capable platforms.
Many people would agree today that UTF-8 is the better encoding for size reasons unless you know you're absolutely going to be dealing with lots of double-byte characters - exclusively - where UTF-16 is more space efficient.