why !, ", #, $, spc can resync characters in Asynchronous serial output? - ascii

Edit:
All ASCII that starts with 0011 can resync frame. While other characters that starts with 01 can't and garbage results after closing and opening serial software. Now is the answer why more clear?
I tried it out right out of the MCU hardware. A crystal running at 8 MHz. It's an old MCU.
Please try any MCU (even new ones) and send "Hello" and after the last alphabet "o", add "space" (or !, #, $) and compare this to using other characters. If you will use other characters (I used "|" for example in the following), the following garbage will result (after closing and opening the terminal software many times). All the serial software I tried have same results or effects.
In the following I used "space" (or !, #, $ will also work) that can somehow resync the word "Hello":
What kind of terminal emulator/software that can report any framing error and buffer overflow? The MCU I sent is old serial UART displayed in PC using serial to USB converter.
Better yet. Please share any software that can show how the UART receiver is receiving the patterns (with start and stop) bit so I can understand why some characters can attain frame sync in ANY terminals (I tried many like Teraterm, Serial Port Monitor, etc).
Original message:
I'm writing a program in the microcontroller that would output alphabet characters in the terminal like the world Hello using the ordinary asynchronous serial protocol (UART). I noticed that whenever I closed and opened the serial PC software while the MCU was running the same firmware, it can run garbage, this is because it can't tell which is the start bit and stop bit in the middle of the stream when the port opens.
But there is something so puzzling, whenever I used the characters "space", !, ", #, $ and add it to Hello() the stream can somehow resync itself. Is there any reason why these characters can do it? (Note the baud used is fixed at 9600, No Parity, Stop Bits and 8 bits for all cases. And firmware didn't use any delay loop in all cases)

The UART synchronizes on the falling edge at the start of the start bit.
Characters which have less other falling edges are less likely to cause mis-synchronation when a device starts listening to a busy line.
More significantly, characters which end with sequences that don't contain a falling edge will be more likely to allow a re-synchronization on the start bit of the next byte, whatever it is.
Since characters are output little-end first, to get the last wrong falling edge as far away as possible from the right one, you need the longest sequence of zeros followed by ones.
If you are outputting 7-bit ASCII characters in 8-bit frames then the most significant bit will always be 0, so the best chance of resynchronization will be when there are more high-order zeros.
The printing characters with the most high-order zeros are punctuation and digits (note that digits are somehow missing from your ASCII table link).
Control characters have even more zeros, so horizontal tab or newline should resync you even more quickly.

Related

ASCII - Whats the point of it?

I always wanted to ask this, I know that ASCII uses numbers to represent characters like 65 = A
Whats the point? computer understand when i press A is A why we need to convert to 65?
You have it backwards: computers understand when you press an A because of codes like ASCII. Or rather, one part of the computer is able to tell another part of the computer that you pressed an A because they agree on conventions of binary signals like ASCII.
At its lowest level, each part of the computer "knows" that it is in one of two states - maybe off and on, maybe high voltage and low voltage, maybe two directions of magnetism, and so on. For convenience, we label these two states 0 and 1. We then build elaborate (and microscopic) sequences of machinery that each say "if this thing's a 1, then do this, if it's a 0 do this".
If we string a sequence of 1s and 0s together, we can write a number, like 1010; and we can make machinery that does maths with those numbers, like 1010 + 0001 = 1011. Alternatively, we can string a much longer sequence together to represent the brightness of pixels from the top left to bottom right of a screen, in order - a bitmap image. The computer doesn't "know" which sequences are numbers and which are images, we just tell it "draw the screen based on this sequence" and "calculate my wages based on this sequence".
If we want to represent not numbers or images, but text, we need to come up with a sequence of bits for each letter and symbol. It doesn't really matter what sequence we use, we just need to be consistent - we could say that 000001 is A, and as long as we remember that's what we chose, we can write programs that deal with text. ASCII is simply one of those mappings of sequences of bits to letters and symbols.
Note that A is not defined as "65" in ASCII, it's defined as the 7 bit sequence 1000001; it just happens that that's the same sequence of bits we generally use for the number 65. Note also that ASCII is a very old mapping, and almost never used directly in modern computers; it is however very influential, and a lot of more recent mappings are designed to use the same or similar sequences for the letters and symbols that it covers.

how data is interpreted in computers

Something that troubles me when I think of is how incoming data is interpreted in computers. I searched a lot but could not find an answer so as a last resort I am asking in here. What I am saying is that you plug in a USB to your computer and data stream starts. Your computer receives ones and zeros from the USB and interpret them correctly like for example inside of the USB there are pictures with different names and different formats and resolutions. What I do not understand is how computer correctly puts them together and the big picture emerges. This could be seen as a stupid question but had me thinking for a while. How does this system work?
I am not a computer scientist but I am studying Electrical and electronics engineering and know somethings.
It is all just streams of ones and zeros, which get counted up into bytes. As you probably know one can multiplex them, but with modern hardware that isn't very necessary (the 's' in USB standing for 'serial)
A pure black and white image of an "A" would be a 2d array:
111
101
111
101
101
3x5 font
I would guess that "A" is stored in a font file as 111101111101101, with a known length of 3*5=15 bits.
When displayed in a window, that A would be broken down into lines, and inserted on the respective line of the window, becoming a stream which contains 320x256 pixels perhaps.
When the length of data is not constant, it can:
If there is a max size, could be the size of the max size (integers and other primitive data types do this, a 0 takes 32/64 bits, as does 400123)
A length is included somewhere, often a sort of "header"
It gets chunked up into either constant or variable sized chunks, and has a continue bit (UTF-8 is a good simple example of constant chunks, some networking protocols (maybe TCP/IP) are a good example of variable chunks)
Both sides need to know how to decode the data, in your example of a USB stick with an image on it. The operating system has a driver which understands the UUID is a storage device, and attempts to read special sectors from it. If it detects a partition type it recognizes (for windows that would be NTFS or FAT32), it will then load the file tables, using drivers that understand how to decode those. It finds a filename allows access via the filename. Then an image reading program is able to load the bytestream of that file and decode it using its headers and installed codecs into a raster image array. If any of those pieces are not available in your system, you cannot view the image, and it will be just any random binary to you (if you format the usb stick with Linux, or use a uncommon/old image format)
So its all various level of explicit or implicit handshakes to agree on what the data is when you get to the higher levels (higher level being at least once you agree on endianness and baudrate of data transmission)

Typing ALT + 251 and ALT + 0251 at the keyboard produce different character entries

In Windows:
when I press Alt + 251, I get a √ character
when I press Alt + 0251 get û character!
A leading zero doesn't have value.
Actually, I want get check mark(√) from Chr(251) function in Client Report Definition (RDLC) but it gets me û!
I think it interprets four numbers as hex not decimal.
Using a leading zero forces the Windows to interpret the code in the Windows-1252 set. Without 0 the code is interpreted using the OEM set.
Alt+251:
You'll get √, because you'll use OEM 437, where 251 is for square root.
I'll get ¹, because I'll use OEM 850, where 251 is for superscript 1.
Alt+0251:
Both of us will get û, because we'll use Windows-1252, where 251 is for u-circumflex.
This is historical.
From ASCII to Unicode
At the beginning of DOS/Windows, characters were one byte wide and were from the American alphabet, the conversion was set using the ASCII encoding.
Additional characters were needed as soon as the PC was used off the US (many languages use accents for instance). So different codepages were designed and different encoding tables were used for conversion.
But a computer in the US wouldn't use the same codepage than one in Spain. This required the user and the programmer to assume the currently active codepage, and this has been a great period in the history of computing...
At the same period it was determined that using only one byte was not going to make it, more than 256 characters were required to be available at the same time. Different encoding systems were designed by a consortium, and collectively known as Unicode.
In Unicode "characters" can be one to four bytes wide, and the number of bytes for one character may vary in the same string.
Other notions have been introduced, such as codepoint and glyph to deal with the complexity of written language.
While Unicode was being adopted as a standard, Windows retained the old one-byte codepages for efficiency, simplicity and retro-compatibility. Windows also added codepages to deal with glyphs found only in Unicode.
Windows has:
A default OEM codepage which is usually 437 in the US -- your case -- or 850 in Europe -- my case --, used with the command line ("DOS"),
the Windows-1252 codepage (aka Latin-1 and ISO 8859-1, but this is a misuse) to ease conversion to/from Unicode. Current tendency is to replace all such extended codepages by Unicode. Java designers make a drastic decision and use only Unicode to represent strings.
When entering a character with the Alt method, you need to tell Windows which codepage you want to use for its interpretation:
No leading zero: You want the OEM codepage to be used.
Leading zero: You want the Windows codepage to be used.
Note on OEM codepages
OEM codepages are so called because for the first PC/PC-Compatible computers the display of characters was hard-wired, not software-done. The computer had a character generator with a fixed encoding and graphical definitions in a ROM. The BIOS would send a byte and a position (line, position in line) to the generator, and the generator would draw the corresponding glyph at this position. This was named "text-mode" at the time.
A computer sold in the US would have a different character ROM than one sold in Germany. This was really dependent on the manufacturer, and the BIOS was able to read the value of the installed codepage(s).
Later the generation of glyphs became software-based, to deal with unlimited fonts, style, and size. It was possible to define a set of glyphs and its corresponding encoding table at the OS level. This combination could be used on any computer, independently of the installed OEM generator.
Software-generated glyphs started with VGA display adapters, the code required for the drawing of glyphs was part of the VGA driver.
As you understood, +0251 is ASCII character, it does not represent a number.
You must understand that when you write 0 to the left of numbers it does not have any value but here it is ASCII codes and not numbers.

Morse code to ASCII converter using vhdl

I want to implement a Morse code to ASCII converter in vhdl and then to transmit that ASCII character to PC terminal through UART. I have completed the UART part.
But i don't know how to implement the converter part. The problem is with varying symbol rate of input Morse code. I want to detect dot, dash, character space and word space.
Please help me for the following implementation:
detection of Morse code symbols with variable symbol rate
binary search tree to traverse the Morse code tree to find the corresponding ASCII character.
It seems as if you're planning to do the converter in two stages, which sounds like a decent idea - first stage for decoding dots, dashes and pauses, second stage for assembling these into characters.
For the first stage, you'll need to define some minimum and maximum timings for your input - that is, what is the shortest and longest a dot / dash / etc can last (I'd take it there are some standards for this?).
With these timings it should be possible to use a counter and an edge detector to decode your input morse signal.
For the second stage, you can create a finite state machine or maybe even just a look-up-table to decode the incoming dots and dashes of a single character, and send this along to your UART.
You need to create a module which sample your input and from the pattern you record, use some fixed pattern to out the corresponding letter. You need to study the width of your received signal to estimate how much your sample rate should be. You can try to read and understand the following shared code on github. https://github.com/altugkarakurt/Morse-Code-Converter which will do the reverse process of generating morse code. This can be used as a test generator for your Morse Reader. The code ASCII extract from Morse detector can be feed into a FIFO, which is a standard circuit and connected to your UART. The Morse detector write the ASCII into the FIFO if the FIFO is not full, while the UART read the ASCII from the FIFO and transmit it to the UART. The FIFO acts as a buffer. If you need some standard UART code look at the Xilinx website which has one UART part of the Microblaze design, it is very simple to interface it.

SMS Text message - do line breaks count as characters?

I'm developing a system to integrate with an SMS API, and I was wondering whether or not newlines count towards the character limit, I can't find any documentation on this.
Thanks
Yes, it does. ANY character you embed in the message counts against the limit, whether it's "visible" or not. Even if it's not a printable character, you've still sent that character across the cell network and caused the cell network to use up (as they claim) vast amounts of bandwidth that much be charged ludicrous rates to handle.
Any character counts toward your SMS limit. Line breaks, included.
I actually can't find a standard or anything. But I do know the message size is limited to 160 7-bit characters, 140 8-bit characters, or 70 16-bit characters, depending on the alphabet used.
Sorry, I can't give you any sources.
I just tested it on my phone. Yes, a line break counts
Yes, each line break, space and carriage return adds to your character count.
1 space = 1 character
1 line break = 2 characters
Yes it does. also as a side note, i noticed that if there are any unicode characters in the text, the entire message is treated as unicode and the length of the message is multiplyed by three.

Resources