I am trying to use CoreAudio/AudioToolbox to play multiple MIDI files using different MIDISynth nodes. I have the samplers wired into a MultiChanelMixer which is in turn wired into the IO unit. I want to be able to change the different input volumes independently of one another. I'm attempting this with this line:
AudioUnitSetParameter(mixerUnit, kMultiChannelMixerParam_Volume, kAudioUnitScope_Input, UInt32(trackIndex), volume, 0)
The problem is that adjusting trackIndex 0 adjusts every input coming into the mixer, not just the one bus like I'm expecting it to.
Here is the output from CAShow of the master graph
AudioUnitGraph 0xC590003:
Member Nodes:
node 1: 'auou' 'rioc' 'appl', instance 0x60000002d580 O I
node 2: 'aumx' 'mcmx' 'appl', instance 0x60000002d680 O I
node 3: 'aumu' 'msyn' 'appl', instance 0x60000002db60 O I
node 4: 'aumu' 'msyn' 'appl', instance 0x60000002ef20 O I
node 5: 'aumu' 'msyn' 'appl', instance 0x60000002df00 O I
node 6: 'aumu' 'msyn' 'appl', instance 0x60800022d820 O I
Connections:
node 2 bus 0 => node 1 bus 0 [ 2 ch, 44100 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved]
node 3 bus 0 => node 2 bus 0 [ 2 ch, 44100 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved]
node 4 bus 0 => node 2 bus 1 [ 2 ch, 44100 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved]
node 5 bus 0 => node 2 bus 2 [ 2 ch, 44100 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved]
node 6 bus 0 => node 2 bus 3 [ 2 ch, 44100 Hz, 'lpcm' (0x00000029) 32-bit little-endian float, deinterleaved]
CurrentState:
mLastUpdateError=0, eventsToProcess=F, isInitialized=T, isRunning=T (2)
Here is the class I wrote to control all of this: https://gist.github.com/jadar/26d9625c875ce91dd2ad0ad63dfd8f80
Mixers are difficult because in a way the channels break the audio stream paradigm that core audio sets up, plus its different for the crosspoints than the input/output masters, and different for the global master.
Based on the code you provided, and assuming you're passing 0 for trackIndex, I would look at the volumes by using the property kAudioUnitProperty_MatrixLevels (which can be tricky to use, so let me know if you need help with that). It's possible that the crosspoint levels are not set correctly and that lowering trackIndex 0 (which is actually bus 0 channel 0) it's affecting everything you hear.
In case it's not clear, the way that the bus/channel paradigm works is it's a contiguous within the mixer itself. So if you have 4 busses with stereo channels and wanted to affect the right channel of bus 0, that would be mixer channel 7.
Related
How does multi-core processors handle interrupts?
I know of how single core processors handle interrupts.
I also know of the different types of interrupts.
I want to know how multi core processors handle hardware, program, CPU time sequence and input/output interrupt
This should be considered as a continuation for or an expansion of the other answer.
Most multiprocessors support programmable interrupt controllers such as Intel's APIC. These are complicated chips that consist of a number components, some of which could be part of the chipset. At boot-time, all I/O interrupts are delivered to core 0 (the bootstrap processor). Then, in an APIC system, the OS can specify for each interrupt which core(s) should handle that interrupt. If more than one core is specified, it means that it's up to the APIC system to decide which of the cores should handle an incoming interrupt request. This is called interrupt affinity. Many scheduling algorithms have been proposed for both the OS and the hardware. One obvious technique is to load-balance the system by scheduling the interrupts in a round-robin fashion. Another is this technique from Intel that attempts to balance performance and power.
On a Linux system, you can open /proc/interrupts to see how many interrupts of each type were handled by each core. The contents of that file may look something like this on a system with 8 logical cores:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 19 0 0 0 0 0 0 0 IR-IO-APIC 2-edge timer
1: 1 1 0 0 0 0 0 0 IR-IO-APIC 1-edge i8042
8: 0 0 1 0 0 0 0 0 IR-IO-APIC 8-edge rtc0
9: 0 0 0 0 1 0 0 2 IR-IO-APIC 9-fasteoi acpi
12: 3 0 0 0 0 0 1 0 IR-IO-APIC 12-edge i8042
16: 84 4187879 7 3 3 14044994 6 5 IR-IO-APIC 16-fasteoi ehci_hcd:usb1
19: 1 0 0 0 6 8 7 0 IR-IO-APIC 19-fasteoi
23: 50 2 0 3 273272 8 1 4 IR-IO-APIC 23-fasteoi ehci_hcd:usb2
24: 0 0 0 0 0 0 0 0 DMAR-MSI 0-edge dmar0
25: 0 0 0 0 0 0 0 0 DMAR-MSI 1-edge dmar1
26: 0 0 0 0 0 0 0 0 IR-PCI-MSI 327680-edge xhci_hcd
27: 11656 381 178 47851679 1170 481 593 104 IR-PCI-MSI 512000-edge 0000:00:1f.2
28: 5 59208205 0 1 3 3 0 1 IR-PCI-MSI 409600-edge eth0
29: 274 8 29 4 15 18 40 64478962 IR-PCI-MSI 32768-edge i915
30: 19 0 0 0 2 2 0 0 IR-PCI-MSI 360448-edge mei_me
31: 96 18 23 11 386 18 40 27 IR-PCI-MSI 442368-edge snd_hda_intel
32: 8 88 17 275 208 301 43 76 IR-PCI-MSI 49152-edge snd_hda_intel
NMI: 4 17 30 17 4 5 17 24 Non-maskable interrupts
LOC: 357688026 372212163 431750501 360923729 188688672 203021824 257050174 203510941 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 4 17 30 17 4 5 17 24 Performance monitoring interrupts
IWI: 2 0 0 0 0 0 0 140 IRQ work interrupts
RTR: 0 0 0 0 0 0 0 0 APIC ICR read retries
RES: 15122413 11566598 15149982 12360156 8538232 12428238 9265882 8192655 Rescheduling interrupts
CAL: 4086842476 4028729722 3961591824 3996615267 4065446828 4033019445 3994553904 4040202886 Function call interrupts
TLB: 2649827127 3201645276 3725606250 3581094963 3028395194 2952606298 3092015503 3024230859 TLB shootdowns
TRM: 169827 169827 169827 169827 169827 169827 169827 169827 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 0 0 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 7194 7194 7194 7194 7194 7194 7194 7194 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 0 0 0 0 0 0 Posted-interrupt notification event
PIW: 0 0 0 0 0 0 0 0 Posted-interrupt wakeup event
The first column specifies the interrupt request (IRQ) number. All the IRQ numbers that are in use can be found in the list. The file /proc/irq/N/smp_affinity contains a single value that specifies the affinity of IRQ N. This value should be interpreted depending on the current mode of operation of the APIC.
A logical core can receive multiple I/O and IPI interrupts. At that point, local interrupt scheduling takes place, which is also configurable by assigning priorities to interrupts.
Other programmable interrupt controllers are similar.
In general, this depends on the particular system you have under test.
The broader approach is to have a specific chip in each processor1 that is assigned, either statically or dinamically2, a unique ID and that can send and receive interrupts over a shared or dedicated bus.
The IDs allows specific processors to be targets of interrupts.
Code running on the processor A can ask its interrupt chip to raise an interrupt on processor B, when this happens a message is sent along the above-mentioned bus, routed to processor B where the relative interrupt chip picks it up, decode it and raise the corresponding interrupt.
At the system level, one or more, general interrupt controllers are present to route interrupt requests from the IO devices (in any bus) to the processors.
These controllers are programmable, the OS can balance the interrupt load across all the processors (or implement any other convenient policy).
This is the most flexible approach, a wired approach is also possible.
In this case, processor A signals are wired directly to processor B inputs and vice versa; asserting these signals give rise to an interrupt on the target processor.
The general concept is called Inter-processor Interrupt (IPI).
The x86 architecture follows the first approach closely3 (beware of the nomenclature though, processor has a different meaning).
Other architectures may not, like the IBM OS/360 M65MP that uses a wired approach4.
Software generated interrupts are just instructions in a program, each processor executes their own instruction stream and thus if program X generate an exception when running on processor A, it is processor A that handles it.
Task scheduling is distributed across all the processors usually (that's what Linux does.
Time-keeping is usually done by a designated processor that serves a hardware timer interrupt.
This is not always the case, I haven't looked at the precise details of the implementations of the modern OSes.
1 Usually an integrated chip, so we can think of it as a functional unit of the processor.
2 By a power-on protocol.
3 Actually, this is reverse causality.
4 I'm following the Wikipedia examples.
I want to distribute files across multiple servers and have them available with very little overhead. So I was thinking of the following naive algorithm:
Providing that each file has an unique ID number: 120151 I'm thinking of segmenting the files using the modulo (%) operator. This works if I know the number of servers in advance:
Example with 2 servers (stands for n servers):
server 1 : ID % 2 = 0 (contains even IDs)
server 2 : ID % 2 = 1 (contains odd IDs)
However when I need to scale this and add more servers I will have to re-shuffle the files to obey the new algorithm rules and we don't want that.
Example:
Say I add server 3 into the mix because I cannot handle the load. Server 3 will contain files that respect the following criteria:
server 3 : ID%3 = 2
Step 1 is to move the files from server 1 and server 2 where ID%3 = 2.
However, I'll have to move some files between server 1 and server 2 so that the following occurs:
server 1 : ID%3 = 0
server 2 : ID%3 = 1
What's the optimal way to achieve this?
My approach would be to use consistent hashing. From Wikipedia:
Consistent hashing is a special kind of hashing such that when a hash
table is resized and consistent hashing is used, only K/n keys need to
be remapped on average, where K is the number of keys, and n is the
number of slots.
The general idea is this:
Think of your servers as arranged on a ring, ordered by their server_id
Each server is assigned a uniformly distributed (random) id, e.g. server_id = SHA(node_name).
Each file is equally assigned a uniformly distributed id, e.g. file_id = SHA(ID), where ID is as given in your example.
Choose the server that is 'closest' to the file_id, i.e. where server_id > file_id (start choosing with the smallest server_id).
If there is no such node, there is a wrap around on the ring
Note: you can use any hash function that generates uniformly distributed hashes, so long as you use the same hash function for both servers and files.
This way, you get to keep O(1) access, and adding/removing is straight forward and does not require reshuffling all files:
a) adding a new server, the new node gets all the files from the next node on the ring with ids lower than the new server
b) removing a server, all of its files are given to the next node on the ring
Tom White's graphically illustrated overview explains in more detail.
To summarize your requirements:
Each server should store an (almost) equal amount of files.
You should be able to determine which server holds a given file - based only on the file's ID, in O(1).
When adding a file, requirements 1 and 2 should hold.
When adding a server, you want to move some files to it from all existing servers, such that requirements 1 and 2 would hold.
Your strategy when adding a 3rd server (x is the file's ID):
x%6 Old New
0 0 0
1 1 1
2 0 --> 2
3 1 --> 0
4 0 --> 1
5 1 --> 2
Alternative strategy:
x%6 Old New
0 0 0
1 1 1
2 0 0
3 1 1
4 0 --> 2
5 1 --> 2
To locate a server after the change:
0: x%6 in [0,2]
1: x%6 in [1,3]
2: x%6 in [4,5]
Adding a 4th server:
x%12 Old New
0 0 0
1 1 1
2 0 0
3 1 1
4 2 2
5 2 2
6 0 0
7 1 1
8 0 --> 3
9 1 --> 3
10 2 2
11 2 --> 3
To locate a server after the change:
0: x%12 in [0,2, 6]
1: x%12 in [1,3, 7]
2: x%12 in [4,5,10]
3: x%12 in [8,9,11]
When you add server, you can always build a new function (actually several alternative functions). The value of the divisor for n servers equals to lcm(1,2,...,n), so it grows very fast.
Note that you didn't mention if files are removed, and if you plan to handle that.
I'm looking all over the Internet for information in regards to calculating the frame length and it's been hard... I was able to successfully calculate the frame length in ms of MPEG-4, AAC, using:
frameLengthMs = mSamplingRate/1000
This works since there is one sample per frame on AAC. For MPEG-1 or MPEG-2 I'm confused. There are 1152 samples per frame, ok, so what do I do with that? :P
Frame sample:
MPEGDecoder(23069): mSamplesPerFrame: 1152
MPEGDecoder(23069): mBitrateIndex: 7
MPEGDecoder(23069): mFrameLength: 314
MPEGDecoder(23069): mSamplingRate: 44100
MPEGDecoder(23069): mMpegAudioVersion 3
MPEGDecoder(23069): mLayerDesc 1
MPEGDecoder(23069): mProtectionBit 1
MPEGDecoder(23069): mBitrateIndex 7
MPEGDecoder(23069): mSamplingRateFreqIndex 0
MPEGDecoder(23069): mPaddingBit 1
MPEGDecoder(23069): mPrivateBit 0
MPEGDecoder(23069): mChannelMode 1
MPEGDecoder(23069): mModeExtension 2
MPEGDecoder(23069): mCopyright 0
MPEGDecoder(23069): mOriginal 1
MPEGDecoder(23069): mEmphasis 0
MPEGDecoder(23069): mBitrate: 96kbps
The duration of an MPEG audio frame is a function of the sampling rate and the number of samples per frame. The formula is:
frameTimeMs = (1000/SamplingRate) * SamplesPerFrame
In your case this would be
frameTimeMs = (1000/44100) * 1152
Which yields ~26ms per frame. For a different sampling rate you would get a different duration. The key is MPEG audio always represents a fixed number of samples per frame, but the time duration of each sample is dependent on the sampling rate.
I didn't understand what do the Huffman tables of Jpeg contain, could someone explain this to me?
Thanks
Huffman encoding is a variable-length data compression method. It works by assigning the most frequent values in an input stream to the encodings with the smallest bit lengths.
For example, the input Seems every eel eeks elegantly. may encode the letter e as binary 1 and all other letters as various other longer codes, all starting with 0. That way, the resultant bit stream would be smaller than if every letter was a fixed size. By way of example, let's examine the quantities of each character and construct a tree that puts the common ones at the top.
Letter Count
------ -----
e 10
<SPC> 4
l 3
sy 2
Smvrkgant. 1
<EOF> 1
The end of file marker EOF is there since you generally have to have a multiple of eight bits in your file. It's to stop any padding at the end from being treated as a real character.
__________#__________
________________/______________ \
________/________ ____\____ e
__/__ __\__ __/__ \
/ \ / \ / \ / \
/ \ / \ / SPC l s
/ \ / \ / \ / \ / \
y S m v / k g \ n t
/\ / \
r . a EOF
Now this isn't necessarily the most efficient tree but it's enough to establish how the encodings are done. Let's first look at the uncompressed data. Assuming an eight-bit encoding, those thirty-one characters (we don't need the EOF for the uncompressed data) are going to take up 248 bits.
But, if you use the tree above to locate the characters, outputting a zero bit if you take the left sub-tree and a one bit if you take the right, you get the following:
Section Encoding
---------- --------
Seems<SPC> 00001 1 1 00010 0111 0101 (20 bits)
every<SPC> 1 00011 1 001000 00000 0101 (22 bits)
eel<SPC> 1 1 0110 0101 (10 bits)
eeks<SPC> 1 1 00101 0111 0101 (15 bits)
elegantly 1 0110 1 00110 001110 01000 01001 0110 00000 (36 bits)
.<EOF> 001001 001111 (12 bits)
That gives a grand total of 115 bits, rounded up to 120 since it needs to be a multiple of a byte, but that's still about half the size of the uncompressed data.
Now that's usually not worth it for a small file like this, since you have to add the space taken up by the actual tree itself(a), otherwise you cannot decode it at the other end. But certainly, for larger files where the distribution of characters isn't even, it can lead to impressive savings in space.
So, after all that, the Huffman tables in a JPEG are simply the tables that allow you to uncompress the stream into usable information.
The encoding process for JPEG consists of a few different steps (color conversion, chroma resolution reduction, block-based discrete cosine transforms, and so on) but the final step is a lossless Huffman encoding on each block which is what those tables are used to reverse when reading the image.
(a) Probably the best case for minimal storage of this table would be something like:
Size of length section (8-bits) = 3 (longest bit length of 6 takes 3 bits)
Repeated for each byte:
Actual length (3 bits, holding value between 1..6 inclusive)
Encoding (n bits, where n is the actual length)
Byte (8 bits)
End of table marker (3 bits) = 0 to distinguish from actual length above
For the text above, that would be:
00000011 8 bits
n bits byte
--- ------ -----
001 1 'e' 12 bits
100 0101 <SPC> 15 bits
101 00001 'S' 16 bits
101 00010 'm' 16 bits
100 0111 's' 15 bits
101 00011 'v' 16 bits
110 001000 'r' 17 bits
101 00000 'y' 16 bits
101 00101 'k' 16 bits
100 0110 'l' 15 bits
101 00110 'g' 16 bits
110 001110 'a' 17 bits
101 01000 'n' 16 bits
101 01001 't' 16 bits
110 001001 '.' 17 bits
110 001111 <EOF> 17 bits
000 3 bits
That makes the table 264 bits which totally wipes out the savings from compression. However, as stated, the impact of the table becomes far less as the input file becomes larger and there's a way to avoid the table altogether.
That way involves the use of another variant of Huffman, called Adaptive Huffman. This is where the table isn't actually stored in the compressed data.
Instead, during compression, the table starts with just EOF and a special bit sequence meant to introduce a new real byte into the table.
When introducing a new byte into the table, you would output the introducer bit sequence followed by the full eight bits of that byte.
Then, after each byte is output and the counts updated, the table/tree is rebalanced based on the new counts to be the most space-efficient (though the rebalancing may be deferred to improve speed, you just have to ensure the same deferral happens during decompression, an example being every time you add byte for the first 1K of input, then every 10K of input after that, assuming you've added new bytes since the last rebalance).
This means that the table itself can be built in exactly the same way at the other end (decompression), starting with the same minimal table with just the EOF and introducer sequence.
During decompression, when you see the introducer sequence, you can add the byte following it (the next eight bits) to the table with a count of zero, output the byte, then adjust the count and re-balance (or defer as previously mentioned).
That way, you do not have to have the table shipped with the compressed file. This, of course, costs a little more time during compression and decompression in that you're periodically rebalancing the table but, as with most things in life, it's a trade-off.
The DHT marker doesn't specify directly which symbol is associated with a code. It contains a vector with counts of how many codes there are of a given length. After that it contains a vector with symbol values.
So when you want to decode you have to generate the huffman codes from the first vector and then associate every code with a symbol in the second vector.
This is related with microcontrollers but thought to post it here because it is a problem with algorithms and data types and not with any hardware stuff. I'll explain the problem so that someone that doesn't have any hardware knowledge can still participate :)
In Microcontroller there is an Analog to Digital converter with 10
bit resolution. (It will output a
value between 0 and 1023)
I need to send this value to PC using the serial port.
But you can only write 8 bits at once. (You need to write bytes). It is
a limitation in micro controller.
So in the above case at least I need to send 2 bytes.
My PC application just reads a sequence of numbers for plotting. So
it should capture two consecutive
bytes and build the number back. But
here we will need a delimiter
character as well. but still the delimiter character has an ascii value between 0 - 255 then it will mixup the process.
So what is a simplest way to do this? Should I send the values as a sequence of chars?
Ex : 1023 = "1""0""2""3" Vs "Char(255)Char(4)"
In summary I need to send a sequence of 10 bit numbers over Serial in fastest way. :)
You need to send 10 bits, and because you send a byte at a time, you have to send 16 bits. The big question is how much is speed a priority, and how synchronised are the sender and receiver? I can think of 3 answers, depending on these conditions.
Regular sampling, unknown join point
If the device is running all the time, you aren't sure when you are going to connect (you could join at any time in the sequence) but sampling rate is slower than communication speed so you don't care about size I think I'd probably do it as following. Suppose you are trying to send the ten bits abcdefghij (each letter one bit).
I'd send pq0abcde then pq1fghij, where p and q are error checking bits. This way:
no delimiter is needed (you can tell which byte you are reading by the 0 or 1)
you can definitely spot any 1 bit error, so you know about bad data
I'm struggling to find a good two bit error correcting code, so I guess I'd just make p a parity bit for bits 2,3 and 4 (0, a b above) and q a parity bit for 5 6 and 7 (c,d,e above). This might be clearer with an example.
Suppose I want to send 714 = 1011001010.
Split in 2 10110 , 01010
Add bits to indicate first and second byte 010110, 101010
calculate parity for each half: p0=par(010)=1, q0=par(110)=0, p1=par(101)=0, q1=par(010)=1
bytes are then 10010110, 01101010
You then can detect a lot of different error conditions, quickly check which byte you are being sent if you lose synchronisation, and none of the operations take very long in a microcontroller (I'd do the parity with an 8 entry lookup table).
Dense data, known join point
If you know that the reader starts at the same time as the writer, just send the 4 ten bit values as 5 bytes. If you always read 5 bytes at a time then no problems. If you want even more space saving, and have good sample data already, I'd compress using a huffman coding.
Dense data, unknown join point
In 7 bytes you can send 5 ten bit values with 6 spare bits. Send 5 values like this:
byte 0: 0 (7 bits)
byte 1: 1 (7 bits)
byte 2: 1 (7 bits)
byte 3: 1 (7 bits)
byte 4: 0 (7 bits)
byte 5: 0 (7 bits)
byte 6: (8 bits)
Then whenever you see 3 1's in a row for the most significant bit, you know you have bytes 1, 2 and 3. This idea wastes 1 bit in 56, so could be made even more efficient, but you'd have to send more data at a time. Eg (5 consecutive ones, 120 bits sent in 16 bytes):
byte 0: 0 (7 bits) 7
byte 1: 1 (7 bits) 14
byte 2: 1 (7 bits) 21
byte 3: 1 (7 bits) 28
byte 4: 1 (7 bits) 35
byte 5: 1 (7 bits) 42
byte 6: 0 (7 bits) 49
byte 7: (8 bits) 57
byte 8: (8 bits) 65
byte 9: (8 bits) 73
byte 10: (8 bits) 81
byte 11: 0 (7 bits) 88
byte 12: (8 bits) 96
byte 13: (8 bits) 104
byte 14: (8 bits) 112
byte 15: (8 bits) 120
This is quite a fun problem!
The best method is to convert the data to an ASCII string and send it that way - it makes debugging a lot easier and it avoids various communication issues (special meaning of certain control characters etc).
If you really need to use all the available bandwidth though then you can pack 4 10 bit values into 5 consecutive 8 bit bytes. You will need to be careful about synchronization.
Since you specified "the fastest way" I think expanding the numbers to ASCII is ruled out.
In my opinion a good compromise of code simplicity and performance can be obtained by the following encoding:
Two 10bit values will be encoded in 3 bytes like this.
first 10bit value bits := abcdefghij
second 10bit value bits := klmnopqrst
Bytes to encode:
1abcdefg
0hijklmn
0_opqrst
There is one bit more (_) available that could be used for a parity over all 20bits for error checking or just set to a fixed value.
Some example code (puts 0 at the position _):
#include <assert.h>
#include <inttypes.h>
void
write_byte(uint8_t byte); /* writes byte to serial */
void
encode(uint16_t a, uint16_t b)
{
write_byte(((a >> 3) & 0x7f) | 0x80);
write_byte(((a & 3) << 4) | ((b >> 6) & 0x7f));
write_byte(b & 0x3f);
}
uint8_t
read_byte(void); /* read a byte from serial */
void
decode(uint16_t *a, uint16_t *b)
{
uint16_t x;
while (((x = read_byte()) & 0x80) == 0) {} /* sync */
*a = x << 3;
x = read_byte();
assert ((x & 0x80) == 0); /* put better error handling here */
*a |= (x >> 4) & 3;
*b = x << 6;
x = read_byte();
assert ((x & 0xc0) == 0); /* put better error handling here */
*b |= x;
}
I normally use a start byte and checksum and in this case fixed length, so send 4 bytes, the receiver can look for the start byte and if the next three add up to a know quantity then it is a good packet take out the middle two bytes, if not keep looking. The receiver can always re-sync and it doesnt waste the bandwidth of ascii. Ascii is your other option, a start byte that is not a number and perhaps four numbers for decimal. Decimal is definitely not fun in a microcontroller, so start with something non-hex like X for example and then three bytes with the hex ascii values for your number. Search for the x examine the next three bytes, hope for the best.