I am communicating with a servo via RS232 serial. The built-in functions that came with my servo are too slow (25 ms for a simple 54 byte message on a 57,600 baud port), so I am trying to write my own communication functions, however the built-in functions are not documented. I have used a port monitor to determine what information is being sent to the servo and I need help deciphering the results.
I used the built-in functions to command the servo to "goto" incrementally increasing steps (1, 2, 3, etc.). This resulted 5 packets being sent to the servo for each "goto" command. The first 4 packets are identical for each "goto" command. I have attached about 50 hex packet below (1 per line). If you need more, post, and we can work something out.
10 13 04 20 00 01 B6 24 E9 68
10 13 04 20 00 00 AE 24 54 82
10 13 04 20 00 00 B5 24 8B 0B
10 13 04 20 00 01 43 01 71 9B
The 5th packet varies based on the step the motor is being commanded to move to. I have included 1 packet here as an example. I have attached a file with about 1000 of these packets (1 per line).
10 13 08 20 03 01 11 25 0A 00 00 00 81 CF
The first 8 bytes of this packet (10 13 08 20 03 01 11 25) appear to be the actual "goto" command. They remain the same no matter what step is specified.
The last 6 bytes (0A 00 00 00 81 CF) change based upon the step that is requested. In the file I attached, I instructed the servo to initially goto step "0", then "1", "2", etc. The first 4 bytes appear to be a little-endian integer corresponding to the number of steps (i.e. the sample command I showed above instructs the servo to goto step 10 decimal).
My question regards the last 2 bytes of the command. They appear to vary randomly, but whenever the specified step is the same they match. This leads me to believe that these 2 bytes are a checksum of some kind. My question to you is: how is the checksum calculated?
I have already tried xor'ing all the bytes, both singly and in 2 byte pairs, and I tried Fletcher's checksum, and a simple checksum (sum of all bytes). I also checked the 2's complement of each of these methods (though I certainly wouldn't mind someone checking to make sure I didn't make a mistakes in the calculations). Does anyone have any ideas?
10 13 08 20 03 01 11 25 00 00 00 00 E9 64
10 13 08 20 03 01 11 25 01 00 00 00 9F D0
10 13 08 20 03 01 11 25 02 00 00 00 04 0C
10 13 08 20 03 01 11 25 04 00 00 00 23 95
10 13 08 20 03 01 11 25 05 00 00 00 55 21
10 13 08 20 03 01 11 25 06 00 00 00 CE FD
10 13 08 20 03 01 11 25 07 00 00 00 B8 49
10 13 08 20 03 01 11 25 08 00 00 00 6C A7
10 13 08 20 03 01 11 25 09 00 00 00 1A 13
10 13 08 20 03 01 11 25 0A 00 00 00 81 CF
10 13 08 20 03 01 11 25 0C 00 00 00 A6 56
10 13 08 20 03 01 11 25 0D 00 00 00 D0 E2
10 13 08 20 03 01 11 25 0F 00 00 00 3D 8A
10 13 08 20 03 01 11 25 10 10 00 00 00 17 FA
10 13 08 20 03 01 11 25 11 00 00 00 84 77
10 13 08 20 03 01 11 25 12 00 00 00 1F AB
10 13 08 20 03 01 11 25 13 00 00 00 69 1F
10 13 08 20 03 01 11 25 14 00 00 00 38 32
10 13 08 20 03 01 11 25 15 00 00 00 4E 86
10 13 08 20 03 01 11 25 16 00 00 00 D5 5A
10 13 08 20 03 01 11 25 17 00 00 00 A3 EE
10 13 08 20 03 01 11 25 18 00 00 00 77 00
10 13 08 20 03 01 11 25 19 00 00 00 01 B4
10 13 08 20 03 01 11 25 1A 00 00 00 9A 68
10 13 08 20 03 01 11 25 1B 00 00 00 EC DC
10 13 08 20 03 01 11 25 1C 00 00 00 BD F1
10 13 08 20 03 01 11 25 1D 00 00 00 CB 45
10 13 08 20 03 01 11 25 1E 00 00 00 50 99
10 13 08 20 03 01 11 25 1F 00 00 00 26 2D
10 13 08 20 03 01 11 25 20 00 00 00 DE 2A
10 13 08 20 03 01 11 25 21 00 00 00 A8 9E
10 13 08 20 03 01 11 25 22 00 00 00 33 42
10 13 08 20 03 01 11 25 24 00 00 00 14 DB
10 13 08 20 03 01 11 25 25 00 00 00 62 6F
10 13 08 20 03 01 11 25 26 00 00 00 F9 B3
10 13 08 20 03 01 11 25 27 00 00 00 8F 07
10 13 08 20 03 01 11 25 28 00 00 00 5B E9
10 13 08 20 03 01 11 25 29 00 00 00 2D 5D
10 13 08 20 03 01 11 25 2A 00 00 00 B6 81
10 13 08 20 03 01 11 25 2B 00 00 00 C0 35
10 13 08 20 03 01 11 25 2C 00 00 00 91 18
10 13 08 20 03 01 11 25 2D 00 00 00 E7 AC
10 13 08 20 03 01 11 25 2E 00 00 00 7C 70
10 13 08 20 03 01 11 25 2F 00 00 00 0A C4
10 13 08 20 03 01 11 25 30 00 00 00 C5 8D
10 13 08 20 03 01 11 25 31 00 00 00 B3 39
10 13 08 20 03 01 11 25 32 00 00 00 28 E5
10 13 08 20 03 01 11 25 33 00 00 00 5E 51
10 13 08 20 03 01 11 25 34 00 00 00 0F 7C
10 13 08 20 03 01 11 25 35 00 00 00 79 C8
10 13 08 20 03 01 11 25 36 00 00 00 E2 14
10 13 08 20 03 01 11 25 37 00 00 00 94 A0
10 13 08 20 03 01 11 25 38 00 00 00 40 4E
10 13 08 20 03 01 11 25 39 00 00 00 36 FA
10 13 08 20 03 01 11 25 3A 00 00 00 AD 26
10 13 08 20 03 01 11 25 3B 00 00 00 DB 92
10 13 08 20 03 01 11 25 3C 00 00 00 8A BF
10 13 08 20 03 01 11 25 3D 00 00 00 FC 0B
10 13 08 20 03 01 11 25 3E 00 00 00 67 D7
10 13 08 20 03 01 11 25 3F 00 00 00 11 63
this is a late answer, but hopefully this can help for other CRC re-engineering tasks:
Your CRC is a derivation of the so-called "16 bit width CRC as designated by CCITT", but with "init value zero".
The CRC is calculated from byte position 3 to byte position 12 of your example data. e.g.
08 20 03 01 11 25 00 00 00 00
The full CRC specification according to our CRC specification overview is:
CRC:16,1021,0000,0000,No,No
The problem was not only to find the right CRC polynomial, but finding the following answers:
Which part of the data is included in the CRC calculation, and which is not.
Which init value to use? Apply final xor value?
Does this algorithm expect reflected input or output values?
Again, see our manual description or the Boost CRC library on what this means.
What I did is running a brute-force script that simply tries out several popular 16 bit CRC polynomials with all kinds of combinations of start/end positions, initial values, reflected versions. Here is how the processing output looked:
Finding CRC for test message (HEX): 10 13 08 20 03 01 11 25 00 00 00 00 E9 64
Trying CRC spec : CRC:16,1021,FFFF,0000,No,No
Trying CRC spec : CRC:16,8005,0000,0000,No,No
Trying CRC spec : CRC:16,8005,FFFF,0000,No,No
Trying CRC spec : CRC:16,1021,FFFF,FFFF,No,No
Trying CRC spec : CRC:16,1021,0000,FFFF,No,No
Trying CRC spec : CRC:16,1021,0000,0000,No,No
Found it!
Relevant sequence for checksum from startpos=3 to endpos=12
08 20 03 01 11 25 00 00 00 00
CRC spec: CRC:16,1021,0000,0000,No,No
CRC result: E9 64 (Integer = 59748)
With the result I could re-calculate the checksum of your example telegrams correctly
19.09.2016 12:18:12.764 [TX] - 10 13 08 20 03 01 11 25 00 00 00 00 E9 64
19.09.2016 12:18:14.606 [TX] - 10 13 08 20 03 01 11 25 01 00 00 00 9F D0
19.09.2016 12:18:16.030 [TX] - 10 13 08 20 03 01 11 25 02 00 00 00 04 0C
I uploaded the documented CRC finder example script which works with the free Docklight Scripting V2.2 evaluation. I assume this can be very useful for other CRC re-engineering puzzles, too.
The example also helped to solve Stackoverflow question 22219796
Related
I am currently using C# to retrieve frames from a borescope (via the FFMPEG library). However, I came across a problem weeks ago and I can't solve it.
The images are returned in JPEG format (since the borescope stream is MJPEG).
Some images come without quality problems, but others come with a strange line in the middle
followed by random staining. (At the end of the question there is an example of a normal image and one with problems).
Analyzing the structure of the files, I realized that there are some differences, but I don't really understand JPEG's binary structure very well, and I can't tell what is corrupted.
Getting to know what is corrupted in the image, which culminates in the quality problem, is very important to me because, through this, I can discard the frame using C#. However, without understanding this problem, I can't even discard the frame, much less fix it.
So, having the image without quality problems as a reference, what is the problem with the binary structure of the image with quality problems?
Examples:
JPEG 1: Image without quality problems
Image's preview (just to see the quality, do not download from here)
JPEG 2: Image with quality problems
Image's preview (just to see the quality, do not download from here)
It's possible to look into binary structure of images through online HEX editors like: Online hex editor, Hexed or Hex-works.
Thank you for reading and have a nice day.
There are at least 2 issues with the file.
The first I can detect with ImageMagick by running this command:
magick identify -verbose image.jpg
and it tells me that the data segment ends prematurely.
Image: outExemplo0169.jpeg
Format: JPEG (Joint Photographic Experts Group JFIF format)
Mime type: image/jpeg
Class: DirectClass
Geometry: 640x480+0+0
Units: Undefined
Colorspace: sRGB
Type: TrueColor
Base type: Undefined
Endianess: Undefined
Depth: 8-bit
Channel depth:
Red: 8-bit
Green: 8-bit
Blue: 8-bit
Channel statistics:
Pixels: 307200
Red:
min: 0 (0)
max: 255 (1)
mean: 107.234 (0.420527)
standard deviation: 66.7721 (0.261851)
kurtosis: -0.67934
skewness: 0.577494
entropy: 0.92876
Green:
min: 0 (0)
:2020-02-26T18:59:19+00:00 0:00.057 0.070u 7.0.9 Resource identify[80956]: resource.c/RelinquishMagickResource/1067/Resource
Memory: 3686400B/0B/32GiB
identify: Corrupt JPEG data: premature end of data segment `outExemplo0169.jpeg' # warning/jpeg.c/JPEGWarningHandler/399.
The second I can see with exiftool when I run this command:
exiftool -v -v -v outExemplo0169.jpeg
ExifToolVersion = 11.11
FileName = outExemplo0169.jpeg
Directory = .
FileSize = 66214
FileModifyDate = 1582743337
FileAccessDate = 1582743559
FileInodeChangeDate = 1582743337
FilePermissions = 33188
FileType = JPEG
FileTypeExtension = JPG
MIMEType = image/jpeg
JPEG APP0 (14 bytes):
0006: 4a 46 49 46 00 01 01 00 00 01 00 01 00 00 [JFIF..........]
+ [BinaryData directory, 9 bytes]
| JFIFVersion = 1 1
| - Tag 0x0000 (2 bytes, int8u[2]):
| 000b: 01 01 [..]
| ResolutionUnit = 0
| - Tag 0x0002 (1 bytes, int8u[1]):
| 000d: 00 [.]
| XResolution = 1
| - Tag 0x0003 (2 bytes, int16u[1]):
| 000e: 00 01 [..]
| YResolution = 1
| - Tag 0x0005 (2 bytes, int16u[1]):
| 0010: 00 01 [..]
| ThumbnailWidth = 0
| - Tag 0x0007 (1 bytes, int8u[1]):
| 0012: 00 [.]
| ThumbnailHeight = 0
| - Tag 0x0008 (1 bytes, int8u[1]):
| 0013: 00 [.]
JPEG SOF0 (15 bytes):
0018: 08 01 e0 02 80 03 01 21 00 02 11 01 03 11 01 [.......!.......]
ImageWidth = 640
ImageHeight = 480
EncodingProcess = 0
BitsPerSample = 8
ColorComponents = 3
JPEG DQT (130 bytes):
002b: 00 03 03 03 03 03 03 04 03 03 03 04 04 04 05 06 [................]
003b: 09 06 06 05 05 06 0c 08 09 07 09 0e 0c 0e 0e 0d [................]
004b: 0c 0d 0d 0f 11 15 12 0f 10 14 10 0d 0d 13 19 13 [................]
005b: 14 16 17 18 18 18 0f 12 1a 1c 1a 17 1c 15 17 18 [................]
006b: 17 01 04 04 04 06 05 06 0b 06 06 0b 17 0f 0d 0f [................]
007b: 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 [................]
008b: 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 [................]
[snip 18 bytes]
JPEG DHT (416 bytes):
00b1: 00 00 01 05 01 01 01 01 01 01 00 00 00 00 00 00 [................]
00c1: 00 00 01 02 03 04 05 06 07 08 09 0a 0b 10 00 02 [................]
00d1: 01 03 03 02 04 03 05 05 04 04 00 00 01 7d 01 02 [.............}..]
00e1: 03 00 04 11 05 12 21 31 41 06 13 51 61 07 22 71 [......!1A..Qa."q]
00f1: 14 32 81 91 a1 08 23 42 b1 c1 15 52 d1 f0 24 33 [.2....#B...R..$3]
0101: 62 72 82 09 0a 16 17 18 19 1a 25 26 27 28 29 2a [br........%&'()*]
0111: 34 35 36 37 38 39 3a 43 44 45 46 47 48 49 4a 53 [456789:CDEFGHIJS]
[snip 304 bytes]
JPEG SOS
JPEG EOI
Unknown trailer (50 bytes at offset 0x10274):
10274: 42 6f 75 6e 64 61 72 79 45 42 6f 75 6e 64 61 72 [BoundaryEBoundar]
10284: 79 53 00 00 01 00 90 08 01 00 fb 4b db 6a 2a 22 [yS.........K.j*"]
10294: 00 00 2a 22 00 00 01 00 01 00 80 02 00 00 e0 01 [..*"............]
102a4: 00 00
So there are 50 extraneous bytes at the end including the text string "BoundaryEBoundaryS" which may be recognisable to you as coming from somewhere else in your processing chain?
One test you could do for JPEG quality is check the last 2 bytes are a valid EOI which means it should end in FF D9 - see here.
In proto stream when decoding the file size and data block length in varint, the data block length is bigger than the file size? Please Help
the data block length is bigger than the file size
That sounds unlikely, and is most likely the result of (any one of):
a broken encoder (very unlikely if you're using any of the established implementations, but not impossible)
corruption of the data-stream in transit (the most common variant of this being: using a text-encoding such as UTF-8 backwards to try to get a string - when if a string is really needed, something like base-16 (hex) or base-64 should be used)
accidentally truncating a stream prematurely
a broken decoder
It sounds like you're parsing the stream manually, so frankly my guess is the last option. If you can post the bytes you're decoding we can probably advise on whether your interpretation is correct. Alternatively, you can try pushing your data through https://protogen.marcgravell.com/decode which will pull apart the data-stream and show how it has interpreted the bytes.
00 E4 02 00 A5 0A F0 3E 1E 08 C9 3F 12 19 08 05 12 03 01 00 05 18 75 2A 0E AE 44 93 42 AF 44 BF 3F B2 44 B1 44 B0 44 0A 03 08 93 42 22 33 12 31 42 2F 0A 0A 54 72 61 6E 73 69 74 69 6F 6E 12 04 6E 6F 6E 65 19 00 00 01 02 08 F0 3F 29 01 07 01 01 54 30 00 58 EB D4 FF AA 07 80 01 00 2A 03 08 AF 44 32 03 08 AE 44 3A 0D 05 78 AF 44 8A 01 03 08 BF 3F 98 01 01 A2 01 03 08 B0 44 DA 01 03 08 B2 44 A2 02 03 08 B1 44 D2 02 01 26 01 06 28 AF 44 1F 08 AE 44 12 1A 08 07 12 05 94 A0 C3 01 22 08 0A 04 0A 02 01 06 18 01 2A 04 B3 44 DC 3B 0A BE 01 0A AF 01 0A 3E 0A 1F 0A 0A 0D 00 00 C8 42 15 00 00 5A 43 12 01 0C E0 4E 44 15 00 00 E1 43 18 03 25 00 00 00 00 12 03 08 C9 3F 1A 12 08 04 10 02 18 01 25 00 00 40 41 2D 00 00 00 3F 30 00 28 00 38 00 12 03 08 DC 3B 1A 68 08 00 10 00 2A 62 12 1D 3D 18 1A 54 0A 0E 08 01 12 01 12 08 00 00 15 01 48 0C 0A 0E 08 02 05 10 05 5F 00 00 3A 10 00 04 E1 43 15 20 05 30 14 E1 43 0A 02 08 05 3E 44 00 14 12 03 08 B3 44 22 01 05 30 30 01 10 03 15 08 B3 44 12 10 08 D1 0F 09 E4 F0 4A 6C 2A 04 83 42 C7 41 12 03 08 A8 3F 1A 1F 41 66 74 65 72 20 4B 65 79 6E 6F 74 65 20 38 3F 29 0A 62 75 6C 6C 65 74 20 77 6F 6D 62 61 74 2A 0D 0A 07 08 00 12 03 08 83 42 0A 02 08 12 32 08 0A 06 08 00 10 00 18 00 3A 09 0A 07 08 01 69 10 C7 41 50 01 72 15 17 04 9A 01 05 0B 10 12 02 65 6E C2 09 0B 18 10 00 18 00 1F 08 AF 62 65 01 0C C0 41 B5 44 4E 65 01 04 A0 41 39 06 00 40 9E 65 01 08 C0 41 1A 42 65 01 00 40 C2 65 01 00 40 3A 65 01 00 40 5E 65 01 00 B5 21 65 00 B5 21 65 0C 02 15 08 B5 2E 65 01 18 4F 2A 04 AE 3B A2 40 29 65 08 06 57 6F 25 4C 31 33 08 A2 40 32 25 26 21 1B 00 3A 11 15 04 AE 3B 86 48 01 18 12 08 B2 44 12 0D 08 2D C9 0C 05 2A 02 B4 61 35 10 B4 44 15 08 B4 2E 7D 00 00 3E 01 7D 0C AC 3B 08 04 05 7F 15 77 04 AC 3B 82 77 00 3D B4 20 0F 08 B1 44 12 0A 08 E7 17 49 36 0C 00 1F 08 B0 62 C4 01 08 DB 40 B6 3E C4 01 18 00 44 15 00 80 3D 44 25 C4 20 48 42 15 00 00 E8 41 18 00 92 29 03 04 DB 40 32 C4 01 0D 3D 6A 29 03 05 22 7D 29 05 10 04 E8 41 36 29 03 04 E8 41 5A 29 03 00 B6 21 C4 00 B6 21 C4 18 01 17 08 B6 44 12 12 75 29 20 4C 2A 06 AE 3B C0 40 B7 44 25 47 10 1A 03 EF BF BC 35 4C 00 C0 56 C3 01 00 4A 31 CE 04 B7 44 2E CE 01 32 57 01 00 B7 21 57 00 FB 2D C4 44 11 0A 04 0A 00 10 00 10 00 22 07 64 65 63 69 6D 61 6C
Decoding the first six bytes
I've successfully been able to retrieve the card number and expiry date from a contactless debit/credit card. However, the cardholder name is not being returned in the READ RECORD command response. Am I missing a something?
- Select Application
# IN_DATA_EXCHANGE
>> D4 40 01 00 A4 04 00 07 A0 00 00 00 03 10 10 00
<< D5 41 00 6F 43 84 07 A0 00 00 00 03 10 10 A5 38 50 10 56 69 73 61 20 20 20 20 20 20 20 20 20 20 20 20 9F 38 18 9F 66 04 9F 02 06 9F 03 06 9F 1A 02 95 05 5F 2A 02 9A 03 9C 01 9F 37 04 BF 0C 08 9F 5A 05 31 08 26 08 26 90 00
- Read the card
# IN_DATA_EXCHANGE
>> D4 40 01 00 B2 01 0C 00
<< D5 41 00 70 12 57 10 XX XX XX XX XX XX XX XX D1 50 52 01 00 00 00 01 90 00
It's not uncommon for an EMV payment card to not reveal the cardholder name over the contactless interface. In fact, all major brands have introduced this as a privacy feature. On many cards the cardholder name field (tag 5F20) is present but filled with a string like " /" to indicate that the cardholder name is not to be revealed. At least for Visa cards (like the one you have above) the cardholder name field is not mandatory (and if its not present, its value should be assumed as " /"). You might want to also check other records/files on the card. Some cards also provide this field in response to the GET PROCESSING OPTIONS command only.
I am writing a code to parse MFT of NTFS. I`m trying analyse Data Run of non residental $INDEX_ALLOCATION attrib:
11 01 2C 11 02 FE 11 00
9F 0B 21 01 DB 00 21 01
D9 00 21 01 E0 00 21 01
F6 00 21 01 10 01 00 F1
After regroup I see problem in Data Run No 3:
DataRun 1: 11 01 2C
DataRun 2: 11 02 FE
DataRun 3: 11 00 9F <- what does mean "00" ?
I tried analyse it using Active Disk Editor 3 and this software decompose it to:
DataRun 3: 11 00 9F 0B
In my opinion header of DataRun 3 ("11") mean 1 length and 1 offset so there should be 2 bytes after header, but there are 3 bytes.
Any idea?
I'd like to screen some jpegs for validity before I send them across the network for more extensive inspection. It is easy enough to check for a valid header and footer, but what is the smallest size (in bytes) a valid jpeg could be?
A 1x1 grey pixel in 125 bytes using arithmetic coding, still in the JPEG standard even if most decoders can't decode it:
ff d8 : SOI
ff e0 ; APP0
00 10
4a 46 49 46 00 01 01 01 00 48 00 48 00 00
ff db ; DQT
00 43
00
03 02 02 02 02 02 03 02
02 02 03 03 03 03 04 06
04 04 04 04 04 08 06 06
05 06 09 08 0a 0a 09 08
09 09 0a 0c 0f 0c 0a 0b
0e 0b 09 09 0d 11 0d 0e
0f 10 10 11 10 0a 0c 12
13 12 10 13 0f 10 10 10
ff c9 ; SOF
00 0b
08 00 01 00 01 01 01 11 00
ff cc ; DAC
00 06 00 10 10 05
ff da ; SOS
00 08
01 01 00 00 3f 00 d2 cf 20
ff d9 ; EOI
I don't think the mentioned 134 byte example is standard, as it is missing an EOI. All decoders will handle this but the standard says it should end with one.
That file can be generated with:
#!/usr/bin/env bash
printf '\xff\xd8' # SOI
printf '\xff\xe0' # APP0
printf '\x00\x10'
printf '\x4a\x46\x49\x46\x00\x01\x01\x01\x00\x48\x00\x48\x00\x00'
printf '\xff\xdb' # DQT
printf '\x00\x43'
printf '\x00'
printf '\x03\x02\x02\x02\x02\x02\x03\x02'
printf '\x02\x02\x03\x03\x03\x03\x04\x06'
printf '\x04\x04\x04\x04\x04\x08\x06\x06'
printf '\x05\x06\x09\x08\x0a\x0a\x09\x08'
printf '\x09\x09\x0a\x0c\x0f\x0c\x0a\x0b'
printf '\x0e\x0b\x09\x09\x0d\x11\x0d\x0e'
printf '\x0f\x10\x10\x11\x10\x0a\x0c\x12'
printf '\x13\x12\x10\x13\x0f\x10\x10\x10'
printf '\xff\xc9' # SOF
printf '\x00\x0b'
printf '\x08\x00\x01\x00\x01\x01\x01\x11\x00'
printf '\xff\xcc' # DAC
printf '\x00\x06\x00\x10\x10\x05'
printf '\xff\xda' # SOS
printf '\x00\x08'
printf '\x01\x01\x00\x00\x3f\x00\xd2\xcf\x20'
printf '\xff\xd9' # EOI
and opened fine with GNOME Image Viewer 3.38.0 and GIMP 2.10.18 on Ubuntu 20.10.
Here's an upload on Imgur. Note that Imgur process the file making it larger however if you download it to check, and as seen below, the width=100 image shows white on Chromium 87:
It occurs to me you could make a progressive jpeg with only the DC coefficients, that a single grey pixel could be encoded in 119 bytes. This reads just fine in a few programs I've tried it in (Photoshop, GNOME Image Viewer 3.38.0, GIMP 2.10.18, and others).
ff d8 : SOI
ff db ; DQT
00 43
00
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01
ff c2 ; SOF
00 0b
08 00 01 00 01 01 01 11 00
ff c4 ; DHT
00 14
00
01 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
03
ff da ; SOS
00 08
01 01 00 00 00 01 3F
ff d9 ; EOI
The main space savings is to only have one Huffman table. Although this is slightly smaller than the 125 byte arithmetic encoding given in another answer, the arithmetic encoding without the JFIF header would be smaller yet (107 bytes), so that should still be considered the smallest known.
The above file can be generated with:
#!/usr/bin/env bash
printf '\xff\xd8' # SOI
printf '\xff\xdb' # DQT
printf '\x00\x43'
printf '\x00'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\x01\x01\x01\x01\x01\x01\x01\x01'
printf '\xff\xc2' # SOF
printf '\x00\x0b'
printf '\x08\x00\x01\x00\x01\x01\x01\x11\x00'
printf '\xff\xc4' # DHT
printf '\x00\x14'
printf '\x00'
printf '\x01\x00\x00\x00\x00\x00\x00\x00'
printf '\x00\x00\x00\x00\x00\x00\x00\x00'
printf '\x03'
printf '\xff\xda' # SOS
printf '\x00\x08'
printf '\x01\x01\x00\x00\x00\x01\x3F'
printf '\xff\xd9' # EOI
Try the following (134 bytes):
FF D8 FF E0 00 10 4A 46 49 46 00 01 01 01 00 48 00 48 00 00
FF DB 00 43 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
FF FF FF FF FF FF FF FF FF FF C2 00 0B 08 00 01 00 01 01 01
11 00 FF C4 00 14 10 01 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 FF DA 00 08 01 01 00 01 3F 10
Source: Worlds Smallest, Valid JPEG? by Jesse_hz
Found "the tiniest GIF ever" with only 26 bytes.
47 49 46 38 39 61 01 00 01 00
00 ff 00 2c 00 00 00 00 01 00
01 00 00 02 00 3b
Python literal:
b'GIF89a\x01\x00\x01\x00\x00\xff\x00,\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x00;'
While I realize this is far from the smallest valid jpeg and has little or nothing to do with your actual question, I felt I should share this as I'd been looking for a very small JPEG that actually looked like something to do some testing with when i'd found your question. I'm sharing it here because its valid, its small, and it makes me ROFL.
Here is a 384 byte JPEG image that I made in photoshop. It is the letters ROFL hand drawn by me and then saved with max compression settings while still being sort of readable.
Hex sequences:
my #image_hex = qw{
FF D8 FF E0 00 10 4A 46 49 46 00 01 02 00 00 64
00 64 00 00 FF EC 00 11 44 75 63 6B 79 00 01 00
04 00 00 00 00 00 00 FF EE 00 0E 41 64 6F 62 65
00 64 C0 00 00 00 01 FF DB 00 84 00 1B 1A 1A 29
1D 29 41 26 26 41 42 2F 2F 2F 42 47 3F 3E 3E 3F
47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47
47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47
47 47 47 47 47 47 47 47 47 47 47 47 01 1D 29 29
34 26 34 3F 28 28 3F 47 3F 35 3F 47 47 47 47 47
47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47
47 47 47 47 47 47 47 47 47 47 47 47 47 47 47 47
47 47 47 47 47 47 47 47 47 47 47 47 47 FF C0 00
11 08 00 08 00 19 03 01 22 00 02 11 01 03 11 01
FF C4 00 61 00 01 01 01 01 00 00 00 00 00 00 00
00 00 00 00 00 00 04 02 05 01 01 01 01 00 00 00
00 00 00 00 00 00 00 00 00 00 00 02 04 10 00 02
02 02 02 03 01 00 00 00 00 00 00 00 00 00 01 02
11 03 00 41 21 12 F0 13 04 31 11 00 01 04 03 00
00 00 00 00 00 00 00 00 00 00 00 00 21 31 61 71
B1 12 22 FF DA 00 0C 03 01 00 02 11 03 11 00 3F
00 A1 7E 6B AD 4E B6 4B 30 EA E0 19 82 39 91 3A
6E 63 5F 99 8A 68 B6 E3 EA 70 08 A8 00 55 98 EE
48 22 37 1C 63 19 AF A5 68 B8 05 24 9A 7E 99 F5
B3 22 20 55 EA 27 CD 8C EB 4E 31 91 9D 41 FF D9
}; #this is a very tiny jpeg. it is a image representaion of the letters "ROFL" hand drawn by me in photoshop and then saved at the lowest possible quality settings where the letters could still be made out :)
my $image_data = pack('H2' x scalar(#image_hex), #image_hex);
my $url_escaped_image = uri_escape( $image_data );
URL escaped binary image data (can paste right into a URL)
%FF%D8%FF%E0%00%10JFIF%00%01%02%00%00d%00d%00%00%FF%EC%00%11Ducky%00%01%00%04%00%00%00%00%00%00%FF%EE%00%0EAdobe%00d%C0%00%00%00%01%FF%DB%00%84%00%1B%1A%1A)%1D)A%26%26AB%2F%2F%2FBG%3F%3E%3E%3FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG%01%1D))4%264%3F((%3FG%3F5%3FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG%FF%C0%00%11%08%00%08%00%19%03%01%22%00%02%11%01%03%11%01%FF%C4%00a%00%01%01%01%01%00%00%00%00%00%00%00%00%00%00%00%00%00%04%02%05%01%01%01%01%00%00%00%00%00%00%00%00%00%00%00%00%00%00%02%04%10%00%02%02%02%02%03%01%00%00%00%00%00%00%00%00%00%01%02%11%03%00A!%12%F0%13%041%11%00%01%04%03%00%00%00%00%00%00%00%00%00%00%00%00%00!1aq%B1%12%22%FF%DA%00%0C%03%01%00%02%11%03%11%00%3F%00%A1~k%ADN%B6K0%EA%E0%19%829%91%3Anc_%99%8Ah%B6%E3%EAp%08%A8%00U%98%EEH%227%1Cc%19%AF%A5h%B8%05%24%9A~%99%F5%B3%22%20U%EA'%CD%8C%EBN1%91%9DA%FF%D9
Here's the C++ routine I wrote to do this:
bool is_jpeg(const unsigned char* img_data, size_t size)
{
return img_data &&
(size >= 10) &&
(img_data[0] == 0xFF) &&
(img_data[1] == 0xD8) &&
((memcmp(img_data + 6, "JFIF", 4) == 0) ||
(memcmp(img_data + 6, "Exif", 4) == 0));
}
img_data points to a buffer containing the JPEG data.
I'm sure you need more bytes to have a JPEG that will decode to a useful image, but it's a fair bet that if the first 10 bytes pass this test, the buffer probably contains a JPEG.
EDIT: You can, of course, replace the 10 above with a higher value once you decide on one. 134, as suggested in another answer, for example.
It is not a requirement that JPEGs contain either a JFIF or Exif marker. But they must start with FF D8, and they must have a marker following that, so you can check for FF D8 FF.