What is the relationship between GS1 standard barcodes and generic barcode symbols? - barcode

We are implementing GS1 linear barcodes (Code 128, Code 39) and GS1 2D barcodes (Data Matrix, PDF41 [sic]) and also GS1 DataBar barcodes in our application. Since I am new to this, I have few questions regarding GS1-type barcodes.
What is the relation between GS1 standard barcodes and generic barcode symbols?
Can any text be made into GS1-type barcode (e.g. GS1 DataMatrix) or does the text have to follow a certain mandatory format?
Thanks in advance.

A quick nit: There is no such format as a GS1 Code 39 nor "PDF41" (I
presume that you meant PDF417 but as with Code 39 this is not a
GS1-adopted carrier symbol.)
With the exception of the purpose-designed GS1 DataBar family of barcodes the remaining GS1 barcode symbologies are "application standards" of various general-purpose ISO/IEC barcode standards adapted for dedicated use within the supply chain industry and are specified by the GS1 General Specifications. Each format was created as a specialisation of some pre-existing carrier symbology (Code 128, Data Matrix, QR Code), meaning that they work within the limits of the existing specification to produce a more-restricted, special-purpose variant that is optimised for use in their particular application – for example by reducing the range of available symbol sizes, applying constraints on the data capacity and specifying a particular structure for the encapsulated data.
With regard to having a mandatory data structure, where available in the carrier symbology the GS1 specialisations mandate the use the "FNC1 in first position" mechanism to indicate the presence of data that conforms to the GS1 Application Identifier standard format described by the GS1 specifications. The product data is thus represented within this standard format and encoded within the carrier symbol using a scheme that is broadly similar across the GS1 symbologies.
The "extraction" part of this answer gives details of formatting data according to the GS1 AI standard format structure.

Related

Understanding Barcodes of Variant Products

This is an exceptionally general question that likely has a yes/no answer.
Let's say we have a line of shoes in our retail store.
Size 5 and Size 6 both have different assigned barcodes, as I've learned is the standard.
Great, we can now track them as different products.
Barcodes have a manufacturer identifier on the left, and a product identifier on the right.
My question: if we look at the barcodes for the Size 5 and Size 6 of our shoes, can we ever know that they are both from the same line of product? Just from the barcode?
As far as I can see, there is no such information within barcodes. The two products are simply variants, yet their barcodes make them appear completely different. One could be a shoe, and one could be a pack of birthday balloons.
Or, can we tell, from a barcode, that two products are actually variants (in this cases, sizes) of the same product?
We could, of course, do a barcode lookup with an API, but there does not seem to be, in any of the JSON data I've looked at, any way to associate them with each other. Looking at MPN numbers, also, this does not seem to be a thing.
Titles can be similar, but they are rarely exactly the same.
Welcome to SO. I work with barcodes of different kinds.
The question you have does in principle have a yes/no answer, if you think of regular "retail" barcodes such as EAN13 or UPC. These correctly as you say have 4 parts: [country][company][item][checkdigit]. In this way, each product is unique by itself, and unrelated to other products.
But not all barcodes are like this. Other barcodes may contain data identifiers in a structured way. The most common is the GS1 identifiers. Different barcodes ("symbologies") can carry this structure, common variants are EAN128, Datamatrix/GS1 and QR/GS1, but others may be used as well. Using this logic, you can create a relation between your products, using things like LOT code or internal company numbers.
Finally, nothing prevents you from creating your own barcoding strategy using a separate code on the product/box. It could be a code128 with a structure you build yourself, such as product family, type, ...

Accuracy of barcodes

In books (e.g. "Barcodes for Mobile Devices", ISBN 978-0-521-88839-4), Papers (e.g. "Bar Codes May Have Poorer Error Rates Than Commonly Believed", DOI: 10.1373/clinchem.2010.153288) or websites information about the accuracy or error rates of barcodes are given.
The given numbers vary for e.g. Code39 from 1 error in 1.7 million, over 1 error in 3 million to 1 error in 4.5 million.
Where do these numbers come from and how can one calculate it (e.g. for Code39)?
In the definition of Code39 in ISO/IEC 16388:2007 I also couldn't find usefull information.
The "error rate" these numbers describe is the read error rate, i.e. how often a barcode may be read incorrectly when scanned. In order for barcodes to be useful this needs to be a very low value and so barcode formats that have lower read error rates are potentially better (although there are other factors involved as well).
These numbers are presumably determined by scientific testing. In the website you linked to there is a further link to a study by Ohio University that describes the methodology they used, which is an example of how this can be done:
An automated test apparatus was constructed and used for the test. The apparatus included a robot which loaded carrier sheets onto oscillating stages that were moved under four fixed mounted, “hand held” moving beam, visible laser diode bar code scanners. Scanner output was a series of digital pulses. Decoding of all symbols was performed in a computer using software programs based on standard reference decode algorithms. Each symbol was scanned by
each scanner until 283 decodes were obtained. [...] An error occurred and was recorded whenever the decoded data did not match the encoded data for a given symbol.
Got to barcodefaq site you linked to and click on the Barcode type e.g. UPC and you will get a PDF that explains the methodology used. The article cited explains the error encountered as well as containing links to further informatiion.

How to add redundancy into an OCR-scanned code

This is more of an algorithmy question - I am not very mathematical so was looking for an engineery solution... If this is off topic for SO let me know and I will delete the question.
I created a mashup of open source goodness to do Optical Character Recognition on difficult backgrounds: https://github.com/metalaureate/tesseract-docker-ocr
I want to use it to scan labels with a pre-defined ID code, e.g., 2826672. The accuracy is about 70% for digits.
Question: how do I add redundancy programmatically to my code to increase accuracy to 99%, and how do I decode it? I can imagine some really kludgy ways, like doubling and inverting the digits, but I don't know how to do this in a way that honors information theory without my having to translate a lot of math.
How do I add and decode digits to correct for OCR errors?
If you have the freedom of actually printing the labels, then there's no real reason to stick with plain ol' numbers. Use QR codes instead. Both the size (information capacity) and information redundancy is configurable, so you can customize it to fit your specific scenario. Internally, Reed-Solomon error correction is used. They offer There are plenty of libraries for both QR code generation and recognition from a scan.
Further info is available in Wikipedia.

Accuracy of barcode vs qrcode?

I want to develop a supermarket application for checking and billing.
Should I use barcodes or qrcodes? Which will give better accuracy?
The biggest difference here is that a linear barcode (e.g. Code 3 of 9, UPC, EAN, etc.) and a 2-dimensional symbology (e.g. QRCode, DataMatrix, etc.) store data in very different ways. A linear barcode can be read with a simple laser scanner, while most 2-D symbologies require an imager in order to be read. In general, imagers can also read linear barcodes, but are also more expensive than laser scanners.
You will want to consider whether your customers may already have linear scanners only, or whether they would be willing to pay the premium for an imager in order to get the benefit of the extra data that can be encoded in the 2-D symbologies.
Both will be accurate, the question is how much data do you need to store. QR has much more capacity than something like 3of9 barcode.

How does PDF417 barcode decoding recover from damaged labels?

I recently learned about PDF417 barcodes and I was astonished that I can still read the barcode after I ripped it in half and scanned only a fragment of the original label.
How can the barcode decoding be that robust? Which (types of) algorithms are used during encoding and decoding?
EDIT: I understand the general philosophy of introducing redundancy to create robustness, but I'm interested in more details, i.e. how this is done with PDF417.
the pdf417 format allows for varying levels of duplication/redundancy in its content. the level of redundancy used will affect how much of the barcode can be obscured or removed while still leaving the contents readable
PDF417 does not use anything. It's a specification of encoding of data.
I think there is a confusion between the barcode format and the data it conveys.
The various barcode formats (PDF417, Aztec, DataMatrix) specify a way to encode data, be it numerical, alphabetic or binary... the exact content though is left unspecified.
From what I have seen, Reed-Solomon is often the algorithm used for redundancy. The exact level of redundancy is up to you with this algorithm and there are libraries at least in Java and C from what I've been dealing with.
Now, it is up to you to specify what the exact content of your barcode should be, including the algorithm used for redundancy and the parameters used by this algorithm. And of course you'll need to work hand in hand with those who are going to decode it :)
Note: QR seems slightly different, with explicit zones for redundancy data.
I don't know the PDF417. I know that QR codes use Reed Solomon correction. It is an oversampling technique. To get the concept: suppose you have a polynomial in the power of 6. Technically, you need seven points to describe this polynomial uniquely, so you can perfectly transmit the information about the whole polynomial with just seven points. However, if one of these seven is corrupted, you miss the information whole. To work around this issue, you extract a larger number of points out of the polynomial, and write them down. As long as you have at least seven out of the bunch, it will be enough to reconstruct your original information.
In other words, you trade space for robustness, by introducing more and more redundancy. Nothing new here.
I do not think the concept of trade off between space and robustness is any different here as anywhere else. Think RAID, let's say RAID 5 - you can yank a disk out of the array and the data is still available. The price? - an extra disk. Or in terms of the barcode - extra space the label occupies

Resources