I'm hoping that someone here has worked with UPC/Coupon barcodes before. Part of every UPC/Coupon barcode is a manufacturer code. I need to database every manufacturer code. Does anyone have any idea where to find a database of these manufacturer codes?
A quick Google search for "upc database" turned up several different ones. Sadly, you are probably not going to find a definitive database. Some stores use their own UPCs internally, and then you also have the abbreviated UPCs as well (look at a can of Pepsi to see what I mean).
See https://opendata.stackexchange.com/questions/562/does-there-exist-a-global-database-of-all-products-with-ean-13-barcodes which has a very similar question. Among the options listed in that thread The Product Open Data project looks to be the most promising. There you can download their entire catalogue database, and that database has a column of all the Global Company Prefixes (GCPs) they have captured.
Here are a few resources I found:
http://www.upcdatabase.com/
http://www.ean-search.org/
If you are in the U.S. (I am assuming), you can use this site to search by GTIN, which (in the U.S.) is the same as a 12-digit UPC code (from a barcode scan) with the digits "00" prefixing the UPC, making it a total of 14 digits and GTIN-14 compliant. To see an example, use "00028000133177" for a test search.
That search will give you the GLN for the manufacturer.
I've found the wikipedia's explaination of UPCs to also be very helpful.
Related
I'm trying to create a database for my project to lookup mac vendors. I added a UNIQUE key on the prefix column. When inserting rows from an officially published MA-L csv file, I got the duplicate entry error from DB. Then I looked it up in the csv file and found 3 entries for prefix '080030'.
Is the file wrong or I'm misunderstanding how to use the OUI list? If I want to look up the vendor of a mac with prefix '08:00:30', which one of the three is correct?
There are currently two duplicate assignments in the MA data files.
Registry,Assignment,Organization Name,Organization Address
MA-L,080030,NETWORK RESEARCH CORPORATION,2380 N. ROSE AVENUE OXNARD CA US 93010
MA-L,080030,ROYAL MELBOURNE INST OF TECH,GPO BOX 2476V MELBOURNE VIC AU 3001
MA-L,080030,CERN,CH-1211 GENEVE SUISSE/SWITZ CH 023
Registry,Assignment,Organization Name,Organization Address
MA-L,0001C8,THOMAS CONRAD CORP.,1908-R KRAMER LANE AUSTIN TX US 78758
MA-L,0001C8,CONRAD CORP.,
The IEEE provides the following footnote on page 7 of the linked document.
The IEEE Registration Authority makes a concerted effort to avoid
duplicate assignments but does not guarantee that duplicate assignments
have not occurred. Global uniqueness also depends on proper use of
assignments and absence of faults that might result in duplication.
https://standards.ieee.org/content/dam/ieee-standards/standards/web/documents/tutorials/eui.pdf
The Wireshark OUI lookup tool, based on their own compiled list, gives one answer for which of these organizations is currently assigned.
Network Research Corporation
https://www.wireshark.org/tools/oui-lookup
And the maclookup website, which seems reasonable, gives a different answer.
CERN
https://maclookup.app/macaddress/080030
There are no timestamps in the data file. There is no explicit row sorting and ordering (and it doesn't look sorted or ordered). Bottom line, there seems to be no way to use the data file and supporting documents alone to determine which assignment is correct.
This is strange, never seen such a thing, are you sure you are pulling the correct file from IEEE.
I've reviewed the 3 companies you mentioned in the file, it has different Prefixes.
CERN Mac prefix is 80D336
Network Research Corporation has two prefixes, one of which the one mentioned in the question.
Couldn't find this company in any DB.
I think your parser is somehow corrupted especially if you are parsing the text file.
I am looking for the way for generating unique identifiers for different business entities in project I am working on. Currently we are using GUIDs, but we want to move to number identifiers. I have worked with Facebook Graph API and figured out they have some kind of unique and self-contained (as I supposed) ids. But I could not find any information or assumptions in the Internet about how they generate this keys.
I formulated some minimum requirments for identifiers based on my suggestions and investigation of Facebook ids. Here they are:
It must be presented as long / Int64 value.
It must be unique in scope of whole system.
It must contains as minimum type of requesting entity (user, page, etc.). So that it gives possibility to create сonvenient API like GET {host}/{id}.
Does anybody have any experience or ideas how can this task can be solved? I guess there should be some existing solutions or specifications for it. If anybody has some suggestions it would be great to hear that.
Ok, the answer for me was Twitter Snowflake. Also you can find some interesting useful information here: Sharding & IDs at Instagram. Hope it helps somebody.
I saw an intern opportunity in a bank in dubai. They have a defined problem statement to be solved in 2 months. They told us just 2 lines -
"Basically the problem is about name matching logic.
There are two fields (variables) – both are employer names, and it’s a free text field. So we need to write a program to match these two variables."
Can anyone help me in understanding it? Is it just a simple pattern matching stuff?
Any help/comments would be appreciated.
I think this is what they are asking for:
They have two sources of related data, for example, one from an internal database, and the other from name card input.
Because the two fields are free text fields, there will be inconsistency. For example, Nitin Garg, or Garg, Nitin, or Mr. Nitin Garg, etc. Here is an extreme case of Gadaffi.
What you are supposed to do is to find a way to match all the names for a specific person together.
In short, match two pieces of data together by employer names, taking possible inconsistency into account.
Once upon a time there was a nice simple answer to the problem of matching up names despite mis-spellings and different transliterations - Soundex. But people have put a lot of work into this problem, so now you should probably use the results of that work, which is built into databases and add-ons - some free. See Fuzzy matching using T-SQL and http://anastasiosyal.com/archive/2009/01/11/18.aspx and http://msdn.microsoft.com/en-us/magazine/cc163731.aspx
To make matter more specific:
How to detect people names (seems like simple case of named entity extraction?)
How to detect addresses: my best guess - find postcode (regexes); country and town names and take some text around them.
As for phones, emails - they could be probably caught by various regexes + preprocessing
Don't care about education/working experience at this point
Reasoning:
In order to build a fulltext index on resumes all vulnerable information should be stripped out from them.
P.S. any 3rd party APIs/services won't do as a solution.
The problem you're interested in is information extraction from semi structured sources. http://en.wikipedia.org/wiki/Information_extraction
I think you should download a couple of research papers in this area to get a sense of what can be done and what can't.
I feel it can't be done by a machine.
Every other resume will have a different format and layout.
The best you can do is to design an internal format and manually copy every resume content in there. Or ask candidates to fill out your form (not many will bother).
I think that the problem should be broken up into two search domains:
Finding information relating to proper names
Finding information that is formulaic
Firstly the information relating to proper names could probably be best found by searching for items that are either grammatically important or significant. I.e. English capitalizes only the first word of the sentence and proper nouns. For the gramatical rules you could look for all of the words that have the first letter of the word capitalized and check it against a database that contains the word and the type [i.e. Bob - Name, Elon - Place, England - Place].
Secondly: Information that is formulaic. This is more about the email addresses, phone numbers, and physical addresses. All of these have a specific formats that don't change. Use a regex and use an algorithm to detect the quality of the matches.
Watch out:
The grammatical rules change based on language. German capitalizes EVERY noun. It might be best to detect the language of the document prior to applying your rules. Also, another issue with this [and my resume sometimes] is how it is designed. If the resume was designed with something other than a text editor [designer tools] the text may not line up, or be in a bitmap format.
TL;DR Version: NLP techniques can help you a lot.
I am in the middle of designing a web form for German and French users. Within this form, the users would have to type street names several times.
I want to minimize the annoyance to the user, and offer autocomplete feature based on common French and German street names.
Any idea where I can a royalty-free list?
Would your users have to type the same street name multiple times? Because you could easily prevent this by coding something that prefilled the fields.
Another option could be to use your user database as a resource. Query it for all the available street names entered by your existing users and use that to generate suggestions.
Of course this would only work if you have a considerable number of users.
[EDIT] You could have a look at OpenStreetMap with their Planet.osm dumbs (or have a look here for a dump containing data for just Europe). That is basically the OSM database with all the map information they have, including street names. It's all in an XML format and streets seem to be stored as Ways. There are tools (i.e. Osmosis) to extract the data and put it into a database, or you could write something to plough through the data and filter out the street names for your database.
Start with http://en.wikipedia.org/wiki/Category:Streets_in_Germany and http://en.wikipedia.org/wiki/Category:Streets_in_France. You may want to verify the Wikipedia copyright isn't more protective than would be suitable for your needs.
Edit (merged from my own comment): Of course, to answer the "programmatically" part of your question: figure out how to spider and scrape those Wikipedia category pages. The polite thing to do would be to cache it, rather than hitting it every time you need to get the street list; refreshing once every month or so should be sufficient, since the information is unlikely to change significantly.
You could start by pulling names via Google API (just find e.g. lat/long outer bounds - of Paris and go to the center) - but since Google limits API use, it would probably take very long to do it.
I had once contacted City of Bratislava about the street names list and they sent it to me as XLS. Maybe you could try doing that for your preferred cities.
I like Tom van Enckevort's suggestion, but I would be a little more specific that just looking inside the Planet.osm links, because most of them require the usage of some tool to deal with the supported formats (pbf, osm xml etc)
In fact, take a look at the following link
http://download.gisgraphy.com/openstreetmap/
The files there are all in .txt format and if it's only the street names that you want to use, just extract the second field (name) and you are done.
As an fyi, I didn't have any use for the French files in my project, but mining the German files resulted (after normalization) in a little more than 380K unique entries (~6 MB in size)
#dusoft might be onto something - maybe someone at a government level can help? I don't think that a simple list of street names cannot be copyrighted, nor any royalties be charged. If that is the case, maybe you could even scrape some mapping data from something like a TomTom?
The "Deutsche Post" offers a list with all street names in Germany:
http://www.deutschepost.de/dpag?xmlFile=link1015590_3877
They don't mention the price, but I reckon it's not for free.