I recently bought a Barcode scanner online and when i got it, I noticed that the entire user guide is in chinese (Simplified)... I was wondering if there was some sort of OCR software out there that could take a scanned copy (.jpg) and turn it into an English-translated copy (.txt or .doc) ? I have tried JOCR.exe and that works perfectly with every language except chinese, japanese, and other foreign languages. to use those languages i need to aquire the language packs for MS Office's OCR Plugin. If OCR is not the best method in this situation, then what would be? Any and all Advice would be appreciated!
Related
The Book "Mastering the Lightning Network" is accessible through Github (https://github.com/lnbook/lnbook) and there are multiple asciidoc files that make the book. The License is that it is free for the own use to generate a pdf. So I looked a bit how to do it and it seems not too easy. It sounds a bit like Latex, but I found not a quick way to make a pdf from the files.
I found something like Asciidoctor . I am not sure if this is the best approach.
So I would be grateful for hints how to generate the book in the easiest way.
AsciiDoctor is based on Ruby plus many other related dependancies thus a Mahoosive potential installation of over 32000 files with a learning curve like LaTeX or any other heavyweight book publishing system.
You asked for the easiest way to "generate" in book format, and you can literally read the docs in a few clicks, but the license specifically allows you to only produce a pdf or similar for private use and you dont need to do a full book at a time for personal everyday reading/reviewing.
Note a similar query was how to combine all the book chapters in
Convert a folder containing asciidocs and pictures to pdf and that could perhaps be answered by building your own book.asciidoc or .asoc as suggested in Generate single PDF from multiple Asciidoc files, however for a few "books for the road" visual or audio, the Ruby Plus Asciidoc installation is overkill and potential for frustrations. Primarily this books navigation is still incomplete.
For the intervening months this year (2022) "You can't create ebooks in PDF, HTML EPUB or any other format unless it is for personal use only and not shared/distributed." https://github.com/lnbook/lnbook#licensing-change-in-12-months
Thus you can print parts to pdf if you need them say while traveling with your ebook / audio reader, but must not do so, for your family, friends or other individuals etc...
Here I am reading the source in FireFox (on the right) and checking a PDF live compilation in my portable book reader. (But please dont read that, its my personal copy for tommorow :-)
The AsciiDoc reader is at https://github.com/asciidoctor/asciidoctor-browser-extension and you simply point the extension at the chapter you wish to read.
The license specifically states:
Mastering the Lightning Network is released under the Creative Commons CC-BY-NC-ND license, which allows sharing the source code for personal use only. You may read this book for free. You may not create derivatives (such as PDF copies), or distribute the book commercially.
The source code is free for personal use, but it explicitly does not permit generation of derivatives such as PDF copies. As such, you won't get help to willfully violate the license.
I recently started working and came to know AdobeAcrobat DC which I really, really like as a tool to archive and send around information.
Especially loved that you could ...
Convert images (JPEG etc.) to PDF
Combine - Merge multiple pdfs into one document and order them
Compress - Enhance and test around compression
Use the typewriter feature to fill out forms
(really delete) blacken content to send around without exposing confidential parts of a document
OCR - search a scanned document
Really liking the software I thought about buying it. Having a MacOS I sadly had to realize I could only get the Adobe Acrobat DC Pro, because the Standard is not available for my OS.
Now the real shock started. Looking at the price I either could pay 29,74 € a month on demand, or up front either 17,84*12= 214,08 € per year or 665,21 € ones. I was shocked and thought thats way to much for a software just moving data from one format to another!
Then I tried to research and test alternatives using this page and also Mac's own tools (Automater, Preview). Again I was shocked, either the UI was really bad or the features were not even close to Adobe DC.
That was the time I started to ask myself, are there patents
protecting Adobe? Why is there no competition earning the name? For
Windows their is at least Foxit and Nitro ... but for Mac?
Does somebody could help out, and state if the technology is really
tough to reengineer or if there are patents protecting the format?
First and foremost, there is the ISO 32000 standard, and anyone is free to create a viewer which is compliant to that standard. In fact, this is kind of encouraged.
Why Acrobat? Well, Adobe has 25 years of experience and knowledge on how to create a PDF viewer. Adobe also has the necessary resources. And Adobe has the market power… which they do use.
Apple would have the resources to do something useful, but Apple gives a shit about PDF (that's obvious, considering Preview.app messing up certain PDF documents beyond all repair).
Update, added after comment by OP:
About the features mentioned in the OP… Most of these implementations by Adobe are well done, and kind of mature technology. However, there are other implementations out there, and anyone is free to do something following ISO 32000… In detail:
• Converting to PDF is, of course, a side effect of the PDF-creation library Adobe uses in their other applications (InDesign, Illustrator, PostScript, etc.). This library is considered to be top class, and therefore, using it does make a lot of sense…
• Combining… Again, it is the library Adobe developed (for their own and for licenceable use), which can very well read, understand and write PDF. For this feature, there are other good products out in the market.
• Compressing… Smart use of compression provides an optimum between file size and handling speed. Good use and implementation of the various compression schemes.
• Typewriter feature… A bit of a marketing gimmick; it is essenitally an "add Text" annotation. The sole advantage of a form filled with the typewriter tool over a hand-filled form is that the further is better readability. Forms should be fillable, and their data should be retrievable without human interference.
• Redaction… Indeed, Adobe made good progress on this. The industry standard for Redaction is, however, still a plug-in (or server application) for Acrobat (Pro); (Redax by Appligent is the product).
• OCR… Adobe did acquire and license some very good OCR tools; for certain kind of text, the leading third party tools are somewhat better, for others, Acrobat would be the choice.
PDF's ubiquity, specially in academia, makes being able to highlight them (and saving these annotations) extremely important. Some academic journals (specially in law) allows the user to send journal articles to a Kindle reader, which makes reading and taking notes extremely easy. The question is how to take the underlined text from the MyClippings.txt file to the PDF. About a year ago I found that this is possible through an action in Adobe Acrobat Pro X which would parse a text file which is feeded to it and would highlight the relevant sections. The action takes advantage of the Search and Redact tool but instead of redacting, it highlights.
However, I would like to get out of the Adobe environment (for different reasons, one of which is that the Adobe reader is demanding resource wise and not-free). Skim in Mac OSX sounds like a good alternative for its support of AppleScript integration. I found two projects in GitHub which attempt to do this, but with both of them I failed.
my-clippings-to-pdf
Skim-AppleScript
Would anyone with knowledge in AppleScript take a look at that code and tell me if they look sound? It seems this would be a great functionality for integrating PDFs and ePub in a useful and meaningful way specially for academics.
Actually I'm using App Multilingual Toolkit from Microsoft to manage translations from two different languages. I decided to translate it in Brazilian, so I'd like a friend from Brazil to translate of the strings for me. He hasbn't got any programming knowledge, Visual Studio tools and so on.
The question is how can I export from VS and pass him all the string in a readable format? Something that will be easy for me to import later as well.
I'm not sure of AMT, but I used the gnu text translation toolkit (Windows version), which was very easy to use. You can use a webapp called Pootle that non-technical users can use to provide translated strings.
Users would update the strings via the website, which I could download an updated .po file from, this was then added to my deployment, using a little C# helper that read the strings and displayed the correct version depending on the user's language. It was remarkably easy all round, and as Gettext uses English words as the key, if you add a word that is not translated for a language you will get the English 'default' instead which is better than "error: word not defined" :)
IIRC I got into this because someone on the SVN team enabled the tool as a test, since then they've moved to Transifex - I'm not sure if its significantly better as its a commercial web tool, but it might work for you.
That said, there's also Google translation toolkit - good luck getting to it as its now hidden behind Google's "one account login" guff.
There used to be a utility software, appTranslator (http://www.apptranslator.com/), that helped a lot when trying to translate MFC programs.
It will not only help you translate the strings (STRINGTABLE) but will help you translate dialog content as well (update strings in dialogs, move resources if applicable, ... )
It has not been updated in a couple of years.
(just quickly tried it and it seems to work).
I would say CSV is the most readable way to export/import
Doing some software review for a RIA project - I was hoping to use Flex but need to make sure it has full UTF-8 support - I'm talking all fonts for all languages - everything from English, to Finish, to Russian, to Japanese to Thai to Sanskrit...
I haven't worked with Flash/Flex/ActionScript in years - but I seem to remember it's up to the font you embed into the movie - so if you have, say MS Arial UniCode that has the full character set you simply include in the movie and the support is there to display the characters? Is this right?
Also including that level of character support(that large a font) -how much does that bloat the application?
Any insight would be helpful as I am still in the information gather stage.
Other software suggestions would also be appreciated.
Thanks
JD
If you use ActionScript 3 (and you should), all strings are Unicode.
And if you use the newer text components (Flash 10) then the text engine supports complex scripts (including Russia, Japanese, and Indic scripts).
All you would have to do is make sure you have the right fonts. You might embed your own (with the mandatory bloat that you can't avoid if you embed a 30 MB Chinese font :-)
In practice you will probably just use the system fonts.
Among others because there are no free and good quality Chinese/Japanese fonts. And you have not right to embed the font without the proper licensing (and the prices are not low :-)