Best method of store & address to set of texts - data-structures

I have some texts from different authors with different address system in each text. For example it could be:
chapter & subchapter
page & column
page & starter words
chapter & verse
And so on. What is the best method to store that texts and access its content whits that address system? For example, one text has link to another text, chapter 4 subchapter 3.
As for now, I realized next system:
I have folder for each author, which has folders for each book. If that book have chapters - each chapter stored in separate file (as for now I don't want to keep it in files for simplify store and editing), if not - all book is stored in one file. All other addresses (pages, starter words for paragraphs) are sored as tags (maybe it should be xml file)
When I want to access text by api, I request Link object from Storage, fill it with addresses and send it to Storage back. That system is needed, because only Storage knows how text are stored. And if I want to change storage system to another (for example store each book as one file) I should change only Link realization without changing another code.

Related

How to extract a list of URLs from specific domain?

I'm using Firefox 53, and have Scrapbook X and want to save a lot of pages using the Save Multiple URLs feature, but before I do that, I want to extract a specific list of URLs without having to do so manually.
The site I'm looking at extracting data from is www.address-data.co.uk - namely this page.
What I want to do is extract only the URLs and subpages within that page but not the privacy policy or contact us page and all the sub-pages with the EH postcodes.
Is there a way to do this online, or any tool for Mac OS X that can find all related URLs before I copy them into Scrapbook's Save Multiple URLs (where I save them in a subfolder of Scrapbook)?
I assume that EH45 is typical of those you want to extract from the page you mentioned. Like its siblings it's of the form https://address-data.co.uk/postcode-district-EH<postcode number>.
This means that you can make a complete list of the urls if you have a list of the numbers, or of the postcodes.
My main difficulty in answering is that I don't know what tools (especially programming tools) you might have at your disposal. I will assume only that you have, or can obtain, access to an editor that can do macros or that can edit columns. On Windows I would use Emerald (ow known as Crimson).
Then copy the contents of the table in the EH page (not the table headings) and remove everything except the first column. Finally, prepend every item in the column with 'https://address-data.co.uk/postcode-district-'.
PS: This might also be a good question to put on SuperUser.

Kofax Seperate Main Invoice from Supporting Document without using Seperator sheet

When a batch gets created documents should get separated automatically without using separator sheet or Barcode separator.
How can I classify documents for Invoice and supporting document.
In our project we get many invoices with supporting document so the scanning person has to insert the separator sheets manually, so to avoid this we want to automatically classify the supporting documents.
In general the concept would be that you would enable separation in the project and then train your classes with examples to be used for the layout or content classifiers.
However, as I'm sure you've seen, the obstacle with invoices is that they are different enough between vendors that it would not reliably classify all to an Invoice class. Similarly with "Supporting Documents" which are likely to be very different from each other, so unfortunately there isn't a completely easy answer without separator sheets (or barcode stickers affixed to supporting docs).
What you might want to do is write code in the one of the separation events like Document_AfterSeparate event. Despite the name, the document has not yet been split at this point, but the classifiers have run. See Scripting Help topic "Server Script Events Sequence > Document Separation > Standard Document Separation" for more detail. Setting the SplitPage property on the CDocPage (pXDoc.CDoc.Pages.ItemByIndex(lPage).SplitPage) will allow you to use your own logic to determine which pages to separate.
For example if you know that you will always have single page invoices, you can split on the first page and classify accordingly. Or you can try to search for something that indicates the end of the invoice like "Total" or other characteristics. There is an example of how you can use locators to help separation in the Scripting Help topic "Script Samples > Use Locator Results for Standard Document Separation". The example uses a Barcode Locator, but the same concept works if you wanted to try it with a Format Locator or anything else.
Without Separator sheets you will need a smart classification software like Kofax Transformation Module (KTM). Its kind of expensive. you will need to verify the cost saving and ROI.

Magento Translation

I have Magento 1.8.1.0. Recently I've installed Russian pack, the result wasn't appropriate enough, cause some phrases on frontend remained in English
I know there's handy way to translate Magento using cvs-files.
The question is where I can find proper cvs-file? Does installed theme concerns translation some how? I know I'm asking newbie questions, I've read several posts, but I haven't made up my mind how to translate Magento.
Many thanks in advance.
Hope you are doing well,
As i have gone through your question that you want to translate your websites front end in Russian if user has selected the language Russian.
For this you are required to work out the translate.csv files which will be available in your theme Package.
Example : app/design/frontend/default/SecuareWeb/locale/de_DE
In the locale folder you will find the folder for Russian language open that folder and you will find the file where you are required to add the required translation text in it.
How to add translation text in translate.csv file is given below.
Example:
"This is the demo of translation in Russian","Это демо-трансляции на русском языке"
And one thing i would like add is that make sure your front end .phtml files must contain the text in $this->__("Example");. If you have added all the text like this then only then it will allow you for translation other wise it will not translate a text.
Hope this might be use full to you !!!
Waiting for your valuable comments in regards to your Question !!!
There are different ways to achieve translation in Magento so you can find multiple directory containing static csv files and also a database table.
All the modes have same structure: key/value. For example: "String to translate","String translated".
Inline Translation (database table: core_translate):
following best practices in Magento, you should use inline-translation aka database saved translation in rare cases. It is harder to mantain and can be buggy. It has first precedence, so any translation you do via inline translation will override the other 'modes'.
Theme level Translation (file in app/design/frontend/your_package/your_theme/ru_RU/translate.csv):
you can place any string to be translated in the translate.csv. It has second precedence.
Locale translation (file in app/locale/ru_RU/Module_Name.csv):
the suggested way to do translation as it will keep translation separated by each module and is easier to maintain. For example: Mage_Catalog.csv etc.
Each module in Magento can specify its csv file containing translation and sometimes the same string has different modules trying to translate, so if your translation does not work check between multiple file by a quick editor search. It will be overridden by the two above modes.
Note:
Magento will load all the csv files and build up a giant tree and caches it. So before scratching your head because the string is not translated as you wished in the frontend:
1. clean the cache.
2. check for any same key string which comes after your translated string. For example: in the same csv Line 100 will override Line 1 if the key string are the same.
3. check for any same key string in the mode which has higher precedence. For example: inline translation will override any csv based translated string.
It may be easier for you to go to the admin backend System -> Configuration -> Developer and switch "Translate Inline" "Enabled for Frontend" to "Yes".
Then, refresh the frontend and you can change the translation directly at your web browser.
The translation is saved in the database table core_translate just for the case you want to do it in a test environment and copy the translation later on to the production.
Take care that without client restrictions (System -> Configuration -> Developer) everyone will see the translation options.
btw. You may need to clear the cache and refresh the webpage in order to see your changes.

What is a proper way for naming of message properties in i18n?

We do have a website which should be translate into different languages. Some of the wording is in message properties files ready for translation. I want now add the rest of the text into these files.
What is a good way to name the text blocks?
<view>.<type>.<name>
We mostly have webpages and some of the elements/modules are repeating on some sites.
As far as I know, no "standard" exists. Therefore it is pretty hard to tell what is proper and what is improper way of naming resource keys. However, based on my experience, I could recommend this way:
property file name: <module>.properties
resource keys: <view or dialog>[.<sub-context>].<control-type>.<name>
We may discuss if it is proper way to put every strings from one module into one property files - probably it could be right if updates doesn't happen often and there are not so many messages. Otherwise you might think about one file per view.
As for key naming strategy: it is important for the Translator (sounds like film with honorable governor Arnold S. isn't it?) to have a Context. Translation may actually depend on it, i.e. in Polish you would translate a message in a different way if it is page/dialog/whatever title and in totally different way if it is text on a button.
One example of such resource key could be:
preferences.password_area.label.username=User name
It gives enough hints to the Translator about what it actually is, which could result in correct translation...
We have come up with the following key naming convention (Java, btw) using dot notation and camel case:
Label Keys (form labels, page/form/app titles, etc...i.e., not full sentences; used in multiple UI locations):
If the label represents a Java field (i.e., a form field) and matches the form label: label.nameOfField
Else: label.sameAsValue
Examples:
label.firstName = First Name
label.lastName = Last Name
label.applicationTitle = Application Title
label.editADocument = Edit a Document
Content Keys:
projectName.uiPath.messageOrContentType.n.*
Where:
projectName is the short name of the project (or a derived name from the Java package)
uiPath is the UI navigation path to the content key
messageOrContentType (e.g., added, deleted, updated, info, warning, error, title, content, etc.) should be added based on the type of content. Example messages: (1) The page has been updated. (2) There was an error processing your request.
n.* handles the following cases: When there are multiple content areas on a single page (e.g., when the content is separated by, an image, etc), when content is in multiple paragraphs or when content is in an (un)ordered list - a numeric identifier should be appended. Example: ...content.1, ...content.2
When there are multiple content areas on a page and one or more need to be further broken up (based on the HTML example above), a secondary numeric identifier may be appended to the key. Example: ...content.1.1, ...content.1.2
Examples:
training.mySetup.myInfo.content.1 = This is the first sentence of content 1. This is the second sentence of content 1. This content will be surrounded by paragraph tags.
training.mySetup.myInfo.content.2 = This is the first sentence of content 2. This is the second sentence of content 2. This content will also be surrounded by paragraph tags.
training.mySetup.myInfo.title = My Information
training.mySetup.myInfo.updated = Your personal information has been updated.
Advantages / Disadvantages:
+ Label keys can easily be reused; location is irrelevant.
+ For content keys that are not reused, locating the page on the UI will be simple and logical.
- It may not be clear to translators where label keys reside on the UI. This may be a non-issue for translators who do not navigate the pages, but may still be an issue for developers.
- If content keys must be used in more than one location on the UI (which is highly likely), the key name choice will not make sense in the other location(s). In our case, management is not concerned with a duplication of values for content areas, so we will be using different keys (to demonstrate the location on the UI) in this case.
Feedback on this convention - especially feedback that will improve it - would be much appreciated since we are currently revamping our resource bundles! :)
I'd propose the below convention
functionalcontext.subcontext.key
logicalcontext.subcontext.key
This way you can logically group all the common messages in a super context (id in the below example). There are few things that aren't specific to any functional context (like lastName etc) which you can group into logical-context.
order.id=Order Id
order.submission.submit=Submit Order
name.last=Last Name
the method that I have personally used and that I've liked more so far is using sentence to localisee as the key. For example: (pls replace T with the right syntax dependably on the language)
for example:
print(T("Hello world"))
in this case T will search for a key "Hello world". If it is not found then the key is returned, otherwise the value of the key.
In this way, you do not need to edit the message (in your default language) at least that you need to use parameters.... It saved me a LOT of dev time

Programmatically find common European street names

I am in the middle of designing a web form for German and French users. Within this form, the users would have to type street names several times.
I want to minimize the annoyance to the user, and offer autocomplete feature based on common French and German street names.
Any idea where I can a royalty-free list?
Would your users have to type the same street name multiple times? Because you could easily prevent this by coding something that prefilled the fields.
Another option could be to use your user database as a resource. Query it for all the available street names entered by your existing users and use that to generate suggestions.
Of course this would only work if you have a considerable number of users.
[EDIT] You could have a look at OpenStreetMap with their Planet.osm dumbs (or have a look here for a dump containing data for just Europe). That is basically the OSM database with all the map information they have, including street names. It's all in an XML format and streets seem to be stored as Ways. There are tools (i.e. Osmosis) to extract the data and put it into a database, or you could write something to plough through the data and filter out the street names for your database.
Start with http://en.wikipedia.org/wiki/Category:Streets_in_Germany and http://en.wikipedia.org/wiki/Category:Streets_in_France. You may want to verify the Wikipedia copyright isn't more protective than would be suitable for your needs.
Edit (merged from my own comment): Of course, to answer the "programmatically" part of your question: figure out how to spider and scrape those Wikipedia category pages. The polite thing to do would be to cache it, rather than hitting it every time you need to get the street list; refreshing once every month or so should be sufficient, since the information is unlikely to change significantly.
You could start by pulling names via Google API (just find e.g. lat/long outer bounds - of Paris and go to the center) - but since Google limits API use, it would probably take very long to do it.
I had once contacted City of Bratislava about the street names list and they sent it to me as XLS. Maybe you could try doing that for your preferred cities.
I like Tom van Enckevort's suggestion, but I would be a little more specific that just looking inside the Planet.osm links, because most of them require the usage of some tool to deal with the supported formats (pbf, osm xml etc)
In fact, take a look at the following link
http://download.gisgraphy.com/openstreetmap/
The files there are all in .txt format and if it's only the street names that you want to use, just extract the second field (name) and you are done.
As an fyi, I didn't have any use for the French files in my project, but mining the German files resulted (after normalization) in a little more than 380K unique entries (~6 MB in size)
#dusoft might be onto something - maybe someone at a government level can help? I don't think that a simple list of street names cannot be copyrighted, nor any royalties be charged. If that is the case, maybe you could even scrape some mapping data from something like a TomTom?
The "Deutsche Post" offers a list with all street names in Germany:
http://www.deutschepost.de/dpag?xmlFile=link1015590_3877
They don't mention the price, but I reckon it's not for free.

Resources