I'm planning on making a certain kind of plaintext-based manual/whatever, meaning it will be presented using a fixed-width font and a fixed number of characters per line. Now I'm wondering what a "safe" such "line width" (in number of characters) is these days.
My first thought is "80", but I have two concerns with that:
I vaguely remember that 80 isn't safe for e-mail, but has to be 78 for some reason.
80 seems very narrow when there are extremely high-res screens/desktops everywhere. On the other hand, there are also a lot of small "mobile devices", and a major reason I'm using this "primitive" method of presentation over rich HTML or something else is that it's supposed to be easy to read and concise.
120 is what Windows uses by default for cmd.exe, which seems slightly more sane than 80. However, I'm very torn about this.
I'm trying to maximize the compatibility and easiness of reading this content. I'm trying to avoid any kind of problems that might arise from choosing a too narrow or too wide value.
Also, yes, I know that HTML/CSS have all kinds of features for limiting the width and whatnot. However, even e-mail with HTML seems to be very problematic, and CSS appears to still not exactly be mature in terms of typography features.
To make it clear: I'm not asking specifically about e-mail, specifically about web pages, or specifically about any kind of existing medium. I'm trying to make this "universal" so that it can be applied to "any device/platform", excluding only extremely old/obscure/exotic ones which do something really crazy.
You might say that I should be storing it in a database (which I am) and then generate the appropriate format on an on-demand basis, which I might well do, but there's also the "beauty factor" of the presentation of the text. I wish to retain some kind of control at least for the "main" format (plaintext), where I can be sure that the readers are reading it as I intended it, meaning purposedly limited line widths for improved readability and also for the visual beauty of the text content.
Related
Windows uses some encoding table for non-unicode applications to map characters from unicode table to 1-byte table. There are many predefined character sets, user can choose one in windows settings. I need to create a custom character set. Where can I find some information about that process? I tried to Google it, but didn't have any luck, I guess, few people are doing that.
AFAIK, you can't do that, I don't think there's even a way to write some kernel mode "driver" for it, but, haven't looked into these things for a while, maybe there is some way (now).
In any case, you might be better off using a library you can change/update, such as libiconv.
UPDATE:
Since you don't have the source code, you're in a very unfortunate position.
For all string resources (in EXE or any DLLs or, though unlikely, in some other file(s)), you can "read them out" and figure out what's the code page used in them and change it (and the strings themselves), tweaking it in some way that would achieve your purpose - to have the right glyphs appear (yes, you might actually see different glyphs in Notepad, but, who cares if you application shows the right one(s) - FWIW, for such hacks, it's best to use a hex-editor). Then, of course, "put" the (changed) resources back in (EXE/DLL). But, it's quite possible not all strings are in resources, and that's when the "real" problems start.
There's any number of hacks that could have been done here. Your best option is to use some good debugger (WinDbg or better) and figure out what's going on and how are character sets handled = since you don't have the source code, it's gonna be quite painful. You want to find out:
Are the default charset(s) used (OEM/ANSI), or some specific (via NLS APIs)?
Whatever charset is used, is it a standard one or not? The charset here is the "code" Windows assigns to it. Look at Windows lists of available charsets.
Is the application installing fonts? If it is, use a font tool to examine them - maybe it has a specific (non-standard?) code-page supported in it.
Is the application installing some some drivers. If it is, the only way to gain more insight is to use a kernel debugger (which is very tricky and annoying, but, as already said, you're in an unfortunate situation).
It appears that those tables are located at C:\Windows\system32*.nls. I'm not sure whether there's proper documentation for their structure. There's some information in Russian here. Also you might want to tinker with registry at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls
Style guides for various languages recommend a maximum line length range of 80–100 characters for readability and comprehension.
My question is, how do people using tabs for indentation comply with the line length limits? Because, if the length of a tab is set to 2 in your editor, the same code could be over 80 characters long in someone else's editor with tab length set to 3/4/6/8.
Some suggest that I mentally see my tab length at 8 characters. This isn't realistic and won't let me focus on programming as I'd have to check it for every other line.
So, how do you (if you use only tabs in your code) do it?
Almost all languages have a convention for tab size. Most projects have a convention for tabs size as well, which is usually (though not always) the same as the language's tab size.
For example, Ruby is 2, Python & PHP is 4, C is 8, etc.
Sticking with the project's tab length, or the language tab size if there is no obvious project tab size, is by far the most sane thing to do, since this what almost all people will use, except perhaps a few.
The ability to set a different tab size is probably the largest advantage of using tabs instead of spaces, and if a user wants to deviate from the standard, then that's fine, but there is no reasonable way to also comply with the maximum line length in every possible situation.
You can compare it to most web sites; they're designed for a 100% zoom level. Zooming in or changing the font size almost always works, but some minor things may break. It's practically impossible to design for every possible zoom level & font size, so artefacts are accepted.
If you want to make sure the code complies to both the line length and tab size, you should probably use spaces, and not "real" tabs. Personally, I feel these guidelines are exactly that: guidelines; and not strict rules, so I don't think it matters that much. Other people's opinion differs on this matter, though.
Personally, I like tabs, for a number of reasons which are not important now, but only deviate from the standard temporarily, usually when editing heavy-nested code where I would like some more clarity which block I'm editing.
You probably don't write very long lines often, so you probably won't have to reformat your code too often.
Therefore, you can use an approach where you periodically check your file for compliance, and fix anything that's out of whack (for example, you could do this before submitting the file to code-review).
Here's a trivially simple Python script that takes a file on stdin and tells you which lines exceed 80 chars:
TABSIZE = 8
MAXLENGTH = 80
for i, line in enumerate(sys.stdin):
if len(line.rstrip('\n').replace('\t', ' '*TABSIZE)) > MAXLENGTH:
print "%d: line exceeds %d characters" % (i+1, MAXLENGTH)
You can extend this in various ways: have it take a filename/directory and automatically scan everything in it, make the tab size or length configurable, etc.
Some editors can be set up to insert a number of spaces equal to your tab length instead of an actual tab (Visual Studio has this enabled by default). This allows you to use the tab key to indent while writing code, but also guarantees that the code will look the same when someone else reads it, regardless of what their tab length may be.
Things get a little trickier if the reader also needs to edit the code, as is the case in pretty much every development team. Teams will usually have tabs vs. spaces and tab length as part of their coding standard. Each developer is responsible for making their own commits align with the standard, generally by setting up their environment to do this automatically. Changes that don't meet the standard can be reverted, and the developer made to fix them before they get integrated.
If you are going to have style guidelines that everyone has to follow then one way to insure that all code conforms to a column width maximum of 100 is to use code formatter such as Clang Format on check-in for example they include a option to set the ColumnLimit:
The column limit.
A column limit of 0 means that there is no column limit. In this case, clang-format will respect the input’s line breaking decisions
within statements unless they contradict other rules.
Once your team agrees to the settings then no one has to think about these details on check-in code formatting will be automatically done. This requires a degree of compromise but once in place you stop thinking about pretty quickly and you just worry about your code rather then formatting.
If you want to ensure you don't go over the limit as prescribed in a team/project coding standard, then it would be irrational to not also have a teamp/project standard for tab width. Modern editors let you set the tab width, so if your team decides and everyone sets their editor to that tab width, everyone will be on the same page.
Think if a line is 100 characters long for you and someone comes by and says it's over because their editor is set to 8 while yours is 2. You could tell them they have the wrong tab width and be making just as valid a point as they could if they told you your tab width was wrong if there's no agreed upon standard. If there's not, the 8-width guy will be wrong when he writes a 100 character line and another developer who prefers 10-width tabs complains.
Where tabs don't inherently have a defined width, it's up to the team to either choose one or choose not to engage in irrational arguments about it.
That said, spaces are shown to be correlated with higher income, so, y'know... ;)
When I thought about resizing images and saving the new sizes parallel on the server, I came to the following question:
// Original size
DSC_18342.jpg
// New size: Use an "x" for "times"
DSC_18342_640x480px.jpg
// New size: Use the real "×" for "times"
DSC_18342_640×480px.jpg
The point is, that it's slightly easier if you got a real × instead of an x in the file name, as the unit px already contains the x, which makes it a little bit harder to read.
Question: What problems could I get in, when using the Html entity in the filename?
Sidenotes: I'm writing an open source, publicly available script, so the targeted server can be anything - therefore I'm also interested (and will vote up) edge cases, that I'm not aware off.
Thank you all!
You may have noticed, that I'm aware, that I could simply avoid it (which I'll do anyway), but I'm interested in this issue and learning about it, so please just take above example as possible case.
There are file systems that simply don't support unicode. This may be less of a problem if you make unicode support a requirement of your application.
Some consideration about different unicode file system are given in File Systems, Unicode, and Normalization.
A concluding remark (from a viewpoint of solaris file systems) is:
Complete compatibility and seamless interoperability with
all other existing Unicode file systems appears not 100%
possible due to inherent differences.
I can imagine that there will be problems especially when migrating the application. Just storing files is probably no problem but if their names are stored in a database there might be a mismatch after migration.
This is not so much a technical question but still part of the development cycle.
I'm having to word all of my dialog boxes in this program I am working on and I was trying to get a good handle on the best practices for making text for the average end user to comprehend.
I have three core principles I could think of
Keep it short - yet long enough to explain thoroughly
Avoid personal remarks such as "keep in mind", "just so you know", etc
Call an apple an apple - If a concept is highly technical do not dumb it down with another word that doesn't fully encapsulate the idea.
Are these good principles to go by and/or is there something better to add.
There are various platform specific guidelines, e.g.
Microsoft WUXI --- Dialog text
Apple
The things I would add:
consistency - keep style and tone consistent throughout the application.
consistency - use the same names for concepts and elements of your app
drop everyting you can live without - explanations belong to online help, don't pack the dialog to tight, leave room.
Use simple words. Not all of your users are native english speakers.
Use present tense, active voice
Avoid exclamation marks
Avoid multiple exclamation marks
Yes, I believe those are great principles to go off of, but one more thing you may want to look for and be encouraged to do, is if it is highly technical and there is a way to get the same point across without using words most people wont get, consider using them, not everybody knows everything
Together with another developer, I have embarked on a journey to create a hosted 'CRM Style' application that will cater to enterprise level businesses. These businesses will be accessing our application remotely and so the hosted nature of the application will require certain features. For example, to guarantee a level of professional service the following things must be true:
internationalization requires multiple languages and presentation of date/time for various timezones and locales
transactional capability for batch processing of tasks and rollback capabilities
security concerns for keeping data safe and remote invocations secure from attack
etcetera, the list goes on and on
Due to these concerns and my role as the developer most responsible for the server side development, I am very interested in the choices I make early on. Regarding timezones and languages for example, are there issues related to my choice of database or data fields? Do I choose to use a UTC timestamp or date field throughout the application and if so is there a standard format for that? Also, regarding different languages, am I supposed to ensure the data is stored in the database as UTF-8 or unicode?
I really want to avoid laying down the infustructure of the system only to discover later that a fundamental decision was incorrect or not big enough, wide enough, smart enough, etc. Can someone point me in the right direction regarding these basic 'early' decisions?
EDIT _ Ok I appreciate the broad responses and now I see my question was a little too non-specific. I'd like to focus on the more specific elements that WERE present in the question, such as how to choose the proper format for storing a UTC Date/Time or how to save my text data (do I specify a UTF format?)
If you are targeting enterprise CRM, then you will need a very high level of customizability and integrations with all kinds of systems. You will make mistakes in the design. Your only hope is to isolate each little piece of the code so that you can have a chance of fixing it later.
In short, basic software engineering principles are your best bet.
What you are discussing is called a multi-tenant application wherein you have the same code base used by multiple customers (tenants) with logical or physical separation of data. Remember the fundamental rule of development: flexibility is relational to complexity. The more flexible you make the system, the more complicated it will be.
RE: UTC
For a CRM application that stores things like when calls were made and when meetings took place, I would definitely store all those in UTC and let the user set their local timezone. However, you might run into dates which are timezone agnostic and for those, I would store whatever date was entered.
RE: Unicode
Yes, I would use Unicode for all user-entered data. However, that will not get you localization. If for a single company for example, you have a user in Hong Kong entering text in Chinese and user in Amsterdam entering text in Dutch, you are not going to get automatic translation. Things like dates and number formats can be localized, but raw text like names used in drop lists and such can be a chore to localize.
As you have not mentioned what you think about the issue, you may find my answer or parts of it rather basic.
If you don't need to, don't use a low-level language. I'd use python usually for the first version of a CRM application (with the hope that it would be good enough for the next versions), but this decision depends also on the domain community.
Try to write the minimal code on your own, instead relying on the third-party libraries. People may disagree on this, but I would write the code myself as the last option. But the next point is important.
When selecting a library/framework to use, make sure the party behind it is going to last, the library is stable and the software license suits you needs.
Other general rules apply: focus on the customer, use continuous integration/testing, etc., use good software practices like logging etc.
Nothing is ever stored as "unicode" because this is an abstract concept. Unicode is always stored in some kind of unicode transformation format (UTF) (well or UCS but I never saw that used somewhere). The most commonly used UTF is UTF-8 but I suggest to use what is native/default to your platform. wikipedia