Issue with algorithm to shorten sentences - algorithm

I have a webpage which displays multiple textual entries which have no restriction on their length. They get automatically cut if they are too long to avoid going to a new line. This is the PHP function to cut them:
function cutSentence($sentence, $maxlen = 16) {
$result = trim(substr($sentence, 0, $maxlen));
$resultarr = array(
'result' => $result,
'islong' => (strlen($sentence) > $maxlen) ? true : false
);
return $resultarr;
}
As you can see in the image below, the result is fine, but there are a few exceptions. A string containing multiple Ms (I have to account for those) will go to a newline.
Right now all strings get cut after just 16 characters, which is already very low and makes them hard to read.
I'd like to know if a way exists to make sure sentences which deserve more spaces get it and those which contain wide characters end up being cut at a lower number of characters (please do not suggest using the CSS property text-overflow: ellipsis because it's not widely supported and it won't allow me to make the "..." click-able to link to the complete entry, and I need this at all costs).
Thanks in advance.

You could use a fixed width font so all characters are equal in width. Or optionally get how many pixels wide every character is and add them together and remove the additional character wont the pixel length is over a certain amount.

If the style of your application isn't too important, you could simply use a font in the monospace family such as Courier.

Do it in Javascript rather than in PHP. Use the DOM property offsetWidth to get the width of the containing element. If it exceeds some maximum width, then truncate accordingly.
Code copied from How can I mimic text-overflow: ellipsis in Firefox? :
function addOverflowEllipsis( containerElement, maxWidth )
{
var contents = containerElement.innerHTML;
var pixelWidth = containerElement.offsetWidth;
if(pixelWidth > maxWidth)
{
contents = contents + "…"; // ellipsis character, not "..." but "…"
}
while(pixelWidth > maxWidth)
{
contents = contents.substring(0,(contents.length - 2)) + "…";
containerElement.innerHTML = contents;
pixelWidth = containerElement.offsetWidth;
}
}

Since you are asking for a web page then you can use CSS text-overflow to do that.
It seems to be supported enough, and for firefox there seems to be css workarounds or jquery workarounds...
Something like this:
span.ellipsis {
white-space:nowrap;
text-overflow:ellipsis;
overflow:hidden;
width:100%;
display:block;
}
If you fill more text than it fits it will add the three dots at the end.
Just cut the text if it is really too long so you don't waste html space.
More info here:
https://developer.mozilla.org/En/CSS/Text-overflow
Adding a 'see more' link at the end is easy enough, as appending another span with fixed width, containing the link to see more. text will be truncated with ellipsis before that.

Related

In InDesign, is there a way to bold a whole word that has one bold character?

I'm working on an index in InDesign. Some of the page numbers are in bold, others are in italics or regular. During editing, somehow the first numbers of some of the bold page numbers got changed. I've figured out how to highlight those page numbers by coloring the bold numbers and recoloring the page numbers that are correct using a GREP search for bold words (\b\w+\b). What I can't figure out is how to select the "bad" page numbers that have only some numbers and make the entire "word" bold. Any ideas? It would be nice not to have to fix them manually.
I just tried this on a document and added a few numbers that were only partially bold.
I was able to fix it by doing a search for only digits with (\b\d+\b), changing all to $1. I left find format blank and change format to regular font. This changed all numbers to regular with no mixed bold and regular.
After that you can run the same find and replace again but switching format to bold. This will change all numbers to be fully bold.
It heavily depends on the text you have. If it's just one first digit that need to change, if you don't use character styles, if you have no digits in your body text, if the font you're using has the common names for styles, if ... there is a lot of 'if's, actually. I'd recommend to share a sample of your file (IDML).
So, here is the script that could do the job (if all of those "if"'s are true):
var doc = app.activeDocument;
var styles = doc.characterStyles;
// STEP 1 -- apply style1 (regular) to all regular numbers \d\d+
var style1 = styles.add();
style1.name = 'digits_regular';
style1.fontStyle = 'Regular';
app.findGrepPreferences = NothingEnum.nothing;
app.findGrepPreferences.findWhat = '\\b\\d\\d+'; // two or more digits
app.findGrepPreferences.fontStyle = 'Regular';
app.changeGrepPreferences.changeTo = '$0';
app.changeGrepPreferences.appliedCharacterStyle = style1;
doc.changeGrep();
// STEP 2 -- apply style2 (italic) to all italic numbers \d\d+
var style2 = styles.add();
style2.name = 'digits_italic';
style2.fontStyle = 'Italic';
app.findGrepPreferences = NothingEnum.nothing;
app.findGrepPreferences.findWhat = '\\b\\d\\d+';
app.findGrepPreferences.fontStyle = 'Italic';
app.changeGrepPreferences.changeTo = '$0';
app.changeGrepPreferences.appliedCharacterStyle = style2;
doc.changeGrep();
// STEP 3 -- apply style3 (bold) to all unstyled numbers
var style3 = styles.add();
style3.name = 'digits_bold';
style3.fontStyle = 'Bold';
app.findGrepPreferences = NothingEnum.nothing;
app.findGrepPreferences.findWhat = '\\b\\d\\d+';
app.findGrepPreferences.appliedCharacterStyle = styles[0]; // syle '[None]'
app.changeGrepPreferences.changeTo = '$0';
app.changeGrepPreferences.appliedCharacterStyle = style3;
doc.changeGrep();
// clean prefs
app.findGrepPreferences = NothingEnum.nothing;
Input:
Result:
Then you can remove the character styles you don't need them. But I'd recommend to use styles. They make the life easier exactly in such cases.
It's much easier to use the Find/Change interface in Indesign.

MigraDoc Formatting

I am completely new to PDF creation including MigraDoc. I have gotten this far, which is really close to what I want for now. My question is that the text string (myMessage) that I pass to the "bodyParagraph" is up to 100 lines long, which causes three pages to be created, which is good. However the first page's Top margin is slightly greater than the second and third pages. I have no idea of why...
Basically, I am trying to create every page the same. Same header, footer and the body to take the same space regardless of the number of lines in the "bodyParagraph" content. If I have taken the completely wrong approach I would be open to suggestions.
Also, if there is a good tutorial to point me to that would be great. I can't really find anything but samples. I have learned everything from the samples, but sections, paragraph, etc is all new to me and I would like to get a better understanding of what I've done.
public static Document CreateWorkOrderPDF2(Document document, string filename, string WorkOrderHeader, string myMessage)
{
Section section = document.AddSection();
section.PageSetup.PageFormat = PageFormat.Letter;
section.PageSetup.StartingNumber = 1;
section.PageSetup.LeftMargin = 40;
//Sets the height of the top margin
section.PageSetup.TopMargin = 100;
section.PageSetup.RightMargin = 40;
section.PageSetup.BottomMargin = 40;
//MARGIN
HeaderFooter header = section.Headers.Primary;
header.Format.Font.Size = 16;
header.Format.Font.Color = Colors.DarkBlue;
MigraDoc.DocumentObjectModel.Shapes.Image headerImage = header.AddImage("../../Fonts/castorgate.regular.png");
headerImage.Width = "2cm";
Paragraph headerParagraph = section.AddParagraph();
headerParagraph = header.AddParagraph(WorkOrderHeader);
//BODY PARAGRAPH
Paragraph bodyParagraph = section.AddParagraph();
bodyParagraph = section.AddParagraph(myMessage);
bodyParagraph.Format.Font.Size = 10;
bodyParagraph.Format.Font.Color = Colors.DarkRed;
//paragraph.Format.Distancne = "3cm";
Paragraph renderDate = section.AddParagraph();
renderDate = section.AddParagraph("Work Order Generated: ");
renderDate.AddDateField();
return document;
}
The line Paragraph bodyParagraph = section.AddParagraph(); adds an empty paragraph. I assume that is the extra space on the first page.
Same issue with renderDate in the following code block.
Just remove the calls section.AddParagraph() to remove the empty paragraphs if you don't want them.
MigraDoc is much like Word and understanding sections, paragraphs, &c. in Word will also help you with MigraDoc. That knowledge along with the samples and IntelliSense should get you going.
You can use MigraDoc to create an RTF file, open the RTF in Word, and click the pilcrow to show formatting characters in Word.

Capitalize first letter of sentence in CKeditor

I wish to capitalize the first letter of a sentence, on-the-fly, as the user types the text in a CKEditor content instance.
The strategy consists in catching each keystroke and try to replace it when necessary, that is for instance, when the inserted character follows a dot and a space. I'm fine with catching the event, but can't find a way to parse characters surrounding the caret position:
var instance = CKEDITOR.instances.htmlarea
instance.document.getBody().on('keyup', function(event) {
console.log(event);
// Would like to parse here from the event object...
event.data.preventDefault();
});
Any help would be much appreciated including a strategy alternative.
You should use keydown event (close to what you proposed):
var editor = CKEDITOR.instances.editor1;
editor.document.getBody().on('keydown', function(event) {
if (event.data.getKeystroke() === 65 /*a*/ && isFirstLetter()) {
// insert 'A' instead of 'a'
editor.insertText('A');
event.data.preventDefault();
}
});
Now - how should isFirstLetter() look like?
You have to start from editor.getSelection().getRanges() to get caret position.
You're interested only in the first range from the collection.
To extract text content from before the caret use small trick:
move start of the range to the beginning of document: range.setStartAt( editor.document.getBody(), CKEDITOR.POSITION_AFTER_START ),
use CKEDITOR.dom.walker to traverse through DOM tree in source order,
collect text nodes and find out what's before caret (is it /\. $/) - remember that you have to skip inline tags and stop on block tags - hint: return false from guard function to stop traversing.
Example of how you can use walker on range:
var range, walker, node;
range = editor.getSelection().getRanges()[0];
range.setStartAt(editor.document.getBody(), CKEDITOR.POSITION_AFTER_START);
walker = new CKEDITOR.dom.walker(range);
walker.guard = function(node) {
console.log(node);
};
while (node = walker.previous()) {}
And now few sad things.
We assumed that selection is empty when you type - that doesn't have to be true. When selection is not collapsed (empty) then you'll have to manually remove its content before calling insertText. You can use range#deleteContents to do this.
But this is not all - after deleting range's content you have to place caret in correct position - this isn't trivial. Basically you can use range#select (on the range after deleteContents), but in some cases it can place caret in incorrect place - like between paragraphs. Fixing this is... is not doable without deeeeeeeep knowledge about HTML+editables+insertions+other things :).
This solution is not complete - you have to handle paste event, deleting content (one can delete words from the start of sentence), etc, etc.
I guess there are couple of other problems I didn't even thought about :P.
So this approach isn't realistic. If you still want to implement this feature I think that you should set timer and by traversing DOM (you can use walker on range containing entire document, or recently typed text (hard to find out where it is)) find all sentences starting from lower letter and fix them.
This is what worked for me in Ckeditor 4.
var editor = CKEDITOR.instances.editor1;
editor.document.getBody().on('keydown', function(event) {
if (event.data.getKeystroke() >= 65 && event.data.getKeystroke()<=91 && encodeURI(this.getText())=="%0A" && this.getText().length==1 ) {
//uppercase the char
editor.insertText(String.fromCharCode(event.data.getKeystroke()));
event.data.preventDefault();
}
});

ZPL - zebra: print justified text block without overwriting last line

I'm using the following command to print a justified text:
^FB1800,3,0,J^FT100,200^A0B,26,26^FH\^FDLONG TEXT TO BE PRINTED, WHICH DOESNT FIT IN ONLY 3 LINES...^FS
The command ^FB1800,3,0,J prints a field block in a width of 1800 dots, maximum 3 lines, justified.
The problem is that if the text exceeds the maximum number of lines, it overwrites the last line! :( That of course makes the text of the last line unreadable.
How can I avoid that? Does anybody know if is there a way to cut the exceeding text?
The documentation says exactly that this happens:
Text exceeding the maximum number of lines overwrites the last line. Changing the font size automatically increases or decreases the size of the block.
For reference: I'm using printer Zebra 220Xi4.
Any help would be appreciated. Thank you!
Take a look at the ^TB command. It is preferred over the ^FB command and truncates if the text exceeds the size defined in the TB params
I had just about the same problem, what fixed it in my case - although not the most elegant way - is to specify a higher number of maximum lines, and then formatting it in a way that only the first 3 are in the visible area.
In your case it would be for example ^FB1800,7,0,J instead of ^FB1800,3,0,J
This at least fixed it for me right away, because I print this text at the bottom of the label. If you need to have it somewhere in the middle or top, there might be some tricks with putting a (white) box on top of the overflow-area, since the Zebra printers seem to render before printing. Hope it helps.
Depending on the higher-level programming language you're using (assuming that you are), you could accomplish the same thing (truncate the text to be printed to a specified number of characters) with code like this (C# shown here):
public void PrintLabel(string price, string description, string barcode)
{
const int MAX_CAPS_DESC_LEN = 21;
const int MAX_LOWERCASE_DESC_LEN = 32;
try
{
bool descAllUpper = HHSUtils.IsAllUpper(description);
if (descAllUpper)
{
if (description.Length > MAX_CAPS_DESC_LEN)
{
description = description.Substring(0, MAX_CAPS_DESC_LEN);
}
}
else // not all upper
{
if (description.Length > MAX_LOWERCASE_DESC_LEN)
{
description = description.Substring(0, MAX_LOWERCASE_DESC_LEN);
}
}
. . .
This is what I'm using; is there any reason to prefer the "raw" ^TB command over this?

CKEditor: How to stop angle brackets from converting to HTML entities

Our site is using tags like <# TAGNAME #> but CKEditor converts < and > to &lt and &gt which breaks these tags for use in our software.
I've discovered this option: config.protectedSource.push( /<#[\s\S]*##>/g ); which seems to stop the conversion if the data is saved from Source mode, but in WYSIWYG mode I can't find a way to stop the conversion. I've tried many options in their API but none of them seem to have helped, how can I fix this problem?
Were were looking at using CKEDitor to edit Smarty templates. The problem we were hitting was that it was replacing all the angle brackets and ampersands within the curly brackets, which messed everything up. This came up in a Google search so our solution should help anyone with similar issues.
CKEditor rebuilds the HTML every time you switch to Source mode and when you save, so you need to add to the HTML http://docs.cksource.com/CKEditor_3.x/Developers_Guide/Data_Processor htmlFilter.
This worked for us:
//replace Form_content with whatever your editor's id is.
htmlParser = CKEDITOR.instances.Form_content.dataProcessor.htmlFilter;
//We don't want HTML encoding on smarty tags
//so we need to change things in curly brackets
htmlParser.onText = function(text) {
//find all bits in curly brackets
var matches = text.match(/\{([^}]+)\}/g);
//go through each match and replace the encoded characters
if (matches!=null) {
for (match in matches) {
var replacedString=matches[match];
replacedString = matches[match].replace(/>/g,'>');
replacedString = replacedString.replace(/</g,'<');
replacedString = replacedString.replace(/&/g,'&');
text = text.replace(matches[match],replacedString);
}
}
return text;
}
The onText function processes all the bits that aren't in tags or comments.
I'd imagine you can do something similar by altering the code above - I've left it as is as I think our problems and required solutions are very similar.
editor.on( 'mode', function(ev) {
if ( ev.editor.mode == 'source' ) {
var str=ev.editor.getData();
str=str.replace(/&/g, "&").replace(/>/g, ">").replace(/</g, "<").replace(/"/g, "\"");
ev.editor.textarea.setValue(str);
}
});
http://cksource.com/forums/viewtopic.php?f=11&t=20647&start=10
If you type < or > in any WYSIWYG editor, they will be converted to their HTML entities in source mode.

Resources