jqEasyCounter plugin - how to ignore the "(white) spaces" - jquery-plugins

Working on the textarea with jqEasyCounter plugin.
The textarea is designed as the twitter style - 140 characters, so I use the plugin for user friendly notification.
My trouble is all the spaces are counted.
Does anyone has a clue? thanks heaps

Spaces have to be counted.
If you stop counting spaces from a textarea - you are counting the wrong amount of letters.
As result you will have problems with saving the field into a database table.

Related

ABCPDF.Net AddText Control hyphenation

I'm using ABCPDF.net for generating PDF Pages. We've got a problem with the hyphenation system.
For example if we add a text with long words using
doc.AddText("This is a Verylongwordwhichdoesntfit");
and the Rect is too small, we get:
this is a verylongwo
rdwhichdoesntfit.
My Question now is:
Can i control where it starts a new line. to have it break between long and word.
And can i tell it to use a - before the break like this?
this is a verylongwo-
rdwhichdoesntfit.
Thanks a lot.
Details in the documentation here:
http://www.websupergoo.com/helppdfnet/source/3-concepts/b-htmlstyles.htm
Firstly, with .AddText() there is no possibility of hyphenation at all. You'd have to switch to .AddHtml().
Secondly, no, abcpdf has no intelligence about hyphenating at all; it can be told to break lines after certain characters (default is space), but it has no knowledge of English words or syllables.
See http://www.websupergoo.com/helppdfnet/source/3-concepts/b-htmlstyles.htm#stylerun (search for canBreakAfter at that link)
If you're able to edit your text, you can use soft hyphen characters
http://www.websupergoo.com/helppdfnet/source/3-concepts/b-htmlstyles.htm#stylerun, last line of the "Chars" section
If you require fine control over hyphenation you can make use of the soft hyphen character – ­. This character is invisible and indicates a point at which a chunk of text may reasonably be broken.
For example, you'd use this command, and it might break at any of the places where the ­ appears:
doc.AddHtml("This is a Very­long­word­which­doesnt­fit");
But even this won't add the visible hyphens at the break, I don't think.

XSL-FO: wrapping long words in table cell

I'm using Docbook-XSL and Apache FOP to generate PDF documents containing tables. With the default settings, tables have fixed-width columns and lines wrap at word boundaries. But if a word is longer than the cell width, it overflows the cell. I'd like to break up the words across multiple lines in such a case. How could this be done?
Hyphenation is not a solution since the words need not be in English. (Edit: hyphenation in other languages is not a solution either. It may not be known ahead of time what language the data is in, and there may be "words" that cannot be hyphenated, such as numeric strings.)
I found suggestions to use keep-together.within-column="always" for fo:table-rows, but that didn't seem to have any effect.
(Edit:) Another suggestion was to insert zero-width spaces between all characters. But this also breaks short words mid-word. I would need a solution that breaks at word boundaries whenever possible, and mid-word only when needed.
FOP, like just about every FO processor, can hyphenate languages other than English. See http://xmlgraphics.apache.org/fop/2.1/hyphenation.html
You could try using an FO processor, such as Antenna House AH Formatter, that implements 'auto' table layout and can adjust the widths of the table columns depending on where the text can break (as well as do hyphenation for multiple languages).
Other answers for breaking text in table cells are at:
Force line break after string length
XSL-FO: Force Wrap on Table Entries

Regex for Git commit message

I'm trying to come up with a regex for enforcing Git commit messages to match a certain format. I've been banging my head against the keyboard modifying the semi-working version I have, but I just can't get it to work exactly as I want. Here's what I have now:
/^([a-z]{2,4}-[\d]{2,5}[, \n]{1,2})+\n{1}^[\w\n\s\*\-\.\:\'\,]+/i
Here's the text I'm trying to enforce:
AB-1432, ABC-435, ABCD-42
Here is the multiline description, following a blank
line after the Jira issue IDs
- Maybe bullet points, with either dashes
* Or asterisks
Currently, it matches that, but it will also match if there's no blank line after the issue IDs, and if there's multiple blank lines after.
Is there anyway to enforce that, or will I just have to live with it?
It's also pretty ugly, I'm sure there's a more succinct way to write that out.
Thanks.
Your regex allows for \n as one of the possible characters after the required newline, so that's why it matches when there are multiple.
Here's a cleaned up regex:
/^([a-z]{2,4}-\d{2,5}(?=[, \n]),? ?\n?)+^\n([-\w\s*.:',]+\n)+/i
Notes:
This requires at least one [-\w\s*.:',] character before the next newline.
I changed the issue IDs to have one possible comma, space, and newline, in that order (up to one of each). Can you use lookaheads? If so, I added (?=[, \n]) to make sure the issue ID is followed by at least one of those characters.
Also notice that many of the characters don't need to be escaped in a character class.

Processing form input in a Joomla component

I am creating a Joomla component and one of the pages contains a form with a text input for an email address.
When a < character is typed in the input field, that character and everything after is not showing up in the input.
I tried $_POST['field'] and JFactory::getApplication()->input->getCmd('field')
I also tried alternatives for getCmd like getVar, getString, etc. but no success.
E.g. John Doe <j.doe#mail.com> returns only John Doe.
When the < is left out, like John Doe j.doe#mail.com> the value is coming in correctly.
What can I do to also have the < character in the posted variable?
BTW. I had to use & lt; in this question to display it as I want it. This form suffers from the same problem!!
You actually need to set the filtering that you want when you grab the input. Otherwise, you will get some heavy filtering. (Typically, I will also lose # symbols.)
Replace this line:
JFactory::getApplication()->input->getCmd('field');
with this line:
JFactory::getApplication()->input->getRaw('field');
The name after the get part of the function is the filtering that you will use. Cmd strips everything but alphanumeric characters and ., -, and _. String will run through the html clean tags feature of joomla and depending on your settings will clean out <>. (That usually doesn't happen for me, but my settings are generally pretty open to the point of no filtering on super admins and such.
getRaw should definitely work, but note that there is no filtering at all, which can open security holes in your application.
The default text filter trims html from the input for your field. You should set the property
filter="raw"
in your form's manifest (xml) file, and then use getRaw() to retrieve the value. getCmd removes the non-alphanumeric characters.

Putting spaces back into a string of text with unreliable space information

I need to parse some text from pdfs but the pdf formatting results in extremely unreliable spacing. The result is that I have to ignore the spaces and have a continuous stream of non-space characters.
Any suggestions on how to parse the string and put spaces back into the string by guessing?
I'm using ruby. Or should I say I'musingruby?
Edit: I've pulled the text out using pdf-reader. Some of the pdf files are nicely formatted and some are not. An example of text mixed with positioning:
.7aspe-5.5cts-715.1o0.6f-708.5f-0.4aces-721.4that-716.3are-720.0i-1.8mportant-716.3in-713.9soc-5.5i-1.8alcommunica6.6tion6.3.-711.6Althoug6.3h-708.1m-1.9od6.3els-709.3o6.4f-702.8f5.4ace-707.9proc6.6essing-708.2haveproposed-611.2ways-615.5to-614.7deal-613.2with-613.0these-613.9diff10.4erent-613.7tasks,-611.9it-617.1remainsunclear-448.0how-450.7these-443.2mechanisms-451.7might-446.7be-447.7implemented-447.2in-450.3visualOne-418.9model-418.8of-417.3human-416.4face-421.9processing-417.5proposes-422.7that-419.8informa-tion-584.5is-578.0processed-586.1in-583.1specialised-584.7modules-577.0(Breen-584.4et-582.9al.,-582.32002;Bruce-382.1and-384.0Y92.0oung,-380.21986;-379.2Haxby-379.9et-380.5al.,-
and if I print just string data (I added returns at the end of each line to keep it from
messing up the layout here:
'Distinctrepresentationsforfacialidentityandchangeableaspectsoffacesinthehumantemporal
lobeTimothyJ.Andrews*andMichaelP.EwbankDepartmentofPsychology,WolfsonResearchInstitute,
UniversityofDurham,UKReceived23December2003;revised26March2004;accepted27July2004Availab
leonline14October2004Theneuralsystemunderlyingfaceperceptionmustrepresenttheunchanging
featuresofafacethatspecifyidentity,aswellasthechangeableaspectsofafacethatfacilitates
ocialcommunication.However,thewayinformationaboutfacesisrepresentedinthebrainremainsc
ontroversial.Inthisstudy,weusedfMRadaptation(thereductioninfMRIactivitythatfollowsthe
repeatedpresentationofidenticalimages)toaskhowdifferentface-andobject-selectiveregionsofvisualcortexcontributetospecificaspectsoffaceperception'
The data is spit out by callbacks so if I print each string as it is returned it looks like this:
'The
-571.3
neural
-573.7
system
-577.4
underly
13.9
ing
-577.2
face
-573.0
perc
13.7
eption
-574.9
must
-572.1
repr
20.8
esent
-577.0
the
unchangin
14.4
g
-538.5
featur
16.5
es
-529.5
of
-536.6
a
-531.4
face
'
On examination it looks like the true spaces are large negative numbers < -300 and the false spaces are much smaller positive numbers. Thanks guys. Just getting to the point where i am asking the question clearly helped me answer it!
Hmmmm... I'd have to say that guessing is never a good idea. Looking at the problem root cause and solving that is the answer, anything else is a kludge.
If the spacing is unreliable from the PDF, how is it unreliable? The PDF viewer needs to be able to reliably space the text so the data is there somewhere, you just need to find it.
EDIT following comment:
The idea of parsing the file using a dictionary (your only other option really, apart from randomly inserting spaces and hoping for the best) and inserting spaces at identified word boundaries (a real problem when dealing with punctuation, plurals that don't alter the base word i.e. plural, etc) would, I believe, be a much greater programming challenge than correctly parsing the PDF in the first place. After all, PDF is clearly defined whereas English is somewhat wooly.
Why not look down the route of existing solutions like ps2ascii in linux, call the function from your Ruby and pick up the result.
PDF doesn't only store spaces as space characters, but also uses layout commands for spacing (so it doesn't print a space, but moves the "pen" to the right). Perhaps you should have a look at the PDF reference (the big PDF on the bottom of the site), Chapter 9 "Text" should be what you're looking for.
EDIT: After reading your comment to Lazarus' answer, this doesn't seem to be what you're looking for. I think you should try to get a word list from somewhere and try to split your text using it. A good strategy would be to do that using recursion, because for example:
"meandyou"
The first word could be "me" or "mean", but if you try "mean", "dyou" doesn't make sense, so it will be "me", same for the next word that could be "a" or "an" or "and", only "and" makes sense.
If it were me I'd go back to the source PDFs and try a different method of extracting the text, such as iText (for Java) or maybe some kind of PDF-to-HTML to text conversion software method.

Resources