Is there any built in function in Scintilla.NET to detect the cursor is over a comment or string? I'd want to avoid the autocompletion to work when the user is typing comments or strings.
I'm aware I can scan the whole text backwards, searching for //, /* */ and pairs of " " but I'm almost sure there must be a built-in function to do that.
Thanks!
If you're using a lexer, you can get the style number at the current caret postion and check to see if it corresponds with a string or comment. The Scintilla API for retrieving the style number is:
SCI_GETSTYLEAT(int pos)
The Scintilla.NET documentation states there are already some convenience APIs for detecting comments:
ScintillaNET.Scintilla.PositionIsOnComment(System.Int32)
ScintillaNET.Scintilla.PositionIsOnComment(System.Int32,ScintillaNET.Lexer)
But there does not seem to be anything equivalent for strings - so it looks like you'll have to roll your own by using the above Scintilla message with one of the ScintillaNET.Scintilla.SendMessageDirect() methods.
Related
I'm using ABCPDF.net for generating PDF Pages. We've got a problem with the hyphenation system.
For example if we add a text with long words using
doc.AddText("This is a Verylongwordwhichdoesntfit");
and the Rect is too small, we get:
this is a verylongwo
rdwhichdoesntfit.
My Question now is:
Can i control where it starts a new line. to have it break between long and word.
And can i tell it to use a - before the break like this?
this is a verylongwo-
rdwhichdoesntfit.
Thanks a lot.
Details in the documentation here:
http://www.websupergoo.com/helppdfnet/source/3-concepts/b-htmlstyles.htm
Firstly, with .AddText() there is no possibility of hyphenation at all. You'd have to switch to .AddHtml().
Secondly, no, abcpdf has no intelligence about hyphenating at all; it can be told to break lines after certain characters (default is space), but it has no knowledge of English words or syllables.
See http://www.websupergoo.com/helppdfnet/source/3-concepts/b-htmlstyles.htm#stylerun (search for canBreakAfter at that link)
If you're able to edit your text, you can use soft hyphen characters
http://www.websupergoo.com/helppdfnet/source/3-concepts/b-htmlstyles.htm#stylerun, last line of the "Chars" section
If you require fine control over hyphenation you can make use of the soft hyphen character – . This character is invisible and indicates a point at which a chunk of text may reasonably be broken.
For example, you'd use this command, and it might break at any of the places where the appears:
doc.AddHtml("This is a Verylongwordwhichdoesntfit");
But even this won't add the visible hyphens at the break, I don't think.
Some characters have ambiguous directionality, like whitespace and punctuation marks. This can lead to text layout situations where there doesn't appear to be single correct layout without access to additional data to resolve the ambiguity. Consider this text:
\u05e9\u05e0\u05d1\u05d2abcd!
That's four Hebrew characters (unambiguously right-to-left), four English characters (unambiguously left-to-right), and one punctuation mark (ambiguous). If I layout that string in an IDWriteTextLayout with DWRITE_READING_DIRECTION_RIGHT_TO_LEFT, I get the following:
The punctuation mark appears to be treated as a right-to-left character which is starting a new right-to-left block to the left of the English, which seems perfectly reasonable, especially considering that right-to-left was the specified reading direction. However, it's also entirely reasonable to expect the punctuation mark to be treated as a left-to-right character associated with the embedded left-to-right English text, which would mean it should appear to the right of the 'd'.
My app knows exactly how it wants this character should be treated. How do I pass that data to IDWriteTextLayout to resolve this ambiguity?
I found the SetLocaleName method and thought that it must be the answer, but I can't seem to get it to affect the result at all. I also found the localeName parameter when creating an IDWriteTextFormat (which is then used to create the IDWriteTextLayout).
If my goal is for this to generally be Hebrew text with a string of embedded US English, I would think I'd want to use locale he on the IDWriteTextFormat and then use SetLocaleName to override that with locale en-US on character range [4-9]. However, doing so has no effect. In fact, I can't get any combination of locales used in those places to have any effect on the layout at all, whether I restrict them to a subrange or apply them to the entire string.
Am I wrong in thinking that these APIs should serve this purpose? If so, what APIs should I be using? Or is there really no way to tell IDWriteTextLayout to resolve this ambiguity differently? Am I maybe using the APIs wrong? Here is the test code I'm using to create this IDWriteTextLayout:
TestTextRenderer::TestTextRenderer(const std::shared_ptr<DX::DeviceResources>& deviceResources) :
m_deviceResources(deviceResources),
m_text(L"\u05e9\u05e0\u05d1\u05d2abcd!"),
m_readingDirection(DWRITE_READING_DIRECTION_RIGHT_TO_LEFT),
m_formatLocale(L"en-US"),
m_layoutLocale(L"en-US")
{
ComPtr<IDWriteTextFormat> textFormat;
DX::ThrowIfFailed(
m_deviceResources->GetDWriteFactory()->CreateTextFormat(
L"Segoe UI",
nullptr,
DWRITE_FONT_WEIGHT_MEDIUM,
DWRITE_FONT_STYLE_NORMAL,
DWRITE_FONT_STRETCH_NORMAL,
24.0f,
m_formatLocale.c_str(),
&textFormat
)
);
DX::ThrowIfFailed(textFormat->SetReadingDirection(m_readingDirection));
DX::ThrowIfFailed(
m_deviceResources->GetDWriteFactory()->CreateTextLayout(
m_text.c_str(),
(uint32) m_text.length(),
textFormat.Get(),
250.0f,
100.0f,
&m_textLayout
)
);
DWRITE_TEXT_RANGE all{0u, m_text.size()};
DX::ThrowIfFailed(m_textLayout->SetLocaleName(m_layoutLocale.c_str(), all));
DX::ThrowIfFailed(m_deviceResources->GetD2DFactory()->CreateDrawingStateBlock(&m_stateBlock));
CreateDeviceDependentResources();
}
I don't think there's any ambiguity from the Unicode BiDi algorithm point of view. Initial direction set to IDWriteTextFormat or IDWriteTextLayout is crucial, but after that run directions will be derived strictly from codepoints.
Setting locale won't change direction, but it will potentially affect shaping, end result depends on particular features run font has.
I think you can accomplish abcd!... output using LRE/PDF controls around this part of the text.
i was expecting this command
^FO15,240^BY3,2:1^BCN,100,Y,N,Y,^FD>:>842011118888^FS
to generate a
(420) 11118888
interpretation line, instead it generates
~n42011118888
anyone have idea how to generate the expected output?
TIA!
Joey
If the firmware is up to date, D mode can be used.
^BCo,h,f,g,e,m
^XA
^FO15,240
^BY3,2:1
^BCN,100,Y,N,Y,D
^FD(420)11118888^FS
^XZ
D = UCC/EAN Mode (x.11.x and newer firmware)
This allows dealing with UCC/EAN with and without chained
application identifiers. The code starts in the appropriate subset
followed by FNC1 to indicate a UCC/EAN 128 bar code. The printer
automatically strips out parentheses and spaces for encoding, but
prints them in the human-readable section. The printer automatically
determines if a check digit is required, calculate it, and print it.
Automatically sizes the human readable.
The ^BC command's "interpretation line" feature does not support auto-insertion of the parentheses. (I think it's safe to assume this is partly because it has no way of determining what your data identifier is by just looking at the data provided - it could be 420, could be 4, could be any other portion of the data starting from the first character.)
My recommendation is that you create a separate text field which handles the logic for the parentheses, and place it just above or below the barcode itself. This is the way I've always approached these in the past - I prefer this method because I have direct control over the font, font size, and formatting of the interpretation line.
I am doing some localization testing and I have to test for strings in both English and Japaneses. The English string might be 'Waiting time is {0} minutes.' while the Japanese string might be '待ち時間は{0}分です。' where {0} is a number that can change over the course of a test. Both of these strings are coming from there respective property files. How would I be able to check for the presence of the string as well as the number that can change depending on the test that's running.
I should have added the fact that I'm checking these strings on a web page which will display in the relevant language depending on the location of where they are been viewed. And I'm using watir to verify the text.
You can read elsewhere about various theories of the best way to do testing for proper language conversion.
One typical approach is to replace all hard-coded text matches in your code with constants, and then have a file that sets the constants which can be updated based on the language in use. (I've seen that done by wrapping the require of that file in a case statement based on the language being tested. Another approach is an array or hash for each value, enumerated by a variable with a name like 'language', which lets the tests change the language on the fly. So validations would look something like this
b.div(:id => "wait-time-message).text.should == WAIT_TIME_MESSAGE[language]
To match text where part is expected to change but fall within a predictable pattern, use a regular expression. I'd recommend a little reading about regular expressions in ruby, especially using unicode regular expressions in ruby, as well as some experimenting with a tool like Rubular to test regexes
In the case above a regex such as:
/Waiting time is \d+ minutes./ or /待ち時間は\d+分です。/
would match the messages above and expect one or more digits in the middle (note that it would fail if no digits appear, if you want zero or more digits, then you would need a * in place of the +
Don't check for the literal string. Check for some kind of intermediate form that can be used to render the final string.
Sometimes this is done by specifying a message and any placeholder data, like:
[ :waiting_time_in_minutes, 10 ]
Where that would render out as the appropriate localized text.
An alternative is to treat one of the languages as a template, something that's more limited in flexibility but works most of the time. In that case you could use the English version as the string that's returned and use a helper to render it to the final page.
I need to parse some text from pdfs but the pdf formatting results in extremely unreliable spacing. The result is that I have to ignore the spaces and have a continuous stream of non-space characters.
Any suggestions on how to parse the string and put spaces back into the string by guessing?
I'm using ruby. Or should I say I'musingruby?
Edit: I've pulled the text out using pdf-reader. Some of the pdf files are nicely formatted and some are not. An example of text mixed with positioning:
.7aspe-5.5cts-715.1o0.6f-708.5f-0.4aces-721.4that-716.3are-720.0i-1.8mportant-716.3in-713.9soc-5.5i-1.8alcommunica6.6tion6.3.-711.6Althoug6.3h-708.1m-1.9od6.3els-709.3o6.4f-702.8f5.4ace-707.9proc6.6essing-708.2haveproposed-611.2ways-615.5to-614.7deal-613.2with-613.0these-613.9diff10.4erent-613.7tasks,-611.9it-617.1remainsunclear-448.0how-450.7these-443.2mechanisms-451.7might-446.7be-447.7implemented-447.2in-450.3visualOne-418.9model-418.8of-417.3human-416.4face-421.9processing-417.5proposes-422.7that-419.8informa-tion-584.5is-578.0processed-586.1in-583.1specialised-584.7modules-577.0(Breen-584.4et-582.9al.,-582.32002;Bruce-382.1and-384.0Y92.0oung,-380.21986;-379.2Haxby-379.9et-380.5al.,-
and if I print just string data (I added returns at the end of each line to keep it from
messing up the layout here:
'Distinctrepresentationsforfacialidentityandchangeableaspectsoffacesinthehumantemporal
lobeTimothyJ.Andrews*andMichaelP.EwbankDepartmentofPsychology,WolfsonResearchInstitute,
UniversityofDurham,UKReceived23December2003;revised26March2004;accepted27July2004Availab
leonline14October2004Theneuralsystemunderlyingfaceperceptionmustrepresenttheunchanging
featuresofafacethatspecifyidentity,aswellasthechangeableaspectsofafacethatfacilitates
ocialcommunication.However,thewayinformationaboutfacesisrepresentedinthebrainremainsc
ontroversial.Inthisstudy,weusedfMRadaptation(thereductioninfMRIactivitythatfollowsthe
repeatedpresentationofidenticalimages)toaskhowdifferentface-andobject-selectiveregionsofvisualcortexcontributetospecificaspectsoffaceperception'
The data is spit out by callbacks so if I print each string as it is returned it looks like this:
'The
-571.3
neural
-573.7
system
-577.4
underly
13.9
ing
-577.2
face
-573.0
perc
13.7
eption
-574.9
must
-572.1
repr
20.8
esent
-577.0
the
unchangin
14.4
g
-538.5
featur
16.5
es
-529.5
of
-536.6
a
-531.4
face
'
On examination it looks like the true spaces are large negative numbers < -300 and the false spaces are much smaller positive numbers. Thanks guys. Just getting to the point where i am asking the question clearly helped me answer it!
Hmmmm... I'd have to say that guessing is never a good idea. Looking at the problem root cause and solving that is the answer, anything else is a kludge.
If the spacing is unreliable from the PDF, how is it unreliable? The PDF viewer needs to be able to reliably space the text so the data is there somewhere, you just need to find it.
EDIT following comment:
The idea of parsing the file using a dictionary (your only other option really, apart from randomly inserting spaces and hoping for the best) and inserting spaces at identified word boundaries (a real problem when dealing with punctuation, plurals that don't alter the base word i.e. plural, etc) would, I believe, be a much greater programming challenge than correctly parsing the PDF in the first place. After all, PDF is clearly defined whereas English is somewhat wooly.
Why not look down the route of existing solutions like ps2ascii in linux, call the function from your Ruby and pick up the result.
PDF doesn't only store spaces as space characters, but also uses layout commands for spacing (so it doesn't print a space, but moves the "pen" to the right). Perhaps you should have a look at the PDF reference (the big PDF on the bottom of the site), Chapter 9 "Text" should be what you're looking for.
EDIT: After reading your comment to Lazarus' answer, this doesn't seem to be what you're looking for. I think you should try to get a word list from somewhere and try to split your text using it. A good strategy would be to do that using recursion, because for example:
"meandyou"
The first word could be "me" or "mean", but if you try "mean", "dyou" doesn't make sense, so it will be "me", same for the next word that could be "a" or "an" or "and", only "and" makes sense.
If it were me I'd go back to the source PDFs and try a different method of extracting the text, such as iText (for Java) or maybe some kind of PDF-to-HTML to text conversion software method.