How do I write a regular expression to find ellipses in a text file using VBScript? the text will be look something like this
>…………………………………………………………………………………………………………………<
that I want to find, and replace with something else.
I've tried the following as the search pattern to no avail:
">[\133]*<"
">[…]*<"
">[\133]+<"
">[…]+<"
">[\133]{1,}<"
">[…]{1,}<"
">[\x85]+<"
The first one finds the zero case, but not if an ellipse occurs between the >< characters. Several work when using Notepad++ regular expressions. Any help is appreciated.
I think I've found how to do it.
">[\W]{2,}<"
does it in my file, since the ellipses aren't characters.
In the above context, I can't help but think that a regular expression is a bit overkill, but I had a quick look: \>…+\< will work - it won't capture anything, though, but you could put some parentheses around it if you wanted...
The ellipses is a character. From what I can see, ellipsis is ASCII #133. The character used in your question, however, is something else entirely. They register as ASCII #226 for reasons I can't quite work out. Hopefully someone smarter than me might know the answer. In any event, assuming it is CHR(133), it should be easy enough to construct a string pattern in VBScript to accomplish the above.
Related
I wrote a parser, which recognizes elements of text based on certain pattern.
My program is able to recognize paragraph, chapter etc. The problem is it shouldn't recognize elements, when there's a quote. For example:
Paragraph 1
Something here...
would be proceed as Paragraph.
And:
Paragraph 1
"Paragraph 2"
shouldn't. But as my program is based on regexp patterns, it looks for the word "Paragraph". I'm going line by line and recognize patterns for each line. I don't know how to tell my program: if you see quotes mark, leave text alone without doing anything? My mentor told me to use raise, but I'm not sure how to do it.
OK, so I'm still a bit of a beginner, I don't know if there is a way to direct the regex to ignore things inside quotes, but if I wanted to solve this problem, I would first make a copy of the text to be parsed, run a regex over that and delete everything inside quotes, then run the parser over the remaining text.
A bit kludgy and inelegant I admit, and may have performance issues over a large enough text, but it would get the job done.
See HERE for link to documentation of ruby regex. About a third of the way down it discusses quotes:
/\p{Pi}/ - 'Punctuation: Initial Quote'
/\p{Pf}/ - 'Punctuation: Final Quote'
You may be able to bake that into the regex with the ^ to direct it to ignore items in quotes.
One thing that constantly annoys me about VS is that when I do a Find or Find all, it looks in comments, strings, and other places. When I'm trying to find a particular bit of code, like and rent, it finds it all over. Is there a way to limit searches just to code?
Not sure if there is a specific setting to ignore comments, but you could do a regex find. For example, assuming you want to find "text", you could use this:
^(?!\s*?//).*?text
Caveats:
Assumes comments start with // as first non-whitespace characters. E.g. C# comment types
Doesn't work for comments at the end of code lines (only comments on their own lines)
Doesn't work with block comments, for example /* comment */
So overall it isn't perfect by any means, but depending how many hits you are getting, it might help to cut them down which can be useful if you have a lot of false positives in one-liner comments
The 'Find All References' function may suit you : it ignores all commented-out code and text in strings. CTRL+K, R is the keyboard shortcut.
(Note that it's designed for going from a specific instance of a search string to all other instances. so if you haven't already found an instance of what you're searching for, you would have to (temporarily) type one in to the editor window, then search. Also it's not available for all languages : I know it works fine for C#, though.)
I have a script that is throwing this error.
This usually means there is a loop (like an if or do) that is not correctly ended, or there are too many end clauses. I can't find the issue. Any good tips on how to identify this kind of syntax error?
It could also be a double-quote issue. Wondering if there is a way (in ultra-edit or text editor) to detect lines of script that have un-even numbers of double quotes.
In answer to: "It could also be a double-quote issue, possibly. Wondering if there is a way to detect any lines of script (in ultra-edit or text editor) where there are an un-even number of double quotes."
Sublime is a great editor that is available for most platforms.
For the first question, comment out blocks of code using =begin ... =end and/or # ... and narrow down the error.
For the second question, use syntax highlighting on the text editor. You can easily tell how long a single string literal is continued, and find unbalanced quotes.
Never mind, I found the issue. I commented out the newest definition that I had added and it ran. That let me know it was that definition. I then took that out and went through it with a darn comb. Found that I was checking a value, but hadn't allowed for it to be nil or empty. Added that in and now I'm good.
(Using Ruby 1.8)
I only have a brief understanding of encoding and such...but what I want to know is, in any given script handling any given text-file, is there some universal library or call I need to make to turn non-standard characters into their nearest printable equivalent. I realize there's no "all-in-one" fix, but this is for a English (U.S. gov't) text file, and so I'm wondering if there's something that mitigates what must be a relatively common issue in English text formatting.
For example, in a text file, I have an entry like this:
0-823
That hyphen is just literally a hyphen as I've typed it out. In the file though, it's something that looks like a hyphen (an n-dash?) but when copy and pasting it...for example, into this browser text box, it doesn't show up.
Printing it out via a Ruby script gets this:
08�23
How do I get my script to resolve it into a dash. Or something other than a gremlin?
It's very common to run into hyphen-like characters and dashes, especially in the output of word-processors. Converting them isn't too hard if you know what the byte is that represents the character, but gets to be a pain when you get a document with several different ones. It gets worse as you throw other accented characters into the mix.
Ruby 1.8 doesn't support multibyte and Unicode character sets as well as 1.9+, but you can work around that somewhat by using the Iconv library.
Iconv lets you convert between various character-sets, such as US-ASCII, ISO-8859-1 and WIN-1252. It's smarter than a regex, because it knows how to convert from accented characters, to similarly looking characters, or ignore them if nothing similar exists, allowing your transliteration to degrade gracefully.
I have some example code in an answer to a related question. Also read James Grey's article linked in the answer. It explains the problem and ways to fix it, ending up with recommending Iconv too.
You could whitelist with gsub:
string.gsub(/[^a-zA-Z0-9]/)
Without knowing more information, I can't build the perfect regex for you, but the general idea is to replace anything that's not what you're expecting (anything not a letter or number or expected symbols).
I've written a little program to download images to different folders from the web. I want to create a quick and dirty batch file syntax and was wondering what the best delimiter would be for the different variables.
The variables might include urls, folder paths, filenames and some custom messages.
So are there any characters that cannot be used for the first three? That would be the obvious choice to use as a delimiter. How about the good old comma?
Thanks!
You can use either:
A Control character: Control characters don't appear in files. Tab (\t) is probably the best choice here.
Some combination of characters which is unlikely to occur in your files. For e.g. #s# etc.
Tab is the generally preferred choice though.
Why not just use something that exists already? There are one or two choices, perl, python, ruby, bash, sh, csh, Groovy, ECMAscript, heavens for forbid windows scripting files.
I can't see what you'd gain by writing yet another batch file syntax.
Tabs. And then expand or compress any tabs found in the text.
Choose a delimiter that has the least chance of collision with the names of any variable that you may have (which precludes #, /, : etc). The comma (,) looks good to me (unless your custom message has a few) or < and > (subject to previous condition).
However, you may also need to 'escape' delimiter characters occurring as part of the variables you want to delimit.
This sounds like a really bad idea. There is no need to create yet another (data-representation) language, there are plenty ones which might fit your needs. In addition to Ruby, Perl, etc., you may want to consider YAML.
Designing good syntax for these sort of this is difficult and fraught with peril. Does reinventing the wheel ring a bell?
I would use '|'
It's one of the rarest characters.
How about String.fromCharCode(1) ?