So I want to split a string in java on any non-alphanumeric characters.
Currently I have been doing it like this
words= Str.split("\\W+");
However I want to keep apostrophes("'") in there. Is there any regular expression to preserve apostrophes but kick the rest of the junk? Thanks.
words = Str.split("[^\\w']+");
Just add it to the character class. \W is equivalent to [^\w], which you can then add ' to.
Do note, however, that \w also actually includes underscores. If you want to split on underscores as well, you should be using [^a-zA-Z0-9'] instead.
For basic English characters, use
words = Str.split("[^a-zA-Z0-9']+");
If you want to include English words with special characters (such as fiancé) or for languages that use non-English characters, go with
words = Str.split("[^\\p{L}0-9']+");
I want to puts sharp's instead of password in ruby code
puts " found password: #{pass.tr('?','#')}"
I need as many sharp '#' characters output as characters in a password.
How to do it right?
The method .tr is intended to swap specific characters, you cannot do a wild-card match. Even if you extended it to cover many characters, there is a risk that you miss or forget a special character that is allowed in passwords on your system.
A simple variant of what you have is to use .gsub instead:
pass.gsub(/./,'#')
This uses regular expressions to find groups of characters to swap. The simple Regexp /./ matches any single character. The Ruby core documentation on regular expressions includes a brief introduction, in case you have not used them much before.
I need a a regular expression to allow only one character for a textbox. Actually i want to validate a text filed to enter a single charecter for Initial (for name)
In a regular expression, '.' (dot) matches a single character.
If you want to be sure that this single character is alphabetic, use:
[a-zA-Z]
or in a posix system: [:alpha:]
Now, to know exactly how to implement it, we need to know in which language your code is written.
For a starter, have a look to
http://en.wikipedia.org/wiki/Regular_expression
You can set the textbox property MaxLength to 1 and use a regex to validade if a letter.
I need to find strings with * and / using reg-exes, I am writing in Ruby.The reason for this need to find lots of * and / is that I am building a tokenizer for an language and there are multi-line comments that use the C style of multi-line comments (/* */). I have the single line comments handled already.
Is there a way to use reg-ex without having to use the two foreword slashes to indicate some regular expression because I am finding it impossible to find my mistakes due to the insane amount of escaping. Or can someone give me advise on how to handle the escaping in a sane matter? I already tried writing the sequence first then escaping it.
Thank you for your time and advise.
One trick that might help is the %r literal:
%r{http://www\.google\.com}
I like to use pipes myself, when they're not in the regex.
%r|http://www\.google\.com|
You can also create new instances of Regexp via Regexp.new and pass a string.
Finally, you might also look at Regexp.quote:
Escapes any characters that would have special meaning in a regular expression. Returns a new escaped string, or self if no characters are escaped. For any string, Regexp.new(Regexp.escape(str))=~str will be true.
I'm looking for a character to use a filename delimiter (I'm storing multiple filenames in a plaintext string). Windows seems not to allow :, ?, *, <, >, ", |, / and \ in filenames. Obviously, \ and / can't be used, since they mean something within a path. Is there any reason why any of those others shouldn't be used? I'm just thinking that, similar to / or \, those other disallowed characters may have special meaning that I shouldn't assume won't be in path names. Of those other 7 characters, are any definitely safe or definitely unsafe to use for this purpose?
The characters : and " are also used in paths. Colon is the drive unit delimiter, and quotation marks are used when spaces are part of a folder or file name.
The charactes * and ? are used as wildcards when searching for files.
The characters < and > are used for redirecting an application's input and output to and from a file.
The character | is used for piping output from one application into input of another application.
I would choose the pipe character for separating file names. It's not used in paths, and its shape has a natural separation quality to it.
An alternative could be to use XML in the string. There is a bit of overhead and some characters need encoding, but the advantage is that it can handle any characters and the format is self explanatory and well defined.
Windows uses the semicolon as a filename delimiter: ;. look at the PATH environment variable, it is filled with ; between path elements.
(Also, in Python, the os.path.pathsep returns ";", while it expands to ":" on Unix)
I have used * in the past. The reason for portability to Linux/Unix. True, technically it can be used on those fileysystems too. In practice, all common OSes use it as a wildcard, thus it's quite uncommon in filenames. Also, people are not surprised if programs do break when you put a * in a filename.
Why dont you use any character with ALT key combination like ‡ (Alt + 0135) as delimiter ?
It is actually possible to create files programmatically with every possible character except \. (At least, this was true at one time and it's possible that Windows has changed its policy since.) Naturally, files containing certain characters will be harder to work with than others.
What were you using to determine which characters Windows allows?
Update: The set of characters allowed by Windows is also be determined by the underlying filesystem, and other factors. There is a blog entry on MSDN that explains this in more detail.
If all you need is the appearance of a colon, and will be creating it programatically, why not make use of a UTF-8 character that just looks like a colon?
My first choice would be the Modifier Letter (U+A789), as it is a typical RTL character and appears a lot like a colon. It is what I use when I need a full DateTime in the filename, such as file_2017-05-04_16꞉45꞉22_clientNo.jpg
I would stay away from characters like the Hebrew Punctuation Sof Pasuq (U+05C3), as it is a LTR character and may mess with how a system aligns the file name itself.