How to remove a ^M character java - dos

Problem:
If String ends with \r, remove \r
I started with something like this
if (masterValue.endsWith(CARRIAGE_RETURN_STR)) {
masterValue = masterValue.replace(CARRIAGE_RETURN_STR, "");
}
where
public static final String CARRIAGE_RETURN_STR = (Character.toString(Constants.CARRIAGE_RETURN));
public static final char CARRIAGE_RETURN = '\r';
This seems awkward to me.
Is there an easy way to just remove \r character?
I then moved on to this:
if (value.contains(CARRIAGE_RETURN_STR)) {
value = value.substring(0, value.length()-3);
//-3 because we start with 0 (1), line ends with \n (2) and we need to remove 1 char (3)
But this too seems awkward .
Can you suggest a easier, more elegant solution?

Regexes can support end-of-string anchoring, you know. (See this Javadoc page for more information)
myString.replaceAll("\\r$", "");
This also takes care of fixing \r\n --> \n, I believe.

I'd write it like this:
if (masterValue.endsWith("\r")) {
masterValue = masterValue.substring(0, masterValue.length() - 1);
}
I see no point in creating a named constant for the String "\r".
By the way, your second attempt is incorrect because:
String.contains("\r") tells you if the String contains a carriage return, not if it ends with a carriage return,
the second argument of String.substring(int, int) is the index of the end character; i.e. the position first character that should NOT be in the substring, and
the length of "\r" is one.

Related

Regexp.escape adds weird escapes to a plain space

I stumbled over this problem using the following simplified example:
line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }
My understanding was, that for every String stored in searchstring, the gsub! would cause that line is afterwards empty. Indeed, this is the case for many strings, but not for this case:
searchstring = "D "
line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }
p line
It turns out, that line is printed as "D " afterwards, i.e. no replacement had been performed.
This happens to any searchstring containing a space. Indeed, if I do a
p(Regexp.escape(searchstring))
for my example, I see "D\\ " being printed, while I would expect to get "D " instead. Is this a bug in the Ruby core library, or did I misuse the escape function?
Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string, in the following way:
REPLACEMENTS.each do
|from, to|
line.chomp!
line.gsub!(Regexp.escape(from)) { to }
end
. I'm using Regexp.escape just as a safety measure in the case that the string being replaced contains some regex metacharacter.
I'm using the Cygwin port of MRI Ruby 2.6.4.
line.gsub!(Regexp.escape(searchstring)) { '' }
My understanding was, that for every String stored in searchstring, the gsub! would cause that line is afterwards empty.
Your understanding is incorrect. The guarantee in the docs is
For any string, Regexp.new(Regexp.escape(str))=~str will be true.
This does hold for your example
Regexp.new(Regexp.escape("D "))=~"D " # => 0
therefore this is what your code should look like
line.gsub!(Regexp.new(Regexp.escape(searchstring))) { '' }
As for why this is the case, there used to be a bug where Regex.escape would incorrectly handle space characters:
# in Ruby 1.8.4
Regex.escape("D ") # => "D\\s"
My guess is they tried to keep the fix as simple as possible by replacing 's' with ' '. Technically this does add an unnecessary escape character but, again, that does not break the intended use of the method.
This happens to any searchstring containing a space. Indeed, if I do a
p(Regexp.escape(searchstring))
for my example, I see "D\\ " being printed, while I would expect to get "D " instead. Is this a bug in the Ruby core library, or did I misuse the escape function?
This looks to be a bug. In my opinion, whitespace is not a Regexp meta character, there is no need to escape it.
Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string […]
If you want to do literal string replacement, then don't use a Regexp. Just use a literal string:
line.gsub!(from, to)

Sscanf forgets to forget the comma in expression

I am using scanf, trying to read in an expression before a comma.
sscanf(some_string,
"%s %[ .0-9a-zA-Z!#:/|-_^,],read_other_stuff:%s....]",
&string1, &string2... etc);
Sscanf correct reads in everything up until %[ .0-9a-zA-Z!#:/|-_^,]. This piece of format eats all of the rest of the string, without stopping at a comma, as expected.
How would one make it end at a comma, and read it everything else (including spaces, punctuation other than comma, etc.)
To parse a string up to a ,, code could use strchr(some_string, ',')
To use sscanf(), use 2 calls
int n = 0;
sscanf(some_string, "%*[^,]%n", &n);
// some_string[n] is either \0 or ,
char string1[100];
string1[0] = '\0';
sscanf(&some_string[n], ",%99s", string1);
puts(string1);
Recommend #1

start_with not working for backslash in ruby

I have the following string -
abcdefgh;
lmnopqrst;
On doing a string = string.split(";"), I get -
["abcdefgh", "\nlmnopqrst"]
Now when I do -
string[1].start_with?("\\")
The function returns false. Whereas if I do
string[0].start_with?("a")
The function return true.
I am new to ruby and just can't understand this behavior. Can anyone tell me what am I doing wrong.
I dont know, butString[1][0] (first character from string) returns "\n" so maybe use this
string[1].start_with?("\n")
This is because "\n" actually does not start with a backslash . It is the line feed character and is considered to be a single character and for that reason it is only presented having the escape character \ in front of it.
So:
string[1].start_with?("\n")
Will return true.
You already tried to search with string[1].start_with?("\\") so you seem to realize you need to escape the backslash character by using \\.
If your input string would look like this:
\abcdefgh;
lmnopqrst;
Then after .split(';') your resulting array would look like this:
["\\abcdefgh;", "\nlmnopqrst"]
Now string[0].start_with?("\\") would return true because the first string actually starts with a single backslash, which was presented with the escape character in the console.
you can try
'\nhello world'.start_with?("\\") # return true
"\nhello world".start_with?("\\") # return false
because '\n' is two chars( \ and n), but "\n" is one char(new line char).
The first character there is not "\" - it's "\n" in the first example, and "\\" in the second. "\n" and "\\" are effectively single characters in this context, even though they look like two characters.
"\n" != "\\", and so start_with? responds false.

how to document a single space character within a string in reST/Sphinx?

I've gotten lost in an edge case of sorts. I'm working on a conversion of some old plaintext documentation to reST/Sphinx format, with the intent of outputting to a few formats (including HTML and text) from there. Some of the documented functions are for dealing with bitstrings, and a common case within these is a sentence like the following: Starting character is the blank " " which has the value 0.
I tried writing this as an inline literal the following ways: Starting character is the blank `` `` which has the value 0. or Starting character is the blank :literal:` ` which has the value 0. but there are a few problems with how these end up working:
reST syntax objects to a whitespace immediately inside of the literal, and it doesn't get recognized.
The above can be "fixed"--it looks correct in the HTML () and plaintext (" ") output--with a non-breaking space character inside the literal, but technically this is a lie in our case, and if a user copied this character, they wouldn't be copying what they expect.
The space can be wrapped in regular quotes, which allows the literal to be properly recognized, and while the output in HTML is probably fine (" "), in plaintext it ends up double-quoted as "" "".
In both 2/3 above, if the literal falls on the wrap boundary, the plaintext writer (which uses textwrap) will gladly wrap inside the literal and trim the space because it's at the start/end of the line.
I feel like I'm missing something; is there a good way to handle this?
Try using the unicode character codes. If I understand your question, this should work.
Here is a "|space|" and a non-breaking space (|nbspc|)
.. |space| unicode:: U+0020 .. space
.. |nbspc| unicode:: U+00A0 .. non-breaking space
You should see:
Here is a “ ” and a non-breaking space ( )
I was hoping to get out of this without needing custom code to handle it, but, alas, I haven't found a way to do so. I'll wait a few more days before I accept this answer in case someone has a better idea. The code below isn't complete, nor am I sure it's "done" (will sort out exactly what it should look like during our review process) but the basics are intact.
There are two main components to the approach:
introduce a char role which expects the unicode name of a character as its argument, and which produces an inline description of the character while wrapping the character itself in an inline literal node.
modify the text-wrapper Sphinx uses so that it won't break at the space.
Here's the code:
class TextWrapperDeux(TextWrapper):
_wordsep_re = re.compile(
r'((?<!`)\s+(?!`)|' # whitespace not between backticks
r'(?<=\s)(?::[a-z-]+:)`\S+|' # interpreted text start
r'[^\s\w]*\w+[a-zA-Z]-(?=\w+[a-zA-Z])|' # hyphenated words
r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash
#property
def wordsep_re(self):
return self._wordsep_re
def char_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
"""Describe a character given by unicode name.
e.g., :char:`SPACE` -> "char:` `(U+00020 SPACE)"
"""
try:
character = nodes.unicodedata.lookup(text)
except KeyError:
msg = inliner.reporter.error(
':char: argument %s must be valid unicode name at line %d' % (text, lineno))
prb = inliner.problematic(rawtext, rawtext, msg)
return [prb], [msg]
app = inliner.document.settings.env.app
describe_char = "(U+%05X %s)" % (ord(character), text)
char = nodes.inline("char:", "char:", nodes.literal(character, character))
char += nodes.inline(describe_char, describe_char)
return [char], []
def setup(app):
app.add_role('char', char_role)
The code above lacks some glue to actually force the use of the new TextWrapper, imports, etc. When a full version settles out I may try to find a meaningful way to republish it; if so I'll link it here.
Markup: Starting character is the :char:`SPACE` which has the value 0.
It'll produce plaintext output like this: Starting character is the char:` `(U+00020 SPACE) which has the value 0.
And HTML output like: Starting character is the <span>char:<code class="docutils literal"> </code><span>(U+00020 SPACE)</span></span> which has the value 0.
The HTML output ends up looking roughly like: Starting character is the char:(U+00020 SPACE) which has the value 0.

PregMatch . space and #?

Can someone tell me, what's wrong in this code:
if ((!preg_match("[a-zA-Z0-9 \.\s]", $username)) || (!preg_match("[a-zA-Z0-9 \.\s]", $password)));
exit("result_message=Error: invalid characters");
}
??
Several things are wrong. I assume that the code you are looking for is:
if (preg_match('~[^a-z0-9\h.]~i', $username) || preg_match('~[^a-z0-9\h.]~i', $password))
exit('result_message=Error: invalid characters');
What is wrong in your code?
the pattern [a-zA-Z0-9 \.\s] is false for multiple reasons:
a regex pattern in PHP must by enclosed by delimiters, the most used is /, but as you can see, I have choosen ~. Example: /[a-zA-Z \.\s]/
the character class is strange because it contains a space and the character class \s that contains the space too. IMO, to check a username or a password, you only need the space and why not the tab, but not the carriage return or the line feed character! You can remove \s and let the space, or you can use the \h character class that matches all horizontal white spaces. /[a-zA-Z\h\.]/ (if you don't want to allow tabs, replace the \h by a space)
the dot has no special meaning inside a character class and doesn't need to be escaped: /[a-zA-Z\h.]/
you are trying to verify a whole string, but your pattern matches a single character! In other words, the pattern checks only if the string contains at least an alnum, a space or a dot. If you want to check all the string you must use a quantifier + and anchors for the start ^ and the end $ of the string. Example ∕^[a-zA-Z0-9\h.]+$/
in fine, you can shorten the character class by using the case-insensitive modifier i: /^[a-z0-9\h.]+$/i
But there is a faster way, instead of negate with ! your preg_match assertion and test if all characters are in the character range you want, you can only test if there is one character you don't want in the string. To do this you only need to negate the character class by inserting a ^ at the first place:
preg_match('/[^a-z0-9\h.]/i', ...
(Note that the ^ has a different meaning inside and outside a character class. If ^ isn't at the begining of a character class, it is a simple literal character.)

Resources