How does SQLite save a newline on MSWindows? - windows

When a text value contains a newline, does SQLite on a Windows machine save the newline as 0x0D0A or as 0x0A?
Edit: I asked this question because I would like to know if this user defined sqlite function will return the right value if the passed string has a newline in it.
#!/use/bin/env perl
use DBI;
# ...
# ....
$dbh->sqlite_create_function( 'bit_length', 1, sub {
use bytes;
return length( $_[0] );
}
);

SQLite does not change strings by itself; as long as you don't explicitly change them with some function, they are treated similar to blobs.
If you have a string like "Hello, world!\n" in your code, it will keep that newline style.
If you read the text from a file, it depends on how your language handles newline conversions in text files, but there are not other places where Perl would implicitly convert newlines.

Related

Regexp.escape adds weird escapes to a plain space

I stumbled over this problem using the following simplified example:
line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }
My understanding was, that for every String stored in searchstring, the gsub! would cause that line is afterwards empty. Indeed, this is the case for many strings, but not for this case:
searchstring = "D "
line = searchstring.dup
line.gsub!(Regexp.escape(searchstring)) { '' }
p line
It turns out, that line is printed as "D " afterwards, i.e. no replacement had been performed.
This happens to any searchstring containing a space. Indeed, if I do a
p(Regexp.escape(searchstring))
for my example, I see "D\\ " being printed, while I would expect to get "D " instead. Is this a bug in the Ruby core library, or did I misuse the escape function?
Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string, in the following way:
REPLACEMENTS.each do
|from, to|
line.chomp!
line.gsub!(Regexp.escape(from)) { to }
end
. I'm using Regexp.escape just as a safety measure in the case that the string being replaced contains some regex metacharacter.
I'm using the Cygwin port of MRI Ruby 2.6.4.
line.gsub!(Regexp.escape(searchstring)) { '' }
My understanding was, that for every String stored in searchstring, the gsub! would cause that line is afterwards empty.
Your understanding is incorrect. The guarantee in the docs is
For any string, Regexp.new(Regexp.escape(str))=~str will be true.
This does hold for your example
Regexp.new(Regexp.escape("D "))=~"D " # => 0
therefore this is what your code should look like
line.gsub!(Regexp.new(Regexp.escape(searchstring))) { '' }
As for why this is the case, there used to be a bug where Regex.escape would incorrectly handle space characters:
# in Ruby 1.8.4
Regex.escape("D ") # => "D\\s"
My guess is they tried to keep the fix as simple as possible by replacing 's' with ' '. Technically this does add an unnecessary escape character but, again, that does not break the intended use of the method.
This happens to any searchstring containing a space. Indeed, if I do a
p(Regexp.escape(searchstring))
for my example, I see "D\\ " being printed, while I would expect to get "D " instead. Is this a bug in the Ruby core library, or did I misuse the escape function?
This looks to be a bug. In my opinion, whitespace is not a Regexp meta character, there is no need to escape it.
Some background: In my concrete application, where this simplified example is derived from, I just want to do a literal string replacement inside a long string […]
If you want to do literal string replacement, then don't use a Regexp. Just use a literal string:
line.gsub!(from, to)

how to document a single space character within a string in reST/Sphinx?

I've gotten lost in an edge case of sorts. I'm working on a conversion of some old plaintext documentation to reST/Sphinx format, with the intent of outputting to a few formats (including HTML and text) from there. Some of the documented functions are for dealing with bitstrings, and a common case within these is a sentence like the following: Starting character is the blank " " which has the value 0.
I tried writing this as an inline literal the following ways: Starting character is the blank `` `` which has the value 0. or Starting character is the blank :literal:` ` which has the value 0. but there are a few problems with how these end up working:
reST syntax objects to a whitespace immediately inside of the literal, and it doesn't get recognized.
The above can be "fixed"--it looks correct in the HTML () and plaintext (" ") output--with a non-breaking space character inside the literal, but technically this is a lie in our case, and if a user copied this character, they wouldn't be copying what they expect.
The space can be wrapped in regular quotes, which allows the literal to be properly recognized, and while the output in HTML is probably fine (" "), in plaintext it ends up double-quoted as "" "".
In both 2/3 above, if the literal falls on the wrap boundary, the plaintext writer (which uses textwrap) will gladly wrap inside the literal and trim the space because it's at the start/end of the line.
I feel like I'm missing something; is there a good way to handle this?
Try using the unicode character codes. If I understand your question, this should work.
Here is a "|space|" and a non-breaking space (|nbspc|)
.. |space| unicode:: U+0020 .. space
.. |nbspc| unicode:: U+00A0 .. non-breaking space
You should see:
Here is a “ ” and a non-breaking space ( )
I was hoping to get out of this without needing custom code to handle it, but, alas, I haven't found a way to do so. I'll wait a few more days before I accept this answer in case someone has a better idea. The code below isn't complete, nor am I sure it's "done" (will sort out exactly what it should look like during our review process) but the basics are intact.
There are two main components to the approach:
introduce a char role which expects the unicode name of a character as its argument, and which produces an inline description of the character while wrapping the character itself in an inline literal node.
modify the text-wrapper Sphinx uses so that it won't break at the space.
Here's the code:
class TextWrapperDeux(TextWrapper):
_wordsep_re = re.compile(
r'((?<!`)\s+(?!`)|' # whitespace not between backticks
r'(?<=\s)(?::[a-z-]+:)`\S+|' # interpreted text start
r'[^\s\w]*\w+[a-zA-Z]-(?=\w+[a-zA-Z])|' # hyphenated words
r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash
#property
def wordsep_re(self):
return self._wordsep_re
def char_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
"""Describe a character given by unicode name.
e.g., :char:`SPACE` -> "char:` `(U+00020 SPACE)"
"""
try:
character = nodes.unicodedata.lookup(text)
except KeyError:
msg = inliner.reporter.error(
':char: argument %s must be valid unicode name at line %d' % (text, lineno))
prb = inliner.problematic(rawtext, rawtext, msg)
return [prb], [msg]
app = inliner.document.settings.env.app
describe_char = "(U+%05X %s)" % (ord(character), text)
char = nodes.inline("char:", "char:", nodes.literal(character, character))
char += nodes.inline(describe_char, describe_char)
return [char], []
def setup(app):
app.add_role('char', char_role)
The code above lacks some glue to actually force the use of the new TextWrapper, imports, etc. When a full version settles out I may try to find a meaningful way to republish it; if so I'll link it here.
Markup: Starting character is the :char:`SPACE` which has the value 0.
It'll produce plaintext output like this: Starting character is the char:` `(U+00020 SPACE) which has the value 0.
And HTML output like: Starting character is the <span>char:<code class="docutils literal"> </code><span>(U+00020 SPACE)</span></span> which has the value 0.
The HTML output ends up looking roughly like: Starting character is the char:(U+00020 SPACE) which has the value 0.

How do you print a dollar sign $ in Dart

I need to actually print a Dollar sign in Dart, ahead of a variable. For example:
void main()
{
int dollars=42;
print("I have $dollars."); // I have 42.
}
I want the output to be: I have $42. How can I do this? Thanks.
Dart strings can be either raw or ... not raw (normal? cooked? interpreted? There isn't a formal name). I'll go with "interpreted" here, because it describes the problem you have.
In a raw string, "$" and "\" mean nothing special, they are just characters like any other.
In an interpreted string, "$" starts an interpolation and "\" starts an escape.
Since you want the interpolation for "$dollars", you can't use "$" literally, so you need to escape it:
int dollars = 42;
print("I have \$$dollars.");
If you don't want to use an escape, you can combine the string from raw and interpreted parts:
int dollars = 42;
print(r"I have $" "$dollars.");
Two adjacent string literals are combined into one string, even if they are different types of string.
You can use a backslash to escape:
int dollars=42;
print("I have \$$dollars."); // I have $42.
When you are using literals instead of variables you can also use raw strings:
print(r"I have $42."); // I have $42.

How to use ruby regexp to substitute string with a "callback function"-like manipulation

I have the samples lines, and I want to substitute the leading #s with =s. (The first two lines)
But instring.gsub!(/^#+\w/, ""), I can not get the number of # which I want to substitute.
In javascript, I could use a callback function with the replace method, but how could I archive this Ruby?
##Command-line Tool
###Installment
This is a '#'.
The expected result:
==Command-line Tool
===Installment
This is a '#'.
a callback block function to the gsub method, probably. I am not sure what you had in mind but could be something like
s.gsub(/^(#+)\w+/) {|m| m.gsub("#", "=") }

Ruby string encoding problem

I've looked at the other ruby/encoding related posts but haven't been able to figure out why the following is not working. Likely just because I'm dense, but here's the situation.
Using Ruby 1.9 on windows. I have a set of CSV files that need some data appended to the end of each line. Whenever I run my script, the appended characters are gibberish. The input text appears to be IBM437 encoding, whereas my string I'm appending starts as US-ASCII. Nothing I've tried with respect to forcing encoding on the input strings or the append string seems to change the resultant output. I'm stumped. The current encoding version is simply the last that I tried.
def append_salesperson(txt, salesperson)
if txt.length > 2
return txt.chomp.force_encoding('US-ASCII') + %(, "", "", "#{salesperson}")
end
end
salespeople = Hash[
"fname", "Record Manager"]
outfile = File.open("ActData.csv", "w:US-ASCII")
salespeople.each do | filename, recordManager |
infile = File.open("#{filename}.txt")
infile.each do |line|
outfile.puts append_salesperson(line, recordManager)
end
infile.close
end
outfile.close
One small note that is related to your question is that you have your csv data as such %(, "", "", "#{salesperson}"). Here you have a space char before your double quotes. This can cause the #{salesperson} to be interpreted as multiple fields if there is a comma in this text. To fix this there can't be white space between the comma and the double quotes. Example: "this is a field","Last, First","and so on". This is one little gotcha that I ran into when creating reports meant to be viewed in Excel.
In Common Format and MIME Type for Comma-Separated Values (CSV) Files they describe the grammar of a csv file for reference.
maybe txt.chomp.force_encoding('US-ASCII') + %(, "", "", "#{salesperson.force_encoding('something')}")
?
It sounds like the CSV data is coming in as UTF-16... hence the puts shows as the printable character (the first byte) plus a space (the second byte).
Have you tried encoding your appended data with .force_encoding(Encoding::UTF-16LE) or .force_encoding(Encoding::UTF-16BE)?

Resources