How to replace THIS non printable character type in a docx that represents a \n but Word is not recognizing as a \n? - char

I am parsing various .docx documents but the part of my code that splits paragraphs when it encounters "\n" is adding a new line when it encounters this weird symbol (circled in yellow):
could someone tell me what non printable character is this and how can I replace it just with a normal " " space?
(I can't just copy and paste it and use a replace() function because when I do, the char gets interpreted as a \n, but as you can see, if Word were really interpreting that character as an enter, it would've added the inverted P char insted of the weird enter sign (when I click on the show non printable characters button in Word), and it isn't. Hope I explained myself, thanks so much for the help!).

I believe you'll find this character is a line break. In python-docx, the str value of paragraph.text represents a line-break with "\n". You can have those mapped to a space (" ") instead using:
paragraph_text = paragraph.text.replace("\n", " ")

Related

VBscript find character " in a text

I need to find the character " into text
I have used InStr(strLine,""") but it doesn't run and gives me an error:
800a0409 unterminated string constant
Where is my mistake?
What you want to do is use two quote characters in a row, not just one:
InStr(strLine,"""")
This is how it breaks down: the first " character is how you start a string constant; the second and third " characters together are called an "escaped" quote and indicate that you are not ending the string constant but are instead including a literal, single " character; the fourth " character is the final one indicating that you are ending the string constant.
You must always have an even number of quote characters " as a rule to avoid the compiler error you received.
As an alternative, you could also do it like this:
InStr(strLine, Chr(34))
The Chr() method takes an ASCII value for a character and returns that character. The ASCII value for the double-quote character " is 34.
Which approach you choose is up to you and depends on the circumstances. I usually go with the escaped, double-double-quote "" because it's easier to code and easier to read in longer string constants.

How can I add hair space ( ) to gsub string?

I have the following line in a plugin to display page views on my Jekyll site:
html = pv.to_s.reverse.gsub(/...(?=.)/,'\& ').reverse
It adds space between thousands, for example 23 678.
How can I add hair space   instead of regular space in this string?
In HTML   is a so-called decimal numeric character reference:
The ampersand must be followed by a "#" (U+0023) character, followed by one or more ASCII digits, representing a base-ten integer that corresponds to a Unicode code point that is allowed according to the definition below. The digits must then be followed by a ";" (U+003B) character.
Ruby has the \u escape sequence. However it expects the following characters to represent a hexadecimal (base-sixteen) integer. That's 200A. You also have to use a double-quoted string literal which means now the \ character needs to be escaped with another one:
"\\&\u200A"
Alternatively just use it directly:
'\& '

string has trailing whitespaces that aren't white spaces? (i.e. strip doesn't get rid of it)

I have the following string I got from parsing some html:
"this is my string  "
If I use .strip or .rstrip the string remains the same.
However if I literally type the string "this is my string " and type .strip then the trailing spaces get stripped.
This leads me to believe the string I obtained from parsing html is not containing trailing white spaces. So the question I have is, 1) what is trailing the string if it isn't a white space? and 2) how do I get rid of it?
The unicode table contains several whitespace characters, and it is possible that all of these characters are not handle by the strip methods. If you want to use a regular expression with the sub method, you can try this simple pattern: /\p{Space}+\z/ or /[[:space:]]+\z/ to trim all the blank characters on the right. (obviously, the replacement string must be empty)
Note: the \s is equivalent to [ \t\r\n\f] in Ruby and doesn't contain all whitespaces of the unicode table.

MacVim Replace All Issue

I have an html file that I need to replace some characters with html entities. Right now I'm trying to replace — with — but when I use the Replace All button, the result is that all of those instances of — are replaced with —mdash;
I thought maybe escaping the "&" will work, so I changed the Replace with value to \— but that just results in \—mdash;
The strange thing is that if I go to each, one by one, i.e., click Next, then click Replace, and so on, then it replaces it correctly.
Is this a bug in MacVim? Or am I missing something?
Enter into command line:
:%s/—/\—/g
Also it's possible to get character code. Place your cursor on the character and press ga. Use decimal, hex or octal code into replacement string:
\%d match specified decimal character
\%x match specified hex character
\%o match specified octal character
\%u match specified multibyte character
\%U match specified large multibyte character
:%s/\%d8212/\$mdash;/g

How can I strip tab characters from a string in Ruby?

I have a program that loads some tab-separated lines into a MySQL table. One of the values has tabs in it, which is causing some problems. The data is created column by column, so I need to find a way to strip the tab character out of an individual field with gsub. I do not, however, want to get rid of anything else, like spaces.
It's really easy \t is the tab character.
result = string.gsub /\t/, ''
or, in-place
string.gsub! /\t/, ''
\t is the escape character for tabs within strings. So you can just search for "\t" and replace that by a space or something.

Resources