Delete character in an XML file using Ruby - ruby

I am working with Ruby, and I want to delete all the \ characters from my XML file.
Here is my XML file:
<w:numId w:val=\"2\"/></w:numPr></w:pPr><w:bookmarkStart w:id=\"0\" w:name=\"__DdeLink__0_226207805\"/><w:bookmarkEnd w:id=\"0\"/><w:r><w:rPr></w:rPr><w:t>Serve high quality food</w:t></w:r></w:p>, <w:p><w:pPr><w:pStyle w:val=\"style17\"/><w:numPr><w:ilvl w:val=\"0\"/><w:numId w:val=\"2\"/></w:numPr></w:pPr><w:bookmarkStart w:id=\"0\" w:name=\"__DdeLink__0_226207805\"/><w:bookmarkEnd w:id=\"0\"/>

There's actually no backslash character (\) in your file. The backslash in your example simply escapes the following double-quote and prevents it terminating the string and thereby resulting in a syntax error due to an unterminated double-quote.
What you see when you print that string in IRB is actually not the backslash as is, but the backslash in combination with the following double-quote as an indication that the double-quote is escaped. The idea is kind of hard to grasp when you encounter it the first time. Have a look at "Escape sequences".
Saying it short and sweet, there is no backslash in your file so you can't remove it.
Let me explain with an example:
> text = "This is sample text for escape character\""
#=> "This is sample text for escape character\""
Is equivalent to:
> text = 'This is sample text for escape character"'
#=> "This is sample text for escape character\""
To remove the backslash (\) , just remove "
> text.tr!('"', '')
#=> "This is sample text for escape character"
I hope this makes it clear.

Thank you guys for you answers, here is what i dit and it worked as i wanted:
text = ''
File.open("#{temp_dir}/plan_report_template/word/document.xml").each { |line|
text << line
}
open("#{temp_dir}/plan_report_template/word/document.xml", "w") { |file| file.write(text.gsub('\"', '"')) }

Related

Why ruby controller would escape the parameters itself?

I am writing Ruby application for the back end service. There is a controller which would accept request from front-end.
Here is the case, there is a GET request with a parameter containing character "\n".
def register
begin
request = {
id: params[:key]
}
.........
end
end
The "key" parameter is passing from AngularJs as "----BEGIN----- \n abcd \n ----END---- \n", but in the Ruby controller the parameter became "----BEGIN----- \\n abcd \\n ----END---- \\n" actually.
Anyone has a good solution for this?
Yes, this is because of the ruby way to read the escape character. You can read the explanation right here: Escaping characters in Ruby
I got this issue once, and I just use gsub! to change the \\n to \n. What you should do is:
def register
begin
request = {
id: params[:key].gsub!("\\n", "\n")
}
.........
end
end
Remember, you have to use double quotation " instead of single quotation '. From the link I gave:
The difference between single and double quoted strings in Ruby is the way the string definitions represent escape sequences.
In double quoted strings, you can write escape sequences and Ruby will output their translated meaning. A \n becomes a newline.
In single quoted strings however, escape sequences are escaped and return their literal definition. A \n remains a \n.

Phoenix string interpolation containing \n and name="abc" in HTML

I want to generate HTML content in phoenix. I'm not able to use interpolation while adding name="abc". I get an error at ".
Using \ in text also shows the \, e.g. text = "This is an name=\"abc\" string" gives text = "This is an name=\"abc\" string".
Can anyone please suggest how I can have a raw string containing name="abc"?
The string does contain only name="abc", the problem is that when you see it in the terminal, Elixir escapes the double quotes, so you can copy and paste it to your code. If in doubt, use IO.puts(text), and it will print the text without doing any changes to it:
iex(1)> text = "This is an name=\"abc\" string"
"This is an name=\"abc\" string"
iex(2)> IO.puts text
This is an name="abc" string
:ok
If you want to interpolate some double quotes into a string, you might try this:
iex(1)> text = "This is a name=#{"abc"} string"
"This is a name=abc string"
That didn't work. You need to do something extra:
iex(16)> text = "This is a name=#{"\"abc\""} string"
"This is a name=\"abc\" string"
When you write something like this:
text = "This is a name="abc" string"
You should wonder how is it that elixir knows that the last quote is the quote that terminates the string. In other words, why doesn't elixir think that this is your string:
text = "This is a name="
and the rest of the line is just garbage that doesn't follow elixir's syntax? In order to tell elixir that the double quote after the = sign is not the end of the string--but that it's just another character within the string--you escape the double quote, like this:
text = "This is a name=\"abc\" string"
Now, elixir will see the double quote after string as the termination of the string.
Next, it is a complete hassle to escape double quotes within a string, so elixir provides a means of avoiding that with the ~s sigil:
iex(17)> text = ~s{This is a name="abc" string}
"This is a name=\"abc\" string"
With the ~s sigil, you can use various character pairs to surround your string, e.g. () or <> or | | or / /. You need to use a character pair that is not found within the string--otherwise you will run into the same problem as with interior double quotes.
Finally, you can prove that the string name=\"abc\" has only 10 characters, i.e. the characters name="abc", like this:
iex(13)> s1 = ~s{name="abc"}
"name=\"abc\""
iex(14)> String.length s1
10

Unable to substitute escaped characters in string

I have this string:
str = "no,\"contact_last_name\",\"token\""
=> "no,\"contact_last_name\",\"token\""
I want to remove the escaped double quoted string character \". I use gsub:
result = str.gsub('\\"','')
=> "no,\"contact_last_name\",\"token\""
It appears that the string has not substituted the double quote escape characters in the string.
Why am I trying to do this? I have this csv file:
no,"contact_last_name","token",company,urbanization,sec-"property_address","property_address",city-state-zip,ase,oel,presorttrayid,presortdate,imbno,encodedimbno,fca,"property_city","property_state","property_zip"
1,MARIE A JEANTY,1083123,,,,17 SW 6TH AVE,DANIA BEACH FL 33004-3260,Electronic Service Requested,,T00215,12/14/2016,00-314-901373799-105112-33004-3260-17,TATTTADTATTDDDTTFDDFATFTDDDTTFADTTDFAAADDATDAATTFDTDFTTAFFTTATFFF,017,DANIA BEACH,FL, 33004-3260
When I try to open it with CSV, I get the following error:
CSV.foreach(path, headers: true) do |row|
end
CSV::MalformedCSVError: Illegal quoting in line 1.
Once I removed those double quoted strings in the first row (the header), the error went away. So I am trying to remove those double quoted strings before I run it through CSV:
file = File.open "file.csv"
contents = file.read
"no,\"contact_last_name\",\"token\" ... "
contents.gsub!('\\"','')
So again my question is why is gsub not removing the specified characters? Note that this actuall does work:
contents.gsub /"/, ""
as if the string is ignoring the \ character.
There is no escaped double quote in this string:
"no,\"contact_last_name\",\"token\""
The interpreter recognizes the text above as a string because it is enclosed in double quotes. And because of the same reason, the double quotes embedded in the string must be escaped; otherwise they signal the end of the string.
The enclosing double quote characters are part of the language, not part of the string. The use of backslash (\) as an escape character is also the language's way to put inside a string characters that otherwise have special meaning (double quotes f.e.).
The actual string stored in the str variable is:
no,"contact_last_name","token"
You can check this for yourself if you tell the interpreter to put the string on screen (puts str).
To answer the issue from the question's title, all your efforts to substitute escaped characters string were in vain just because the string doesn't contain the character sequences you tried to find and replace.
And the actual problem is that the CSV file is malformed. The 6th value on the first row (sec-"property_address") doesn't follow the format of a correctly encoded CSV file.
It should read either sec-property_address or "sec-property_address"; i.e. the value should be either not enclosed in quotes at all or completely enclosed in quotes. Having it partially enclosed in quotes confuses the Ruby's CSV parser.
The string looks fine; You're not understanding what you're seeing. Meditate on this:
"no,\"contact_last_name\",\"token\"" # => "no,\"contact_last_name\",\"token\""
'no,"contact_last_name","token"' # => "no,\"contact_last_name\",\"token\""
%q[no,"contact_last_name","token"] # => "no,\"contact_last_name\",\"token\""
%Q#no,"contact_last_name","token"# # => "no,\"contact_last_name\",\"token\""
When looking at a string that is delimited by double-quotes, it's necessary to escape certain characters, such as embedded double-quotes. Ruby, along with many other languages, has multiple ways of defining a string to remove that need.

How to match any quoted strings containing Cyrillic symbols

Need parse a lot of text files and replace any quoted strings containing cyrillic symbols. They are may contains new lines, non-alphabetic characters and special symbols (for example '$' or escaped quote).
Can anyone help with regex?
From comments:
for example php code
function hello($word) {
$word2 = "ха-ха!";
echo "Привет, $word $word2\n";
}
hello('Мир');
I need match "ха-ха!", "Привет, $word $word2\n" and 'Мир'
This should work:
str = 'The cat is under the "таблица"'
regex = /"\p{Cyrillic}+.*?\.?"/ui
str.match(regex){|s| do_stuff_with_each_matching s}
# or...
str.gsub!(regex){|s| method_that_translates_russian s}
Check it out on live at http://rubular.com/r/0Mwbfinjvp.
http://www.ruby-doc.org/core-1.9.3/Regexp.html
".*[^a-zA-Z\d]+.*" matches any quoted character sequence containing at least one non-alphanumeric character.
i.e. it matches "aa$bb" and "a1$b1"
It doesn't match "aabb" or a$b.
Hope that this is what you want (Add required escaping).

ruby regex to remove extra \n

I have a mal-formatted .csv file which is caused by some extra \n. e.g.:
Name,Comment
"Peter","Good morning"
"Paul","How are you
"
"Mary","Fine"
The 2nd row ends with a unwanted, extra \n.
How can I remove all tailing \ns which are not followed by a double-quote " (assume the whole file is read into a string already)?
Don't read the whole thing into a string, use the standard CSV parser in 1.9 to read it. If you have that in, say, pancakes.csv, then:
require 'csv'
data = CSV.open('pancakes.csv').map { |r| r.map(&:strip) }
# or
data = CSV.open('pancakes.csv').map { |r| r.map(&:chomp) }
Then you'll have this in data:
[
["Name", "Comment"],
["Peter", "Good morning"],
["Paul", "How are you"],
["Mary", "Fine"]
]
So you can get your data all clean and nicely parsed quite simply. And if you just need to clean up the CSV for some other program that can't handled embedded newlines, then you can use CSV to write it back out again.
You don't need a Regexp for that. It's basically any double-quote on its own line:
csv_string.gsub("\n\"\n", "\"\n")
Why don't you just add a trailing double quote for lines which don't end in a double quote, and remove empty lines (lines that only have a double quote)?

Resources