Parse quoted-printable encoding content from .mht file - ruby

I am trying to get all the images from .mht file by using Nokogiri gem. But since the .mht file has quoted-printable encoding, all the images that I received, has weird characters in it:
<img alt='3D"AFC-Logo' src="3D%22https://upload.=" width='3D"75"' height='3D"75"'>
<img src="3D%22https://en.wikipedia.org/static/images/footer/wikimedia-butto=" width='3D"88"' height='3D"31"' alt='3D"Wikimedia'>
<img src="3D%22https://en.wikipedia.org/static/images/footer/poweredby_mediawiki_8=" alt='3D"Powered' width='3D"88"' height='3D"31"'>
This is the link to that .mht file: https://drive.google.com/file/d/1DtbgrFyCEcggAk1nqpZSluNhRt-k3t95/view?usp=sharing
And below is the code that I am using to get all the images from the .mht file:
html = File.open("1646037951.mht").read
image_links = get_image_links(html)
def get_image_links(html)
html_doc = Nokogiri::HTML(html)
nodes = html_doc.xpath("//img[#src]")
raise "No <img .../> tags!" if nodes.empty?
nodes.inject([]) do |uris, node|
puts node.to_s
uris << node.attr('src').strip
end.uniq
end
I have tried to parse it by using .unpack('M').first but it's still not working as it just returns the same result as above.
Or maybe Rails have something for this?

Related

How do I properly format code for desired appending output?

I'm writing new code and having problem getting desired output. The code reads an html file and finds tags. it outputs the url only. I insert additional code to complete the link. I'm trying to insert the url two times within the string.
####### Parse for <a> tags and save ############
with open("page1.html", 'r') as htmlb:
soup2 = BeautifulSoup(htmlb, 'lxml')
links = []
for link in soup2.findAll('a', attrs={'href': re.compile("^https://")}):
links.append(''"{link}"'<br>')
time.sleep(.1)
with open("page-2.html", 'w') as html:
html.write('{links}\n'.format(links=links))
This should give you the desired html output file:
import re
from bs4 import BeautifulSoup
import html
with open("page1.html", 'r') as htmlb:
soup2 = BeautifulSoup(htmlb, 'lxml')
with open("page2.html", 'w') as h:
for link in soup2.find_all('a'):
h.write("{}<br>".format(link.get('href'),link.get('href')))
This gives me want I want I guess, but not exactly. I would rather see it written out "https://whatever.com/text/text/" than to see "whatever.com/text/text"
####### Parse for <a> tags and save ############
with open("page1.html", 'r') as htmlb:
soup2 = BeautifulSoup(htmlb, 'lxml')
links = []
for link in soup2.findAll('a', attrs={'href': re.compile("^https://")}):
links.append('{0}</a><br>'.format(link,link))
with open("page-2.html", 'w') as html:
html.write('{links}\n'.format(links=links))

Replace image src in vml markup with globally available images using Nokogiri

Is it possible to find outlook specific markup via Capybara/Nokogiri ?
Given the following markup (erb <% %> tags are processed into regular HTML)
...
<div>
<!--[if gte mso 9]>
<v:rect
xmlns:v="urn:schemas-microsoft-com:vml" fill="true" stroke="false"
style="width:<%= card_width %>px;height:<%= card_header_height %>px;"
>
<v:fill type="tile"
src="<%= avatar_background_url.split('?')[0] %>"
color="<%= background_color %>" />
<v:textbox inset="0,0,0,0">
<![endif]-->
<div>
How can I get the list of <v:fill ../> tags ? (or eventually how can I get the whole comment if finding the tag inside a conditional comment is a problem)
I have tried the following
doc.xpath('//v:fill')
*** Nokogiri::XML::XPath::SyntaxError Exception: ERROR: Undefined namespace prefix: //v:fill
DO I need to somehow register the vml namespace ?
EDIT - following #ThomasWalpole approach
doc.xpath('//comment()').each do |comment_node|
vml_node_match = /<v\:fill.*src=\"(?<url>http\:[^"]*)"[^>]*\/>/.match(comment_node)
if vml_node_match
original_image_uri = URI.parse(vml_node_match['url'])
vml_tag = vml_node_match[0]
handle_vml_image_replacement(original_image_uri, comment_node, vml_tag)
end
My handle_vml_image_replacement then ends up calling the following replace_comment_image_src
def self.replace_comment_image_src(node:, comment:, old_url:, new_url:)
new_url = new_url.split('?').first # VML does not support URL with query params
puts "Replacing comment src URL in #{comment} by #{new_url}"
node.content = node.content.gsub(old_url, new_url)
end
But then it feels like the comment is actually no longer a "comment" and I can sometimes see the HTML as if it was escaped... I am most likely using the wrong method to change the comment text with Nokogiri ?
Here's the final code that I used for my email interceptor, thanks to #Thomas Walpole and #sschmeck for help along the way.
My goal was to replace images (linking to localhost) in VML markup with globally available images for testing with services like MOA or Litmus
doc.xpath('//comment()').each do |comment_node|
# Note : cannot capture beginning of tag, since it might span across several lines
src_attr_match = /.*src=\"(?<url>http\:[^"]*)"[^>]*\/>/.match(comment_node)
next unless src_attr_match
original_image_uri = URI.parse(src_attr_match['url'])
handle_comment_image_replacement(original_image_uri, comment_node)
end
WHich is later calling (after picking an url replacement strategy depending on source image type) :
def self.replace_comment_image_src(node:, old_url:, new_url:)
new_url = new_url.split('?').first
node.native_content = node.content.gsub(old_url, new_url)
end

How to delete tags with Nokogiri in SVG file

I have an SVG document that has image tags inside it. I am trying to parse it and remove all image tags with Nokogiri and then rewrite it to another file:
doc = File.open("cartes_des_risques.svg") { |f| Nokogiri::XML(f) }
doc.search('//image').each do |node|
node.remove
end
file = File.open("formmatted_map.svg", "w")
file.write(doc)
However doc.search('//image') returns an empty array.
The doc I am trying to modify is a big SVG file that embedds image tags that look like this:
<image a:adobe-opacity-share="1" width="338" height="338" xlink:href="
GXRFWHRTb2Z0d2FyZQBBZG9iZSBJbWFnZVJlYWR5ccllPAAAEelJREFUeNrs3dGu5CgShOHi1Lz/
K7Oau92VRprutiEy8wupr/qUDQnxO8GAPx8iIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIi
qqQlBDSov21NQMBK+hIAEzOQ/tJagEuMQvoG4BLzkH4AtMRQpN0JaInBtDMBLTEcaVuQJeYj7UlA
y4ikDQlkiSm1GwEsMShpLwJZRiVtRABLTKttCGSJeUmbEMAyMWkHAlhiaPEngCXGFncigGVwEm8C
WGJ0sSaAJWYXYyKAZXoSWwJY5icxJXAlEBBPIoAFAnEkAlhAIPErG0NgEENgEDsxAg8x0vHFjHrH
A0zEgynES/3BRQwYRazUGWTUm3HESF0JcMCVkcRHHcFHHZlKbNQNgNSNwcREvQiIwJXZRsVD+wKS
+jCeWKgLwKoLA4qDOhAogSsztouBNgRZ5WdKdVd+GgSokXCdas6l7ARSys2k6mwZGENXrtcouE4D
61Jm5W9k9q28Or26Ki8AKO8IuE4B61JW/WYADJSTQdSxUDm9MKsDBnBlGPULLieY1obEFjfmMaRW
HpCdCa+WcLWERxnBtD80wJWp2tdpKQPIAlhvuDomb075wBRAwJXZWtUFUCkJJODKdOXrssTj9fLb
QdULYuXh2gWsoCpzl62Bq47evA7TgVqxX+3BZQFXBgDVsBh0ncOd+BbfRgemGA9Vc7dzDL6H1BNY
lftamczbGq6DK0C1L/dyL5BteC9wHWaeiVBd+gPIXrgPuA4x0jSoAirAgiuwgmqhe4ApwIIrc405
DWoqUNcA8+2i1wbXpmAF1foxXExZHrDg2gisoFovftWmGbrt1Z8AV2AF1fZA7TZf22GNKrgCV3T5
qkHVSoKaBt6u2weuTkICVTDNMnKl7NVXEYoZBVR7PwBA9vy1wRVY277BXkPLCbDgOgqujt2rO0xf
Q/oDyD57za5wBdaB2eoaVL6b5ei8zhJcC7V/GlhBtcdUQmobd/j0yQ6PBbh+fM5jKlQtyTpvxMSM
syNcgTWoLMlQrQTUDnO0VZZTJcN1fNY6ZbvodKhaPZBn0CQwdoMrsDYEaxIMrRzINiu4NoVr94X4
oAqoFUy7G10DXD+zj5LrCtXVqH9MAmxXuALroPsnQjUpS63yUrPjutUdUA5wlTG2yOpSstSuRw1W
+4oquAKrbLUBVCcuw0pfVtUNrqOy1i5wA9Wsdqg4N5u6WH5frlMHuALroCmAVfTenWBaZRgMrsXu
O2mvOKjmxWEaZHfB3z4dB2AF1lZQ7bhioKIxb2WvO6T+I+B62iTTs9XKULWmNScDBFdgbZUhVwNj
daCuUIMmgKo6XFtnrRM+kVx9CmAVK2/aPSp8YXUX+V0HuAIrqF6BavclWKknVlWCq6wVWMtOAawi
5UwH6WmTgaustfVJ8qBaL7PvBNnT0ALXoHv+9aHJULVa4J/rsh+61v7N3+4Dv6HCmeTEbHUCVCct
v6r0cmofrl/VrPXV+3UEa9UpgOTfTAVqAmCnwLXVdEDHU41uD2+7QjX9RWeVowE7wlXW2hysFacA
Tvxm8vKrxOVVqaCUtQJrXLbaDapdl18lfddph/29rLVAZ5et1oTqtNOvbgMBXIFVtloIkqtYDLtA
NnHIDq6H7/fzoRsgWYF//8RqgdWgLZf+QKlgnZStdoDqdKA+Xaf0F5Jpo7d2DwIZ62xI/ik81oD2
XQd/++bf355v7zbqvBK4aoe7JL6wSjPZybZNXcfa4QXVDq3zjfZ87X4dPip3C6zToFqxryR9ebUy
LPel2FfZ9NEOrB2h+iYopy+/Sji9ahf7W1nrb8gca566QXV9cuZjnyzLOhDTlL7Q4Vt1pQtebRqg
8hRAupEqmSL9cJWKWW7FrFXGWsS406CalJ2eyD5PPaAXv9Wq4+SpgKrLTxKh2mXpVfLyqhXQPwD9
AliXYJc1zZ/MF3bMZP4UsLf7ctIyypG7vqZmrKt4Z03IWk4/SNeFqYb1yZqH77A8ckTStoZWvuLb
3JShYAVzpxwNePtFUtKLrFEvsSp+THA1uudUqJ7e574fvOb+xd/si3/rA4PFpwIqzaOkHF4xDao3
Vww8vX41YaRz0xeroW8jwdpZa3h9nliKtMLqf7pOVaaG+Oihsv40h4Js9Vy2lQ7Utx4aFeDaLbuW
scpWy0H6T+s+bYNABRjJWguC1ZPrXpaRBNXJXxCoMNTvsMGgTP+alLF6wfYuVKf3lQpDfTrUp3+m
VLShURIykM5fEagE10ovvEY8LP76UEVDp0A1zXT7pfK8tXb16bJOuG+Jtbk/l43QIdtNHdZVgeob
Z6Suz91dhckZpOmFAmCdnDWmZ6tvAbjS4dcVNgakQtN0wACwdnjbnHi/02+Ab87JPnHvm3Bd+ned
eslYa2UIVeZfk4D6dFmWPgLkwHr/pczkbHuFm3PSF2upCFind4QJmciE3VdJO65WYF+pksDIWDXA
1TidgOqUh+SEvfX8ZyogrhOlZReLCR/PslP7DK8Ca7ngr+b1nLyt9c26p31lw/uEl+okY53zcHjT
/CfXo55a/5oOvA4f72ybedvSWj9juL28ZoXE7o3trBW2slYr0wilZ6zTP6m9hrTPE9noGxnthA0B
5muDwDr5k7mrWB0Tvy7w/zB8c/ogFa6SgcYPBXOstacBbnXQSjuxbmxlTeojVVaUyFipzNM4cYdQ
1c+T+B7UsKwTWHXICsBI2I3VcZncahiX8l4D1l8LXKVhVdIaxcTPX1fJWkFPxmqIUCyDWIPbMWXL
bqU28K2u4Rlr94at8nXMrsvF0gAjqwVWCjGaw1jqlHOJQ61yA6uOoI73zwdwBKeMdew35FOGdSu8
zoal4GUqQAjadV4Gqpm1Jjxw9S9gpcKg8WA0EgRWktEOMtLkT4zIRpuDdQ3tHEvH91Cl+u0nY2Ve
WVud+qym9ZKxAhZdiv/JLwhUeaDpp6HyBQEPiOSHzvoX/+eEfJKxghb9QoZ6M+5LH+EfYNWYXWJi
U4I+Ur79Zax9OvNq0KErrwYBdwJWYGyZBVU80o+AlcgwVfwIWKmCYWSaRjbASuDL4ETASvMyaeAn
YCUiAlYiIgJWIiJgJSICVqJf1i52XSJgpddgBFxEwAqag8q2G7eTBxqwEh2HIfCIH7DqsDr0gzGQ
+RKwUnkw76Br7vD2IWAFKzEpVQbtONs3ce0vYwXp5HjuB/6mQzbOP8XkY4LzOu8qcE0wIlMBnqak
f0RcRz8FVmpk8C2OJcoFvMPBuod2HkuyPPSoQfvJWOeZy9582Tr4mwqgQibbA+sMaiANrDJSuvAg
2QfKt/Wv2mCd+mRN6bg7vM6M56EPrEKgg6vj9Wx18wSwUo+O+namvpm6VBl3474OrJ6AkfXrCte3
62Xe01QAOAaVu5Mhd2gbpTwsKrXBbu5dGeuljrcHmumJ6+0gs53afOLgbmClwdMBJ2HhzNTM0Q14
PxwfYDWlcPqaN7LX/Tn7UACq4bEC1nudZRfu3E9Bfx+oV+UHVFLZPSx+Qd8/+O06UL518PdP/+0T
f7MOluV34r3C2jwhQz8JoNQXg+Oz3fSDrt8+RPnWvSqX6e3y7X8J2sQXgBU3DchETQWYDnj5Polb
WPc//KsK1a7ZIwFrO0Cfzqa6G3hfvnbaHL2jJoF1ZIdIWLO6m/SX9FOu0kZM3UH5W3WSsd7vaCcN
8PaSoT2w7VP7DK/KWHWi0KH+78J1DzHphLWt/HcYrI46y8tadxAQKgD21JkBlfvKW/2kNT8mTAVo
8LtZSyJg9yfvzAD9DljpBTMkZ61P1DUBsKe3tt7obwln8Vby3Wiw+kjdO/e7uZX15AL4W1tbp3xl
oGu2/dv1krFmdZRpc2NvQPaNjQT7cxeqN/rPbuivY/o+cI1TWy7X4d+/cc5A8t7/Fdz21R6y+9Lf
kakACss67LjKjEn6dEKVzwQBa8EGTB32geu5of+JWKfB3DSAjLVUg94cIk7ZFPB0TLs8iOmBeP1o
sBYwfPqenTcFvFH2m1C9ma2aBmiSsU5qyNsG7LYp4K2yVtjWuof695q+D13n5Nvh5NUBv/L36X/3
dPsmrSBIXtMqW20gqwL6dKD081ZPbwx48/6VoCpbvVDW74OFkbXWykaT1q2uIkZOWSlwu24V3hO0
mAroDtY3oVUNrkltnX6K/m1YGa4Dayuw3gZhN7hWHBbe/nrFW1C31vlf6GdYZ0/sHAnDuulrV5+u
X4VzBSpBr9z3vX6GNVBqZ0jYRTVx7eobQE2AalISMJILk1cFVO0kiXD9b6hM3CCQ1J+6LZkq2Z++
D1/v9Lxbp7nWlL99si3XENNWfLEl83yxjt8XClfpJRa4nmvL1c08h7LUpJdQYAysZcGaBte32vRU
P6k835gC1elbcceBVdZ65u9Ptu0KN2HaSoFUAE35BPoxsE7JWqfCNWlYX8V4SRmibPWAfoZ3+E5P
fKdXvdcOicuvkjPb8UsvHcLSC377gPGnALbC8is7pkIfBN8XC2uutdYwf8LyqlMGS4PqyVh0HUEC
64VrdIHrk+1WBbIJnz9JnvaR4Q4HazW4rtB7vNV23Q667gzV3axtjt/v+3KhZa114fp2+60mZj05
ZE6Gqmy1MVjBtR5gn7pXpUwnNYu8Dcc2cAbWDAhVgGvacL6q+Xfwb6YB8bVyfw8UHlzzQAmwNYDa
FcTtpxKsY609tLw1HzZlg8BTRyF2hqps9VLGKmvNHuJPW1510nAV5mEToNruIQ2s4PpG+1SD7A66
nrfyDbSa3msqXNMAmwraHXrtXeR31bPV1+/5PViZqWCtBtdTbbU6GelTax7WFACwguvFet/MNFcR
c+6LvzdtEFr3b6hZUu+5hv02dSjfxaAVoSpbBVZwDcvawbTmtEEXGB4r//dC5UwJ1Lz3ZMhWXoKV
CNX2UxHACq4ge8b8FediS2eNN+/7vVRJcK3/+46Q3aHXvQ1Va2QHAg5cs9siGbQ7/PoJw/cOUwDH
7/u92Km7QL1L5tl5s0jFw5NBtbAmgjUta03LPJMyzFXMrNVXDXQE4ZV6fMONMwmuSdlr+hC+q3F3
o2uMzVYTzLMa3TspY5x+qErFTAhUm2SrCRkruNaBK8BmfYTQ8F/GCq4NrjMVsjv8uslbTEdmq8A6
B64nYtwJtLvI9UE1NPP+BnXm1ezea8C1uoB2F7wPqAYLWGfC9Va8E2Bb/WBla0sL1OkbFhRwrV2+
hLJ03hdfAapegIUO18AVYGVNoFoa6sAKrgA7C1agOnAqoHPWWg2wIHvf0KBqKkDW2nwoD7LnQFLp
Tf0e0B6tzAKuuTGbCNld+B6gaioAXIvFrStkd4N77SbxMRXQtHxVh9lLn4oHBag2hHsFE4Br/Rim
9bPuQ1hQNRUAro2un1SuVONVhp7hf8Nh26QDY7zh76Xd4B6+9Aqs4AqygAqqYAWuAAumoNoCqn/r
W7DjrqFlsYQKUE/fy5zqMAOtoeWxfGouTLvCuyXkgVVGCbQ1zA2qAAWug2LQGbQTt2yC6gP6Fu/4
4JoXBzuvAHW8ljoA7LCypQNki0l92HcZxoFrr3ZdUwwYUAdQZUBwHdy+HTX6E9GdH5TAOqd8IAsc
oMps4KrNAXUYtNq8QOtoMkchgixYgCpjgavygOloYLVb6tXZTEs59YvhcABVBgJXZQPSYbBquylh
gmmcOavPyP6UlUnUU3kZH1QBR12VuW75q5vcA0DHV1/tdLRenU28lZlh1VnZaTacRp2eNdWohtgE
TMrOoOqv/QBV+RlTDNSBpsNo7OHZDNkvDtoUiNSFCcVCfcBUfZhPPNSLAAhUGU5ctDnwqBuTiY06
Ao76MZb4qCuNAw2oMpM4qTO4qDfziJX6A4r6M4uYiQWIiAeBhLiJEWCID0CInxgCAqgCgxgSASoo
kFgSqBIYiCcRoAKBuBKBKgCQ+BKgEuOLMxGgMrxYEwEqs5OYE6ASk4s9ASoxtzYgAlSmJu1BgMrI
pF0IUImBtQ8RmDIuaSsCVGYlbUZgSkyq/QhMiTFJWxKYMiNpVwJTYkDSzmBKDEfaHEiJyUgfICAl
piL9AkiJgUhfAVBiFtKPwJMYgmhKfwNNIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiL6
H/1HgAEArVlNIehstlUAAAAASUVORK5CYII=" transform="matrix(0.24 0 0 0.24 839.2388 2258.6909)">
</image>
How can I modify this SVG file with Nokogiri?
Take a close look at the <image> tag. The SVG content is part of the value for the xlink:href parameter. If you delete that tag, you delete the SVG data, and end up with a document that doesn't contain the useful information, so that's probably not what you want to do.
Here's untested code showing how I'd read the file, remove the node, then write the content:
doc = Nokogiri::XML(File.read('cartes_des_risques.svg'))
doc.search('//image').remove
file.write('formmatted_map.svg', doc)
But, again, removing the node will result in an file with no image information:
require 'nokogiri'
doc = Nokogiri::XML('<image/>')
doc.search('//image').remove
doc.to_xml # => "<?xml version=\"1.0\"?>\n"

Nokogiri removing xml encoding

I am using nokogiri to decode some xml. This xml does have some html as values. I am seeing some strange behavior when parsing this. It appears nokogiri is removing some of the html encoded tags, so when i parse the html I am unable to decode it properly. See examples below:
doc = Nokogiri::XML '<?xml version="1.0"?><manifest
xmlns="http://www.imsglobal.org/xsd/imscp_v1p1"
identifier="Manifest-eaf97d26-aa83-4399-8e9b-ae9f6f5fc6a2"
xmlns="http://www.imsglobal.org/xsd/imscp_v1p1"
xmlns:imsmd="http://www.imsglobal.org/xsd/imsmd_v1p2"
xmlns:imsqti="http://www.imsglobal.org/xsd/imsqti_v2p1">
<imsmd:langstring><p>
 These are the<strong>instructions</strong> for the pool</p></imsmd:langstring>'
this yields the following value:
"<?xml version=\"1.0\"?>\n<manifest xmlns=\"http://www.imsglobal.org/xsd/imscp_v1p1\" xmlns:imsmd=\"http://www.imsglobal.org/xsd/imsmd_v1p2\" xmlns:imsqti=\"http://www.imsglobal.org/xsd/imsqti_v2p1\" identifier=\"Manifest-eaf97d26-aa83-4399-8e9b-ae9f6f5fc6a2\">\n<imsmd:langstring>p
 These are thestrong instructions/strong for the pool/p</imsmd:langstring></manifest>\n"
Notice how the < > tags are missing. However the following works as expected.
doc = Nokogiri::XML '<?xml version="1.0"?><imsmd:langstring><p>
 These are the<strong> instructions</strong> for the pool</p></imsmd:langstring>'
and gives the following result
"<?xml version=\"1.0\"?>\n<imsmd:langstring><p>
 These are the<strong> instructions</strong> for the pool</p></imsmd:langstring>\n"
I am sure I am missing something but can't figure out what is causing this.

I always get an UndefinedConversionError in Ruby 2.0 while scraping with Mechanize

When I try to submit a textarea with Mechanize and Ruby 2.0, I always get an
Encoding::UndefinedConversionError: U+0151 from UTF-8 to ISO-8859-1
Then I tryied to convert the text with Iconv, I got a similar result:
Iconv.iconv("LATIN1", "UTF-8", text)
I get this error message:
Iconv::IllegalSequence: "őzködik, melyet "...
As the text contains east-european characters. What can I do to avoid this kind of inconveniences or how can I convert properly between different encodings?
I have found an elegant solution:
replacements = [["À", "À"], ["Á", "Á"], ["Â", "Â"], ["Ã", "Ã"], ["Ä", "Ä"], ["Å", "Å"], ["Æ", "Æ"], ["Ç", "Ç"], ["È", "È"], ["É", "É"], ["Ê", "Ê"], ["Ë", "Ë"], ["Ì", "Ì"], ["Í", "Í"], ["Î", "Î"], ["Ï", "Ï"], ["Ð", "Ð"], ["Ñ", "Ñ"], ["Ò", "Ò"], ["Ó", "Ó"], ["Ô", "Ô"], ["Õ", "Õ"], ["Ö", "Ö"], ["Ø", "Ø"], ["Ù", "Ù"], ["Ú", "Ú"], ["Û", "Û"], ["Ü", "Ü"], ["Ý", "Ý"], ["Þ", "Þ"], ["ß", "ß"], ["à", "à"], ["á", "á"], ["â", "â"], ["ã", "ã"], ["ä", "ä"], ["å", "å"], ["æ", "æ"], ["ç", "ç"], ["è", "è"], ["é", "é"], ["ê", "ê"], ["ë", "ë"], ["ì", "ì"], ["í", "í"], ["î", "î"], ["ï", "ï"], ["ð", "ð"], ["ñ", "ñ"], ["ò", "ò"], ["ó", "ó"], ["ô", "ô"], ["õ", "õ"], ["ö", "ö"], ["ø", "ø"], ["ù", "ù"], ["ú", "ú"], ["û", "û"], ["ü", "ü"], ["ý", "ý"], ["þ", "þ"], ["ÿ", "ÿ"]]
def replace(str,replacements)
replacements.each {|replacement| str.gsub!(replacement[0], replacement[1])}
return str
end
my_string=replace(my_string,replacements)

Resources