Replace content file in ruby - ruby

I would like to replace some text with whitespaces and spaces in ruby in all files.
toReplace = [
'<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\"
\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">
<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"pl\" xml:lang=\"pl\">
<head>'
]
replacement = [
'<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">'
]
I use gsub for this, but it doesn't work because there is problem with whitespaces and spaces.
contents.gsub! toReplace[i], replacement[i]
How can I do that?

You could try escaping the first string to avoid any characters being treated as special:
REPLACE = Regexp.escape(%Q[<!DOCTYPE...
])
WITH = %Q[
...
]
contents.gsub!(REPLACE, WITH)
Note that you should be using either a string or a regular expression, not an array as you have in your code.

Related

Recove data from xml file with ruby for manipulating

i have a problem. I must recover the data from a xml file inside my print machine becouse i want recove number of copy each day but i can see only total copy from start day of machine.
My script must recover from xml file the total copy day by day and with my manipulition the script subtracts the number each day.
I have alredy try use my little script like follow
require 'net/http'
require 'uri'
uri = URI.parse( "http://192.168.1.80/wcd/system_counter.xml" )
params = {'q'=>'cheese'}
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Get.new(uri.path)
request.set_form_data( params )
# instantiate a new Request object
request = Net::HTTP::Get.new( uri.path+ '?' + request.body )
response = http.request(request)
puts response.body
but when i try with other html page i have correctly response and i can see the code of page with my page i have this html page like response:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN">
<HTML lang="en">
<HEAD>
<TITLE></TITLE>
<meta http-equiv="Expires" content="0">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
<meta content="text/javascript" http-equiv="Content-Script-Type">
<noscript>
<meta http-equiv="refresh" content="0; URL=/wcd/js_error.xml">
</noscript>
</HEAD>
<BODY BGCOLOR="#ffffff" LINK="#000000" ALINK="#ff0000" VLINK="#000000" onload="location.replace('/wcd/index.html?access=SYS_COU');" >
</BODY>
</HTML>
When i go from browser i can see correctly number of copy.
What for your experiece the correctly mode for bypass this restriction with ruby code?
Thanks for all help.

Disappearing entities in XML fragment with nokogiri

I'm using Nokogiri to process fragments of XHTML documents, and am running into some behavior I cannot explain or workaround. I'm not sure if it's a bug, or something I don't understand.
Consider the following two lines, showcasing a reduced version of the problem I'm running into:
puts Nokogiri::XML::DocumentFragment.parse(" <pre><div>foo</div></pre>")
puts Nokogiri::XML::DocumentFragment.parse("<pre><div>foo</div></pre>")
This is the output:
<pre>div>foo/div></pre>
<pre><div>foo</div></pre>
The second line is what I expect, but the first one puzzles me. Where did the go? Why does its presence cause the < to disappear?
Based on matt's suggestion, I'm parsing the fragment by wrapping it in a full XHTML file, as that allows Nokogiri to know about the XHTML entities.
fragment = " <pre><div>foo</div></pre>"
head = <<HERE
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<meta charset="UTF-8" />
</head>
<body>
HERE
foot = <<HERE
</body>
</html>
HERE
puts Nokogiri::XML.parse( head + fragment + foot).css("body").children.to_xml
Feels a bit heavy handed, but it works.

regex to scan html and return the URL from a meta refresh tag

I'm trying to scan html content to find if the source code includes a meta refresh tag in order to get the URL.
Here are some of the cases of meta http-equiv="refresh" tags I've seen
<META HTTP-EQUIV="refresh" CONTENT="0;URL=https://example.de/">
<META HTTP-EQUIV="refresh" CONTENT="0; URL=https://example.com/test">
<meta http-equiv="refresh" content='0;URL=/test' />
<meta http-equiv='refresh' content='0; URL=/test' />
Here is he what I have come up with
$url = response.body.scan(/(CONTENT="0;URL=)(.*?)(">)/)
/(CONTENT="0;URL=)(.*?)(">)/ will work fine for the first instance without the space between ; and URL not for anything else.
Can someone help me with a regex that will work on all 4 scenarios?
Try this out:
$url = response.body.scan(/(CONTENT|content)=["']0;\s?URL=(.*?)(["']\s*\/?>)/)

Javascript and character encoding

In my ASP.NET MVC 3 project, I have set the character encoding in my master page
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
then, in my view, I have
<script type="text/javascript" charset='UTF-8'>
$(function () {
$('#my-btn').click(function () {
$(this).val('#MyProject.Resources.OrderButton');
});
});
</script>
what gives me the value Zamów onstead of Zamów. The resource file's first line is:
<?xml version="1.0" encoding="utf-8"?>
Any ideas how to fix it ?
The correct way to pass server side values to javascript variables is the following:
var value = #Html.Raw(Json.Encode(MyProject.Resources.OrderButton);
$(this).val(value);
This will output code which is completely safe and correctly encoded to be passed to a javascript function. This will also properly handle cases where your string contains characters such as ', new lines, ... which would have broken your javascript code.
And you should not care whether some characters are HTML or whatever encoded. The important thing is that they will be correctly encoded for a browser or an HTML compliant client to correctly consume.

dompdf special characters

I'm having successful html-to-pdf conversions, but not with special characters.
Below is just a special character I'm trying to display, which displays in browsers on my Mac, when I put it simply inside an html document. (but not on my windows box)
<?php
require_once("../dompdf_config.inc.php");
$html = '€';
$dompdf = new DOMPDF(); $html = iconv('UTF-8','Windows-1250',$html);
$dompdf->load_html($html);
$dompdf->render();
$dompdf->stream("contract.pdf");
exit(0);
?>
I keep getting a "?" (question mark) when the pdf is rendered.
I know there's been lots of issues documented with regards to special characters, but I thought I'd give this a try, with the code I'm actually using.
If DomPdf isn't a recommended html-to-pdf conversion tool, I'll take any other recommendations!
I have experienced problems with DOMPDF when converting an UTF-8 html page.
I simply solved the problem by adding
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Between < head > tag.
Maybe it could be an alternative if you set it with your encoding type.
IMPORTANT NOTE from comments below: don't use stream() and output() methods on the same pdf instance. If you do this wont work.
after trying all solutions on the net. I could solve without modifying the dompdf. the problem was on the html content. I just had to add the correct and appropriate HTML structure and setting the font of course . Tested on v0.6.0 and v0.6.1. here I leave the code
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="charset=utf-8" />
<style type="text/css">
* {
font-family: "DejaVu Sans Mono", monospace;
}
</style>
</head>
<body>your content ćčžšđ...</body>
</html>
DOMPDF Latin Turkish (Türkçe) char problem, my solution %100 Work.
Server requirenment control:
Char 'dejavu sans mono' (Turkish support) OR:
Step 1: dompdf_config.inc.php edit to
mb_internal_encoding('UTF-8');
def("DOMPDF_UNICODE_ENABLED", true);
Step 2: lib/fonts/dompdf_font_family_cache.dist.php edit to add code:
'futural' =>
array (
'normal' => DOMPDF_FONT_DIR . 'FUTURAL',
'bold' => DOMPDF_FONT_DIR . 'FUTURAL',
'italic' => DOMPDF_FONT_DIR . 'FUTURAL',
'bold_italic' => DOMPDF_FONT_DIR . 'FUTURAL',
),
Step 3: doqnload font files to copy lib/fonts folder.
Download font files Link http://www.2shared.com/file/k6hdky_b/fonts.html
Step 4: Your code Edit example:
require_once("dompdf_config.inc.php");
$html='<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<style>
html{padding:-30px;}
body { font-family: futural; }
</style>
</head><body>';
$html.='ı İ Ş ş ç Ç ö Ö ü Ü ğ Ğ ÿ þ ð ê ß ã Ù Ñ È » ¿ İsa Şahintürk';
$html.='</body></html>';
if ( isset( $html ) ) {
if ( get_magic_quotes_gpc() )
$html = stripslashes($html);
$old_limit = ini_set("memory_limit", "16M");
$dompdf = new DOMPDF();
$dompdf->load_html($html,'UTF-8');
$dompdf->set_paper('a4', 'portrait');// or landscape
$dompdf->render();
$dompdf->stream("limitless.pdf");
exit(0);
}
End Finish PDF > example
http://limitsizbilgi.com/pdf/dompdf-chartest.pdf
Anything prior to 0.6.x has limited support for characters outside iso-8859-1 encoding. The Euro is supported in 0.5.x by using the appropriate Windows ANSI character code (€), but otherwise you have to jump through some PDF encoding hoops.
The 0.6.0 release has better support for "special" characters. The default encoding is based on Windows ANSI (one of the few recognized by the PDF 1.3 spec). You can enable better character support by loading a Unicode-based font and enabling Unicode in dompdf and specifying that encoding in your document.
The following should work in dompdf 0.6.0 or greater:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
<body>
<p>€</p>
</body>
</html>
(or to be lazy just use the euro entity € in your test)
There is a document outlining the steps needed to enable Unicode support in DOMPDF.
Plus read this answer for an overview of how to load fonts.
You must use another character set. For example dejavu Sans Mono. Add your code
<style>
*{
font-family:"DeJaVu Sans Mono",monospace;
}
</style>
I also mentioned it in this video.

Resources