Replacing part of text string with a page include in Classic ASP - vbscript

Not 100% sure if this is possible, but hoping there is a workaround. Several hours of searching bring nothing up. I have a text string written to the page from a Db table. If it contains a specific string, I would like to add a page include - example below does write:
<!--#include file="members.asp"-->
into the text, but does not pull the included file content in.
<%=Replace(myQuery("Text"), "123456", "%><!--#include file="mypage.asp"--><% ")%>
Client wants it in the page rather than at the top or bottom of the output which would be so easy (and we already do that) The include has to go in at a specific point in the text.
I would appreciate any help, even if it is to confirm that it is not possible to do this.

Here is the main page:
<!DOCTYPE html>
<html>
<head>
<title>Test</title>
</head>
<body onload="document.getElementById('placeholder').innerText = document.getElementById('alwaysfillme').innerText">
<p><%=Replace("ABCDEFGHIJKLMNOPQRSTUVWXYZ", "J", "<span id=""placeholder""></span>")%></p>
<span id="alwaysfillme" style="display:none;"><!--#include file="mypage.asp"--></span>
</body>
</html>
And here is what I stuck in "mypage.asp":
<% response.Write("--123--") %>
When a J is in the text, it displays:
ABCDEFGHI--123--KLMNOPQRSTUVWXYZ
When no J is in the text, it displays:
ABCDEFGHIKLMNOPQRSTUVWXYZ

<%=Replace(myQuery("Text"), "123456", ""&Server.Execute("mypage.asp")&"")%>

Related

Target case-insensitive schemes with URI::regexp?

I have this code to remove URIs using these schemes:
htmldoc.gsub(/#{URI::regexp(['http', 'https', 'ftp', 'mailto'])}/, '')
However, it won't detect a capitalized URI like HTTP or Http unless I add them to the array.
I tried adding the case-insensitive flag i to the regex, but it didn't work.
Any idea how I could achieve this?
URI::regexp calls the default parser's make_regexp which in turn passes the given arguments to Regexp::union and according to its docs: (emphasis mine)
The patterns can be Regexp objects, in which case their options will be preserved, or Strings.
Applied to your problem:
pattern = URI::regexp([/http/i, /https/i, /ftp/i, /mailto/i])
htmldoc = <<-HTML
<html>
<body>
here
here
</body>
</html>
HTML
puts htmldoc.gsub(pattern, '')
Output:
<html>
<body>
here
here
</body>
</html>

How do I replace tags defining a node?

We're trying to move from a rather small bug tracking system to Redmine. For our old system, there's no ready migration solution script available, so we want to do that ourselves.
I suggested using Nokogiri to move some of the formatting over to the new format (Textile), however, I ran into problems.
This is from the DB field in our old system's DB:
<ul>
<li>list item 1</li>
<li>list item 2</li>
</ul>
This needs to be translated into Textile, and it would look like this:
* list item 1
* list item 2
Now, starting to parse using Nokogiri, I'm here:
def self.handle_ul(page)
uls = page.css("ul")
uls.each {|ul|
lis = ul.css("li")
lis.each { |li|
li.inner_html = "*" << li.text << "\n"
}
}
end
This works like a charm. However, I need to do two replacements:
<li>
</li>
tags need to be removed from the <li> object, and:
<ul>
</ul>
tags need to be removed from the <ul> object. However, I cannot seem to find the actual tags in the object representing it. inner_html returned only the HTML between the tags I'm looking for:
ul.inner_html
Results in:
<li>list item 1</li>
<li>list item 2</li>
Where can I find the tags I need to replace? I thought about using parent and reassociate the child <li> tags with the parent.parent, but that would order them at the end of the grandparent.
Can I somehow access the whole HTML representation of an object, without stripping its defining tags out, so that I can replace them?
EDIT:
As requested, here is a mockup of an old DB entry and the style it should have in textile.
Before transformation:
Fixed for rev. 1.7.92.
<h4>Problems:</h4>
<ul>
<li>fixed.</li>
<li>fixed. New minimum 270x270</li>
<li>fixed.</li>
<li>fixed.</li>
<li>fixed.</li>
<li>fixed. Column types list is growing horizontally now.</li>
</ul>
After transformation:
Fixed for rev. 1.7.92.
h4.Problems:
* fixed.
* fixed. New minimum 270x270
* fixed.
* fixed.
* fixed.
* fixed. Column types list is growing horizontally now.
EDIT 2:
I tried to overwrite parts of the to_s method of the Nokogiri elements:
li.to_s["<li>"]=""
but that doesn't seem to be a valid lvalue (not that there is an error, it just doesn't do anything).
Here's the basis for such a transform:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<ul>
<li>list item 1</li>
<li>list item 2</li>
</ul>
EOT
puts doc.to_html
doc.search('ul').each do |ul|
ul.search('li').each do |li|
li.replace("* #{ li.text.strip }")
end
ul.replace(ul.text)
end
puts doc.to_html
Running that outputs:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><ul>
<li>list item 1</li>
<li>list item 2</li>
</ul></body></html>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>* list item 1
* list item 2
</body></html>
I didn't intend, or attempt, to make the first "item" have a leading carriage-return or line-feed. That's left as an exercise for the reader. Nor did I try to handle the <h4> tags or similar substitutions. From the answer code you should be able to figure out how to do it.
Also, I'm using Nokogiri::HTML to parse the HTML, which turns it into a full HTML document with the appropriate DOCTYPE header, <html> and <body> tags to mimic a full HTML document. That could be changed using Nokogiri::HTML::DocumentFragment.parse instead but wouldn't really make a difference in the output.
You may want to look at ClothRed, which is an HTML to Textile converter in Ruby. It hasn't been updated in a while, but it's simple and may be a good starting point for your own converter.
If you really want to use Nokogiri, you're writing a filter, so you may want to use the SAX interface.
You may want to try McBean (https://github.com/flavorjones/mcbean) [caveat: I'm the author of the gem, and it hasn't been updated in a while].
It's similar to ClothRed in spirit, but uses Nokogiri under the hood and actually transforms the document structure into output text. It supports substantial subset of Textile; and in fact I've used it successfully to convert wiki pages between wiki systems, as you're trying to do.
If anybody interested finds this later, another alternative is to use Pandoc. I've just did my first tests, and it seems almost sufficient, and it can do many more formats.

Select the xth element on a page that is a yth child of its parent

There are lots of similar questions, however I wasn't able to find an answer to this.
Imagine you have a HTML page like this:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Page title</title>
</head>
<body>
<div id="content">
<table>
<tr>
<td>A</td>
<td>B</td>
<td>C</td>
</tr>
<tr>
<td>D</td>
<td>E</td>
<td>F</td>
</tr>
</table>
</div>
</body>
</html>
and you want to select the second <td> element on the page that is a first child of its parent. In this case, it's the element <td>D</td>.
Note that this wording should be kept intact, for example it's not the same as selecting the second <tr> and then its first child (results in the same element), because the original page I'm working with is far more complex than this minimal testcase and this approach wouldn't work there.
What I have done so far:
A CSS selector #content td:first-child finds me A and D, now I am able to select the second element either via JS (document.querySelectorAll("query")[1]) or in Java (where I'm working with those elements in the end). However, it's quite inconsistent to use additional code for what could be done via a selector.
Similarly, I can use an XPath expression: id('content')//td[1]. It's the equivalent to the CSS selector above. It returns a node-set, so I thought that id('content')//td[1][2] will work the way I wanted, but no luck.
After some time, I discovered ( id('content')//td[1] )[2] to be working the way I want so I went for that and am quite happy with it.
Still, it's a letdown for me to see that I couldn't do a single query to get my element, and therefore an academic question is in place: Is there any other solution, either with a CSS selector, or an XPath expression to do my query? What did I miss? Can it be done?
CSS selectors currently don't provide any way to select the nth element in a set of globally-matched elements or the nth occurrence of some element in the entire DOM. The structural :nth-*() functional pseudo-classes that are provided by both Selectors 3 and Selectors 4 all count by the nth child of its parent matching the criteria, rather than by the nth element in the entire DOM.
The current Selectors syntax doesn't provide an intuitive way to say "this is the nth of a set of matched elements in the DOM"; even :nth-match() and :nth-last-match() in Selectors 4 have a pretty awkward syntax as they currently stand. So that is indeed a letdown.
As for XPath, the expression to use is (id('content')//td[1])[2], as you have already found. The outer () simply means "this entire subexpression should be evaluated before the [2] predicate" or "the [2] predicate should operate on the result of this entire subexpression, not just //td[1]." Without them, the expression td[1][2] would be treated collectively, with two conflicting predicates that would never work together (you can't have the same element be both first and second!).
Having parentheses around a subexpression doesn't make it an extra query per se; if it were, then you could consider each of id('content'), //td, [1] and [2] a "query" as well in its own right, with implied (or optional) parentheses. And that's a lot of queries :)
Use this simple XPath expression:
(//td[1])[2]
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|#*">
<xsl:copy-of select="(//td[1])[2]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<html lang="en">
<head>
<meta charset="utf-8" />
<title>Page title</title>
</head>
<body>
<div id="content">
<table>
<tr>
<td>A</td>
<td>B</td>
<td>C</td>
</tr>
<tr>
<td>D</td>
<td>E</td>
<td>F</td>
</tr>
</table>
</div>
</body>
</html>
the XPath expression is evaluated and the result of this evaluation is copied to the output:
<td>D</td>

Get HTML structure using Nokogiri

My task is to get the HTML structure of the document without data. From:
<html>
<head>
<title>Hello!</title>
</head>
<body id="uniq">
<h1>Hello World!</h1>
</body>
</html>
I want to get:
<html>
<head>
<title></title>
</head>
<body id="uniq">
<h1></h1>
</body>
</html>
There are a number of ways to extract data with Nokogiri, but I couldn't find a way perform the reverse task.
UPDATE:
The solution found is the combination of two answers I received:
doc = Nokogiri::HTML(open("test.html"))
doc.at_css("html").traverse do |node|
if node.text?
node.remove
end
end
puts doc
The output is exactly the one I want.
It sounds like you want to remove all the text nodes. You can do this like so:
doc.xpath('//text()').remove
puts doc
Traverse the document. For each node, delete what you don't want. Then write out the document.
Remember that Nokogiri can change the document. Doc

Inserting an element in local HTML file

I am trying to write a Ruby script that would read a local HTML file, and insert some more HTML (basically a string) into it after a certain #divid.
I am kinda noob so please don't hesitate to put in some code here.
Thanks
I was able to this by following...
doc = Nokogiri::HTML(open('file.html'))
data = "<div>something</div>"
doc.children.css("#divid").first.add_next_sibling(data)
And then (over)write the file with same data...
File.open("file.html", 'w') {|f| f.write(doc.to_html) }
This is a bit more correct way to do it:
html = '<html><body><div id="certaindivid">blah</div></body></html>'
doc = Nokogiri::HTML(html)
doc.at_css('div#certaindivid').add_next_sibling('<div>junk goes here</div>')
print doc.to_html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<div id="certaindivid">blah</div>
<div>junk goes here</div>
</body></html>
Notice the use of .at_css(), which finds the first occurrence of the target node and returns it, avoiding getting a nodeset back, and relieving you of the need to grab the .first() node.

Resources