Modifying parsed Atom feed in Ruby - ruby

I need to modify the contents of an Atom feed parsed using the standard RSS library. I've found documentation for parsing and generating feeds, but nothing about the correct way to modify existing structure (maybe it's deemed obvious?).
Specifically, I'm trying to add a content element to each entry, with a type of 'html' and wrapped in a CDATA section.
This is what I have so far:
feed = RSS::Parser.parse open(some_uri), true
feed.items.each do |item|
item.content = RSS::Atom::Feed::Entry::Content.new
item.content.type = 'html'
item.content.content = '<html>my content that i have</html>'
end
Is that the preferred way, and if so, how do I add the CDATA tag?

Related

How to add link to source code in Sphinx

class torch.FloatStorage[source]
byte()
Casts this storage to byte type
char()
Casts this storage to char type
Im trying to get some documentation done, i have managed to to get the format like the one shown above, But im not sure how to give that link of source code which is at the end of that function!
The link takes the person to the file which contains the code,But im not sure how to do it,
This is achieved thanks to one of the builtin sphinx extension.
The one you are looking for in spinx.ext.viewcode. To enable it, add the string 'sphinx.ext.viewcode' to the list extensions in your conf.py file.
In summary, you should see something like that in conf.py
extensions = [
# other extensions that you might already use
# ...
'sphinx.ext.viewcode',
]
I'd recommend looking at the linkcode extension too. Allows you to build a full HTTP link to the code on GitHub or such like. This is sometimes a better option that including the code within the documentation itself. (E.g. may have stronger permission on it than the docs themselves.)
You write a little helper function in your conf.py file, and it does the rest.
What I really like about linkcode is that it creates links for enums, enum values, and data elements, which I could not get to be linked with viewcode.
I extended the link building code to use #:~:text= to cause the linked-to page to scroll to the text. Not perfect, as it will only scroll to the first instance, which may not always be correct, but likely 80~90% of the time it will be.
from urllib.parse import quote
def linkcode_resolve(domain, info):
# print(f"domain={domain}, info={info}")
if domain != 'py':
return None
if not info['module']:
return None
filename = quote(info['module'].replace('.', '/'))
if not filename.startswith("tests"):
filename = "src/" + filename
if "fullname" in info:
anchor = info["fullname"]
anchor = "#:~:text=" + quote(anchor.split(".")[-1])
else:
anchor = ""
# github
result = "https://<github>/<user>/<repo>/blob/master/%s.py%s" % (filename, anchor)
# print(result)
return result

Removing XML tags when parsing XML

Using Ruby with Nokogiri is there an easy way to remove tags around returned results? I can't find one in the docs.
Example from the Nokogiri site:
characters[0].to_s # => "<character>Al Bundy</character>"
I was hoping to get:
Al Bundy
Try using the text method:
characters[0].text
You can use the .inner_html method. Here is an example you can use from a basic xml sitemap:
parse_content.css("url").each do |x|
location = x.css("loc").inner_html
last_mod = x.css("lastmod").inner_html
end
You can read about sitemaps here: https://www.sitemaps.org/protocol.html

Ruby Nokogiri - How to prevent Nokogiri from printing HTML character entities

I have a html which I am parsing using Nokogiri and then generating a html out of this like this
htext= File.open(input.html).read
h_doc = Nokogiri::HTML(htmltext)
/////Modifying h_doc//////////
File.open(output.html, 'w+') do |file|
file.write(h_doc)
end
Question is how to prevent NOkogiri from printing HTML character entities (< >, & ) in the final generated html file.
Instead of HTML character entities (< > & ) I want to print actual character (< ,> etc).
As an example it is printing the html like
<title><%= ("/emailclient=sometext") %></title>
and I want it to output like this
<title><%= ("/emailclient=sometext")%></title>
So... you want Nokogiri to output incorrect or invalid XML/HTML?
Best suggestion I have, replace those sequences with something else beforehand, cut it up with Nokogiri, then replace them back. Your input is not XML/HTML, there is no point expecting Nokogiri to know how to handle it correctly. Because look:
<div>To write "&", you need to write "&amp;".</div>
This renders:
To write "&", you need to write "&".
If you had your way, you'd get this HTML:
<div>To write "&", you need to write "&".</div>
which would render as:
To write "&", you need to write "&".
Even worse in this scenario, say, in XHTML:
<div>Use the <script> tag for JavaScript</div>
if you replace the entities, you get undisplayable file, due to unclosed <script> tag:
<div>Use the <script> tag for JavaScript</div>
EDIT I still think you're trying to get Nokogiri to do something it is not designed to do: handle template HTML. I'd rather assume that your documents normally don't contain those sequences, and post-correct them:
doc.traverse do |node|
if node.text?
node.content = node.content.gsub(/^(\s*)(\S.+?)(\s*)$/,
"\\1<%= \\2 %>\\3")
end
end
puts doc.to_html.gsub('<%=', '<%=').gsub('%>', '%>')
You absolutely can prevent Nokogiri from transforming your entities. Its a built in function even, no voodoo or hacking needed. Be warned, I'm not a nokogiri guru and I've only got this to work when I'm actuing directly on a node inside document, but I'm sure a little digging can show you how to do it with a standalone node too.
When you create or load your document you need to include the NOENT option. Thats it. You're done, you can now add entities to your hearts content.
It is important to note that there are about half a dozen ways to call a doc with options, below is my personal favorite method.
require 'nokogiri'
noko_doc = File.open('<my/doc/path>') { |f| Nokogiri.<XML_or_HTML>(f, &:noent)}
xpath = '<selector_for_element>'
noko_doc.at_<css_or_xpath>(xpath).set_attribute('I_can_now_safely_add_preformatted_entities!', '&&&&&')
puts noko_doc.at_xpath(xpath).attributes['I_can_now_safely_add_preformatted_entities!']
>>> &&&&&
As for as usefulness of this feature... I find it incredibly useful. There are plenty of cases where you are dealing with preformatted data that you do not control and it would be a serious pain to have to manage incoming entities just so nokogiri could put them back the way they were.

How to convert xml of a pcap type (PDML) to json and print it in proper format

Is their any way to convert xml to json and vice versa for xml of type pcap?
The to_json does convert it, but the whole output gets printed in one line. How can I get properly formatted output?
You could use the http://cobravsmongoose.rubyforge.org library to do just that. Here's a simple example based on the code from the link above:
require 'cobravsmongoose'
xml = '<pdml><packet><proto name="geninfo" pos="1" showname="General information" size="74">...' # PDML document contents
json = CobraVsMongoose.xml_to_json(xml)
# => "pdml":{"packet":{"proto":[{"#name":"geninfo","#pos":"1","#showname":"General information","#size":"74",...
I used the ICMP example at http://gd.tuwien.ac.at/.vhost/analyzer.polito.it/30alpha/docs/dissectors/PDMLSpec.htm for testing the conversion above.
To address the comment to the original question about how to pretty print the output, you can use the #pretty_generate method in the JSON library to do that:
require 'json'
pretty_json = JSON.pretty_generate(JSON.parse json) # same json as above
puts json

Using Nokogiri with multiple search elements

In this XML snippet I need to replace the data in the UID for some of the blocks. The actual file contains more than 100 similar blocks.
Although I have been able to extract subsets based on name="Track (Timeline)", I am struggling to reduce this subset to the specific block I need by also using the data in the <TrackID>, if name="Track (TimeLine)" and the text of <TrackID> is 0x1200 then set UID to xxxx.
I am new to Nokogiri and, although I write test scripts, I do not consider myself a programmer.
<StructuralMetadata key="06.0E.2B.34.02.53.01.01.0D.01.01.01.01.01.3B.00" length="116" name="Track (TimeLine)">
<EditRate>25/1</EditRate>
<Origin>0</Origin>
<Sequence>32-04-25-67-E7-A7-86-4A-9B-28-53-6F-66-74-65-6C</Sequence>
<TrackID>0x1200</TrackID>
<TrackName>Softel VBI Data</TrackName>
<TrackNumber>0x17010101</TrackNumber>
<UID>34-C1-B9-B9-5F-07-A4-4E-8F-F4-53-6F-66-74-65-6C</UID>
</StructuralMetadata>
<StructuralMetadata key="06.0E.2B.34.02.53.01.01.0D.01.01.01.01.01.3B.00" length="116" name="Track (TimeLine)">
<EditRate>25/1</EditRate>
<Origin>0</Origin>
<Sequence>35-12-2D-86-E6-74-0B-4C-B4-24-53-6F-66-74-65-6C</Sequence>
<TrackID>0x1300</TrackID>
<TrackName>Softel VBI Data</TrackName>
<TrackNumber>0x0</TrackNumber>
<UID>37-0C-80-34-4C-8D-CE-41-85-F3-53-6F-66-74-65-6C</UID>
</StructuralMetadata>
Using xpath:
//StructuralMetadata
will select all StructuralMetadata elements in your XML. The double slash at the start means to select nodes wherever they appear in the document.
You don't want all the nodes though, you can filter the ones you want with a predicate:
//StructuralMetadata[#name="Track (TimeLine)" and TrackID="0x1200"]
This will select all StructuralMetadata elements that have a name attribute with the value Track (TimeLine), and a TrackID child element with contents 0x1200.
As you're interested in the UID element, you can further refine the expression:
//StructuralMetadata[#name="Track (TimeLine)" and TrackID="0x1200"]/UID
This expression will match all the UID elements that are children of StructuralMetadata elements that match the predicate described above.
Putting this to use:
require 'nokogiri'
# Parse the document, assuming xml_file is a File object containing the XML
doc = Nokogiri::XML(xml_file)
# I'm assuming there is only one element in the document that matches
# the criteria, so I'm using at_xpath
node = doc.at_xpath('//StructuralMetadata[#name="Track (TimeLine)" and TrackID="0x1200"]/UID')
# At this point, doc contains a representation of the xml, and node points to
# the UID node within that representation. We can update the contents of
# this node
node.content = 'XXX'
# Now write out the updated XML. This just writes it to standard output,
# you could write it to a file or elsewhere if needed
puts doc.to_xml
A great way to approach this problem is with the ‘map reduce’ style of programming, which works to take a large list of things and narrow it down and combine it into the result you're after. Specifically, Array#find and Array#select are really useful for this sort of problem. Check out this example:
require 'nokogiri'
xml = Nokogiri::XML.parse(File.read "sample.xml")
element = xml.css('StructuralMetadata').find { |item|
item['name'] == "Track (TimeLine)" and item.css('TrackID').text == "0x1200"
}
puts element.to_xml
This little program first uses the CSS selector to get all of the <StructuralMetadata> elements in the document. It returns an array, which we can filter to just what we want using the Array#find method. Array#select is its cousin which returns an array of all the matching objects instead of the first one it happens to find.
Inside the block we have a test to check if the <StructuralMetadata> tag is the one we’re after. Then it puts the element.to_xml string to the console so you can see which thing it found if you run this as a command-line script. Now you can find the element, you can modify it in the usual way and save out a new XML file or whatever.

Resources