Nokogiri replace inner text with <span>ed words - ruby

Here's an example HTML fragment:
<p class="stanza">Thus grew the tale of Wonderland:<br/>
Thus slowly, one by one,<br/>
Its quaint events were hammered out -<br/>
And now the tale is done,<br/>
And home we steer, a merry crew,<br/>
Beneath the setting sun.<br/></p>
I need to surround each word with a <span id="w0">Thus </span> like this:
<span id='w1'>Anon,</span> <span id='w2'>to</span> <span id='w3'>sudden</span>
<span id='w4'>silence</span> <span id='w5'>won,</span> ....
I written this which creates the new fragment. How do I replace/swap the new for old?
def callchildren(n)
n.children.each do |n| # call recursively until arrive at a node w/o children
callchildren(n)
end
if n.node_type == 3 && n.to_s.strip.empty? != true
new_node = ""
n.to_s.split.each { |w|
new_node = new_node + "<span id='w#{$word_number}'>#{w}</span> "
$word_number += 1
}
# puts new_node
# HELP? How do I get new_node swapped in?
end
end

My attempt to provide a solution for your problem:
require 'nokogiri'
Inf = 1.0/0.0
def number_words(node, counter = nil)
# define infinite counter (Ruby >= 1.8.7)
counter ||= (1..Inf).each
doc = node.document
unless node.is_a?(Nokogiri::XML::Text)
# recurse for children and collect all the returned
# nodes into an array
children = node.children.inject([]) { |acc, child|
acc += number_words(child, counter)
}
# replace the node's children
node.children = Nokogiri::XML::NodeSet.new(doc, children)
return [node]
end
# for text nodes, we generate a list of span nodes
# and return it (this is more secure than OP's original
# approach that is vulnerable to HTML injection)n
node.to_s.strip.split.inject([]) { |acc, word|
span = Nokogiri::XML::Node.new("span", node)
span.content = word
span["id"] = "w#{counter.next}"
# add a space if we are not at the beginning
acc << Nokogiri::XML::Text.new(" ", doc) unless acc.empty?
# add our new span to the collection
acc << span
}
end
# demo
if __FILE__ == $0
h = <<-HTML
<p class="stanza">Thus grew the tale of Wonderland:<br/>
Thus slowly, one by one,<br/>
Its quaint events were hammered out -<br/>
And now the tale is done,<br/>
And home we steer, a merry crew,<br/>
Beneath the setting sun.<br/></p>
HTML
doc = Nokogiri::HTML.parse(h)
number_words(doc)
p doc.to_xml
end

Given a Nokogiri::HTML::Document in doc, you could do something like this:
i = 0
doc.search('//p[#class="stanza"]/text()').each do |n|
spans = n.content.scan(/\S+/).map do |s|
"<span id=\"w#{i += 1}\">" + s + '</span>'
end
n.replace(spans.join(' '))
end

Related

Simplify similar loops into one

I am writing a braille converter. I have this method to handle the top line of a braille character:
def top(input)
braille = ""
#output_first = ""
#top.each do |k, v|
input.chars.map do |val|
if k.include?(val)
braille = val
braille = braille.gsub(val, v)
#output_first = #output_first + braille
end
end
end
#output_first
end
I'm repeating the same each loop for the middle and bottom lines of a character. The only thing that is different from the method above is that the #top is replaced with #mid and #bottom to correspond to the respective lines.
Trying to figure a way to simplify the each loop so I can call it on top, mid and bottom lines.
You can put the loop in a separate method.
def top(input)
#output_first = handle_line(#top)
end
def handle_line(line)
result = ''
line.each do |k, v|
input.chars.map do |val|
if k.include?(val)
braille = val
braille = braille.gsub(val, v)
result = result + braille
end
end
end
result
end
You can then call handle_line in your #mid and #bottom processing
I'm not sure whats in the #top var but I believe braille has limited number of characters and therefore I would consider some map structure
BRAILLE_MAP = {
'a' => ['..',' .','. '], # just an example top,mid,bot line for character
'b' => ['..','..',' '],
# ... whole map
}
def lines(input)
top = '' # representation of each line
mid = ''
bot = ''
input.each_char do |c|
representation = BRAILLE_MAP[c]
next unless representation # handle invalid char
top << representation[0] # add representation to each line
mid << representation[1]
bot << representation[2]
end
[top,mid,bot] # return the lines
end
There may be better way to handle those 3 variables, but I cant think of one right now

Merging Ranges using Sets - Error - Stack level too deep (SystemStackError)

I have a number of ranges that I want merge together if they overlap. The way I’m currently doing this is by using Sets.
This is working. However, when I attempt the same code with a larger ranges as follows, I get a `stack level too deep (SystemStackError).
require 'set'
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten!
sets_subsets = set.divide { |i, j| (i - j).abs == 1 } # this line causes the error
puts sets_subsets
The line that is failing is taken directly from the Ruby Set Documentation.
I would appreciate it if anyone could suggest a fix or an alternative that works for the above example
EDIT
I have put the full code I’m using here:
Basically it is used to add html tags to an amino acid sequence according to some features.
require 'set'
def calculate_formatting_classes(hsps, signalp)
merged_hsps = merge_ranges(hsps)
sp = format_signalp(merged_hsps, signalp)
hsp_class = (merged_hsps - sp[1]) - sp[0]
rank_format_positions(sp, hsp_class)
end
def merge_ranges(ranges)
set = Set.new
ranges.each { |r| set << r.to_set }
set.flatten
end
def format_signalp(merged_hsps, sp)
sp_class = sp - merged_hsps
sp_hsp_class = sp & merged_hsps # overlap regions between sp & merged_hsp
[sp_class, sp_hsp_class]
end
def rank_format_positions(sp, hsp_class)
results = []
results += sets_to_hash(sp[0], 'sp')
results += sets_to_hash(sp[1], 'sphsp')
results += sets_to_hash(hsp_class, 'hsp')
results.sort_by { |s| s[:pos] }
end
def sets_to_hash(set = nil, cl)
return nil if set.nil?
hashes = []
merged_set = set.divide { |i, j| (i - j).abs == 1 }
merged_set.each do |s|
hashes << { pos: s.min.to_i - 1, insert: "<span class=#{cl}>" }
hashes << { pos: s.max.to_i - 0.1, insert: '</span>' } # for ordering
end
hashes
end
working_hsp = [Range.new(7, 136), Range.new(143, 178)]
not_working_hsp = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
sp = Range.new(1, 20).to_set
# working
results = calculate_formatting_classes(working_hsp, sp)
# Not Working
# results = calculate_formatting_classes(not_working_hsp, sp)
puts results
Here is one way to do this:
ranges = [Range.new(73, 856), Range.new(82, 1145),
Range.new(116, 2914), Range.new(3203, 3241)]
ranges.size.times do
ranges = ranges.sort_by(&:begin)
t = ranges.each_cons(2).to_a
t.each do |r1, r2|
if (r2.cover? r1.begin) || (r2.cover? r1.end) ||
(r1.cover? r2.begin) || (r1.cover? r2.end)
ranges << Range.new([r1.begin, r2.begin].min, [r1.end, r2.end].max)
ranges.delete(r1)
ranges.delete(r2)
t.delete [r1,r2]
end
end
end
p ranges
#=> [73..2914, 3203..3241]
The other answers aren't bad, but I prefer a simple recursive approach:
def merge_ranges(*ranges)
range, *rest = ranges
return if range.nil?
# Find the index of the first range in `rest` that overlaps this one
other_idx = rest.find_index do |other|
range.cover?(other.begin) || other.cover?(range.begin)
end
if other_idx
# An overlapping range was found; remove it from `rest` and merge
# it with this one
other = rest.slice!(other_idx)
merged = ([range.begin, other.begin].min)..([range.end, other.end].max)
# Try again with the merged range and the remaining `rest`
merge_ranges(merged, *rest)
else
# No overlapping range was found; move on
[ range, *merge_ranges(*rest) ]
end
end
Note: This code assumes each range is ascending (e.g. 10..5 will break it).
Usage:
ranges = [ 73..856, 82..1145, 116..2914, 3203..3241 ]
p merge_ranges(*ranges)
# => [73..2914, 3203..3241]
ranges = [ 0..10, 5..20, 30..50, 45..80, 50..90, 100..101, 101..200 ]
p merge_ranges(*ranges)
# => [0..20, 30..90, 100..200]
I believe your resulting set has too many items (2881) to be used with divide, which if I understood correctly, would require 2881^2881 iterations, which is such a big number (8,7927981983090337174360463368808e+9966) that running it would take nearly forever even if you didn't get stack level too deep error.
Without using sets, you can use this code to merge the ranges:
module RangeMerger
def merge(range_b)
if cover?(range_b.first) && cover?(range_b.last)
self
elsif cover?(range_b.first)
self.class.new(first, range_b.last)
elsif cover?(range_b.last)
self.class.new(range_b.first, last)
else
nil # Unmergable
end
end
end
module ArrayRangePusher
def <<(item)
if item.kind_of?(Range)
item.extend RangeMerger
each_with_index do |own_item, idx|
own_item.extend RangeMerger
if new_range = own_item.merge(item)
self[idx] = new_range
return self
end
end
end
super
end
end
ranges = [Range.new(73, 856), Range.new(82, 1145), Range.new(116, 2914), Range.new(3203, 3241)]
new_ranges = Array.new
new_ranges.extend ArrayRangePusher
ranges.each do |range|
new_ranges << range
end
puts ranges.inspect
puts new_ranges.inspect
This will output:
[73..856, 82..1145, 116..2914, 3203..3241]
[73..2914, 3203..3241]
which I believe is the intended output for your original problem. It's a bit ugly, but I'm a bit rusty at the moment.
Edit: I don't think this has anything to do with your original problem before the edits which was about merging ranges.

A very specific Conway's Game of Life (Ruby beginner)

Looking for feedback on obvious logic errors on this, not optimizing. I keep getting weird tick counts on the end game message (ex: 1 tick turns into 11 ticks)
The largest error I can spot while running the code is on the 2nd tick, a very large amount of alive cells appear. I am too new to this to understand why, but it seems like the #alive_cells is not resetting back to 0 after each check.
Here is my entire code, its large but it should be child's play to anyone with experience.
class CellGame
def initialize
puts "How big do you want this game?"
#size = gets.chomp.to_i
#cell_grid = Array.new(#size) { Array.new(#size) }
#grid_storage = Array.new(#size) { Array.new(#size) }
#tick_count = 0
fill_grid_with_random_cells
end
def fill_grid_with_random_cells
#cell_grid.each do |row|
row.map! do |cell|
roll = rand(10)
if roll > 9
"•"
else
" "
end
end
end
check_cells_for_future_state
end
def check_for_any_alive_cells
#cell_grid.each do |row|
if row.include?("•")
check_cells_for_future_state
break
else
end_game_print_result
end
end
end
def check_cells_for_future_state
#cell_grid.each_with_index do |row, row_index|
row.each_with_index do |cell, cell_index|
#live_neighbors = 0
add_row_shift = (row_index + 1)
if add_row_shift == #size
add_row_shift = 0
end
add_cell_shift = (cell_index + 1)
if add_cell_shift == #size
add_cell_shift = 0
end
def does_this_include_alive(cell)
if cell.include?("•")
#live_neighbors +=1
end
end
top_left_cell = #cell_grid[(row_index - 1)][(cell_index - 1)]
does_this_include_alive(top_left_cell)
top_cell = #cell_grid[(row_index - 1)][(cell_index)]
does_this_include_alive(top_cell)
top_right_cell = #cell_grid[(row_index - 1)][(add_cell_shift)]
does_this_include_alive(top_right_cell)
right_cell = #cell_grid[(row_index)][(add_cell_shift)]
does_this_include_alive(right_cell)
bottom_right_cell = #cell_grid[(add_row_shift)][(add_cell_shift)]
does_this_include_alive(bottom_right_cell)
bottom_cell = #cell_grid[(add_row_shift)][(cell_index)]
does_this_include_alive(bottom_cell)
bottom_left_cell = #cell_grid[(add_row_shift)][(cell_index - 1)]
does_this_include_alive(bottom_left_cell)
left_cell = #cell_grid[(row_index)][(cell_index - 1)]
does_this_include_alive(left_cell)
if #live_neighbors == 2 || #live_neighbors == 3
#grid_storage[row_index][cell_index] = "•"
else
#grid_storage[row_index][cell_index] = " "
end
end
end
update_cell_grid
end
def update_cell_grid
#cell_grid = #grid_storage
print_cell_grid_and_counter
end
def print_cell_grid_and_counter
system"clear"
#cell_grid.each do |row|
row.each do |cell|
print cell + " "
end
print "\n"
end
#tick_count += 1
print "\n"
print "Days passed: #{#tick_count}"
sleep(0.25)
check_for_any_alive_cells
end
def end_game_print_result
print "#{#tick_count} ticks were played, end of game."
exit
end
end
I couldn't see where your code went wrong. It does have a recursive call which can easily cause strange behavior. Here is what I came up with:
class CellGame
def initialize(size)
#size = size; #archive = []
#grid = Array.new(size) { Array.new(size) { rand(3).zero? } }
end
def lives_on?(row, col)
neighborhood = (-1..1).map { |r| (-1..1).map { |c| #grid[row + r] && #grid[row + r][col + c] } }
its_alive = neighborhood[1].delete_at(1)
neighbors = neighborhood.flatten.count(true)
neighbors == 3 || neighbors == 2 && its_alive
end
def next_gen
(0...#size).map { |row| (0...#size).map { |col| lives_on?(row, col) } }
end
def play
tick = 0; incr = 1
loop do
#archive.include?(#grid) ? incr = 0 : #archive << #grid
sleep(0.5); system "clear"; #grid = next_gen
puts "tick - #{tick += incr}"
puts #grid.map { |row| row.map { |cell| cell ? '*' : ' ' }.inspect }
end
end
end
cg = CellGame.new 10
cg.play
The tick count stops but the program keeps running through the oscillator at the end.
I wanted to revisit this and confidently say I have figured it out! Here is my new solution - still super beginner focused. Hope it helps someone out.
class Game
# Uses constants for values that won't change
LIVE = "🦄"
DEAD = " "
WIDTH = 68
HEIGHT = 34
def initialize
# Sets our grid to a new empty grid (set by method below)
#grid = empty_grid
# Randomly fills our grid with live cells
#grid.each do |row|
# Map will construct our new array, we use map! to edit the #grid
row.map! do |cell|
if rand(10) == 1
LIVE # Place a live cell
else
DEAD # Place a dead cell
end
end
end
# Single line implimentation
# #grid.each {|row|row.map! {|cell|rand(10) == 1 ? LIVE : DEAD}}
loop_cells #start the cycle
end
def empty_grid
Array.new(HEIGHT) do
# Creates an array with HEIGHT number of empty arrays
Array.new(WIDTH) do
# Fills each array with a dead cell WIDTH number of times
DEAD
end
end
# Single line implimentation
# Array.new(HEIGHT){ Array.new(WIDTH) { DEAD } }
end
def print_grid # Prints our grid to the terminal
system "clear" # Clears the terminal window
# Joins cells in each row with an empty space
rows = #grid.map do |row|
row.join(" ")
end
# Print rows joined by a new line
print rows.join("\n")
# Single line implimentation
# print #grid.map{|row| row.join(" ")}.join("\n")
end
def loop_cells
print_grid # Start by printing the current grid
new_grid = empty_grid # Set an empty grid (this will be the next life cycle)
# Loop through every cell in every row
#grid.each_with_index do |row, row_index|
row.each_with_index do |cell, cell_index|
# Find the cells friends
friends = find_friends(row_index, cell_index)
# Apply life or death rules
if cell == LIVE
state = friends.size.between?(2,3)
else
state = friends.size == 3
end
# Set cell in new_grid for the next cycle
new_grid[row_index][cell_index] = state ? LIVE : DEAD
end
end
# Replace grid and start over
#grid = new_grid
start_over
end
def find_friends(row_index, cell_index)
# Ruby can reach backwards through arrays and start over at the end - but it cannot reach forwards. If we're going off the grid, start over at 0
row_fix = true if (row_index + 1) == HEIGHT
cell_fix = true if (cell_index + 1) == WIDTH
# You'll see below I will use 0 if one of these values is truthy when checking cells to the upper right, right, lower right, lower, and lower left.
# Check each neighbor, use 0 if we're reaching too far
friends = [
#grid[(row_index - 1)][(cell_index - 1)],
#grid[(row_index - 1)][(cell_index)],
#grid[(row_index - 1)][(cell_fix ? 0 : cell_index + 1)],
#grid[(row_index)][(cell_fix ? 0 : cell_index + 1)],
#grid[(row_fix ? 0 : row_index + 1)][(cell_fix ? 0 : cell_index + 1)],
#grid[(row_fix ? 0 : row_index + 1)][(cell_index)],
#grid[(row_fix ? 0 : row_index + 1)][(cell_index - 1)],
#grid[(row_index)][(cell_index - 1)]
]
# Maps live neighbors into an array, removes nil values
friends.map{|x| x if x == LIVE}.compact
end
def start_over
sleep 0.1
loop_cells
end
end
# Start game when file is run
Game.new

Ruby Array Loop

I am currently trying to create a ruby algorithm to execute the following:
l = Array.new
Given array is text in the form of an array and has three manifests each titled Section No. 1, Section No. 2, Section No. 3 respectively.
Put the entire text in one string by looping through the array(l) and adding each line to the one big string each time.
Split the string using the split method and the key word "Section No." This will create an array with each element being one section of the text.
Loop through this new array to create files for each element.
So far I have the following:
a = l.join ''
b = Array.new
b = a.split ("Section No.")`
How would I go writing the easiest method to the third part?
Should only be about 2-3 lines.
Output would be the creation of three files each named after the manifest titles.
"Complex Version"
file_name = "Section"
section_number = "1"
new_text = File.open(file_name + section_number, 'w')
i = 0
n= 1
while i < l.length
if (l[i]!= "SECTION") and (l[i+1]!= "No")
new_text.puts l[i]
i = i + 1
else
new_text.close
section_number = (section_number.to_i +1).to_s
new_text = File.open(file_name + section_number, "w")
new_text.puts(l[i])
new_text.puts(l[i+1])
i=i+2
end
end
b.each_with_index(1) do |text, index|
File.write "section_#{index}.txt", text
end
To answer your most basic question, you could probably get away with:
sections.each_with_index do |section, index|
File.open("section_#{index}.txt", 'w') { |file| file.print section }
end
Here's an alternate solution:
input_string = "This should be your manifest string"
starting_string = "Section No."
copy_input_string = input_string.clone
sections = []
while(copy_input_string.length > 0)
index_of_next_start = copy_input_string.index(starting_string, starting_string.length) || copy_input_string.length
sections.push(copy_input_string.slice!(0...index_of_next_start))
end
sections.each_with_index do |section, index|
File.open("section_#{index}.txt", 'w') { |file| file.print section }
end
create string s by putting a space between each string in l
s = l.join ' '
split on 'Section No.' - note that 'Section No.' no longer appears in a
a = s.split('Section No.')
throw away the part before the first section
a = a[1..-1]
create the files
a.each do |section|
File.open('Section' + section.strip[0], 'w') do |file_handle|
file_handle.puts section
end
end

How to do this in Ruby?

I have an array(list?) in ruby:
allRows = ["start","one","two","start","three","four","start","five","six","seven","eight","start","nine","ten"]
I need to run a loop on this to get a list that clubs elements from "start" till another "start" is encountered, like the following:
listOfRows = [["start","one","two"],["start","three","four"],["start","five","six","seven","eight"],["start","nine","ten"]]
Based on Array#split from Rails:
def group(arr, bookend)
arr.inject([[]]) do |results, element|
if (bookend == element)
results << []
end
results.last << element
results
end.select { |subarray| subarray.first == bookend } # if the first element wasn't "start", skip everything until the first occurence
end
allRows = ["start","one","two","start","three","four","start","five","six","seven","eight","start","nine","ten"]
listOfRows = group(allRows, "start")
# [["start","one","two"],["start","three","four"],["start","five","six","seven","eight"],["start","nine","ten"]]
If you can use ActiveSupport (for example, inside Rails):
def groups(arr, delim)
dels = arr.select {|e| == delim }
objs = arr.split(delim)
dels.zip(objs).map {|e,objs| [e] + objs }
end

Resources