ConditionalFormatting of cell containing string in CAxlsx - ruby

Using the CAxlsx gem (https://github.com/caxlsx/caxlsx), I'm trying to add conditional formatting to a range of cells, where the style should be applied if the cell contains the character -. Here's the snippet I'm using at the moment.
worksheet.add_conditional_formatting(range,
type: :containsText,
formula: "-",
dxfId: #styles[:invalid],
priority: 1)
Unfortunately, this doesn't seem to work. It does seem to apply the styling when the cell doesn't contain text, but a negative number, but that's not my use case. The documentation is severely lacking as well, and it doesn't offer a lot of explanation on what should be done in this case. (E.g., there's a cellIs type, with which the containsText operator can be used, but there's also a containsText type and no explanation as to what the difference between them is - and neither seem to work in my case.) Any pointers would be greatly appreciated, so far it's just been trial and error.

Assuming your range is something like "A1:A4" then formula you are looking for is NOT(ISERROR(SEARCH("-",A1))). Docs
Example:
require 'axlsx'
package = Axlsx::Package.new
workbook = package.workbook
s = workbook.styles.add_style({b:true, type: :dxf})
rows = ['a','b-c','d','e-f']
workbook.add_worksheet do |sheet|
rows.each do |row|
sheet.add_row [row]
end
sheet.add_conditional_formatting("A1:A4", { :type => :containsText,
:operator => :containsText,
:formula => 'NOT(ISERROR(SEARCH("-",A1)))',
:dxfId => s,
:priority => 1 })
end
package.serialize('conditional_test.xlsx')
I have found that the easiest way to determine the appropriate formula is to:
manually create a new excel work book
fill in an appropriate number of cells
conditionally format them in excel
save this workbook and close
change the extension to .zip (because xlsx is just zipped XML files)
open the zip file and navigate to /xl/worksheets/sheet1.xml
open this file and it will show you the formula used for the conditional formatting e.g.
<x14:conditionalFormattings>
<x14:conditionalFormatting xmlns:xm="http://schemas.microsoft.com/office/excel/2006/main">
<x14:cfRule type="containsText" priority="1" operator="containsText" id="{E75522C8-BC6E-4142-B282-D21DF586C852}">
<xm:f>NOT(ISERROR(SEARCH("-",A1)))</xm:f>
<xm:f>"-"</xm:f>
<x14:dxf>
<font>
<color rgb="FF9C0006"/>
</font>
<fill>
<patternFill>
<bgColor rgb="FFFFC7CE"/>
</patternFill>
</fill>
</x14:dxf>
</x14:cfRule>
<xm:sqref>A1:A4</xm:sqref>
</x14:conditionalFormatting>
</x14:conditionalFormattings>

Related

How do I hide columns using rubyXL 3.4.0 or later?

How do I hide columns using rubyXL 3.4.0 or later?
The example here (https://github.com/weshatheleopard/rubyXL/issues/145) appears to be out-of-date. (sheet.cols.find now returns an Enumerator, so doesn't have a hidden method.)
Code from rubyXL issue #145:
# Assuming that the cells/rows/cols are respectively locked in the test file:
doc = RubyXL::Parser.parse('test.xlsx')
sheet = doc.worksheets[0]
sheet.sheet_data.rows[0].hidden
=> nil
sheet.sheet_data.rows[1].hidden
=> true
(c = sheet.cols.find(0)) && c.hidden
=> nil
(c = sheet.cols.find(1)) && c.hidden
=> true
xf = doc.workbook.cell_xfs[c.style_index || 0]
xf.apply_protection && xf.protection.locked
=> true
xf.apply_protection && xf.protection.hidden
=> true
I figured it out:
The following code does generate a new workbook with column B hidden:
require "rubyXL"
require "rubyXL/convenience_methods"
workbook = RubyXL::Workbook.new
sheet = workbook.worksheets.first
sheet.add_cell(0,0, "Sam")
sheet.add_cell(0,1, "George")
sheet.add_cell(0,2, "John")
sheet.cols.get_range(1).hidden = true
workbook.write('hide_b.xlsx')
(I also put a copy of this code here: https://github.com/kurmasz/rubyXL-recipes)
the example you offer is not to hide columns but to get the hidden
status of columns
Edited to prevent misleading others
looks like rubyXL doesn't have feature to hide columns its
main purpose is to parse xlsx file the hidden attribute is only for
reading, you can't change it with rubyXL
actually you can, see the solution Zack figure out
I found this gem, write_xlsx to have capability to hide columns /
rows but in contrast, it's for creating xlsx they have
example in the github repo about how to hide I didn't find any
other gem that can achive this goal easily(hide specific columns from
an existed xlsx file) maybe you can
read the original xlsx file with rubyXL
use the data parsed from rubyXL and make a whole new xlsx file with write_xlsx gem (hide the target column in this step)
replace original xlsx file

Concept for recipe-based parsing of webpages needed

I'm working on a web-scraping solution that grabs totally different webpages and lets the user define rules/scripts in order to extract information from the page.
I started scraping from a single domain and build a parser based on Nokogiri.
Basically everything works fine.
I could now add a ruby class each time somebody wants to add a webpage with a different layout/style.
Instead I thought about using an approach where the user specifies elements where content is stored using xpath and storing this as a sort of recipe for this webpage.
Example: The user wants to scrape a table-structure extracting the rows using a hash (column-name => cell-content)
I was thinking about writing a ruby function for extraction of this generic table information once:
# extracts a table's rows as an array of hashes (column_name => cell content)
# html - the html-file as a string
# xpath_table - specifies the html table as xpath which hold the data to be extracted
def basic_table(html, xpath_table)
xpath_headers = "#{xpath_table}/thead/tr/th"
html_doc = Nokogiri::HTML(html)
html_doc = Nokogiri::HTML(html)
row_headers = html_doc.xpath(xpath_headers)
row_headers = row_headers.map do |column|
column.inner_text
end
row_contents = Array.new
table_rows = html_doc.xpath('#{xpath_table}/tbody/tr')
table_rows.each do |table_row|
cells = table_row.xpath('td')
cells = cells.map do |cell|
cell.inner_text
end
row_content_hash = Hash.new
cells.each_with_index do |cell_string, column_index|
row_content_hash[row_headers[column_index]] = cell_string
end
row_contents << [row_content_hash]
end
return row_contents
end
The user could now specify a website-recipe-file like this:
<basic_table xpath='//div[#id="grid"]/table[#id="displayGrid"]'
The function basic_table is referenced here, so that by parsing the website-recipe-file I would know that I can use the function basic_table to extract the content from the table referenced by the xPath.
This way the user can specify simple recipe-scripts and only has to dive into writing actual code if he needs a new way of extracting information.
The code would not change every time a new webpage needs to be parsed.
Whenever the structure of a webpage changes only the recipe-script would need to be changed.
I was thinking that someone might be able to tell me how he would approach this. Rules/rule engines pop into my mind, but I'm not sure if that really is the solution to my problem.
Somehow I have the feeling that I don't want to "invent" my own solution to handle this problem.
Does anybody have a suggestion?
J.

Ruby RDF query - extracting simple data from Seq and Bag items

I am receiving xml-serialised RDF (as part of XMP media descriptions in case that is relevent), and processing in Ruby. I am trying to work with rdf gem, although happy to look at other solutions.
I have managed to load and query the most basic data, but am stuck when trying to build a query for items which contain sequences and bags.
Example XML RDF:
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<rdf:Description rdf:about='' xmlns:dc='http://purl.org/dc/elements/1.1/'>
<dc:date>
<rdf:Seq>
<rdf:li>2013-04-08</rdf:li>
</rdf:Seq>
</dc:date>
</rdf:Description>
</rdf:RDF>
My best attempt at putting together a query:
require 'rdf'
require 'rdf/rdfxml'
require 'rdf/vocab/dc11'
graph = RDF::Graph.load( 'test.rdf' )
date_query = RDF::Query.new( :subject => { RDF::DC11.date => :date } )
results = date_query.execute(graph)
results.map { |result| { result.subject.to_s => result.date.inspect } }
=> [{"test.rdf"=>"#<RDF::Node:0x3fc186b3eef8(_:g70100421177080)>"}]
I get the impression that my results at this stage ("query solutions"?) are a reference to the rdf:Seq container. But I am lost as to how to progress. For the example above, I'd expect to end up, eventually, with an array ["2013-04-08"].
When there is incoming data without the rdf:Seq and rdf:li containers, I am able to extract the strings I want using RDF::Query, following examples at http://rdf.rubyforge.org/RDF/Query.html - unfortunately I cannot find any examples of more complex queries or RDF structures processed in Ruby.
Edit: In addition, when I try to find appropriate methods to use with the RDF::Node object, I cannot see any way to explore any further relations it may have:
results[0].date.methods - Object.methods
=> [:original, :original=, :id, :id=, :node?, :anonymous?, :unlabeled?, :labeled?, :to_sym, :resource?, :constant?, :variable?, :between?, :graph?, :literal?, :statement?, :iri?, :uri?, :valid?, :invalid?, :validate!, :validate, :to_rdf, :inspect!, :type_error, :to_ntriples]
# None of the above leads AFAICS to more data in the graph
I know how to get the same data in xpath (well, at least provided we always get the same paths in the serialisation), but feel it is not the best query language to use in this case (it's my backup plan, however, if it turns out too complex to implement an RDF-query solution)
I think you're correct when saying "my results at this stage ("query solutions"?) are a reference to the rdf:Seq container". RDF/XML is a really horrible serialisation format, instead think of the data as a graph. Here a picture of an RDF:Bag. RDF:Seq works the same and the #students in the example is analogous to the #date in your case.
So to get to the date literal, you need to hop one node further in the graph. I'm not familiar with the syntax of this Ruby library, but something like:
require 'rdf'
require 'rdf/rdfxml'
require 'rdf/vocab/dc11'
graph = RDF::Graph.load( 'test.rdf' )
date_query = RDF::Query.new({
:yourThing => {
RDF::DC11.date => :dateSeq
},
:dateSeq => {
RDF.type => RDF.Seq,
RDF._1 => :dateLiteral
}
})
date_query.execute(graph).each do |solution|
puts "date=#{solution.dateLiteral}"
end
Of course, if you expect the Seq to actually to contain multiple dates (otherwise it wouldn't make sense to have a Seq), you will have to match them with RDF._1 => :dateLiteral1, RDF._2 => :dateLiteral2, RDF._3 => :dateLiteral3 etc.
Or for a more generic solution, match all the properties and objects on the dateSeq with:
:dateSeq => {
:property => :dateLiteral
}
and then filter out the case where :property ends up being RDF:type while :dateLiteral isn't actually the date but RDF:Seq. Maybe the library has also a special method to get all the Seq's contents.

Adding formatted break with Nokogiri

I'm trying to add a few elements to an already existing XML document. The following code is successful at adding the desired nodes and content, however it doesn't format the inserted elements. All the added elements end up on one line instead of with line breaks and indentations after each element.
Any suggestions about how I could add this formatting?
The code is:
doc.xpath("//tei:div[#xml:id='versionlog']", {"tei" => "http://www.tei-c.org/ns/1.0"}).each do |node|
new_entry = Nokogiri::XML::Node.new "div", doc
new_entry["xml:id"] = "v_#{ed_no}"
head = Nokogiri::XML::Node.new "head", doc
head.content = "Description of changes for #{ed_no}"
new_entry.add_child(head)
para = Nokogiri::XML::Node.new "p", doc
para.content = "#{version_description}"
new_entry.add_child(para)
node.add_child(new_entry)
end
Why is it important that the XML not be on one line? It's purely cosmetic having "pretty-printed" XML, and not required by the XML spec or the parser when the XML is reloaded. Personally, I'd recommend having no formatting for your transfer speed and reduced disk size, but YMMV.
You can either run the XML through an XML beautifier, or play a game with Nokogiri along the lines of:
new_entry.add_child(para.to_xml + "\n")
The line break will be added as a text node between the tags, but it's benign and not significant to XML's ability to deliver its payload.
If you insist, "How do I pretty-print HTML with Nokogiri?" describes how to get there.

Can I avoid transposing an array in Ruby on Rails?

I have a Rails app that has a COUNTRIES list with full country names and abbreviations created inside the Company model. The array for the COUNTRIES list is used for a select tag on the input form to store abbreviations in the DB. See below. VALID_COUNTRIES is used for validations of abbreviations in the DB. FULL_COUNTRIES is used to display the full country name from the abbreviation.
class Company < ActiveRecord::Base
COUNTRIES = [["Afghanistan","AF"],["Aland Islands","AX"],["Albania","AL"],...]
COUNTRIES_TRANSFORM = COUNTRIES.transpose
VALID_COUNTRIES = COUNTRIES_TRANSPOSE[1]
FULL_COUNTRIES = COUNTRIES_TRANSPOSE[0]
validates :country, inclusion: { in: VALID_COUNTRIES, message: "enter a valid country" }
...
end
On the form:
<%= select_tag(:country, options_for_select(Company::COUNTRIES, 'US')) %>
And to convert back the the full country name:
full_country = FULL_COUNTRIES[VALID_COUNTRIES.index(:country)]
This seems like an excellent application for a hash, except the key/value order is wrong. For the select I need:
COUNTRIES = {"Afghanistan" => "AF", "Aland Islands" => "AX", "Albania" => "AL",...}
While to take the abbreviation from the DB and display the full country name I need:
COUNTRIES = {"AF" => "Afghanistan", "AX" => "Aland Islands", "AL" => "Albania",...}
Which is a shame, because COUNTRIES.keys or COUNTRIES.values would give me the validation list (depending on which hash layout is used).
I'm relatively new to Ruby/Rails and am looking for the more Ruby-like way to solve the problem. Here are the questions:
Does the transpose occur only once, and if so, when is it executed?
Is there a way to specify the FULL_ and VALID_ lists that do not require the transpose?
Is there a better or reasonable alternate way to do this? For instance, VALID_COUNTRIES is COUNTRIES[x][1] and FULL_COUNTRIES is COUNTRIES[x][0], but VALID_ must work with the validation.
Is there a way to make a hash work with just one hash rather then one for the select_tag and one for converting the abbreviations in the DB back to full names for display?
1) Does the transpose occur only once, and if so, when is it executed?
Yes at compile time because you are assigning to constants if you want it to be evaluated every time use a lambda
FULL_COUNTRIES = lambda { COUNTRIES_TRANSPOSE[0] }
2) Is there a way to specify the FULL_ and VALID_ lists that do not require the transpose?
Yes use a map or collect (they are the same thing)
VALID_COUNTRIES = COUNTRIES.map &:first
FULL_COUNTRIES = COUNTRIES.map &:last
3) Is there a better or reasonable alternate way to do this? For instance, VALID_COUNTRIES is COUNTRIES[x][1] and FULL_COUNTRIES is COUNTRIES[x][0], but VALID_ must work with the validation.
See Above
4) Is there a way to make the hash work?
Yes I am not sure why a hash isn't working as the rails docs say options_for_select will use hash.to_a.map &:first for the options text and hash.to_a.map &:last for the options value so the first hash you give should be working if you can clarify why it is not I can help you more.

Resources