String to kebab-case with SCSS/SASS - sass

With a little help from this gist, I now have a function for kebab-casing a string in SCSS:
#function replace($string, $substr, $newsubstr, $all: 0) {
$string: quote(#{$string});
$substr: quote(#{$substr});
$newsubstr: quote(#{$newsubstr});
$position-found: str-index($string, $substr);
$processed: ();
#while ($position-found and $position-found > 0) {
$length-substr: str-length($substr);
#if (1 != $position-found) {
$processed: append($processed, str-slice($string, 0, $position-found - 1));
}
$processed: append($processed, $newsubstr);
$string: str-slice($string, $position-found + $length-substr);
$position-found: 0;
#if ($all > 0) {
$position-found: str-index($string, $substr);
}
}
$processed: append($processed, $string);
$string: "";
#each $s in $processed {
$string: #{$string}#{$s};
}
#return $string;
}
#function kebabCase($string) {
$replace: " ", "-", "–", "—", "_", ",", ";", ":", ".", "+", "=", "?", "&", "*", "/", "|", ">", "<", "(", ")";
#each $char in $replace {
$string: replace($string, $char, "-", 1);
}
#return $string;
}
My only problem is, this does not work with "camelCased" or "PascalCased" strings. For example this:
.example {
#{kebabCase("justify content")}: 'center'; // Works
#{kebabCase("justifyContent")}: 'center'; // Does not work
}
produces:
.example {
justify-content: "center";
justifyContent: "center";
}
My question: Is there any way SCSS can detect capital letters, so I can break words at their position (if it is not the first letter is a string)? I am not looking for a JS-solution, SCSS only.

Related

Codeigniter - $sOrder and $sLimit in datatables

I'm retrieving records using Codeigniter and datatables. Removing $sOrder and $sLimit loads the data but on filtering, there's a database error:
"You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near ''desc' LIMIT '0', '10'' at line 5
SELECT SQL_CALC_FOUND_ROWS id, FName, LName, status, authorizedby, userName
FROM users
ORDER BY id 'desc'
LIMIT '0', '10'"
Here's the code:
if (isset($_REQUEST['iSortCol_0'])) {
$sOrder = "ORDER BY ";
for ($i = 0; $i < intval($_REQUEST['iSortingCols']); $i++) {
if ($_REQUEST['bSortable_' . intval($_REQUEST['iSortCol_' . $i])] == "true") {
$sOrder .= $aColumns[intval($_REQUEST['iSortCol_' . $i])] . "
" . $this->db->escape($_REQUEST['sSortDir_' . $i]) . ", ";
}
}
$sOrder = substr_replace($sOrder, "", -2);
if ($sOrder == "ORDER BY") {
$sOrder = "";
}
}
$sWhere = "";
// this for search code
if ($_REQUEST['sSearch'] != "") {
$sWhere = "WHERE (";
for ($i = 0; $i < count($aColumns); $i++) {
$sWhere .= $aColumns[$i] . " LIKE '%" . $this->db->escape($_REQUEST['sSearch']) . "%' OR ";
}
$sWhere = substr_replace($sWhere, "", -3);
$sWhere .= ')';
}
for ($i = 0; $i < count($aColumns); $i++) {
if ($_REQUEST['bSearchable_' . $i] == "true" && $_REQUEST['sSearch_' . $i] != '') {
if ($sWhere == "") {
$sWhere = "WHERE ";
} else {
$sWhere .= " AND ";
}
$sWhere .= $aColumns[$i] . " LIKE '%" . $this->db->escape($_REQUEST['sSearch_' . $i]) . "%' ";
}
}
// generate sql query
$sQuery = "SELECT SQL_CALC_FOUND_ROWS " . str_replace(" , ", " ", implode(", ", $aResultColumns)) . "
FROM $sTable
$sWhere
$sOrder
$sLimit
";
Now, removing the last two lines ($sOrder and $sLimit) works for loading the data but error on filtering. How is this fixable.
This is how I solved it:
$sWhere = "";
// this for search code
if ($_REQUEST['sSearch'] != "") {
$sWhere = "WHERE (";
for ($i = 0; $i < count($aColumns); $i++) {
$sWhere .= $aColumns[$i] . " LIKE '*%" . $this->db->escape($_REQUEST['sSearch']) . "%*' OR ";
}
$sWhere = substr_replace($sWhere, "", -3);
$sWhere .= ')';
}
for ($i = 0; $i < count($aColumns); $i++) {
if ($_REQUEST['bSearchable_' . $i] == "true" && $_REQUEST['sSearch_' . $i] != '') {
if ($sWhere == "") {
$sWhere = "WHERE ";
} else {
$sWhere .= " AND ";
}
$sWhere .= $aColumns[$i] . " LIKE '*%" . $this->db->escape($_REQUEST['sSearch_' . $i]) . "%*' ";
}
}
$sOrder = str_replace("'", "", $sOrder);
$sLimit = str_replace("'", "", $sLimit);
$sWhere = str_replace("'", "", $sWhere);
$sWhere = str_replace("*", "'", $sWhere);
// generate sql query
$sQuery = "SELECT SQL_CALC_FOUND_ROWS " . str_replace(" , ", " ", implode(", ", $aResultColumns)) . "
FROM $sTable
$sWhere
$sOrder
$sLimit
";```

Unique Case of "Call to a member function set_type() on null"

Hello guys i have gone through all 25 questions relating to the title of my question and i am made to believe my scenario is totally different.
Here is the error
Fatal error: Uncaught Error: Call to a member function set_type() on null in /path_to_file/login.php:55 Stack trace: #0 /path_to_call_page/login.php(24): login->authenticate() #1 {main} thrown in /path_to_file/login.php on line 55
Here is my code
<?php
class login
{
public $username;
public $password;
public $err;
public $table_name;
public $session_data;
public $md5 = true;
public $username_column = "username";
public $password_column = "password";
public $builder;
function _construct($username,$password)
{
$this->username = $username;
$this->password = $password;
$this->builder = new queryBuilder();
}
function set_table($tablename)
{
$this->table_name = $tablename;
}
/*
* Tells the login class where to find the username and password in database table
*/
function set_columns($username_col,$password_col)
{
$this->username_column = $username_col;
$this->password_column = $password_col;
}
function md5_on()
{
$this->md5 = true;
}
function md5_off()
{
$this->md5 = false;
}
function authenticate()
{
$db = new mySQLConnection();
$db->select();
// if md5 is turned on
if($this->md5)
$this->password = md5($this->password);
$this->builder->set_type("SELECT");
$this->builder->set_table_name($this->table_name);
$this->builder->set_where("WHERE $this->username_column = '$this->username' and $this->password_column = '$this->password'");
$query = $this->builder->build_query();
if($db->execute_query($query))
$data = $db->fetch($db->result);
else
{
die($db->error);
}
if($db->rows_affected() == 1)
{
$this->session_data = $data[0];
return true;
}
else
{
$this->err = "Invalid username or password. Please try again";
return false;
}
}
function get_error()
{
return $this->err;
}
}
?>
The error occurs everywhere i have
$this->builder
And i have defined it in the _construct method.
This is the queryBuilder class
class queryBuilder
{
var $data;
var $field;
var $tableName;
var $databaseName;
var $where;
var $order;
var $group;
var $limit;
var $queryString;
var $error;
private function put_quotes1($field)
{
$field = trim($field);
$field = "`".$field."`";
return $field;
}
private function put_quotes2($field)
{
$field = trim($field);
$field = "'".$field."'";
return $field;
}
function set_type($type)
{
$this->type = $type;
}
function set_data($data)
{
$this->data = $data;
}
function set_field($field = null)
{
$this->field = $field;
}
function set_where($where)
{
$this->where = $where;
}
function set_limit($limit)
{
$this->limit = $limit;
}
function set_order($order)
{
$this->order = $order;
}
function set_table_name($name)
{
$this->tableName = $name;
}
function prepare_data($data)
{
if(is_array($data))
{
foreach($data as $k => $v)
{
$this->field[] = $k; //setting the column names
$this->data[] = $v; // setting the values
}
}
}
function build_query()
{
switch($this->type)
{
case 'SHOW':
$database_name = $this->put_quotes1($this->databaseName);
$this->queryString = "SHOW ";
if(!isset($this->field) || is_null($this->field))
$this->queryString .= "DATABASES LIKE 'thirdeye%'; ";
else{
$noFields = count($this->field); //no of fields in table
for($i = 0; $i < $noFields; $i++)
{
if($i == ($noFields- 1)) // if on the last field
$this->queryString .= $this->put_quotes1($this->field[$i]).' ';
else
$this->queryString .= $this->put_quotes1($this->field[$i]).',';
}
}
break;
case 'INSERT':
$table_name = $this->put_quotes1($this->tableName);
$this->queryString = "INSERT INTO ".$table_name." (";
$noFields = count($this->field);
$noData = count($this->data);
if($noFields > 0 && $noData > 0)
{
for($i = 0; $i < $noFields; $i++)
{
if($i == ($noFields- 1))
$this->queryString .= $this->put_quotes1($this->field[$i]).')';
else
$this->queryString .= $this->put_quotes1($this->field[$i]).',';
}
$this->queryString.= " VALUES (";
for($i = 0; $i < $noData; $i++)
{
if($i == ($noData -1))
$this->queryString .= $this->put_quotes2($this->data[$i]).');';
else
$this->queryString .= $this->put_quotes2($this->data[$i]).',';
}
}
else
{
$this->error = "No column name or data was supplied";
}
break;
case 'SELECT':
$table_name = $this->put_quotes1($this->tableName);
$this->queryString = "SELECT ";
if(!isset($this->field) || is_null($this->field))
$this->queryString .= "* ";
else{
$noFields = count($this->field); //no of fields in table
for($i = 0; $i < $noFields; $i++)
{
if($i == ($noFields- 1)) // if on the last field
$this->queryString .= $this->put_quotes1($this->field[$i]).' ';
else
$this->queryString .= $this->put_quotes1($this->field[$i]).',';
}
}
$this->queryString .= "FROM ".$table_name;
if(isset($this->where))
$this->queryString .= " ".$this->where;
if(isset($this->order))
$this->queryString .= " ".$this->order;
if(isset($this->limit))
$this->queryString .= " ".$this->limit;
else
$this->queryString .= ";";
break;
case 'UPDATE':
$table_name = $this->put_quotes1($this->tableName);
$this->queryString = "UPDATE ". $table_name. " SET ";
$noFields = count($this->field); //no of fields in table
if(is_array($this->field) && is_array($this->data) && isset($this->where))
{
for($i = 0; $i < $noFields; $i++)
{
if($i == ($noFields -1))
$this->queryString .= $this->put_quotes1($this->field[$i])." = ". $this->put_quotes2($this->data[$i]).' ';
else
$this->queryString .= $this->put_quotes1($this->field[$i])." = ". $this->put_quotes2($this->data[$i]).',';
}
$this->queryString .= " ".$this->where.";";
}
else
{
$this->error = "Cannot build query. One of the following was not set";
}
break;
case 'DELETE':
$table_name = $this->put_quotes1($this->tableName);
$this->queryString = "DELETE FROM ".$table_name;
if(isset($this->where))
{
$this->queryString .= " ".$this->where.";";
}
else
{
$this->error = "Connot build. No condition was set";
}
break;
}
return $this->queryString;
}
}
Any pointers would help. Remember i have been through previous questions so a suggested edit or code answer would be great.

How i can pass the third parameter into the form validation callback function which is field name?

here is the validation code :
public function registration () {
//
//
// -- code
$this->form_validation->set_rules('password', 'Password', 'required|callback__check_length[6,10]');
}
function _check_length($input, $min, $max)
{
$length = strlen($input);
if ($length <= $max && $length >= $min)
{
return TRUE;
}
elseif ($length < $min)
{
$this->form_validation->set_message('_check_length', 'Minimum number of characters is ' . $min);
return FALSE;
}
elseif ($length > $max)
{
$this->form_validation->set_message('_check_length', 'Maximum number of characters is ' . $max);
return FALSE;
}
}
it is giving me error :
Message: Missing argument 3 for Person::_check_length(), called in C:\wamp64\www\abc\system\libraries\Form_validation.php on line 744 and defined
You can do so by exploding the 2nd param. CI doesn't seem to support 3 params in function args for form validation rules:
public function _check_length($input, $minmax) {
$minmax = explode(',', $minmax);
$min = $minmax[0];
$max = $minmax[1];
$length = strlen($input);
if ($length <= $max && $length >= $min) {
return TRUE;
} elseif ($length < $min) {
$this->form_validation->set_message('_check_length', 'Minimum number of characters is ' . $min);
return FALSE;
} elseif ($length > $max) {
$this->form_validation->set_message('_check_length', 'Maximum number of characters is ' . $max);
return FALSE;
}
}
You also don't have to have your own function to do this. You can simply use min_length[x] and max_length[x] rules.
https://www.codeigniter.com/userguide3/libraries/form_validation.html#rule-reference

How do I use beautify in Ace Editor?

I've found the beautify extension in Ace editor but I don't see any examples of how to use it. Here's what I have so far:
var beautiful = ace.require("ace/ext/beautify");
beautiful.beautify();
but I get the error:
Result of expression 'e' [undefined] is not an object.
It looks like this works:
var beautify = ace.require("ace/ext/beautify"); // get reference to extension
var editor = ace.edit("editor"); // get reference to editor
beautify.beautify(editor.session);
It requires that you pass in the Ace Editor session as the first parameter. In my original question, I did not pass in any variables and that was throwing an error.
Note: It did not work well which was mentioned on the extensions release notes. It was not working well enough to use.
I didn't get it working
var beautify = ace.require("ace/ext/beautify"); // get reference to extension
Beautify was always undefined.
After a while I gave up.
And used the external Beautify library (Link)
function beatify() {
var val = editor.session.getValue();
//Remove leading spaces
var array = val.split(/\n/);
array[0] = array[0].trim();
val = array.join("\n");
//Actual beautify (prettify)
val = js_beautify(val);
//Change current text to formatted text
editor.session.setValue(val);
}
Had the same problem. Ended up building a simplified prettify method that fit my needs (which are not to have everything on the same line).
note I was using the react version of Ace Editor but same applies to JS. It does not support comments as my generated code does not contain them and you may need to expand the method if you wish to support them.
const html = prettifyHtml('<div id="root"><div class="container"><div class="row"><div class="col-lg-6">hello there<p>What <strong>is</strong> this? <br /> yes</p></div><div class="col-lg-6"></div></div></div></div>');
const scss = prettifyScss('.container { strong {color:green; background-color:white; border:1px solid green; &:hover {cursor:pointer} } }');
<AceEditor
mode="html" // or "scss"
theme="github"
defaultValue={html} // or scss
onChange={this.onChange.bind(this)}
/>
html:
export const prettifyHtml = (html) => {
let indent = 0,
mode = 'IDLE',
inTag = false,
tag = '',
tagToCome = '',
shouldBreakBefore = false,
shouldBreakAfter = false,
breakBefore = ['p', 'ul', 'li'],
breakAfter = ['div', 'h1', 'h2', 'h3', 'h4', 'p', 'ul', 'li'];
return html
.split('')
.reduce((output, char, index) => {
if (char === '<') {
tagToCome = whichTag(html, index);
shouldBreakBefore = tagToCome && breakBefore.indexOf(tagToCome) >= 0;
mode = 'TAG';
inTag = true;
output += (shouldBreakBefore ? br(indent) : '') + '<';
} else if (char === '/' && mode == 'TAG') {
mode = 'CLOSING_TAG'
inTag = true;
output += '/';
} else if (char === ' ') {
inTag = false;
output += ' ';
} else if (char === '>') {
if (mode === 'TAG' || mode === 'CLOSING_TAG') {
indent += mode === 'TAG' ? +1 : -1;
shouldBreakAfter = breakAfter.indexOf(tag) >= 0;
inTag = false;
tag = '';
}
output += '>';
output += shouldBreakAfter ? br(indent) : '';
} else {
output += char;
if (inTag) {
tag += char;
}
}
return output;
}, '');
}
sass:
export const prettifyScss = (scss) => {
let indent = 0,
closeBefore = 0;
return scss
.split('')
.reduce((output, char) => {
closeBefore++;
if (char === '{') {
indent++;
output += '{' + br(indent);
} else if (char === '}') {
indent--;
output += br(indent) + '}' + (closeBefore > 3 ? '\n' : '') + _tabs(indent);
closeBefore = 0;
} else if (char === '.') {
output += br(indent) + '.';
} else if (char === ';') {
output += ';' + br(indent);
} else {
output += char;
}
return output;
}, '');
}
helper methods:
const _tabs = (number) => {
let output = '';
for (let cnt = 0; cnt < number; cnt++) {
output += '\t';
}
return output;
}
const br = (indent) => {
return '\n' + _tabs(indent);
}
export const whichTag = (html, index) => {
let inTag = true,
tag = '';
const arr = html.split('');
for (let i = index + 1; i < index + 10; i++) {
const char = arr[i];
if (char >= 'a' && char <= 'z' && inTag) {
tag += char;
} else if (char !== '/') {
inTag = false;
}
}
return tag;
}
Faced the same issue but fixed it by adding two script files.
<script src="https://cdnjs.cloudflare.com/ajax/libs/ace/1.2.6/ace.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/ace/1.2.6/ext-beautify.js"></script>
You may need to execute the beautify.beautify after window is loaded when open the page so that editor.session is initialized.
window.addEventListener('load', () => {
beautify.beautify(editor.session)
})
Ace editor use beautify only for php, - it is written in ace docs.
For me, the best solution was https://github.com/beautify-web/js-beautify
There are a lot of settings, Js/CSS/HTML beautifying, work with npm, with python, by import, by required etc.
import beautify from 'js-beautify';
// your code
beautifyHTML() {
this.html = beautify.html(this.html, {
indent_size: '2',
indent_char: ' ',
max_preserve_newlines: '5',
preserve_newlines: true,
keep_array_indentation: false,
break_chained_methods: false,
indent_scripts: 'normal',
brace_style: 'expand',
space_before_conditional: true,
unescape_strings: false,
jslint_happy: false,
end_with_newline: false,
wrap_line_length: '80',
indent_inner_html: true,
comma_first: false,
e4x: false
});
}
see more docs and settings here
In beautify file just point beautify to windows(global object) after that you can call beautify from the global object.
ext-beautify.js on row 330 add
window.beautify = exports;
Then you can use it.
vm.session = vm.editor.getSession();
beautify.beautify(vm.session);

Retrieve online data and generate and xml output of it

The project requires to grep online data and generate an xml file of it. This is how the output should be:
<!DOCTYPE MetaIssue SYSTEM "http://schema.highwire.org/public/toc/MetaIssue.pubids.dtd">
<MetaIssue volume="306" issue="1">
<Provider>Cadmus</Provider>
<IssueDate>January 1, 2014</IssueDate>
<PageRange>C1-C76</PageRange>
<TOC>
<TocSection>
<Heading>Editorial Focus</Heading>
<DOI>10.1152/ajpcell.00342.2013</DOI>
</TocSection>
<TocSection>
<Heading>Review</Heading>
<DOI>10.1152/ajpcell.00281.2013</DOI>
</TocSection>
<TocSection>
<Heading>CALL FOR PAPERS | Stem Cell Physiology and Pathophysiology</Heading>
<DOI>10.1152/ajpcell.00156.2013</DOI>
<DOI>10.1152/ajpcell.00066.2013</DOI>
</TocSection>
<TocSection>
<Heading>Articles</Heading>
<DOI>10.1152/ajpcell.00130.2013</DOI>
<DOI>10.1152/ajpcell.00047.2013</DOI>
<DOI>10.1152/ajpcell.00070.2013</DOI>
<DOI>10.1152/ajpcell.00096.2013</DOI>
</TocSection>
<TocSection>
<Heading>Corrigendum</Heading>
<DOI>10.1152/ajpcell.zh0-7419-corr.2014</DOI>
</TocSection>
</TOC>
</MetaIssue>
The output which I am getting is:
<!DOCTYPE MetaIssue SYSTEM "http://schema.highwire.org/public/toc/MetaIssue.pubids.dtd">
<MetaIssue volume="306" issue="1">
<Provider>Cadmus</Provider>
<IssueDate>January 1, 2014 </IssueDate>
<PageRange>C1-</PageRange>
<TOC>
<TocSection>
<Heading>Review</Heading>
<DOI>10.1152/ajpcell.00281.2013</DOI>
</TocSection>
<TocSection>
<Heading>CALL FOR PAPERS | Stem Cell Physiology and Pathophysiology</Heading>
<DOI>10.1152/ajpcell.00156.2013</DOI>
</TocSection>
<TocSection>
<Heading>Articles</Heading>
<DOI>10.1152/ajpcell.00130.2013</DOI>
</TocSection>
<TocSection>
<Heading>Corrigendum</Heading>
<DOI>10.1152/ajpcell.zh0-7419-corr.2014</DOI>
</TocSection>
</TOC>
</MetaIssue>
The code I tried is:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my $path1 = $ARGV[0];
open(F6, ">meta_issue.xml");
print "Enter the URL:";
my $url = <STDIN>;
chomp $url;
print "Enter the Volume Number:";
my $vol = <STDIN>;
chomp $vol;
print "Enter the Issue Number:";
my $iss = <STDIN>;
chomp $iss;
my $website_content = get($url);
print F6 "\<\!DOCTYPE MetaIssue SYSTEM \"http://schema.highwire.org/public/toc/MetaIssue.pubids.dtd\">\n";
print F6 "<MetaIssue volume=\"$vol\" issue=\"$iss\">\n";
print F6 "<Provider>Cadmus</Provider>\n";
if ($website_content =~ m#<span class="highwire-cite-metadata-date">(.*?)</span>#s) {
#<span class="highwire-cite-metadata-date">January 1, 2014 </span>
print F6 "<IssueDate>$1</IssueDate>\n"; #<IssueDate>January 1, 2014</IssueDate>
}
if ($website_content =~ m#(<span class="label">:</span>\s?(.*?)(-(.*?))?</span>)#gs) {
#.*?(?!<span class="label">:</span>\s?(.*?)(-(.*?))?</span>)$#gs) #<PageRange>C1-C76</PageRange>
my $first = $2;
print F6 "<PageRange>$2-</PageRange>\n";
}
print F6 "<TOC>\n";
while ($website_content =~ m#<h2 id=".*?" class=".*?">(.*?)</h2>#gs) {
my $h = $1;
print F6 "<TocSection>\n";
print F6 "<Heading>$h</Heading>\n";
if ( $website_content =~ m#(.*?<p><span class="label">DOI:</span>\s?(.*?)\n?</p>\s?</span>\s?\n?</div>.*?)#gs ) {
my $doi = $1;
my $doi1 = $2;
print F6 "<DOI>$doi1</DOI>\n";
print F6 "</TocSection>\n";
}
}
print F6 "</TOC>\n</MetaIssue>\n";
Note: Each <Heading> might have one or more <DOI> values, which I am not able to retrieve
I cannot place the particular <DOI> values under that <Heading>.
I cannot retrieve the last occurrence of the digit from
<span class="label">:</span>\s?(.*?)(-(.*?))?</span>
since there are variation such as </span> c14</span> or <span> c12-c14</span>. So from here I need to grep the last digit i.e c14
I execute the code in cmd as follows;
D:\Code>Perl File name (Enter)
Enter the URl: http://ajpcell.physiology.org/content/306/1
Enter the Volume Number: 306
Enter the Issue Number: 1
UPDATE:
In the URL's :
1) http://ajpendo.physiology.org/content/283/5
2) http://ajpendo.physiology.org/content/280/1
The DOI is not there, so in that case, the output in place of
<DOI>$_</DOI> tag
should be
<ResId type=”publisher-id”>$volume/$issue/$first_page</ResId>
where $first_page is specific to that particular section.
I added "else{} loop" in "sub retrieve_doi()" and also in the "for{} loop" below, but not getting the desired output .
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use HTML::Parser;
use WWW::Mechanize;
my ($date, $first_page, $last_page, #toc);
sub get_date {
my ($self, $tag, $attr) = #_;
if ('span' eq $tag
and $attr->{class}
and 'highwire-cite-metadata-date' eq $attr->{class}
and not defined $date
) {
$self->handler(text => \&next_text_to_date, 'self, text');
} elsif ('span' eq $tag
and $attr->{class}
and 'highwire-cite-metadata-pages' eq $attr->{class}
) {
if (not defined $first_page) {
$self->handler(text => \&parse_first_page, 'self, text');
} else {
$self->handler(text => \&parse_last_page, 'self, text');
}
} elsif ('span' eq $tag
and $attr->{class}
and 'highwire-cite-metadata-doi' eq $attr->{class}
) {
$self->handler(text => \&retrieve_doi, 'self, text');
} elsif ('div' eq $tag
and $attr->{class}
and $attr->{class} =~ /\bissue-toc-section\b/
) {
$self->handler(text => \&next_text_to_toc, 'self, text');
}
}
sub next_text_to_date {
my ($self, $text) = #_;
$text =~ s/^\s+|\s+$//g;
$date = $text;
$self->handler(text => undef);
}
sub parse_first_page {
my ($self, $text) = #_;
if ($text =~ /([A-Z0-9]+)(?:-[0-9A-Z]+)?/) {
$first_page = $1;
$self->handler(text => undef);
}
}
sub parse_last_page {
my ($self, $text) = #_;
if ($text =~ /(?:[A-Z0-9]+-)?([0-9A-Z]+)/) {
$last_page = $1;
$self->handler(text => undef);
}
}
sub next_text_to_toc {
my ($self, $text) = #_;
push #toc, [$text];
$self->handler(text => undef);
}
sub retrieve_doi {
my ($self, $text) = #_;
if ('DOI:' ne $text)
{
$text =~ s/^\s+|\s+$//g;
push #{ $toc[-1] }, $text;
$self->handler(text => undef);
}
else #UPDATE
{
$text =~ s/^\s+|\s+$//g;
push #{ $toc[-1] }, $text;
$self->handler(text => undef);
}
}
print STDERR 'Enter the URL: ';
chomp(my $url = <>);
my ($volume, $issue) = (split m(/), $url)[-2, -1];
my $p = 'HTML::Parser'->new( api_version => 3,
start_h => [ \&get_date, 'self, tagname, attr' ],
);
my $mech = 'WWW::Mechanize'->new(agent => 'Mozilla');
$mech->get($url);
my $contents = $mech->content;
$p->parse($contents);
$p->eof;
my $toc;
for my $section (#toc) {
$toc .= "<TocSection>\n";
$toc .= "<Heading>".shift(#$section)."</Heading>\n";
$toc .= join q(), map "<DOI>$_</DOI>\n", #$section;
$toc .= join q(), map "<ResId type=”publisher-id”>$volume/$issue/$first_page</ResId>\n", #$section; #UPDATE
$toc .= "</TocSection>\n";
}
open (F6, ">meta_issue_$issue.xml");
print F6 <<"__HTML__";
<!DOCTYPE MetaIssue SYSTEM "http://schema.highwire.org/public/toc/MetaIssue.pubids.dtd">
<MetaIssue volume="$volume" issue="$issue">
<Provider>Cadmus</Provider>
<IssueDate>$date</IssueDate>
<PageRange>$first_page-$last_page</PageRange>
<TOC>
$toc</TOC>
</MetaIssue>
__HTML__
Please let me know how to update the code to get the desired output.
Use a proper module to parse the HTML:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
use HTML::Parser;
use WWW::Mechanize;
my ($date, $first_page, $last_page, #toc);
sub get_info {
my ($self, $tag, $attr) = #_;
if ('span' eq $tag
and $attr->{class}
and 'highwire-cite-metadata-date' eq $attr->{class}
and not defined $date
) {
$self->handler(text => \&next_text_to_date, 'self, text');
} elsif ('span' eq $tag
and $attr->{class}
and 'highwire-cite-metadata-pages' eq $attr->{class}
) {
if (not defined $first_page) {
$self->handler(text => \&parse_first_page, 'self, text');
} else {
$self->handler(text => \&parse_last_page, 'self, text');
}
} elsif ('span' eq $tag
and $attr->{class}
and 'highwire-cite-metadata-doi' eq $attr->{class}
) {
$self->handler(text => \&retrieve_doi, 'self, text');
} elsif ('div' eq $tag
and $attr->{class}
and $attr->{class} =~ /\bissue-toc-section\b/
) {
$self->handler(text => \&next_text_to_toc, 'self, text');
}
}
sub next_text_to_date {
my ($self, $text) = #_;
$text =~ s/^\s+|\s+$//g;
$date = $text;
$self->handler(text => undef);
}
sub parse_first_page {
my ($self, $text) = #_;
if ($text =~ /([A-Z0-9]+)(?:-[0-9A-Z]+)?/) {
$first_page = $1;
$self->handler(text => undef);
}
}
sub parse_last_page {
my ($self, $text) = #_;
if ($text =~ /(?:[A-Z0-9]+-)?([0-9A-Z]+)/) {
$last_page = $1;
$self->handler(text => undef);
}
}
sub next_text_to_toc {
my ($self, $text) = #_;
push #toc, [$text];
$self->handler(text => undef);
}
sub retrieve_doi {
my ($self, $text) = #_;
if ('DOI:' ne $text) {
$text =~ s/^\s+|\s+$//g;
push #{ $toc[-1] }, $text;
$self->handler(text => undef);
}
}
print STDERR 'Enter the URL: ';
chomp(my $url = <>);
my ($volume, $issue) = (split m(/), $url)[-2, -1];
my $p = 'HTML::Parser'->new( api_version => 3,
start_h => [ \&get_info, 'self, tagname, attr' ],
);
my $mech = 'WWW::Mechanize'->new(agent => 'Mozilla');
$mech->get($url);
my $contents = $mech->content;
$p->parse($contents);
$p->eof;
my $toc;
for my $section (#toc) {
$toc .= " <TocSection>\n";
$toc .= " <Heading>" . shift(#$section) . "</Heading>\n";
$toc .= join q(), map " <DOI>$_</DOI>\n", #$section;
$toc .= " </TocSection>\n";
}
print << "__HTML__";
<!DOCTYPE MetaIssue SYSTEM "http://schema.highwire.org/public/toc/MetaIssue.pubids.dtd">
<MetaIssue volume="$volume" issue="$issue">
<Provider>Cadmus</Provider>
<IssueDate>$date</IssueDate>
<PageRange>$first_page-$last_page</PageRange>
<TOC>
$toc </TOC>
</MetaIssue>
__HTML__
Basic explanation:
HTML::Parser is callback-based, which means you give it subroutines to run when it encounters a given event in the parsed document. I use a general callback get_info, which searches for various indicators of needed information in the HTML. As we are often interested in something like "nearest text after the given span", it just registers the new callback for the text. For example, when the span with the class highwire-cite-metadata-date is found and date is not yet defined, it registers a new text handler, which would run next_text_to_date. The handler just assigns the text to the $date variable and removes the handler. I'm not sure that's the "correct" way to do it, but in this case at least, it works.
I used WWW::Mechanize in order to be able to specify the User Agent. With the default value of the much simpler LWP::Simple, I wasn't getting the whole HTML.
The output smells of a template. Switching to Template might be a good step forward.

Resources