I'm writing a PHP script to automate a mysqldump command. The method I've selected is using exec( $exec_string . ' 2>&1' ). This script must work on Windows and *nix platforms.
Sadly, some passwords contain the dreaded $ symbol, and so the `-p'passwordcontaining$' must be quoted.
Here are the challenges I've noted so far:
On *nix, you must use the single quote, otherwise it will expand the
$ as a variable.
On Windows, you must use the double quote, because single quotes are treated literally
Escaping the $ is not an option, because on Windows the backslash is only interpreted as the escape character when preceding the double quote (\"), so \$ would be interpreted literally
I don't know how to reliably detect OS to be able to "interactively" switch between single and double quotes.
Is there a trick I am missing that will work cross-platform?
You can dump from within a pdo PHP script :
<?php
$dumpSettings = array(
'include-tables' => array('table1', 'table2'),
'exclude-tables' => array('table3', 'table4'),
'compress' => CompressMethod::GZIP, /* CompressMethod::[GZIP, BZIP2, NONE] */
'no-data' => false, /* http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html#option_mysqldump_no-data */
'add-drop-table' => false, /* http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html#option_mysqldump_add-drop-table */
'single-transaction' => true, /* http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html#option_mysqldump_single-transaction */
'lock-tables' => false, /* http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html#option_mysqldump_lock-tables */
'add-locks' => true, /* http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html#option_mysqldump_add-locks */
'extended-insert' => true /* http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html#option_mysqldump_extended-insert */
);
$dump = new MySQLDump('database','database_user','database_pass','localhost', $dumpSettings);
$dump->start('forum_dump.sql.gz');
See https://github.com/clouddueling/mysqldump-php
Not crazy about the approach I ended up adopting (MySQLDump class seems to be a more robust approach than exec() ), but here it is in case it helps anyone.
Even though I didn't want to do OS detection, at first blush it doesn't seem that I can resolve the issues in quote interpretation. Happily, OS detection is relatively easy, if you're prepared to make some assumptions, such as "if it ain't running on Windows, then it's on some kind of *nix server".
Here's my basic approach, which I'm sure will get flamed.
// Inside class
protected $os_quote_char = "'"; // Unix default quote character
// Inside constructor
// Detect OS and set quote character appropriately
if (strtoupper(substr(PHP_OS, 0, 3)) === 'WIN')
$this->os_quote_char = '"';
// Method for adding quotes
/**
* Adds the OS non-interpreted quote character to the string(s), provided each string doesn't already start with said quote character.
*
* The quote character is set in the constructor, based on detecting Windows OSes vs. any other (assumed to be *nix).
* You can pass in an array, in which case you will get an array of quoted strings in return.
*
* #param string|array $string_or_array
* #return string|array
*/
protected function os_quote( $string_or_array ){
$quoted_strings = array();
$string_array = (array) $string_or_array;
foreach ( $string_array as $string_to_quote ){
// don't quote already quoted strings
if ( substr( $string_to_quote, 0, 1 ) == $this->os_quote_char )
$quoted_strings[] = $string_to_quote;
else
$quoted_strings[] = $this->os_quote_char . $string_to_quote . $this->os_quote_char;
}
if ( is_array( $string_or_array ) )
return $quoted_strings;
else
return $quoted_strings[0];
}
// Actual usage example:
if ( function_exists( 'exec' ) ){
list( $user, $pwd, $host, $db, $file ) = $this->os_quote( array( DB_USER, DB_PASSWORD, DB_HOST, DB_NAME, $backup_file.'.sql' ) );
$exec = $this->get_executable_path() . 'mysqldump -u' . $user . ' -p' . $pwd . ' -h' . $host . ' --result-file=' . $file . ' ' . $db . ' 2>&1';
exec ( $exec, $output, $return );
}
It's hard to guess, what limitations you are talking about, but I suspect some of them are false
passthru('mysqldump -uddd -p"pass\'word$" ddd');
Worked for me under Windows and FreeBSD, as well as the command itself in the respective shells
Related
The perl DBI documentation says this :
Perl supports two kinds of strings: Unicode (utf8 internally) and non-Unicode (defaults to iso-8859-1 if forced to assume an encoding). Drivers should accept both kinds of strings and, if required, convert them to the character set of the database being used. Similarly, when fetching from the database character data that isn't iso-8859-1 the driver should convert it into utf8.
DBD::Sqlite with parameter (sqlite_unicode => 1), or DBD::Pg with parameter (pg_enable_utf8 => -1) -- which is the default -- indeed do such conversions.
With DBD::Oracle (v1.83, NLS_LANG='FRENCH_FRANCE.UTF8') it is not so : if non-Unicode strings are passed to INSERT or UPDATE statements, the driver does not upgrade them automatically to utf8.
Here is my test suite. Variants for SQLite and Pg succeed, but this Oracle variant fails :
use utf8;
use strict;
use warnings;
use Test::More;
use SQL::Abstract::More;
use Scalar::Util qw/looks_like_number/;
use DBI;
my #DBI_CONNECT_ARGS = #ARGV;
my ($table, $key_col, $val_col) = qw/TST_UTF8 KEY VAL/; # assuming this table is already created
binmode $_, ':utf8' for *STDERR, *STDOUT;
# strings for tests
my %str;
$str{utf8} = "il était une bergère"; # has flag utf8 because of 'use utf8'
$str{native} = $str{utf8}; utf8::downgrade($str{native}); # without flag utf8
$str{wide_chars} = "il était une bergère♥♡"; # chars > 256 - cannot be a native string (\x{2665}\x{2661})
$str{named_chars} = "il \N{LATIN SMALL LETTER E WITH ACUTE}tait une " # identical to string 'wide_chars'
. "berg\N{LATIN SMALL LETTER E WITH GRAVE}re"
. "\N{BLACK HEART SUIT}\N{WHITE HEART SUIT}";
# check that test strings meet expectations
ok utf8::is_utf8($str{utf8}), "perl string with utf8 flag";
ok !utf8::is_utf8($str{native}), "perl string without utf8 flag, (native chars ... latin1)";
is $str{utf8}, $str{native}, "strings 'utf8' and 'native' have different encodings but represent the same chars";
ok utf8::is_utf8($str{wide_chars}), "string with wide chars must have utf8 flag";
ok utf8::is_utf8($str{named_chars}), "string with named wide chars must have utf8 flag";
is $str{wide_chars}, $str{named_chars}, "named chars are identical to chars from perl source";
my $dbh = DBI->connect(#DBI_CONNECT_ARGS);
my $sqlam = SQL::Abstract::More->new;
my ($sql, #bind);
# suppress records from previous run
my #k = keys %str;
($sql, #bind) = $sqlam->delete(-from => $table, -where => {$key_col => {-in => \#k}});
my $del = $dbh->do($sql, {}, #bind);
note "DELETED $del records";
# insert strings via bind values
while (my ($key, $val) = each %str) {
($sql, #bind) = $sqlam->insert(-into => $table, -values => {$key_col => $key, $val_col => $val});
my $ins = $dbh->do($sql, {}, #bind);
note "INSERT via bind $key: $ins";
}
# read data back
($sql, #bind) = $sqlam->select(-from => $table,
-columns => [$key_col, $val_col],
-where => {$key_col => {-in => \#k}});
my $rows = $dbh->selectall_arrayref($sql, {}, #bind);
my %str_from_db = map {#$_} #$rows;
# check round trip
is_deeply \%str_from_db, \%str, 'round trip with bind values';
# suppress again
($sql, #bind) = $sqlam->delete(-from => $table, -where => {$key_col => {-in => \#k}});
$del = $dbh->do($sql, {}, #bind);
note "DELETED $del records";
# insert strings via raw sql
while (my ($key, $val) = each %str) {
my $ins = $dbh->do("INSERT INTO $table($key_col, $val_col) VALUES ('$key', '$val')");
note "INSERT via raw SQL $key: $ins";
}
# check round trip
is_deeply \%str_from_db, \%str, 'round trip with raw SQL';
As a workaround, I added some callbacks for automatic upgrading of native strings; with this addition the tests pass :
$dbh->{Callbacks}{prepare} = sub {
# warn "PREPARE : upgrading stmt: $_[1]\n";
utf8::upgrade($_[1]);
return;
};
$dbh->{Callbacks}{ChildCallbacks}{execute} = sub {
# warn "EXECUTE: ";
foreach my $i (1 .. $#_) {
if ($_[$i] && ! ref $_[$i] && ! looks_like_number(($_[$i]))) {
# warn "upgrading $i : $_[$i];";
utf8::upgrade($_[$i]);
}
}
print STDERR "\n";
return;
};
If I understand properly the DBI spec, this automatic upgrade should be performed by the DBD::Oracle driver, not by the application code. Or am i missing something ?
Message: preg_match(): Unknown modifier 'p'
Filename: core/Router.php
Line Number: 399
Backtrace:
File: /home/spdcin/public_html/demo/no-waste/index.php
Line: 292
Function: require_once
iam getting this error on line 2
$key = str_replace(array(':any', ':num'), array('[^/]+', '[0-9]+'), $key);
// Does the RegEx match?
//line no 2
if (preg_match('#^'.$key.'$#', $uri, $matches))
{
// Are we using callbacks to process back-references?
if ( ! is_string($val) && is_callable($val))
{
// Remove the original string from the matches array.
array_shift($matches);
// Execute the callback using the values in matches as its parameters.
$val = call_user_func_array($val, $matches);
}
// Are we using the default routing method for back-references?
elseif (strpos($val, '$') !== FALSE && strpos($key, '(') !== FALSE)
{
$val = preg_replace('#^'.$key.'$#', $val, $uri);
}
$this->_set_request(explode('/', $val));
return;
}
}
There is a problem with your regex and PHP thinks you try to apply a 'p' modifier, which is not valid.
You will probably get to know what is wrong with your regex if you do :
echo '#^'.$key.'$#';
The fact that you try to program a router indicates that $key most probably contains '#p' (common in URLs).
Solution : In your case you can escape the character '#' with backslashes. Quoted from the php documentation :
"If the delimiter needs to be matched inside the pattern it must be escaped using a backslash."
If I understand your problem correctly, surround $key with preg_quote() like this:
if (preg_match('#^'.preg_quote($key).'$#', $uri, $matches))
This function will automatically escape ALL regex commands in $key.
I’d like to understand why with the include parameter it doesn’t search into the file identification.php that were targeted.
With the include parameter :
admin#server:/filer/www/website/httpdocs$ egrep -Rns --include=*.php "deleteTemp" *
identification.php-sed:61: $deleteTemp = " DELETE FROM ".$table_name."
identification.php-sed:64: $execTemp = #mysql_query ( $deleteTemp );
And without :
admin#server:/filer/www/website/httpdocs$ egrep -Rns "deleteTemp" *
identification.php:61: $deleteTemp = " DELETE FROM ".$table_name."
identification.php:64: $execTemp = #mysql_query ( $deleteTemp );
identification.php-sed:61: $deleteTemp = " DELETE FROM ".$table_name."
identification.php-sed:64: $execTemp = #mysql_query ( $deleteTemp );
I also tried with quotes for the pattern of the include and the result is the same.
I think it is because your pattern is being interpreted by the shell, try:
egrep -Rns --include=\*.php "deleteTemp" *
By the way, do you know ag?
Edit - Answer posted below
I have a script that usually uses #ARGV arguments but in some cases it is invoked by another script (which I cannot modify) that instead only passes a config filename which among other things has the command line options that should have been passed directly.
Example:
Args=--test --pdf "C:\testing\my pdf files\test.pdf"
If possible I'd like a way to parse this string into an array that would be identical to #ARGV.
I have a workaround where I setup an external perl script that just echos #ARGV, and I invoke this script like below (standard boilerplate removed).
echo-args.pl
print join ("\n", #ARGV);
test-echo-args.pl
$my_args = '--test --pdf "C:\testing\my pdf files\test.pdf"';
#args = map { chomp ; $_ } `perl echo-args.pl $my_args`;
This seems inelegant but it works. Is there a better way without invoking a new process? I did try splitting and processing but there are some oddities on the command line e.g. -a"b c" becomes '-ab c' and -a"b"" becomes -ab" and I'd rather not worry about edge cases but I know that'll bite me one day if I don't.
Answer - thanks ikegami!
I've posted a working program below that uses Win32::API and CommandLineToArgvW from shell32.dll based on ikegami's advice. It is intentionally verbose in the hopes that it'll be more easy to follow for anyone like myself who is extremely rusty with C and pointer arithmetic.
Any tips are welcome, apart from the obvious simplifications :)
use strict;
use warnings;
use Encode qw( encode decode );
use Win32::API qw( );
use Data::Dumper;
# create a test argument string, with some variations, and pack it
# apparently an empty string returns $^X which is documented so check before calling
my $arg_string = '--test 33 -3-t" "es 33\t2 ';
my $packed_arg_string = encode('UTF-16le', $arg_string."\0");
# create a packed integer buffer for output
my $packed_argc_buf_ptr = pack('L', 0);
# create then call the function and get the result
my $func = Win32::API->new('shell32.dll', 'CommandLineToArgvW', 'PP', 'N')
or die $^E;
my $ret = $func->Call($packed_arg_string, $packed_argc_buf_ptr);
# unpack to get the number of parsed arguments
my $argc = unpack('L', $packed_argc_buf_ptr);
print "We parsed $argc arguments\n";
# parse the return value to get the actual strings
my #argv = decode_LPWSTR_array($ret, $argc);
print Dumper \#argv;
# try not to leak memory
my $local_free = Win32::API->new('kernel32.dll', 'LocalFree', 'N', '')
or die $^E;
$local_free->Call($ret);
exit;
sub decode_LPWSTR_array {
my ($ptr, $num) = #_;
return undef if !$ptr;
# $ptr is the memory location of the array of strings (i.e. more pointers)
# $num is how many we need to get
my #strings = ();
for (1 .. $num) {
# convert $ptr to a long, using that location read 4 bytes - this is the pointer to the next string
my $string_location = unpack('P4', pack('L', $ptr));
# make it human readable
my $readable_string_location = unpack('L', $string_location);
# decode the string and save it for later
push(#strings, decode_LPCWSTR($readable_string_location));
# our pointers are 32-bit
$ptr += 4;
}
return #strings;
}
# Copied from http://stackoverflow.com/questions/5529928/perl-win32api-and-pointers
sub decode_LPCWSTR {
my ($ptr) = #_;
return undef if !$ptr;
my $sW = '';
for (;;) {
my $chW = unpack('P2', pack('L', $ptr));
last if $chW eq "\0\0";
$sW .= $chW;
$ptr += 2;
}
return decode('UTF-16le', $sW);
}
In unix systems, it's the shell that parses that shell command into strings. But in Windows, it's up to each application. I think this is normally done using the CommandLineToArgv system call (which you could call with the help of Win32::API), but the spec is documented here if you want to reimplement it yourself.
As an example:
I load in the input from a .txt:
Benjamin,Schuvlein,Germany,1912,M,White
I do some code that I will not post here for brevity and get to the link:
https://familysearch.org/pal:/MM9.1.1/K3BN-LLJ
I want to scrape multiple things from that page. In the code below, I only do 1.
I'd also like to make each item be separated by a , in the output .txt.
And, I'd like the output to be preceded by the input.
I'm using the following packages in the code:
use strict;
use warnings;
use WWW::Mechanize::Firefox;
use Data::Dumper;
use LWP::UserAgent;
use JSON;
use CGI qw/escape/;
use HTML::DOM;
Here's the relevant code:
my $ua = LWP::UserAgent->new;
open(my $o, '>', 'out2.txt') or die "Can't open output file: $!";
# Here is the url, although in practice, it is scraped itself using different code
my $url = 'https://familysearch.org/pal:/MM9.1.1/K3BN-LLJ';
print "My URL is <$url>\n";
my $request = HTTP::Request->new(GET => $url);
$request->push_header('Content-Type' => 'application/json');
my $response = $ua->request($request);
die "Error ".$response->code if !$response->is_success;
my $dom_tree = new HTML::DOM;
$dom_tree->write($response->content);
$dom_tree->close;
my $str = $dom_tree->getElementsByTagName('table')->[0]->getElementsByTagName("td")->[10]->as_text();
print $str;
print $o $str;
Desired Output (from that link) is something like:
Benjamin,Schuvlein,Germany,1912,M,White,Queens,New York,Married,Same Place,Head, etc ....
(How much of that output section is scrapable?)
Any help on how to get the link within the link would be much appreciated!
This is fairly simply done using HTML::TreeBuilder::XPath to access the HTML. This program builds a hash of the data using the labels as keys, so any of the desired information can be extracted. I have enclosed in quotes any fields that contain commas or whitespace.
I don't know whether you have the permission of this web site to extract data this way, but I should draw your attention to this X-Copyright header in the HTTP responses. This approach clearly falls under the header of programmatic access.
X-Copyright: COPYRIGHT WARNING Data accessible through the FamilySearch API is protected by copyright. Any programmatic access, reformatting, or rerouting of this data, without permission, is prohibited. FamilySearch considers such unauthorized use a violation of its reproduction, derivation, and distribution rights. Contact devnet (at) familysearch.org for further information.
Am I to expect an email from you? I replied to your first mail but haven't heard since.
use strict;
use warnings;
use URI;
use LWP;
use HTML::TreeBuilder::XPath;
my $url = URI->new('https://familysearch.org/pal:/MM9.1.1/K3BN-LLJ');
my $ua = LWP::UserAgent->new;
my $resp = $ua->get($url);
die $resp->status_line unless $resp->is_success;
my $tree = HTML::TreeBuilder::XPath->new_from_content($resp->decoded_content);
my #results = $tree->findnodes('//table[#class="result-data"]//tr[#class="result-item"]');
my %data;
for my $item (#results) {
my ($key, $val) = map $_->as_trimmed_text, $item->content_list;
$key =~ s/:$//;
$data{$key} = $val;
}
my $record = join ',', map { local $_ = $data{$_}; /[,\s]/ ? qq<"$_"> : $_ }
'name', 'birthplace', 'estimated birth year', 'gender', 'race (standardized)',
'event place', 'marital status', 'residence in 1935',
'relationship to head of household (standardized)';
print $record, "\n";
output
"Benjamin Schuvlein",Germany,1912,Male,White,"Assembly District 2, Queens, New York City, Queens, New York, United States",Married,"Same Place",Head
Try this
use LWP::Simple;
use LWP::UserAgent;
use HTML::TableExtract;
$ENV{'PERL_LWP_SSL_VERIFY_HOSTNAME'} = 0;
$ua = LWP::UserAgent->new;
$ua->agent("Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.91 Safari/537.11");
$req = HTTP::Request->new(GET => "https://familysearch.org/pal:/MM9.1.1/K3BN-LLJ");
$res = $ua->request($req);
$content = $res->content;
#$content = get("https://familysearch.org/pal:/MM9.1.1/K3BN-LLJ") or die "Couldn't get it! $!";
$te = HTML::TableExtract->new( attribs => { 'class' => 'result-data' } );
# $te = HTML::TableExtract->new( );
$te->parse($content);
$table = $te->first_table_found;
# print $content; exit;
# $te->tables_dump(1);
#print Dumper($te);
#print Dumper($table);
print $table->cell(4,0) . ' = ' . $table->cell(4,1), "\n"; exit;
Which prints out
event place: = Assembly District 2, Queens, New York City, Queens, New York, United States
I also noticed this header:
X-Copyright:COPYRIGHT WARNING Data accessible through the FamilySearch API is protected by copyright. Any programmatic access, reformatting, or rerouting of this data, without permission, is prohibited. FamilySearch considers such unauthorized use a violation of its reproduction, derivation, and distribution rights. Contact devnet (at) familysearch.org for further information.
See also http://metacpan.org/pod/HTML::Element#SYNOPSIS
I thought I had answered your question.
The problem is that you are trying to fetch the webpage with LWP. Why are try to doing that if you already have WWW::Mechanize::Firefox?
Did you tried this?
It will retrieve and save each link for further analyses. A small change and you 'get' the DOM tree. Sorry, I do not have acccess to this page, so I just hope it will work.
my $i=1;
for my $link (#links) {
print Dumper $link->url;
print Dumper $link->text;
my $tempfile = './$i.html';$i++;
$mech->get( $link, ':content_file' => $tempfile, synchronize => 1 );
my $dom_tree = $mech->document();
my $str = $dom_tree->getElementsByTagName('table')->[0]->getElementsByTagName("td")->[9]->as_text();
}
EDIT:
Process the page content with regexp (Everyone: Please remember, there is always more than one way to do something wwith Perl!. It works, it is easy...)
it tried it out with this cmd:
wget -nd 'https://familysearch.org/pal:/MM9.1.1/K3BN-LLJ' -O 1.html|cat 1.html|1.pl
use Data::Dumper;
use strict;
use warnings;
local $/=undef;
my $html = <>;#read from file
#$html = $mech->content( format => 'html' );# read data from mech object
my $data = {};
my $current_label = "not_defined";
while ($html =~ s!(<td[^>]*>.*?</td>)!!is){ # process each TD
my $td = $1;
print "td: $td\n";
my $td_val = $td;
$td_val =~ s!<[^>]*>!!gis;
$td_val =~ s!\s+! !gs;
$td_val =~ s!(\A\s+|\s+\z)!!gs;
if ($td =~ m!result-label!){ #primitive state machine, store the current label
print "current_label: $current_label\n";
$current_label = $td_val;
} elsif ($td =~ m!result-value!){ #add each data to current label
push(#{$data->{$current_label}},$td_val);
} else {
warn "found something else: $td\n";
}
}
#process it using a white lists of known entries (son,race, etc).Delete from the result if you find it on white list, die if you find something new.
#multi type
foreach my $type (qw(son wife daughter head)){
process_multi($type,$data->{$type});
delete($data->{$type});
}
#simple type
foreach my $type (qw(birthplace age)){
process_simple($type,$data->{$type});
delete($data->{$type});
}
die "Unknown label!".Dumper($data) if scalar(keys %{$data})>0;
Output:
'line number:' => [
'28'
],
'estimated birth year:' => [
'1912'
],
'head' => [
'Benjamin Schuvlein',
'M',
'28',
'Germany'
],