What's going one here?
$string = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<album>
<img src="002.jpg" caption="wássup?" />
</album>
XML;
$xml = simplexml_load_string($string);
// $xmlobj = simplexml_load_file("xml.xml"); // same thing
echo "<pre>";
var_dump($xml);
echo "</pre>";
Error:
Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 5: parser error : Entity 'aacute' not defined
á is not an XML entity - you're thinking about HTML.
Special characters are usually used "as is" in XML - an html_entity_decode() on the input data (don't forget to specify UTF-8 as the character set) should do the trick:
$string = html_entity_decode($string, ENT_QUOTES, "utf-8");
i had this problem the other day.
any occurrence of & will need to be inside a CDATA tag
<album>
<img src="002.jpg" />
<caption><![CDATA[now you can put whatever characters you need & include html]]></caption>
</album>
to keep the parser from failing.
You may want to look at Matt Robinson's article on an alternative method: Converting named entities to numeric in PHP . It mentions the html_entity_decode method (already pointed out by another answer) and some potential pitfalls:
There are two possible problems with this approach. The first is invalid entities: html_entity_decode() won't touch them, which means you'll still get XML errors. The second is encoding. I suppose it's possible that you don't actually want UTF-8. You should, because it's awesome, but maybe you have a good reason. If you don't tell html_entity_decode() to use UTF-8, it won't convert entities that don't exist in the character set you specify. If you tell it to output in UTF-8 and then use something like iconv() to convert it, then you'll lose any characters that aren't in the output encoding.
Also, if you find the script rather cumbersome, you can also use the one shared on SourceRally.
Another solution is to change
"wássup?"
to
"wássup?"
Try this func simplexml_load_entity_string
<?php
$string = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<album>
<img src="002.jpg" caption="test<wássup?" />
</album>
XML;
$xml = simplexml_load_entity_string($string);
var_dump($xml);
function simplexml_load_entity_string($string = '')
{
// cover entity except Predefined entities in XML
$string = str_replace([
'"', '&', ''', '<', '>',
], [
'SPECIALquotMARK', 'SPECIALampMARK', 'SPECIALaposMARK', 'SPECIALltMARK', 'SPECIALgtMARK',
], $string);
$string = html_entity_decode($string, ENT_QUOTES, "utf-8");
$string = str_replace([
'SPECIALquotMARK', 'SPECIALampMARK', 'SPECIALaposMARK', 'SPECIALltMARK', 'SPECIALgtMARK',
], [
'"', '&', ''', '<', '>',
], $string);
// load xml
return simplexml_load_string($string);
}
Related
How to add line break in laravel language files?
I have tried to use ,, \n\r,
to break line and add new line but all these not work.
return [
'best_hospitality' => 'Simply <br /> the best hospitality',
];
You can use
'best_hospitality' => "<pre>Simply\r\nthe best hospitality</pre>",
or 'best_hospitality' => sprintf ('<pre>Simply%sthe best hospitality</pre>',PHP_EOL ),
please note the use of double quotes in the first example, it is not working with single quotes if you use the \r\n inside the string, this is why
if you try echo(Lang::get('message.best_hospitality')) you will see the new line:
I am not so sure if you need the pre tag, depends where you need to use the Lang for html or not, eg using (double quotes here):
'best_hospitality' => "Simply\r\nthe best hospitality",
and var_dump(Lang::get('message.best_hospitality')); exit;
has the output
C:\wamp64\www\test\app\Http\Controllers\TestController.php:24:string 'Simply
the best hospitality' (length=39)
Does this cover your case?
You need HTML Entity Names on your lang files.
Try this :
return [
'best_hospitality' => 'Simply <br> the best hospitality',
];
May I suggest creating a custom helper function for this specific case?
Add this into your helpers.php file:
if (! function_exists('trans_multiline')) {
/**
* Retrieve an escaped translated multiline string with <br> instead of newline characters.
*/
function trans_multiline($key, array $replace = [], string $locale = null): string
{
return nl2br(e(__($key, $replace, $locale)));
}
}
Now you will have a function trans_multiline() available for you in any view, which will behave pretty much like the built-in __() helper.
The function will fetch a localized string of text and replace any newline \r\n symbol with <br> tag.
Caveat: For a proper escaping it must be done before nl2br() function, like you see in the code above. So, to prevent any weird errors due to double-escaping, you must use this custom helper without any additional escaping, like so:
{!! trans_multiline('misc.warning', ['name' => 'Joe']) !!}
Escaping will be handled by the e() function (which is what Laravel uses under the hood of {{ }}) inside the helper itself.
And here's how you define a multiline translation string:
'warning' => "There's only 1 apple, :name!\r\nDon't eat it!"
Make sure to use double-quotes, so PHP actually replaces \r\n with a newline character.
Obviously, parameter replacing still works exactly like with the __() helper.
I have a problem in Perl with a Turkish character. I have set the Turkish character set but my Turkish character isn't displaying properly.
This is my Perl script.
#!"c:\xampp\perl\bin\perl.exe"
use strict;
use warnings;
use CGI qw(:standard);
my $name = param('name');
my $surname = param('surname');
my $age = param('age');
my $gender = param('gender');
my $q = new CGI;
# We're outputting plain text, not HTML
print $q->header(-content_type => 'text/plain' , -charset => 'ISO-8859-9');
my $text = $name." ".$surname." ".$age." ".$gender." kaydı, sistemde yapılacak olan güncellemelerden sonra sisteme başarıyla eklenecektir.";
# printf("%s %s %d %d kaydı, sistemde yapılacak olan güncellemelerden sonra sisteme başarıyla eklenecektir.", $name , $surname , $age , $gender);
print $text;
How can I fix the problem?
First and foremost: Do not use the HTML generation functionality in CGI.pm. Using those leads to an unmaintainable mess of a presentation layer. Instead, use Template Toolkit based templates to separate presentation and functionality.
Second, do not use indirect object notation. That is, do not write:
my $cgi = new CGI;
Instead, write
my $cgi = CGI->new
The use of $q or $query to refer to the CGI object is weird and has it roots in the early days of the WWW. There is no reason to perpetuate it when you are learning things from scratch.
In addition, given that you have just instantiated an object, don't use plain subs such as param and don't pollute the namespace of your script. Access parameter values using:
my $value = $cgi->param('surname');
Finally, if you are going to use "interesting" characters in your source code such as Ş, save your source code as UTF-8 and specify
use utf8;
at the top of your script.
Finally, do also save all your HTML templates in UTF-8 and generate all output from your script encoded in UTF-8 and specify the document encoding as UTF-8. All other paths lead to insanity.
Also, don't use sexual as a parameter name. Use nouns as parameter and variable names. And, most infuriating to me as a Turk, Mr. (Bay) and Ms. (Bayan) are just titles and they are not appropriate choices for an input field asking about the sex of the respondent.
See also How can I deal with diverse gender identities in user profiles? That may not be on your radar in Turkey at the moment, but you will eventually encounter the issue.
Here is an untested script that might work for you:
#!"c:\xampp\perl\bin\perl.exe"
use utf8;
use strict;
use warnings;
use warnings qw(FATAL utf8);
# Provides only object oriented interface
# without HTML generation cruft
use CGI::Simple;
run( CGI::Simple->new );
sub run {
my $cgi = shift;
binmode STDOUT, ':encoding(UTF-8)';
my ($name, $surname, $age, $gender) = map $cgi->param($_), qw(name surname age gender);
print $cgi->header(-type => 'text/plain', -charset => 'utf-8');
printf("%s %s %d %d kaydı, sistemde yapılacak olan güncellemelerden sonra sisteme başarıyla eklenecektir.\n",
$name , $surname , $age , $gender);
return;
}
I think you are misunderstanding the purpose of the charset attribute on the Content-type header. Your program is emitting this header:
Content-Type: text/plain; charset=ISO-8859-9
This says to an HTTP client (a browser, for example) "I am going to send you some plain text which is encoded as ISO-8859-9". But it's important to note that the header is purely informational. It tells the world that your text is encoded as ISO-8859-9. It does not do the encoding for you. That is up to you.
This is why Borodin and others have been asking you questions which you have not been answering. Most editors will create text that is encoded either as ISO-8859-1 or UTF-8. Unless you have a special Turkish editor or you have changed the configuration of your editor, it seems very unlikely to me that you are producing text in ISO-8859-9.
If you are determined to emit ISO-8859-9 text, then you need to do that encoding yourself. You can use the encode() function from the Encode module to do that.
use Encode;
my $encoded_text = encode('ISO-8859-9', $text);
print $encoded_text;
But I wonder why you want to use a relatively obscure encoding like ISO-8859-9. UTF-8 covers all of the characters used in Turkish. Why not use that instead? Your life will become far easier if you embrace the same standards as the web.
As an aside, you have introduced a small strangeness in your code. You use CGI.pm in "functions" mode and load it in a way which imports a number of its functions into your namespace.
use CGI qw(:standard);
And then you use the param() function a few times in this way. But after that you create a CGI query object in order to call the header() method on it.
my $q = new CGI;
print $q->header(...);
You probably don't realise it, but the header() function is included in the :standard set of imports, so you can call it without creating a CGI object.
print header(...);
I used it that way in my answer to your previous question. I'm not sure why you changed the code to make it more complicated.
I should also point out that if you do want to create a CGI query object, then you shouldn't use the indirect object notation:
my $q = new CGI;
This will cause you problems at some point. Far better to write:
my $q = CGI->new;
(As demonstrated in the CGI.pm documentation)
I know I can use {{{}}} for escape all html tags from output texts, but I want to escape only unsafe tags not all tags (for example I want to use br tag in the text)
You should definitely implement it by yourself. I'm assuming that the tags you want to escape are probably just <script> and <iframe>, however in my opinion it is more appropriate to remove entirely that content instead of keeping escaped content on your page for no reason.
You could use regex for simple substitution, something like
$html = preg_replace("/<iframe.*?>/", "", $html);
$html = preg_replace("/<script(.*?)>(.*?)<\/script>/", "", $html);
However it's considered bad practice because the perfect regex expression doesn't exist, so you could have a breach in your security.
A better idea would be using the PHP DOMDocument Parser. You can do something like this to remove script tags:
$doc = new DOMDocument();
$doc->loadHTML($html);
$script_tags = $doc->getElementsByTagName('script');
for ($i = 0; $i < $script_tags->length; $i++) {
$script_tags->item($i)->parentNode->removeChild($script_tags->item($i));
}
$clean_html = $doc->saveHTML();
I'm using htmlspecialchars() on input field for user last name to prevent xss, but it's not working..
Let's say $user_data->user_last_name; is my user last name, so I did:
htmlspecialchars( $user_data->user_last_name, ENT_QUOTES, 'UTF-8' );
When I try to save user last name as 'Lastname<script>alert("xss")</script>', I get JS alert with 'xss' message.
Any clue maybe?
Try this, may work:
$string = htmlentities($user_data->user_last_name, ENT_QUOTES, 'ISO-8859-15');
While retrieving the input you should use:
$value = $this->input->post('input_name', true);
Here, true will clean the input value of xss.
It works, but The output is interpreted by your browser as HTML
// use That Simple Line Above Your Code To See The Real output
<?php
header('Content-Type: text/plain');
?>
I can't figure out how to search for text containing single quotes using XPATHs.
For example, I've added a quote to the title of this question. The following line
$x("//*[text()='XQuery looking for text with 'single' quote']")
Returns an empty array.
However, if I try the following
$x("//*[text()=\"XQuery looking for text with 'single' quote\"]")
It does return the link for the title of the page, but I would like to be able to accept both single and double quotes in there, so I can't just tailor it for the single/double quote.
You can try it in chrome's or firebug's console on this page.
Here's a hackaround (Thanks Dimitre Novatchev) that will allow me to search for any text in xpaths, whether it contains single or double quotes. Implemented in JS, but could be easily translated to other languages
function cleanStringForXpath(str) {
var parts = str.match(/[^'"]+|['"]/g);
parts = parts.map(function(part){
if (part === "'") {
return '"\'"'; // output "'"
}
if (part === '"') {
return "'\"'"; // output '"'
}
return "'" + part + "'";
});
return "concat(" + parts.join(",") + ")";
}
If I'm looking for I'm reading "Harry Potter" I could do the following
var xpathString = cleanStringForXpath( "I'm reading \"Harry Potter\"" );
$x("//*[text()="+ xpathString +"]");
// The xpath created becomes
// //*[text()=concat('I',"'",'m reading ','"','Harry Potter','"')]
Here's a (much shorter) Java version. It's exactly the same as JavaScript, if you remove type information. Thanks to https://stackoverflow.com/users/1850609/acdcjunior
String escapedText = "concat('"+originalText.replace("'", "', \"'\", '") + "', '')";!
In XPath 2.0 and XQuery 1.0, the delimiter of a string literal can be included in the string literal by doubling it:
let $a := "He said ""I won't"""
or
let $a := 'He said "I can''t"'
The convention is borrowed from SQL.
This is an example:
/*/*[contains(., "'") and contains(., '"') ]/text()
When this XPath expression is applied on the following XML document:
<text>
<t>I'm reading "Harry Potter"</t>
<t>I am reading "Harry Potter"</t>
<t>I am reading 'Harry Potter'</t>
</text>
the wanted, correct result (a single text node) is selected:
I'm reading "Harry Potter"
Here is verification using the XPath Visualizer (A free and open source tool I created 12 years ago, that has taught XPath the fun way to thousands of people):
Your problem may be that you are not able to specify this XPath expression as string in the programming language that you are using -- this isn't an XPath problem but a problem in your knowledge of your programming language.
Additionally, if you were using XQuery, instead of XPath, as the title says, you could also use the xml entities:
"" for double and ' for single quotes"
they also work within single quotes
You can do this using a regular expression. For example (as ES6 code):
export function escapeXPathString(str: string): string {
str = str.replace(/'/g, `', "'", '`);
return `concat('${str}', '')`;
}
This replaces all ' in the input string by ', "'", '.
The final , '' is important because concat('string') is an error.
Well I was in the same quest, and after a moment I found that's there is no support in xpath for this, quiet disappointing! But well we can always work around it!
I wanted something simple and straight froward. What I come with is to set your own replacement for the apostrophe, kind of unique code (something you will not encounter in your xml text) , I chose //apos// for example. now you put that in both your xml text and your xpath query . (in case of xml you didn't write always we can replace with replace function of any editor). And now how we do? we search normally with this, retrieve the result, and replace back the //apos// to '.
Bellow some samples from what I was doing: (replace_special_char_xpath() is what you need to make)
function repalce_special_char_xpath($str){
$str = str_replace("//apos//","'",$str);
/*add all replacement here */
return $str;
}
function xml_lang($xml_file,$category,$word,$language){ //path can be relative or absolute
$language = str_replace("-","_",$language);// to replace - with _ to be able to use "en-us", .....
$xml = simplexml_load_file($xml_file);
$xpath_result = $xml->xpath("${category}/def[en_us = '${word}']/${language}");
$result = $xpath_result[0][0];
return repalce_special_char_xpath($result);
}
the text in xml file:
<def>
<en_us>If you don//apos//t know which server, Click here for automatic connection</en_us> <fr_fr>Si vous ne savez pas quelle serveur, Cliquez ici pour une connexion automatique</fr_fr> <ar_sa>إذا لا تعرفوا أي سرفير, إضغطوا هنا من أجل إتصال تلقائي</ar_sa>
</def>
and the call in the php file (generated html):
<span><?php echo xml_lang_body("If you don//apos//t know which server, Click here for automatic connection")?>