I read from remote server some ugly letters - laravel

In Laravel 5.8 / vuejs 2.6 I make search from remote server and outputting readed data I see some ugly symbols :
https://prnt.sc/p2nb2w
I suppose these(or part of them) are some arabic letters. I read the data with curl, having headers :
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: application/json; charset=utf-8"));
On my site I use utf-8 and Cera-GR fonts.
Dumping readed data for printscreen with examples above I see in kate editor next text pieces:
[Description] => 🌎🌍🌏 Moroccan Travel lover living in Paris/Dubai 📍Dubai
[FullName] => ☕️💈pdl - Est. 2018 💈☕️
[Description] => Latin American Restaurant🍴
Pisco Bar& Lounge 🍸
Members' Club✨
+971(0)43169600 ☎️ reservations#coyarestaurant.ae 💌
I am not sure what can I do here? What kind of symbols are there ? Change the fonts or clear some symbols?
If just clear them, by which rules?
Thanks!

You could get rid of them from the string in Javascript, if that's what you'd like to achieve. Run them through this:
removeIllegalCharacters(string) {
string.replace(/[^\w\s]/gi, '').trim()
}
This will retain your spaces, but .trim() will ensure you don't have any leading or trailing spaces on the string.

Related

Laravel 5.2 Maatwebsite Excel::load not parsing Japanese characters

I'm using Maatwebsite excel v ~2.1.0 for my Laravel 5.2 project. My problem is that the japanese characters are not rendered as seen the photo below. As you can see, I have 23 heading rows and the displayed data only are only English characters. Columns that uses Japanese characters are null.
This is my data in my CSV file.
This is my approach in loading the CSV file:
Excel::load(request()->file('file'), function($reader) {
$results = $reader->all();
dd($results);
});
What should I change to make the Japanese characters readable?
Okay after exploring more, I found out the answer. In excel.php file, I changed 'to_ascii' => true, to 'to_ascii' => false and my problem is fixed. Hope this helps someone in the near future.

Mac computers aren't processing mailto: links correctly when they have // in them (mailto://)

Sorry for the question title, it's a little difficult to phrase in my opinion. Here is the full question:
The WYSIWYG HTML editor we use on our websites includes a // in the mailto: link when inserted into the text editor box (mailto://). We are a webfirm and use this editor on many, many websites. For example, all the mail links inserted appear like this:
Text Here
We just noticed this morning that Windows computers do not include the // in the To: field when clicked regardless of the email client it's opened with. It will include the email as normal (email#domain.com).
However, Mac computers are including the // though, so whenever someone tries to send an email using these links, it's trying to email //email#domain.com - which isn't delivering, because obviously it's an invalid format with the //s.
Does anyone have any knowledge to why this is happening? The WYSIWYG editor we are using is obout. If we have to go back and remove these // from every single website we've built, it would be a tremendous task. I'm just wondering why Macs seem to not process the link correctly, while Windows computers do.
The Macs are processing the link correctly. Windows is incorrectly removing data and your editor is incorrectly encoding the data.
The mailto: URL scheme is defined by RFC 2368. It defines it as:
mailtoURL = "mailto:" [ to ] [ headers ]
to = #mailbox
headers = "?" header *( "&" header )
header = hname "=" hvalue
hname = *urlc
hvalue = *urlc
"#mailbox" is as specified in RFC 822 [RFC822]. This means that it
consists of zero or more comma-separated mail addresses, possibly
including "phrase" and "comment" components. Note that all URL
reserved characters in "to" must be encoded: in particular,
parentheses, commas, and the percent sign ("%"), which commonly occur
in the "mailbox" syntax.
There is no provision for removing characters such as /.

Not sure why the output of my PHP scripts contains random embedded spaces within character strings

I have written several PHP scripts to read the contents of a database and output those contents in an email message. Every once in a while, I will see a SPACE (0x20) character embedded in the output where there shouldn't be any. For example, in one script, I reference a PHP global variable containing exactly "n" non-space characters, and sometimes (not always), when that variable is dumped to an email message, the string will appear with an embedded blank (making the total length of the string "n+1"). Other times, an HTML tag (such as <BR>) will appear as < BR> (note the SPACE before the "B").
Because the behavior of the script is not consistent (some emails are affected, and others aren't), I can't seem to find the problem.
I am enclosing a link to the PHP script that is occasionally embedding a space into the BREAK tag. I have removed the lines that provide specific login information to the databases. Otherwise, everything else is intact. In the code file you can find at the link below, line 281 is the one that contained the BREAK command with the embedded SPACE (as described above). This has happened only once!
http://jem-software.com/temptest.txt
I guess the only other potentially relevant information is that this script file is taken from code entered into a JUMI code block contained within a Joomla! based website.
Edit 1:
Thank you, Riccardo, for your suggestions. Here is some more clarification:
I am not reading an email and parsing the results in order to insert into a database. Just the opposite, I am reading from a database and using the results to create an email. I will check the database to see what character set was used, and explicitly pass the character set to see if that makes a difference.
I don't use Joomla functions to access the database because the database I am referencing is external to the Joomla! environment. It is a pre-existing database created from PHP scripts written several years prior. When my old website was re-written using Joomla, I wanted to "port" the PHP database access code intact, so I installed the JUMI plugin to make this possible.
I will check out the character coding in the database and synchronize it to the character code of the email message.
I don't understand how an issue with character coding would result in the insertion of a SPACE into the hard-coded HTML tag - this tag did not come from any database, but was typed into the email as a literal string.
This is a strange issue, but here are my two cents:
The first is you're not using Joomla functions to access the db and the mail subsystem. While this could work, it's not really nice.
The second is, this smells like a character set / codepage issue.
Here are a few considerations on the character set issue:
I read your code quickly, and I didn't notice anything wrong. But Joomla uses UTF-8, and your queries don't specify it (mysql_set_charset() is missing!) which could be a first issue.
The second is that the emails you read will have different character sets, depending on the senders' settings. Make sure you handle the codepage issues properly: the following is a snippet of a function I use for parsing email:
$mime = imap_fetchmime($this->connection, $this->messageNumber, $partNumber);
return $this->decodeMailBody($data,$mime); // QUOTED_PRINTABLE
function decodeMailBody($string,$mime) {
$str = quoted_printable_decode($string);
echo "<h3>mime: $mime; charset $charset</h3>";
//mime: Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8
//mime: Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252
$mimes = explode('charset=',$mime);
foreach($mimes as $mimepiece) {
$charset = $mimepiece;
}
$charset = strtolower(trim($charset));
if ($charset == 'utf-8') {
return $str;
} else {
return iconv($charset, 'UTF-8', $str);
}
}
Last, make sure you use utf-8 when you insert the mail into the db after parsing it.

How is Illegal char's URL working?

There are many sites (such as Stackoverflow) that has the title of the page in the URL.
I am looking for the algorithm in which they are using in order to avoid illegal URL characters. ( I dont want URL encoding, I want replace/remove algo)
like 'How is Illegal char's URL working?' will become 'How-is-Illegal-chars-URL-working'
Thanks!
The algorithm to do this is generally called 'slugify', because it turns a string into a 'slug' to be used in a URL. Searching for that should give you plenty of useful implementations.
No idea how SO does it, but I would just strip every non-alphanumeric character and replace spaces with underscores.
In Python:
def cleanTitle(title):
temp = ''
for character in title.lower():
if character in 'abcdefghijklmnopqrstuvwxyz1234567890_-+/<>,.=[]{}()\|!##$%^&':
temp += character
return temp
I see you are working in C#. I don't know C#, so you'll have to translate this code. I doubt it's hard to do, though.

How do I guarantee that utf-8 characters are scraped accurately using CURL in php?

I am scraping webpages (using php's curl) that have accented characters (like "é").
In the source of those webpages, those characters are written using utf-8 (they are not html encoded.)
However, when the result is produced using the following code, I get question marks instead of the accented characters.
$ch = curl_init();
$timeout = 5;
curl_setopt ($ch, CURLOPT_URL, $website);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$file = curl_exec($ch);
curl_close($ch);
The header info returned from the scraped webpage indicates that the Content is set to "html/text." There's no indication that it's utf-8 encoded. I've tried using CURLOPT_HTTPHEADER curl option to change the text encoding, but that doesn't do anything.
What am I missing?
As per the answer to my question, have a look at
characters changed in a Curl request
The answer Dominic Rodger just saved my day with his reply..

Resources