Read Double byte characters in plist from shell - macos

I am working on Mac. I have a p-list entry containing double byte chinese characters,
ie.ProductRoot /Users/labuser/Desktop/您好.
Now i am running this command on terminal
defaults read "path to p-list" ProductRoot
and I am getting /Users/labuser/Desktop/\u60a8\u597d
........How can i fix this?

"defaults read" doesn't seem to have any way to change the format of the output. Maybe you could pipe that to another command-line tool to unescape the Unicode characters.
Failing that, it'd be very easy to write a tool in Objective-C or Swift to dump just that one value as a string.
As a side note, you claim the file has double-byte characters. If it's being created by native Mac code, it's probably more-likely to be in UTF-8 encoding. I don't know if that would matter at all, but I figured I'd add that in case it's relevant.

You could try this:
defaults read | grep ppt | perl -npe 's/\\\\U(\w\w\w\w)/chr hex $1/ge'

Related

Unable to make a regex work in new Mac Big Sur terminal

I am trying to make a perl onliner work in mac Big Sur terminal. The online is this
perl -pi -e 's/REGULAR_EXPRESSION_TO_BE FOUND/REPLACEMENT/g' *.hmtl
When I try to search and replace in editor BBEDITOR it works fine, but when I try in macOS terminal it does not replace. I believe it may have to do with encoding since I am working with Spanish texts. But my texts are in UTF-8.
If your regex or replacement text are unicode, you need the utf8 pragma to tell Perl to decode the command line script as unicode. Otherwise they will be interpreted as bytes or otherwise according to your locale. Just because it looks like the right character to you on your terminal doesn't mean it really is. This is because the terminal does its own decoding of the bytes you pasted or typed or printed.
Add -Mutf8 to the command line or do use utf8. You can always use the B::perlstring function to see what Perl thinks of what you typed.
# v5.22
$ perl -e 'use B; print B::perlstring "愛"; '
"\346\204\233"
$ perl -Mutf8 -e 'use B; print B::perlstring "愛"; '
"\x{611b}"
$ perl -e 'print "\346\204\233"; '
愛
The regex and the file have to be in the same encoding for the matching to work. Because obviously "\346\204\233" != "\x{611b}". To remove the ambiguity of the terminal you might have to write a short script file to debug it. You also might need -CSD as well.
See here for more information.
perlrun
utf8 pragma
HTH

In shell script, colon(:) is being treated as a operator for variable creation

I have following snippet:
host="https://example.com"
port="80"
url="${host}:${port}"
echo $url
the output is:
:80ps://example.com
How can I escape the colon here. I also tried:
url="${host}\:${port}"
but it did not work.
Expected output is:
https://example.com:80
You've most likely run into what I call the Linefeed-Limbo.
If I copy the code you provided from StackOverflow and run it on my machine (bash version 4.4.19(1)), then it outputs correctly
user#host:~$ cat script.sh
host="https://example.com"
port="80"
url="${host}:${port}"
echo $url
user#host:~$ bash script.sh
https://example.com:80
What is Linefeed-Limbo?
Different operating systems use different ASCII symbols to represent when a new line occurs in a text, such as a script. This Wikipedia article gives a good introduction.
As you can see, Unix and Unix-like systems use the single character \n, also called a "Line Feed". Windows, as well as other systems, use \r\n, so a "carriage return" followed by a "line feed".
What happens now is when you write a script on Windows on an editor such as notepad, what you write is host="example.com"\r\n. When you copy this file into Linux, Linux interprets the \r as if it were part of the script, since only \n is considered a new line. And indeed, when I change my newline style to DOS-style, I get the exact output you get.
How can I fix this?
You have several options to fix this issue.
Converting the script (with dos2unix)
Since all you need to do is replacing every instance of \r\n with \n, you could use any text-editing software you want. However, if you like simple solutions, then dos2unix (and its sister unix2dos) might be what you looking for:
user#host:~$ dos2unix script.sh
dos2unix: converting file script.sh to Unix format...
That's it. Run your file now and you will see it behaves well.
Encoding the source-file correctly
By using a more advanced text editor such as Notepad++, you can define which style of newline you would like to use.
By changing the newline-type to whichever system you intend to run your script on, you will not run into any problems like this anymore.
Bonus round: Why does it output :80ps://example.com?
To understand why your output is like this, you have to look at what your script is doing, and what \r means.
Try thinking of your terminal as an old-fashioned typewriter. Returning the carriage means you start writing on the left again. Making a "new line" means sliding the paper. These two things are seperate, and I think that's why some systems decided to use these two characters as a logical "new line".
But I digress. Let's look at the first line, host="https://example.com"\r.
What this means when printed is "Print https://example.com, then put the carriage back at the start". When you then print :80\r, it doesn't start after ".com", it starts at the beginning of the line, because that's where you (unknowingly) told the cursor to go. it then overwites the first few characters, resulting in ":80ps://example.com" to be written. Keep in mind that after 80, you again placed a carriage return symbol, so any new text you would have written ends up overwriting the beginning again.
It works for me, try to remove carriage returns in variables and then try.
new_host=$(echo "$host" | tr -d '\r')
new_port=$(echo "$port" | tr -d '\r')
new_url="${new_host}:${new_port}"

Bash is putting out an "invalid char ' ' in expression" in my awk section

I am currently writing a small game in gawk and i am using a Raspberry Pi with gawk installed.
After saving my code on a windows editor to move the file to my raspberry, i encountered some problems with the encoding (had to remove some pesky ^M chars). The file is saved in UTF-8 ('locale' tells me the LANG=en_GB.UTF-8). :set encoding? in vim is telling me im using UTF-8 too (same goes for :set fileencoding? ).
When i try to execute my code, which is saved as a .sh script, the interpreter stops at the first OR sign "|"
while ((FieldSize !~ /^[[:digit:]]+$/) || (FieldSize < 4))
The error message is: invalid char 'squarishlooking char' in expression.
I have tried several fixes, i also viewed the file in the hexview and both |-chars are correctly identified as '7C', the hexvalue of | in the ascii-chart.
The error only happens when i use the combination AltGr-7 to input the character into vim. The error wont happen if i enter the INSERT-mode of vim and use Ctrl-V 124 (which is the dez-value of | in the ascii chart). If i view either of the 2 options in a hexview, the chars are correctly nested between a space-char (hex: 20) on either side --> 20 7c 7c 20.
I am now also highlighting nonascii-chars in my vim with
syntax match nonascii "[^\x00-\x7F]"
highlight nonascii guibg=Red ctermbg=2
and the pipe-char only gets highlighted when i use altGr-7.
Always using ctrl-V xxx isnt really a desriable solution in my humble opinion.
I want my script to work when i use AltGr-7 for putting in a "|". Is there any soltution to this apart from the workaround with ctrl-V 124?
squarishlooking char means it isn't a normal pipe. It is something your terminal doesn't have a character for. If you can't see that with xxd then you likely aren't looking at the right file/place.
This is not a shell script, but an awk script. You see the shell complaining that it does not understand what to do with two pipes right after each other. This would be the standalone version:
#!/usr/bin/gawk -f
# your code here
Then, your OS will figure out automatically from the HashBang (#!) to run gawk with it.
Alternatively, this would also work, by running first a new shell, which calls an awk command:
#!/usr/bin/env sh
awk 'your awk code here'

Perl on Windows: Problems with Encoding

I have a problem with my Perl scripts. In UNIX-like systems it prints out all Unicode characters like ä properly to the console. In the Windows commandline, the characters are broken to senseless glyphs. Is there a simple way to avoid this? I'm using use utf8;.
Thanks in advance.
use utf8; simply tells Perl your source is encoded using UTF-8.
It's not working on unix either. There are some strings that won't print properly (print chr(0xE9);), and most that do will print a "Wide character" warning (print chr(0x2660);). You need decode your inputs and encode your outputs.
In unix systems, that's usuaully
use open ':std', ':encoding(UTF-8)';
In Windows system, you'll need to use chcp to find the console's character page. (437 for me.)
use open ':std', ':encoding(cp437)'; # Encoding used by console
use open IO => ':encoding(cp1252)'; # Encoding used by files

Grep with windows, words created in text file without spaces

I am using grep to take all the four letter words out of a dictionary text file and place them into a new text file.
This command should work with Unix however on windows it does not.
I need one word per line, on windows it gives me all the words but all piled together without spaces.
This is the grep command I'm using:
grep "^[a-z]\{4\}$" dictionaryfilename > outputfilename
I believe it's something to do with a difference in newline characters between Unix and windows?
Anyway I'm not sure how to make a fix for windows with this could someone please help.
Thanks a lot :)
you probably have a UNIX-formatted textfile (newlines without carriage returns), which looks like one big line in Windows; grep just deals in whatever the system says is 'a line', so it has little to do with the problem.
Try converting the file from LF to CRLF and see if you get better results.

Resources