{} symbols are making my application crash - internationalization

I've got this line in my cakePHP project:
<?= __('Doorzoek {0,number,#,###} foto\'s', $totalimg) ?>
Extracting this with the I18n Shell gives in the default.pot:
#: Template/Search/start.ctp:148
msgid "Doorzoek {0,number,#,###} foto's"
msgstr ""
Opening, translating and exporting to default.po with PoEdit makes this line:
#: Template/Search/start.ctp:148
msgid "Doorzoek {0,number,#,###} foto's"
msgstr "Search in {0, number, #, ###} pictures"
now, the {} symbols make my application crash.
What is the correct way to escape these so the phrase prints something like:
Search in 564,646 pictures
EDIT
so the problem is the space before the hashtags. Changing the line into
msgstr "Search in {0, number,#,###} pictures"
solved the problem.

Related

rename multiple mp3 filenames from greek to english with bash script

I have dozens of mp3 files where the filename contains Greek letters. I would like to rename them to "latin only characters" so that the title etc. is displayed correctly on all common playback devices.
It takes a long time to do this manually, so I need your help.
Is there a simple bash script that can do this job?
as example:
I want the script to rename the file from σαγαπώ.mp3 to sagapo.mp3
edit://
I was now able to rename the file name with a python script.
Of:
Βασίλης Μπατής - Ζημιά _ Vasilis Mpatis - Zimia _ Official Video Clip HQ 2017.mp3
would:
Basilis Mpatis - Zimia _ Vasilis Mpatis - Zimia _ Official Video Clip HQ 2017.mp3
So far so good, now the question is how do I get rid of all "unnecessary" information from the file name, so that in the end only the artist and title remain as file names.
This is what the file name should look like at the end.
Basilis Mpatis - Zimia.mp3
Anyone an idea?
Here is my Python script:
import os
# Pfad zum Ordner mit den MP3-Dateien
path = '/home/sakis/mp3'
# Alle MP3-Dateien im Ordner durchlaufen
for file in os.listdir(path):
if file.endswith('.mp3'):
# Aktuellen Dateinamen speichern und in Unicode umwandeln
old_name = file.encode('utf-8').decode('utf-8')
# Dateinamen in zwei Teile trennen
name, extension = old_name.rsplit('.', 1)
# Griechische Buchstaben im Dateinamen ersetzen
new_name = old_name.replace('Ά', 'A').replace('Έ', 'E').replace('Ή', 'H').replace('Ί', 'I').replace('Ό', 'O').replace('Ύ', 'Y').replace('Ώ', 'W').replace('ΐ', 'I').replace('Α', 'A').replace('Β', 'B').replace('Γ', 'G').replace('Δ', 'D').replace('Ε', 'E').replace('Ζ', 'Z').replace('Η', 'H').replace('Θ', 'TH').replace('Ι', 'I').replace('Κ', 'K').replace('Λ', 'L').replace('Μ', 'M').replace('Ν', 'N').replace('Ξ', 'X').replace('Ο', 'O').replace('Π', 'P').replace('Ρ', 'R').replace('Σ', 'S').replace('Τ', 'T').replace('Υ', 'Y').replace('Φ', 'F').replace('Χ', 'X').replace('Ψ', 'PS').replace('Ω', 'O').replace('ά', 'a').replace('έ', 'e').replace('ή', 'i').replace('ί', 'i').replace('ό', 'o').replace('ύ', 'y').replace('ώ', 'w').replace('ϊ', 'i').replace('ϋ', 'u').replace('ό', 'o').replace('α', 'a').replace('β', 'b').replace('γ', 'g').replace('δ', 'd').replace('ε', 'e').replace('ζ', 'z').replace('η', 'i').replace('θ', 'th').replace('ι', 'i').replace('κ', 'k').replace('λ', 'l').replace('μ', 'm').replace('ν', 'n').replace('ξ', 'x').replace('ο', 'o').replace('π', 'p').replace('ρ', 'r').replace('ς', 's').replace('σ', 's').replace('τ', 't').replace('υ', 'y').replace('φ', 'f').replace('χ', 'x').replace('ψ', 'ps').replace('ω', 'o')
# Alle weiteren Zeichen im Dateinamen entfernen
name = ''.join(c for c in name if c.isalnum() or c in [' ', '-', '_'])
# Neuen Dateinamen setzen
os.rename(os.path.join(path, old_name), os.path.join(path, new_name))
print('Done!')
You need to define your own tranlsation table, because nobody can guess how you want to translate the names. Assume that the greek name is stored in variable greek_name, something like this could do:
english_name=$(tr αβΓγΔδεΖζ... avGgDdeZz... $greek_name)
Of course you have to make compromises: Since for instance the letter υ can be pronounced as "i", "f" or as "w" depending on the context, you have to settle for one.
Another problem is that several greek letters are pronounced the same; for instance, Ο and Ω. If you don't manage to map them uniquely, it might happen that two greek file names map to the same english file name. Therefore, when you do the renaming, make sure that you get at least an error message in this case:
if ! mv -n "$greek_name" "$english_name"
then
echo Can not rename "$greek_name", because "$english_name" already exists
fi
UPDATE:
It's not clear how you would translate i.e. ψ, as the most natural mapping would be to use two charcaters, "ps". You could either use an english letter which has no equivalent in Greek anyway ('c' comes to my mind), or you translate these special cases in a separate step, for instance:
# english_name could still contain a ψ because this
# was not handled by `tr`
english_name=${english_name//ψ/ps}
You have of course make up your mind whether you want the upper case Ψ being translated into PS or Ps.
You have not specified how you want to use in translation the English letter b. In Greek, this sound is written as νπ, i.e. two Greek letters map to a single English one. If you want to implement this mapping, you have to do it before the one-to-one translation done by tr, for instance:
# Already preprocess νπ before translating the other
# Greek letters:
greek_name=${greek_name//νπ/b}
greek_name=${greek_name//Ν[πΠ]/B}
This reflects the idea that a Greek word starting with Νπ is meant to be a word starting with an upper case letter, and ΝΠ is meant to start an all-upper-case word, both corresponding to an upper-case B in English.

Moving chunks of data in a file with awk

I'm moving my bookmarks from kippt.com to pinboard.in.
I exported my bookmarks from Kippt and for some reason, they were storing tags (preceded by #) and description within the same field. Pinboard keeps tags and description separated.
This is what a Kippt bookmark looks like after export:
<DT>This is a title
<DD>#tag1 #tag2 This is a description
This is what it should look like before importing into Pinboard:
<DT>This is a title
<DD>This is a description
So basically, I need to replace #tag1 #tag2 by TAGS="tag1,tag2" and move it on the first line within <A>.
I've been reading about moving chunks of data here: sed or awk to move one chunk of text betwen first pattern pair into second pair?
I haven't been to come up with a good recipe so far. Any insight?
Edit:
Here's an actual example of what the input file looks like (3 entries out of 3500):
<DT>Phabricator
<DD>#bug #tracking
<DT>The hidden commands for diagnosing and improving your Netflix streaming quality – Quartz
<DT>Icelandic Farm Holidays | Local experts in Iceland vacations
<DD>#iceland #tour #car #drive #self Self-driving tour of Iceland
This might not be the most beautiful solution, but since it seems to be a one-time-thing it should be sufficient.
import re
dt = re.compile('^<DT>')
dd = re.compile('^<DD>')
with open('bookmarks.xml', 'r') as f:
for line in f:
if re.match(dt, line):
current_dt = line.strip()
elif re.match(dd, line):
current_dd = line
tags = [w for w in line[4:].split(' ') if w.startswith('#')]
current_dt = re.sub('(<A[^>]+)>', '\\1 TAGS="' + ','.join([t[1:] for t in tags]) + '">', current_dt)
for t in tags:
current_dd = current_dd.replace(t + ' ', '')
if current_dd.strip() == '<DD>':
current_dd = ""
else:
print current_dt
print current_dd
current_dt = ""
current_dd = ""
print current_dt
print current_dd
If some parts of the code are not clear, just tell me. You can of course use python to write the lines to a file instead of printing them, or even modify the original file.
Edit: Added if-clause so that empty <DD> lines won't show up in the result.
script.awk
BEGIN{FS="#"}
/^<DT>/{
if(d==1) print "<DT>"s # for printing lines with no tags
s=substr($0,5);tags="" # Copying the line after "<DT>". You'll know why
d=1
}
/^<DD>/{
d=0
m=match(s,/>/) # Find the end of the HREF descritor first match of ">"
for(i=2;i<=NF;i++){sub(/ $/,"",$i);tags=tags","$i} # Concatenate tags
td=match(tags,/ /) # Parse for tag description (marked by a preceding space).
if(td==0){ # No description exists
tags=substr(tags,2)
tagdes=""
}
else{ # Description exists
tagdes=substr(tags,td)
tags=substr(tags,2,td-2)
}
print "<DT>" substr(s,1,m-1) ", TAGS=\"" tags "\"" substr(s,m)
print "<DD>" tagdes
}
awk -f script.awk kippt > pinboard
INPUT
<DT>Phabricator
<DD>#bug #tracking
<DT>The hidden commands for diagnosing and improving your Netflix streaming quality – Quartz
<DT>Icelandic Farm Holidays | Local experts in Iceland vacations
<DD>#iceland #tour #car #drive #self Self-driving tour of Iceland
OUTPUT:
<DT>Phabricator
<DD>
<DT>The hidden commands for diagnosing and improving your Netflix streaming quality – Quartz
<DT>Icelandic Farm Holidays | Local experts in Iceland vacations
<DD> Self-driving tour of Iceland

I need to URL-encode a string in AppleScript

My script searches a website for songs, but when there are spaces it doesn't search, you have to add underscores. I was wondering if there was a way to replace my spaces with underscores.
Could you please use my current code below to show me how to do it?
set search to text returned of (display dialog "Enter song you wish to find" default answer "" buttons {"Search", "Cancel"} default button 1)
open location "http://www.mp3juices.com/search/" & search
end
Note: The solution no longer works as of Big Sur (macOS 11) - it sounds like a bug; do tell us if you have more information.
Try the following:
set search to text returned of (display dialog "Enter song you wish to find" default answer "" buttons {"Search", "Cancel"} default button 1)
do shell script "open 'http://www.mp3juices.com/search/'" & quoted form of search
end
What you need is URL encoding (i.e., encoding of a string for safe inclusion in a URL), which involves more than just replacing spaces.
The open command-line utility, thankfully, performs this encoding for you, so you can just pass it the string directly; you need do shell script to invoke open, and quoted form of ensures that the string is passed through unmodified (to be URI-encoded by open later).
As you'll see, the kind of URL encoding open performs replaces spaces with %20, not underscores, but that should still work.
mklement0's answer is correct about url encoding but mp3juices uses RESTful URLs (clean URLs). RESTful URLs want's to keep the URL human readable and you won't see/use typical hex values in your url presenting an ASCII number. A snake_case, as you have mentioned (is false), but it is pretty common to use an substitution for whitespaces (%20) (and other characters) in RESTful URLs. However the slug of an RESTful must be converted to RESTful's own RESTful encoding before it can be handled by standard URL encoding.
set search to text returned of (display dialog "Enter song you wish to find" default answer "" buttons {"Search", "Cancel"} default button 1)
set search to stringReplace(search, space, "-")
do shell script "open 'http://www.mp3juices.com/search/'" & quoted form of search
on stringReplace(theText, searchString, replaceString)
set {oldTID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, searchString}
set textItems to every text item of theText
set AppleScript's text item delimiters to replaceString
set newText to textItems as string
set AppleScript's text item delimiters to oldTID
return newText
end stringReplace
EDIT: updated the code, unlike the question mentioned that spaces are converted to underscores, mp3juice uses hyphens as substitution for whitespaces.
An update on this, despite the fact that the answer is 3 years old, as I faced the same problem: on recent versions of macOS/OS X/Mac OS X (I think, 10.10 or later), you can use ASOC, the AppleScript/Objective-C bridge:
use framework "Foundation"
urlEncode("my search string with [{?#äöü or whatever characters")
on urlEncode(input)
tell current application's NSString to set rawUrl to stringWithString_(input)
set theEncodedURL to rawUrl's stringByAddingPercentEscapesUsingEncoding:4 -- 4 is NSUTF8StringEncoding
return theEncodedURL as Unicode text
end urlEncode
It should be noted that stringByAddingPercentEscapesUsingEncoding is deprecated, but it will take some time until it’s removed from macOS.
URL encoding in AppleScript
For a general use case (for me at the moment to pass any ASCII url containing chars like #, &, ß, ö to the bit.ly API), I stumbled upon a nice code snippet that instantly added full support to my ShortURL clipboard pasting shortcut. Here's a quote from source:
i was looking for a quick and dirty way to encode some data to pass to a url via POST or GET with applescript and Internet Explorer, there were a few OSAXen which have that ability, but i didn't feel like installing anything, so i wrote this thing (works with standard ascii characters, characters above ascii 127 may run into character set issues see: applescript for converting macroman to windows-1252 encoding)
Notes
Double encoding should be duly noted.
Not tested on non-ASCII URLs.
Tested on OS X 10.8.5.
Code
on urlencode(theText)
set theTextEnc to ""
repeat with eachChar in characters of theText
set useChar to eachChar
set eachCharNum to ASCII number of eachChar
if eachCharNum = 32 then
set useChar to "+"
else if (eachCharNum ≠ 42) and (eachCharNum ≠ 95) and (eachCharNum < 45 or eachCharNum > 46) and (eachCharNum < 48 or eachCharNum > 57) and (eachCharNum < 65 or eachCharNum > 90) and (eachCharNum < 97 or eachCharNum > 122) then
set firstDig to round (eachCharNum / 16) rounding down
set secondDig to eachCharNum mod 16
if firstDig > 9 then
set aNum to firstDig + 55
set firstDig to ASCII character aNum
end if
if secondDig > 9 then
set aNum to secondDig + 55
set secondDig to ASCII character aNum
end if
set numHex to ("%" & (firstDig as string) & (secondDig as string)) as string
set useChar to numHex
end if
set theTextEnc to theTextEnc & useChar as string
end repeat
return theTextEnc
end urlencode
If you need to get the URL as a string (not just feed it into open which does a nifty job of encoding for you) and you're not above using a little Automator, you can throw some JavaScript into your AppleScript:
encodeURIComponent is a built in JavaScript function - it is a complete solution for encoding components of URIs.
For copy/pasters, here are all three scripts in the above Automator chain:
on run {input, parameters}
return text returned of (display dialog "Enter song you wish to find" default answer "" buttons {"Search", "Cancel"} default button 1)
end run
function run(input, parameters) {
return encodeURIComponent(input);
}
on run {input, parameters}
display dialog "http://www.mp3juices.com/search/" & input buttons {"okay!"} default button 1
end run
I was hunting around for URL encoding and decoding and came across this helpful link.
Which you can use like so:
set theurl to "https://twitter.com/zackshapiro?format=json"
do shell script "php -r 'echo urlencode(\"" & theurl & "\");'"
# gives me "https%3A%2F%2Ftwitter.com%2Fzackshapiro%3Fformat%3Djson"
set theurl to "https%3A%2F%2Ftwitter.com%2Fzackshapiro%3Fformat%3Djson"
return do shell script "php -r 'echo urldecode(\"" & theurl & "\");'"
# gives me "https://twitter.com/zackshapiro?format=json"
Or as functions:
on encode(str)
do shell script "php -r 'echo urlencode(\"" & str & "\");'"
end encode
on decode(str)
do shell script "php -r 'echo urldecode(\"" & str & "\");'"
end decode
Just so it's said, AppleScriptObjC allows us to use NSString to do the encoding. The script is complicated by the fact that different parts of the URL allow different characters (all of which I've added options for) but in most cases the 'query' option will be used.
See NSCharacterSet's dev page (the section called "Getting Character Sets for URL Encoding") for descriptions of the various URL parts.
use AppleScript version "2.4" -- Yosemite 10.10 or later
use framework "Foundation"
property NSString : class "NSString"
property NSCharacterSet : class "NSCharacterSet"
-- example usage
my percentEncode:"some text" ofType:"query"
on percentEncode:someText ofType:encodeType
set unencodedString to NSString's stringWithString:someText
set allowedCharSet to my charSetForEncodeType:encodeType
set encodedString to unencodedString's stringByAddingPercentEncodingWithAllowedCharacters:allowedCharSet
return encodedString as text
end percentEncode:ofType:
on charSetForEncodeType:encodeType
if encodeType is "path" then
return NSCharacterSet's URLPathAllowedCharacterSet()
else if encodeType is "query" then
return NSCharacterSet's URLQueryAllowedCharacterSet()
else if encodeType is "fragment" then
return NSCharacterSet's URLFragmentAllowedCharacterSet()
else if encodeType is "host" then
return NSCharacterSet's URLHostAllowedCharacterSet()
else if encodeType is "user" then
return NSCharacterSet's URLUserAllowedCharacterSet()
else if encodeType is "password" then
return NSCharacterSet's URLPasswordAllowedCharacterSet()
else
return missing value
end if
end charSetForEncodeType:
The Python Approach:
Find your python3 path (which python3) or if you don't have it, install using brew or miniconda
Now try this:
python_path = /path/to/python3
set search_query to "testy test"
tell application "Google Chrome"
set win to make new window
open location "https://www.google.com/search?q=" & url_encode(q)
end tell
on url_encode(input)
return (do shell script "echo " & input & " | " & python_path & " -c \"import urllib.parse, sys; print(urllib.parse.quote(sys.stdin.read()))\"
")
end url_encode
credits to #Murphy https://stackoverflow.com/a/56321886

Ruby: Remove invisible characters after converting string to UTF-8

I am working with text coming from this website with windows-1252 charset. Converting the text to UTF-8 was done using force_encoding, but the text still contains whitespace that I can't get rid of. The whitespace cannot be removed using text.gsub!(/\s/, ' ') or a similar technique.
The iconv gem doesn't do the trick either - as explained here. It is clear that the whitespace is a remnant of the original text and the windows-1252 charset as I get a invalid multibyte char (US-ASCII) warning if I don't specify the encoding as UTF-8.
I'm not an expert of text encoding so I may be overlooking something trivial.
Update: This is the script that I currently use.
#!/bin/env ruby
# encoding: utf-8
require 'rubygems'
require 'nokogiri'
require 'open-uri'
URL = 'http://www.eximsystems.com/LaVerdad/Antiguo/Gn/Genesis.htm'
html = Nokogiri.HTML(open(URL))
# Extract Paragraphs
text = ''
html.css('p').each do |p|
text += p.text
end
# Clean Up Text
text.gsub!(/\s+/, ' ')
puts text
This is a sample of the text that contains invisible characters that I try to remove. The space before the number 16 is what I am referring to.
cobraron aliento para conversar con él.   16 Al punto corrió la voz, y
se divulgó generalmente esta noticia en el palacio del rey: Han
Without seeing your code, it's hard to know exactly what's going on for you. I'll point out, however, that String#force_encoding doesn't transcode the String; it's a way of saying, "No, really, this is UTF-8", for example. To transcode from one encoding to another, use String#encode.
This seems to work for me:
require 'net/http'
s = Net::HTTP.get('www.eximsystems.com', '/LaVerdad/Antiguo/Gn/Genesis.htm')
s.force_encoding('windows-1252')
s.encode!('utf-8')
In general, /[[:space:]]/ should capture more kinds of whitespace that /\s/ (which is equivalent to /[ \t\r\n\f]/), but it doesn't appear to be necessary in this case. I can't find any abnormal whitespace in s at this point. If you're still having problems, you'll need to post your code and a more precise description of the issue.
Update: Thanks for updating your question with your code and an example of the problem. It looks like the issue is non-breaking spaces. I think it's simplest to get rid of them at the source:
require 'nokogiri'
require 'open-uri'
URL = 'http://www.eximsystems.com/LaVerdad/Antiguo/Gn/Genesis.htm'
s = open(URL).read # Separate these three lines to convert
s.gsub!(' ', ' ') # to normal ' ' in source rather than after
html = Nokogiri.HTML(s) # conversion to unicode non-breaking space
# Extract Paragraphs
text = ''
html.css('p').each do |p|
text += p.text
end
# Clean Up Text
text.gsub!(/\s+/, ' ')
puts text
There's now just a single, normal space between the period at the end of 15 and the number 16:
15) Besó también José a todos sus hermanos, orando sobre cada uno de ellos; después de cuyas demostraciones cobraron aliento para conversar con él. 16 Al punto corrió la voz, y se divulgó generalmente esta noticia en el palacio del rey: Han venido los hermanos de José; y holgóse de ello Faraón y toda su corte.
You can try to use text.strip for removing the whitespaces.

Programmatically get a list of characters a certain .ttf font file supports

Is there a way to programmatically get a list of characters a .ttf file supports using Ruby and/or Bash. I am trying to pipe the supported character codes into a text file for later processing.
(I would prefer not to use Font Forge.)
Found a Ruby gem called ttfunk which can be found here.
After a gem install ttfunk, you can get all unicode characters by running the following script:
require 'ttfunk'
file = TTFunk::File.open("path/to/font.ttf")
cmap = file.cmap
chars = {}
unicode_chars = []
cmap.tables.each do |subtable|
next if !subtable.unicode?
chars = chars.merge( subtable.code_map )
end
unicode_chars = chars.keys.map{ |dec| dec.to_s(16) }
puts "\n -- Found #{unicode_chars.length} characters in this font \n\n"
p unicode_chars
Which will output something like:
- Found 2815 characters in this font
["20", "21", "22", "23", ... , "fef8", "fef9", "fefa", "fefb", "fefc", "fffc", "ffff"]

Resources