plain text URL to HTML code (Automator/AppleScript) - applescript

Suppose I have a plain txt file in a text editor such as TextEdit:
title 1
http://a.b/c
title 2
http://d.e/f
...
I'd like to convert all the lines beginning with http:// to HTML code for URL, so that the aforementioned content will become:
title 1
http://a.b/c
title 2
http://d.e/f
...
How can I get this done in Automator or AppleScript? (My current solution is using Gmail, but it involves multi-step copy-paste.)
Thank you very much in advance.

This will let you avoid another editor:
set inFile to "/Users/you/Desktop/Urls.txt"
set outFile to "/Users/you/Desktop/Urls2.txt"
do shell script "sed 's/\\(http[^ ]*\\)/<a href=\"\\1\">\\1<\\/a>/g' " & quoted form of inFile & " >" & quoted form of outFile

Just do a regex search and replace in a text editor or Terminal:
sed -E 's|^(http:.*)|\1|g' file.txt

Related

Sed "invalid command code <", when using sed within Applescript

I am trying to use Sed to do string replacement within my Applescript.
I am using the following command:
set selected_text to do shell script "echo " & "\"" & selected_text & "\"" & " | sed /'s/<\\(.*\\)>/\\1/'"
and I am being met with the following error:
The action “Run AppleScript” encountered an error: “sed: 1: "/s/<\(.*\)>/\1/": invalid command code <”
Expected input:
FirstName LastName <FirstName.LastName#email.com>
Expected output:
FirstName.LastName#email.com
Does anyone have any suggestions?
the better (easier) way is to use quoted form of with using do shell script. AppleScript escapes the text for you.
The slash in front of the regex was wrong, without it the error disappeared, but there was still an error in your regex. With your sed you want to replace things in your input, so you have to use an input regex matching your whole input. The parentheses then defines the part you want to keep. The first part is to find all items until <, the second is the email address you want to keep inside parentheses and finally the closing >.
set selected_text to do shell script "echo " & quoted form of selected_text & " | sed 's/[^<]*<\\(.*\\)>/\\1/'"
Best, Michael / Hamburg

Applescript to remove all text not between two strings

I'm trying to make a script that displays a dialog of the current iTunes Top 20. How I intend to do this is get the html code from the top 100 website and then extract for text between two strings to get the name of the song. For the first song, this is extremely successful.
However, it only works for the first song each time. The only way I can think of to fix this is rather then get everything between the two strings, I could delete everything not between them. This would hopefully then give me all the song names as a string.
Does anyone know how to do this?
Get the HTML:
set curlcommand to "curl https://www.apple.com/itunes/charts/songs/"
set html to (do shell script curlcommand)
Get the song name:
set AppleScript's text item delimiters to "width=\"100\" height=\"100\" alt=\""
set theText to item 2 of every text item of html
set AppleScript's text item delimiters to "\"></a>"
set theText to item 1 of every text item of theText
set AppleScript's text item delimiters to ""
You could use the shell to get a list with all song names line by line (this is not a AppleScript list).
set charts to (do shell script "
curl -s 'https://www.apple.com/itunes/charts/songs/'| tr '\"' '\n' |awk '/^ alt=$/ {getline;print}'")
tr '\"' '\n' substitutes the double quotes with new lines (tr manual page)
awk '/^ alt=$/ {getline;print}' prints the line after the line alt= where the song name is written (awk manual page)

How to select line in AppleScript

I'm trying to figure out how to use Text Item Delimiters on a long line of text that is in a log file.
Within the log of information there is always a constant phrase that i'm searching for which leads me to the line of text. I'm getting to the line I want by searching for "[Constant]", for example.
The problem I'm having is that I can't select the whole line to perform a Delimiter. Below is a very basic example of what the log looks like.
qwertyuiop
mnbvcxza
oqeryuiiop
[Constant] 1234567890123456-098765432109876-8765432118976543
odgnsgnsanfadf
joiergjdfmgadfs
Any advice would be appreciated.
So far I'm using:
repeat 16 times
key code 124 using (shift down)
end repeat
Which does the job fine but it is clunky.
An easy way to find a line of text containing a specific string is the shell command grep.
set theConstant to "Constant"
set theText to "qwertyuiop
mnbvcxza
oqeryuiiop
Constant 1234567890123456-098765432109876-8765432118976543
odgnsgnsanfadf
joiergjdfmgadfs"
set foundLine to do shell script "echo " & quoted form of theText & " | tr '\\r' '\\n' | grep " & quoted form of theConstant
the tr part to replace return (0x0d) characters with linefeed (0x0a) characters is necessary to conform to the shell line separator requirements.
If the constant contains special characters it's a bit more complicated, because you have to escape the characters before passing them to the shell.
set theConstant to "\\[Constant\\]"
set theText to "qwertyuiop
mnbvcxza
oqeryuiiop
[Constant] 1234567890123456-098765432109876-8765432118976543
odgnsgnsanfadf
joiergjdfmgadfs"
set foundLine to do shell script "echo " & quoted form of theText & " | tr '\\r' '\\n' | grep " & quoted form of theConstant
If you want to read the text from a file on disk you can use this
set logFile to (path to library folder from user domain as text) & "Logs:myLogFile.log"
set theText to read file logFile as «class utf8»
Your question is puzzling. Do you want to parse a text/log file or a script to work with the GUI of some app? Because that is what your code suggests...
If you want to parse a log file, which is easier, you can use the good old Unix tools OSX comes with. You can use them from inside Applescript like this...
set logfile to "/some/path/file.log"
# Quote the string in case it contains spaces... or add single quotes above...
set qlogfile to quoted form of logfile
# Prepare the shell command to run
set cmd to "grep '^\\[Constant]' " & qlogfile & " | cut -c 12- | tr '-' '\\n'"
# Run it and capture the output
try
set cmdoutput to (do shell script cmd)
on error
# Oh no, command errored. Best we do something there
end try
The result looks like this...
tell current application
do shell script "grep '^\\[Constant]' '/some/path/file.log' | cut -c 12- | tr '-' '\\n'"
--> "1234567890123456
098765432109876
8765432118976543"
end tell
Result:
"1234567890123456
098765432109876
8765432118976543"
So to break it down the shell commands are,
grep ... | will read the contents of the file and select all lines that start ^ with the text [Constant] and pass what it finds | on to the next command
cut cuts out the characters from position 12 until the end - of the line
tr replaced any character - with \n which is the code for newline in unix.
The \\ you see are due to having it executed from inside Applescript. You only need on if you run it inside Terminal.
If you care to know the contents of one line from the other, then remove the last command | tr '-' '\\n' and it will return
Result:
"1234567890123456-098765432109876-8765432118976543"

Unix grep command outputs garbage

I´m executing the following command "grep bruno < bash.txt " which gives me the right output "bruno" and garbage "\f0\fs24 \cf0".
I´m on the command shell on a Mac OS X v10.6.8 and i´m pretty sure i should be getting the line of the found word and the word. Not garbage.
This is the Output:
Mobile-Devs-MacBook-Pro:Screenshots Poupe mdev$ grep bruno < bash.txt
\f0\fs24 \cf0 bruno\
In bash.txt i only have written "bruno", if i output with "cat bash.txt" it also gives me the following garbage:
Mobile-Devs-MacBook-Pro:Screenshots Poupe mdev$ cat bash.txt
{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf360
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
\paperw11900\paperh16840\margl1440\margr1440\vieww9000\viewh8400\viewkind0
\pard\tx566\tx1133\tx1700\tx2267\tx2834\tx3401\tx3968\tx4535\tx5102\tx5669\tx6236\tx6803\ql\qnatural\pardirnatural
\f0\fs24 \cf0 bruno\
If i make "echo bruno > bash.txt" and then "cat bash.txt" it gives me a clean output. Why am i not seeing a clean output when i write the file by hand?
Your file isn't a plain text file. It is RTF. grep is giving you the line containing "bruno", along with the rich text formatting.
When you do:
echo bruno > bash.txt
bash.txt contains only "bruno".
When you "edit the file by hand", your editor is saving as RTF. You need to save as plain text.
That isn't a plain text file. That looks like an RTF. Grep only understands text and its job is to output the entire line where the search text is found.
I cannot tell from your formatting, but I have to believe the "garbage" you are seeing is on the same line as the "bruno" text.
As others have pointed out, the problem is that the file is in RTF format, and contains formatting information. If you want to create a plain text file in TextEdit, use the menu option Format > Make Plain Text before saving it. Better yet, don't use TextEdit at all -- my favorite for plain text editing is TextWrangler, but there are plenty of other options.

Script to convert lower case characters into upper case is working differently as service action

I am trying a simple script as a service action in automator which performs this function:
Receives selected text in any application and replaces selected text
with the text containing capital letters
So I used this script:
on run {input, parameters}
set upperCaseString to ""
repeat with i in input
if (ASCII number i) > 96 and (ASCII number i) < 123 then
set upperCaseString to upperCaseString & (ASCII character ((ASCII number i) - 32))
else
set upperCaseString to upperCaseString & (ASCII character (ASCII number i))
end if
end repeat
return upperCaseString
end run
But I found this problem:
It was returning first letter of input as an upper case letter, eg.
input - lowercasetext, output - L, whereas the expected output was -
LOWERCASETEXT.
To check the problem I added this line of code in repeat loop:
display dialog i
and found that it is displaying complete text in place of single character at a time ,ie. in place of displaying l.. o.. w.. in lowercasetext it is displaying lowercasetext at once.
Can anyone suggest me why is it bugging me as service action while it is working fine in Apple Script Editor?
This works for a lot of languages:
on toUpper(s)
tell AppleScript to return do shell script "shopt -u xpg_echo; export LANG='" & user locale of (system info) & ".UTF-8'; echo " & quoted form of s & " | tr [:lower:] [:upper:]"
end toUpper
on toLower(s)
tell AppleScript to return do shell script "shopt -u xpg_echo; export LANG='" & user locale of (system info) & ".UTF-8'; echo " & quoted form of s & " | tr [:upper:] [:lower:]"
end toLower
When I run your script, I get the correct result. But one thing you may want to do is to explicitly coerce your result to text. The easiest way to do that would be at the end:
return upperCaseString as text
That may or may not do it for you, but you'll avoid a lot of frustration if you explicitly coerce data when there is a possibility of ambiguity.
Another (faster) way is to leverage the Unix tr (translate) command the via do shell script:
set upperCaseString to ¬
(do shell script ("echo " & input & " | tr a-z A-Z;"))
That's enough for 'English' language, but you can also add diacritical translation, like so
set upperCaseString to ¬
(do shell script ("echo " & input & " | tr a-zäáà A-ZÄÁÀ;"))
tr will translate anything to anything, so you can add any characters you may encounter and what you'd like them to translate to. A 'leet-speak' translator comes to mind.
You will get the same result in the AppleScript Editor if the input variable is set to a list. The input parameter of an Automator action is also a list, so your comparison isn't doing what you think. Note that text id's have obsoleted ASCII character and ASCII number commands - see the 10.5 AppleScript Release notes.
#Matt Strange:
You could also try:
set upperCaseString to ¬
do shell script "echo " & input & " | tr [:lower:] [:upper:]"
If you run 'man tr' on 'OS X 10.10' you may see that the character classes [:lower:] and [:upper:] should be used instead of explicit character ranges like 'a-z' or 'A-Z', since these may not produce correct results as it is explained there, on the manual page.

Resources