extract string between two strings from text document using AppleScript - macos

I am very new to writing code. I've been looking at every way I can find of finding a string in a text document and then returning part of the string on the following line. Ideally with the end goal of putting this extracted string into an excel file but I'm no where near that step yet. I've been playing around with a lot of different options and I can not for the life of me get it to work. I feel like I'm close and it's killing me because I just can't figure out where I'm going wrong here.
Goal: to extract the name of the person who posted the job from the text below without knowing the person's name. I know the string "Job posted by" will immediately preseed the name I'm looking for and I know " · " will immediately follow the name. no where else in the text document do either of these surround strings appear.
I'm running OS X El Capitan
file name for this example is ExtractedTextOutput.txt
file location for this example is "/Users/RaquelBianca/Desktop/ExtractTextOutput2.txt"
my attempts at this so far are the following (my issue is that it appears to simply return the entire text document as opposed to just the name I'm looking for)
set theFile to ("/Users/RaquelBianca/Desktop/ExtractTextOutput2.txt")
set theFileContents to read theFile
set output to {}
set od to AppleScript's text item delimiters
set AppleScript's text item delimiters to {"
"}
set all_lines to every text item of theFileContents
repeat with the_line in all_lines
if "Job posted by" is not in the_line then
set output to output & the_line
else
set AppleScript's text item delimiters to {"Job posted by"}
set latter_part to last text item of the_line
set AppleScript's text item delimiters to {" "}
set last_word to last text item of latter_part
set output to output & ("$ " & last_word as string)
end if
end repeat
set AppleScript's text item delimiters to {"
"}
set output to output as string
set AppleScript's text item delimiters to od
return output
any and all help and ideas is enormously appreciated.
sample text in the file:
9/2/2016 Application Security Engineer Job at Datadog in Greater New York City Area | LinkedIn
60
Home Profile
Job description
My Network Jobs
 Search for people, jobs, companies, and more... Interests
 Advanced
 
Business Services

Go to Lynda.c
Application Security Engineer
Datadog
Greater New York City Area
Posted 15 days ago 93 views
1 alum works here
Apply on company website
We’re on a mission to bring sanity to cloud operations and we need you to build resilient and secure applications on our platform. What you will do
Perform code and design reviews, contribute code that improves security throughout Datadog's products Educate your fellow engineers about security in code and infrastructure
Monitor production applications for anomalous activity
Prioritize and track application security issues across the company
Help improve our security policies and processes
Job posted by
Ryan Elberg · 2nd
Head of Tech Talent Acquisition at Datadog Greater New York City Area
Send Inmail

I just had some difficulties to determine what is exactly your second separator. you text example shows '·', but when I checked what is just after 'Elberg" and before '2nd...', I found 4 characters : code 32 (space), code 194 (¬), code 183 (∑), code 32 (space).
In the script bellow, I have used the code 194. it works when I cut/paste your text example into a file. Here is the script :
set theFile to ("/Users/RaquelBianca/Desktop/ExtractTextOutput2.txt")
-- your separator seems to be code 32 (space), code 194 (¬), code 183 (∑), code 32 (space)
set Separator to ASCII character 194 -- is it correct ?
set theFileContents to read theFile
set myAuthor to ""
set AppleScript's text item delimiters to {"Job posted by "}
if (count of text item of theFileContents) is 2 then
set Part2 to text item 2 of theFileContents -- this part starts just after "Job posted by "
set AppleScript's text item delimiters to {Separator}
set myAuthor to text item 1 of Part2
end if
log "result=//" & myAuthor & "//" -- show the result in variable myAuthor
Note : if the text does not contain "Job posted by ", then myAuthor is ''.

You had the right idea to use AppleScript's text item delimiters, but the way you tried to extract the name was giving you trouble. First, though, I'll go through some things you can do to improve your script:
set all_lines to every text item of theFileContents
repeat with the_line in all_lines
if "Job posted by" is not in the_line then
set output to output & the_line
else
…
end repeat
There's no need to break the file contents into lines; AppleScript can operate on entire paragraphs or more, if desired.
Removing these unnecessary steps (and adding new ones to make it work on the entire file) shrinks the script considerably:
set theFile to ("/Users/RaquelBianca/Desktop/ExtractTextOutput2.txt")
set theFileContents to read theFile
set output to {}
set od to AppleScript's text item delimiters
if "Job posted by" is in theFileContents
set AppleScript's text item delimiters to {"Job posted by"}
set latter_part to last text item of theFileContents
set AppleScript's text item delimiters to {" "}
set last_word to last text item of latter_part
set output to output & ("$ " & last_word as string)
else
display alert "Poster of job listing not found"
set output to theFileContents
end if
set AppleScript's text item delimiters to od
return output
This right here is what's giving you wrong output:
set last_word to last text item of latter_part
set output to output & ("$ " & last_word as string)
This is incorrect. It's not the last word you want; that's the last word of the file! To extract the poster of the job listing, change it to the following:
repeat with theWord in latterPart
if the first character in theWord is "¬" then exit repeat
set output to output & theWord
end repeat
Due to AppleScript's weird Unicode handling, for whatever reason the dot (·) that separates the name from the other text is converted to "¬∑" when run though the script. So, we look for "¬" instead.
Some last code fixes:
Some of your variable names use the_snake_case, while others use theCamelCase. It's generally a good idea to use one convention or another, so I fixed that, too.
I assumed you wanted that dollar sign in the output for whatever reason, so I kept it in. If you don't want it, just replace set output to "$ " with set output to "".
So, your final, working script looks like this:
set theFile to "/Users/RaquelBianca/Desktop/ExtractTextOutput2.txt"
set theFileContents to read theFile as text
set output to "$ "
set od to AppleScript's text item delimiters
if "Job posted by" is in theFileContents then
set AppleScript's text item delimiters to {"Job posted by"}
set latterPart to last text item of theFileContents
set AppleScript's text item delimiters to {" "}
repeat with theWord in latterPart
if the first character in theWord is "¬" then exit repeat
set output to output & theWord
end repeat
else
display alert "Poster of job listing not found"
set output to theFileContents
end if
set AppleScript's text item delimiters to od
return output

Related

Extract Text Between Two Strings Repeatedly in AppleScript

I'm one of many AppleScript beginners here, it's now going on 3am here and I've done all the possible reading I can, I still have not found my answer. Hopefully some experts can shed some light.
I'm looking to extract multiple values that are between two strings from a block of html code REPEATEDLY. (The block of html string obtained by using javascript to look for a particular id/class from a site)
After hours of searching/reading, I've found many discussing this using Applescript's Text Item Delimiters. However, so far, all of them does one and one time only.
I thought repeat statement may be my answer but doesn't seem to really apply here. (But most likely because I'm so noob)
By far this is the most commonly used method
set AppleScript's text item delimiters to startText
set text1 to text item 1 of InputString
set AppleScript's text item delimiters to endText
set text2 to text item 2 of InputString
set AppleScript's text item delimiters to {""}
Problem is, it only executes once and doesn't care if there are multiple start/end strings in the input string.
In this post Applescript to remove all text not between two strings, someone gave a simple shell script that achieved what the OP was asking for (and by far the closest to what I'm looking to do). I wish I can take that but I've no idea how to change a shell script as a noob.
Thank you so much!
EDIT:
At one of the expert's request, I'm adding sample string and expected output to demonstrate my goal.
<div class="table-1"><div class="row"><div class="table-3">Customer ID:</div><div class="table-5">1234567890</div></div><div id="title" class="row"><div class="table-3">Title:</div><div class="table-5"></div></div><div id="customer-name" class="row"><div class="table-3">Name:</div><div class="table-5"><span>FirstName LastName</span> </div></div><div id="primary-email" class="row"><div class="table-3">Primary Email:</div><div class="table-5">test_123#google.com</div></div><div id="customer-email" class="row"><div class="table-3">Account Email:</div><div class="table-5">test_abc#google.com</div></div></div>
Goal is to obtain the customer ID, name and account email.
With the method provided by wch1zpink, I was able to erase all the html strings but then it presents a greater problem as now I have all of the values I need as one long string that cannot be separated. I understand this is no easy task to tackle and I may not be approaching this in right direction at all. I greatly appreciate all of your kind help!
PS.
I thought about having the script find any text that appears between a ">" and "<". If "><" this happens, there is no value and move on. At the end it should give me the values I need plus some such as "Name:" or "Title:". Then if the output can be itemized as a list, I can then grab the item by its number. Ofc, just a noob talk, I wish I know how.
EDIT2:
Instead of extracting 3 values all at once from a long inconsistent block of string, I've decided to utilize different methods to extract each values individually and tentatively achieved my goal. The erase method provided by wch1zpink is proven to be very helpful. Once again thank you all for chipping in!
PSS.
I welcome any future additional comments/feedback/suggestions! :D
This AppleScript code works for me using the latest version of macOS Mojave.
-- Define Source Text Here
set fullTextString to "<p>I thought repeat statement</p> <p>After hours of searching/reading</p>"
-- Define As Many Strings As You Want Removed Here
set removeFromFullTextString to {"<p>", "</p>"}
set cleanedText to stripOuterTextTID(fullTextString, removeFromFullTextString)
on stripOuterTextTID(fullTextString, removeFromFullTextString)
set originalText to fullTextString
set AppleScript's text item delimiters to removeFromFullTextString
set tempText to text items of originalText
set text item delimiters to ""
set cleanedText to tempText as text
end stripOuterTextTID

Use paragraph-separated list to remove songs from iTunes playlist

I can get songs from a playlist based on a text file with a different title in each paragraph, but can I use another text file to delete any songs in playlist 'SongList' whose title matches one of the titles in my 'SongList' text file?
Know how to read the text file
set mySampleText to read file "Macintosh SSD:path:to:file.txt"
And how to get paras of it
set paras to paragraphs of mySampleText
But I can't find a way to delete any tracks who share a name with any line of mySampleText.
Can anyone help?
Thanks
Tardy
The script below find tracks with same title as in your text file and delete.
Be careful, there is no reverse to the delete action !!!
set FText to choose file "select your text file" -- select text file
set mySampleText to read FText -- read txt file
set Paras to paragraphs of mySampleText
tell application "iTunes"
tell playlist "My preferred playlist"
repeat with aSong in Paras -- loop to each title
set myTracks to (every track whose name is aSong)
if (count of myTracks) > 0 then -- only if found
set myTrack to item 1 of myTracks -- take first item found
Delete myTrack
end if
end repeat
end tell
end tell
You may have to manage the case when you have multiple tracks with the same title. Script above only takes the 1st track found.
I strongly reccomand that you replace the "delete" by "play" during debuting !! (to be safe).
Last, but not least, make sure your txt file is properly encoded if you have special characters in your titles.

AppleScript loop to ask (3) questions, select multiple finder files, combine results into a text file

I am looking for some help getting an apple script setup. I have been trying to copy and past from different examples on the web to no avail. I am setting up a journal / diary for a family member and need to have a text file that contains the following information.
The AppleScript will display a dialogue box asking for three things:
The name of an event
The date of the event
A description for the event
Each of those would be stored as a separate variable.
Then the script would ask for a selection of files from the Finder, nothing nested, just a selection of 15 - 30 files all contained in the same folder.
Finally a new TextEdit document would be created
The beginning of the document would have the (3) variables mixed in with some default text.
The middle of the file would be filled in with a repeat loop based on the number of files selected from the finder. Their file paths would be mixed in with additional default text.
The last section would be default text only, no variables required.
I am sure my description is way more complicated than the script will probably be. Would anyone be able to provide this script for me? It would be most appreciated.
Here is a rough idea what the final thing would look like. The bold areas are the variables.
The activity of the day was scuba diving.
The date you went scuba diving was January 1, 2016.
This is a description of your event. The day was quite beautiful and the water was perfect. You were able to see a wide variety of fishes!
These are the locations of the files from this event.
The first file is /events/scuba/scuba1.txt
These are the locations of the files from this event.
The first file is /events/scuba/scuba2.txt
These are the locations of the files from this event.
The first file is /events/scuba/scuba3.txt
This was a summary of your scuba diving activity. These memories will last a lifetime!
I appreciate the help with this. And if the family member in question was able to provide a thanks, know that they would as well.
You can do that like this:
set evName to text returned of (display dialog "The name of an event" default answer "")
set evdate to text returned of (display dialog "The date of the event" default answer "")
set evDesc to text returned of (display dialog "A description for the event" default answer "")
set theText to "The activity of the day was " & evName & return & "The date you went " & evName & " was " & evdate & return & evDesc & return & return
set x to choose file with multiple selections allowed
set def1 to "These are the locations of the files from this event."
set def2 to "The first file is "
repeat with i in x
set theText to theText & def1 & return & def2 & (POSIX path of i) & return
end repeat
set theText to theText & return & "This was a summary of your " & evName & " activity. These memories will last a lifetime!"
tell application "TextEdit"
make new document with properties {text:theText}
activate
end tell
May I suggest an alternative solution using Evernote?
You could create a "template" note using a table to fill in the activity, date, and description. Any time you want a new journal entry, just select the template, and go to Note > Copy to Notebook.
Then you can attach and/or import the text of the files.
This would also allow you to add images and other attachments, and search much easier. And of course it is easy to share.
Let me know if you'd like more details.
Screenshot of example:

How can I get a URL from a selected hyperlink using Applescript?

I’m attempting to create an Applescript that will grab the URL from a selected hyperlink.
For some backstory: the system that my company has in place doesn’t play well with generating reports, so I created a script into which I can paste a list of URLs, at which point Safari will go through each page and select all the data, copy it, and parse out what I need.
However, each page that I’m parsing has a link on it that says, for example, “Edit”. If I post it into, say, Pages, the hyperlink is preserved. It would GREATLY speed up my flow if I could somehow get the URL contained in that hyperlink.
Any ideas?
Drew, I suspect you got no answer because it's a little difficult to discern what you're wanting. But, here is a script that will grab the raw text of a web page, and then find the first href hyperlink that is named "Edit", and then return the target URL that it's linking to. It uses CURL to pull the content and offset to find the link name. You might have to adjust the tag identifiers surrounding the Link Name you're searching for.
property baseURL : "http://www.mycoolsite.index.html"
property linkName : "Here"
set rawHTML to do shell script "curl '" & baseURL & "'"
set theOffset to offset of ("\">" & linkName & "</a>") in rawHTML
set rawHTML to text 1 thru (theOffset - 1) of rawHTML
set otid to AppleScript's text item delimiters
set AppleScript's text item delimiters to "http://"
set targetURL to (text item -1 of (text items of rawHTML))
set AppleScript's text item delimiters to otid
return targetURL

Using rb-appscript to write a bulleted/numbered list in pages or textedit

I need to use rb-appscript to create a new Pages document that contains bulleted and numbered lists. Looking in to this, I see that paragraphs have a property called list_style, but I'm not familiar enough with rb-appscript or applescript to figure out how to set that property. I have read the documentation generated by the ASDictionary, but my knowledge of AppleScript is apparently too little to understand it.
Any help with either understanding how to use the information presented in the documentation, or writing a list using rb-appscript in pages would be much appreciated.
Edit: I'm not stuck on pages, textedit is also a viable option.
rb-appscript:
require 'rubygems'
require 'appscript'; include Appscript
lst=["a", "b"]
doc = app('Pages').documents[0]
doc.selection.get.paragraph_style.set("Body Bullet")
doc.selection.set(lst.join("\n"))
AppleScript:
set lst to {"a", "b"}
set text item delimiters to linefeed
tell application "Pages" to tell document 1
set paragraph style of (get selection) to "Body Bullet"
set selection to (lst as text)
end tell
The current crop of Apple applications are weird to script. I don't use rb-appscript, but here is working code for Applescript that you should be able to alter to taste and port:
property dummyList : {"Tyler Durden", "Marla Singer", "Robert Paulson"}
tell application "Pages"
set theDocument to make new document
tell theDocument
set bulletListStyle to ""
set lastListStyle to (count list styles)
repeat with thisListStyle from 1 to lastListStyle
set theListStyle to item thisListStyle of list styles
if name of theListStyle is "Bullet" then
set bulletListStyle to theListStyle
end if
end repeat
repeat with thisItem from 1 to (count dummyList)
set body text to body text & item thisItem of dummyList & return
end repeat
set paraCount to count paragraphs of theDocument
repeat with thisPara from 1 to paraCount
select paragraph thisPara
set theSelection to selection
set paragraph style of theSelection to "Body Bullet"
end repeat
end tell
end tell
What this does, essentially, is place each list item in its own paragraph (that is what a list item is for all intents and purposes: an indented paragraph with a bullet), select each paragrah in turn, then apply the list paragraph style to the selection. The paragraph object just returns the text of the given paragraph and does not hold any state in and of itself, for some reason. This isn't the best way to handle this scenario, but at least all the components are there to get you what you need.

Resources