Extract Text Between Two Strings Repeatedly in AppleScript - applescript

I'm one of many AppleScript beginners here, it's now going on 3am here and I've done all the possible reading I can, I still have not found my answer. Hopefully some experts can shed some light.
I'm looking to extract multiple values that are between two strings from a block of html code REPEATEDLY. (The block of html string obtained by using javascript to look for a particular id/class from a site)
After hours of searching/reading, I've found many discussing this using Applescript's Text Item Delimiters. However, so far, all of them does one and one time only.
I thought repeat statement may be my answer but doesn't seem to really apply here. (But most likely because I'm so noob)
By far this is the most commonly used method
set AppleScript's text item delimiters to startText
set text1 to text item 1 of InputString
set AppleScript's text item delimiters to endText
set text2 to text item 2 of InputString
set AppleScript's text item delimiters to {""}
Problem is, it only executes once and doesn't care if there are multiple start/end strings in the input string.
In this post Applescript to remove all text not between two strings, someone gave a simple shell script that achieved what the OP was asking for (and by far the closest to what I'm looking to do). I wish I can take that but I've no idea how to change a shell script as a noob.
Thank you so much!
EDIT:
At one of the expert's request, I'm adding sample string and expected output to demonstrate my goal.
<div class="table-1"><div class="row"><div class="table-3">Customer ID:</div><div class="table-5">1234567890</div></div><div id="title" class="row"><div class="table-3">Title:</div><div class="table-5"></div></div><div id="customer-name" class="row"><div class="table-3">Name:</div><div class="table-5"><span>FirstName LastName</span> </div></div><div id="primary-email" class="row"><div class="table-3">Primary Email:</div><div class="table-5">test_123#google.com</div></div><div id="customer-email" class="row"><div class="table-3">Account Email:</div><div class="table-5">test_abc#google.com</div></div></div>
Goal is to obtain the customer ID, name and account email.
With the method provided by wch1zpink, I was able to erase all the html strings but then it presents a greater problem as now I have all of the values I need as one long string that cannot be separated. I understand this is no easy task to tackle and I may not be approaching this in right direction at all. I greatly appreciate all of your kind help!
PS.
I thought about having the script find any text that appears between a ">" and "<". If "><" this happens, there is no value and move on. At the end it should give me the values I need plus some such as "Name:" or "Title:". Then if the output can be itemized as a list, I can then grab the item by its number. Ofc, just a noob talk, I wish I know how.
EDIT2:
Instead of extracting 3 values all at once from a long inconsistent block of string, I've decided to utilize different methods to extract each values individually and tentatively achieved my goal. The erase method provided by wch1zpink is proven to be very helpful. Once again thank you all for chipping in!
PSS.
I welcome any future additional comments/feedback/suggestions! :D

This AppleScript code works for me using the latest version of macOS Mojave.
-- Define Source Text Here
set fullTextString to "<p>I thought repeat statement</p> <p>After hours of searching/reading</p>"
-- Define As Many Strings As You Want Removed Here
set removeFromFullTextString to {"<p>", "</p>"}
set cleanedText to stripOuterTextTID(fullTextString, removeFromFullTextString)
on stripOuterTextTID(fullTextString, removeFromFullTextString)
set originalText to fullTextString
set AppleScript's text item delimiters to removeFromFullTextString
set tempText to text items of originalText
set text item delimiters to ""
set cleanedText to tempText as text
end stripOuterTextTID

Related

How to find xpath of an element under a heading

in a Web page :
<h3 class="xh-highlight">Units Currently On Bed List</h3>
"[total beds=0]
"
i want to find xpath of total beds=0.
how can i do?
Your question and your comment are a bit contradictory. Do you want to find the text after a heading or do you want to find the element containing the text [total beds=0]? Also, how exact do you want to navigate your document?
To find a text after any h3 element you can use this: //h3/following-sibling::text()[1] (see XPath - select text after certain node).
To find a text after an h3 element with the class "xs-highlight" you can use this: //h3[#class='xh-highlight']/following-sibling::text()[1]
To be even more precise you can also look for the heading text: //h3[#class='xh-highlight' and text()='Units Currently On Bed List']/following-sibling::text()[1]
This doesn't match the html in your first comment however, so you might want to adjust the header class and text values. Also, it will find any first text even if there are other elements between it and the h3 element.
Now, your second comment makes it seem you actually want to find the element containing the text. The reason //*[text()='[total beds=0]'] doesn't work is because of the newline in the text. If you can get rid of that in the source it should match, otherwise you can "ignore" it in the xpath by using //*[normalize-space(text())='[total beds=0]']. (This is assuming the quotes around the text in your question aren't actually in the document.)

Selecting text from multiple paragraphs using xpath

I have a situation where my end points and mid points can vary.
I always have:
<p style="margin-top: 0px;" >
and
<p class="contactAdvisor">
in between, I will have varying items including <b>, <i>, <strong>, <br> headings 1,2 or 3. I might also have one or more <p> in between the two fixed items.
What I'm trying to get is all of the text in between these two elements no matter whether wrapped in headings, various stylings or inside sub paragraph elements.
I've messed around with contains and preceding/following-sibling but my best attempt has been to create based on pre/follow for each use case. And even that leaves me with some issues because if there are multiple <p> inside and I'm trying to select all of them, I only get one.
Depending on the hierarchy, you can use either preceding:: or preceding-sibling::.
Try selectong something along the lines of:
//*[preceding::p[#style="margin-top: 0px;"] and not (preceding::p[#class="contactAdvisor"])]
This should exclude everything before the first p with the first condition and everything after the second with the second. Untested so you may have to tweak the check a little.
//p[#style="margin-top: 0px;"]/following::*[following::p[#class="contactAdvisor"]]
//*[preceding::p[#style="margin-top: 0px;"] and following::p[#class="contactAdvisor"]]

CKEDITOR How to find and wrap text in span

I am writing a CKEDITOR plugin that needs to wrap certain pieces of text in a tag. From a webservice, I have an array of items that need to be wrapped. The array is just the plain text strings. Such as:
"[best buy", "horrible migraine", "eat cake"]
I need to find the instances of this text in the editor and wrap them in a span tag.
This is further complicated because the text may be marked up. So the HTML for "best buy" might be
"<strong>best</strong> buy"
but the text returned from the web service is stripped of any markup.
I started trying to use a CKEDITOR.htmlParser() object, and that seems like it is moderately successful. I am able to catch the parser.onText event and check if the text contains anything in my array.
But then I cannot modify that text. Modifications are not persisted back to the source html. So I think using the htmlParser() is a dead-end.
What is the best way to accomplish this task?
Oh, and as a bonus, I also do not want to lose my user's current cursor position when the changes are displayed.
Here is what I wound up doing and it seems to be working so far.
I created a text filter rule that searches through my array of items for any item that is contained (or partially contained) in the text. If so, it wraps the element in my span.
A drawback here is that I wind up with two spans for items with markup. But in my usecase, this is tolerable.
Then I set the results using:
editor.document.getBody().setHtml(results);
Because of this, I also have to strip this markup back out when this text gets read. I do this using an elements filter on editor.dataProcessor.htmlFilter.
This seems to be working well for my (so far limited) test cases.

How can I stop Joomla from stripping HTML code from the Contact info?

I've only spent maybe 30 mins searching online for this, and couldn't come up with a decent answer.
But anyway, in Joomla there are normal input fields for the Contacts component, but there's a textarea for the Address.
This would make me assume you can enter multiple lines of address in there, and it would be displayed as separate lines... but it doesn't. Even if I enter line breaks, the output is rendered on one line.
So I try to enter <br> to separate, and upon saving, Joomla strips these tags out.
In the template, the output is being written simply by echoing $this->contact->address
Is there anyway, to explode this input and replace linebreaks with <br> marks?
UPDATE:
For now as a temporary measure I'm able to add HTML code into the database values, which saves and outputs on the front end.
On a separate note, I'm now looking to remove the Subject line from the contact form, without hacking the code. and by using overrides as much as possible. Can anyone help?
Have you tried the Sourcerer extension?
Your question is pretty old, but did you get a solution to this Lee?
To create line-breaks in Joomla, titles, text areas etc. Easiest way to do this is to use the ReReplace extension from NoNumber: http://extensions.joomla.org/extensions/edition/replace/4336
I personally use this to add line break in e.x. menu-item titles, where < br / > aren't allowed and get stripped.
With ReReplacer, you can create a custom tag e.x. {br} and then have ReReplacer replace {br} with < br / >.
So everytime you need to add a line break anywhere in Joomla, where html codes usually get stripped, you can just add {br} to have it add a new line.
Very old question but I've fallen into the same issue and tried to find a more user friendly solution.
You can enter multiple lines in the address textarea, and they are correctly outputted to the HTML page source. But as you know, newlines in HTML are not rendered, they have to be transformed to <br>.
For this PHP has a nice function, nl2br, that inserts a <br> each time it encounters a newline in a string.
So in html\com_contact\contact\default_address.php of your template, replace:
echo $this->contact->address;
with
echo nl2br($this->contact->address);
This would nicely do the job, and allow the user to naturally insert any newline in the contact address textarea that will be correctly rendered with the appropriate <br>; I believe this is quite more user friendly solution than your previous one of the user having to insert -br- tags in the address field.

How can I query XPath matching an attribute from one tag against another attribute from another tag?

I'm scripting a web app that has labels for input fields. EG:
<label for="surname">Family Name:</label>...<input id="surname" ...></input>
I'd like to be able to write an XPath expression that takes the text "Family Name:" to match the label, then takes the "for" attribute from the label and uses that to find the associated input tag by matching its "id" tag.
I can make this "work" as follows:
//label[contains(.,"Family Name:")]/following::input[1]
However, for this web app I think it would be more reliable to match for/id. (EG: How can I be sure that in all possible layouts the first input tag following the label is the one I want? And what happens if somewhere down the line this page is rendered using a right-to-left script?
My ultimate aim is to create a library function that we can use to write QA scripts in advance of web pages to test against, when all we have is a picture or document with an input field labelled "Family Name:" and no idea what id some programmer will ultimately assign to the field.
Maybe this is what you're looking for?
//input[#id = //label[contains(., "Family Name:")]/#for]
As for your ultimate goal, you may want to take a look at the XPath gem for Ruby. Some of what you're doing may already be implemented there. (Specifically, check out the library's HTML Helpers)

Resources