ImportXML and replacing quotes for enters - xpath

I'm trying to import a Google Play Store description into a Google spreadsheet, and that works fairly well with this formula:
=importXML("https://play.google.com/store/apps/details?id=com.facebook.katana", "//div[#itemprop='description']")
However, I'm running into the issue that this:
Keeping up with friends is faster than ever.<p>• See what friends are up to...</p>
Will be parsed as:
"Keeping up with friends is faster than ever.• See what friends are up to..."
Ideally I'd like to see the <p> tag replaced by a break, or at least a space. I've been trying the following formula
=importXML("https://play.google.com/store/apps/details?id=com.facebook.katana", "normalize-space(translate(//div[#itemprop='description'],'"',' '))")
but this removes every occurrence of &, q, u, o, t and ;
How can I replace these HTML tags for a break or space?

You can actually use this:
=join(char(10),IMPORTXML("https://play.google.com/store/apps/details?id=com.facebook.katana","//*[#jsname='C4s9Ed']"))
which gives you a newline for each element. Note that for the first example if you want to replace the •, you would want to sub that with a space or new line.
If you just want a space instead of a new line for either of those you can modify the char(10) to a " " instead.
here is another App page I tried it with:
=join(char(10),IMPORTXML("https://play.google.com/store/apps/details?id=com.facebook.orca","//*[#jsname='C4s9Ed']"))

Try:
=SUBSTITUTE(importXML("https://play.google.com/store/apps/details?id=com.facebook.katana", "//div[#itemprop='description']"), "•"," ")

Related

How to get rid of ISBLANK extra spaces in number app?

This is on Numbers on Mac: I generate a paragraph of text from a row where columns hold parts of text. However, as some cells are empty, I had to use conditional ISEMPTY for the formula to work:
""&B9&" "&IF(ISBLANK(C9);"";C9)&" "&IF(ISBLANK(D9);"";D9)&" "&IF(ISBLANK(E9);"";E9)&" "&IF(ISBLANK(F9);"";F9)&" "&G9&" "&H9&""
This does function, but I end up with double spaces in areas where I have one or more columns empty (so the spaces double).
Is there a way I could use an another conditional like ISEMPTY(ISEMPTY...) to get rid of those?
This is not a huge problem, but is annoying and time consuming, because I have to fix these texts afterwards (there is a lot of them). :-(
Change the parts with &" "& from IF(ISBLANK(C9);"";C9)&" "&into IF(ISBLANK(C9);"";C9&" ")&
In your formula you check if C9 is blank. Whether blank or not it's followed by a " " space. So if blank you get a space added, but no data prior to it. If the next is empty too, you get another space without data etc..
By including the &" " inside the if statement it will only add a space if C9 is not blank. Blank cell adds no data and no space.
=TRIM(CONCATENATE(B9;" ";C9;" ";D9;" ";E9;" ";F9;" ";G9;" ";H9))
Maybe you should learn about TEXTJOIN..

Power Query remove repetitive substrings

I have a column in Power Query (standalone power query with Excel), with text like this
"Hazelnut Berries Nuts Raspberry"
I need to be able to identify if there are more than 1 instance of "nut" ("berry") in it and remove generic word, to have result as
"Hazelnut Raspberry"
I have seen this post, but it works off whole words repeated.
I'm not entirely certain about your criteria for searching for the words you want to remove (PQ is fairly limited in how it can evaluate this with built in functions anyways). This will look through that string and remove any words that start with "Nut" or "Berr".
Text.Combine(List.Transform(Text.Split("Hazelnut Berries Nuts Raspberry", " "), each if (Text.StartsWith(_, "Nut") or Text.StartsWith(_, "Berr")) then null else _), " ")
Which will get your desired output. Don't know if you need more detailed criteria for evaluating each word, but that would probably need a custom function.
List.Distinct: https://learn.microsoft.com/en-ie/powerquery-m/list-distinct should do it; something like: List.Distinct(Text.Split("Hazelnut Berries Nuts Raspberry", " "))
You might need a bit more if your list could contain multiple spaces or other "stuff"

Processing form input in a Joomla component

I am creating a Joomla component and one of the pages contains a form with a text input for an email address.
When a < character is typed in the input field, that character and everything after is not showing up in the input.
I tried $_POST['field'] and JFactory::getApplication()->input->getCmd('field')
I also tried alternatives for getCmd like getVar, getString, etc. but no success.
E.g. John Doe <j.doe#mail.com> returns only John Doe.
When the < is left out, like John Doe j.doe#mail.com> the value is coming in correctly.
What can I do to also have the < character in the posted variable?
BTW. I had to use & lt; in this question to display it as I want it. This form suffers from the same problem!!
You actually need to set the filtering that you want when you grab the input. Otherwise, you will get some heavy filtering. (Typically, I will also lose # symbols.)
Replace this line:
JFactory::getApplication()->input->getCmd('field');
with this line:
JFactory::getApplication()->input->getRaw('field');
The name after the get part of the function is the filtering that you will use. Cmd strips everything but alphanumeric characters and ., -, and _. String will run through the html clean tags feature of joomla and depending on your settings will clean out <>. (That usually doesn't happen for me, but my settings are generally pretty open to the point of no filtering on super admins and such.
getRaw should definitely work, but note that there is no filtering at all, which can open security holes in your application.
The default text filter trims html from the input for your field. You should set the property
filter="raw"
in your form's manifest (xml) file, and then use getRaw() to retrieve the value. getCmd removes the non-alphanumeric characters.

Using Regex to grab multiple values from a string and drop them into an array?

Trying to grab the two $ values and the X value from this string in Ruby/watir:
16.67%: $xxx.xx down, includes the Policy Fee, and x installments of $xxx.xx
So far I've got:
16.67%:\s+\$(\d+.\d{2})
which grabs the first xxx.xx fine, what do I need to add to it to grab the last two variables and load this all into an array?
You can use the following, but regex may be unnecessary if the surrounding text is always the same:
\$(\d+.\d{2}).*?(\d+) installments.*?\$(\d+.\d{2})
http://www.rubular.com/r/sk5wO3fyZF
if you know that the text in between will always be the same you could just:
16.67%:\s+\$(\d+.\d{2}) down, includes the Policy Fee, and x installments of (\d+.\d{2})
You better use scan.
sub(/.*%/, '').scan(/\$?([\d\.]+)/)
Have you considered just splitting the string on the $ character?, then manipulating what you get with a regex or basic string commands?
/\$(\d+.\d{2}).+\$(\d+.\d{2})/ should do it. it wont matter what text is there, only that there are two "$" in the sentence.

Inserting characters before whatever is on a line, for many lines

I have been looking at regular expressions to try and do this, but the most I can do is find the start of a line with ^, but not replace it.
I can then find the first characters on a line to replace, but can not do it in such a way with keeping it intact.
Unfortunately I don´t have access to a tool like cut since I am on a windows machine...so is there any way to do what I want with just regexp?
Use notepad++. It offers a way to record an sequence of actions which then can be repeated for all lines in the file.
Did you try replacing the regular expression ^ with the text you want to put at the start of each line? Also you should use the multiline option (also called m in some regex dialects) if you want ^ to match the start of every line in your input rather than just the first.
string s = "test test\ntest2 test2";
s = Regex.Replace(s, "^", "foo", RegexOptions.Multiline);
Console.WriteLine(s);
Result:
footest test
footest2 test2
I used to program on the mainframe and got used to SPF panels. I was thrilled to find a Windows version of the same editor at Command Technology. Makes problems like this drop-dead simple. You can use expressions to exclude or include lines, then apply transforms on just the excluded or included lines and do so inside of column boundaries. You can even take the contents of one set of lines and overlay the contents of another set of lines entirely or within column boundaries which makes it very easy to generate mass assignments of values to variables and similar tasks. I use Notepad++ for most stuff but keep a copy of SPFSE around for special-purpose editing like this. It's not cheap but once you figure out how to use it, it pays for itself in time saved.

Resources