My problem is to find the titles and each of the last name of the first author of all articles.I have an assignment on xpath and this is the last question i've been banging my head for over an hour but i can't figure it out. Thanks for the help.
The first thing you need to do is find all of the articles. Now we can either do this knowing their specific position in the tree - /Publications/Proceedings/Article - or find all the Articles, wherever they are - //Article. However, this gives us too much infromation and needs to be narrowed down.
What we need is two pieces of information, the title of the article and the last name of the first Author.
Getting the title, is easier as from the Article, we can just grab the title, like so: Article/Title.
To get the Author information is slightly harder, the lastname of anygiven Author can be done like: Author/Surname, however we need one one Author.
To get the first Author of any given article, we can just use the position() function to grab the first one ( remembering that XML array indexing starts at 1): Author[position()=1]. We can shortcut this by just using the number itself - Author[1]
From here you have enough information to build the two XPaths you need. Good luck.
Related
Super new to XPath so forgive me if I stumble through terms. I'm using IMPORTXML() in a google doc in order to pull info from a webpage. Basically what I'm shooting for is to turn this
into
What I can't figure out is how to pull info between the <br> nodes and pull the string from within the <a> node.
I've fumbled my way as far as =IMPORTXML($A$1, "//p/b[starts-with(text(), '"& $A4 &"')]/following-sibling::text()[1]") to get a return of 1 for Casting Time, but not any further.
The end goal is to do this for about a dozen different values across the page and cycle the checks through about 500 web pages, hence the cells in the formula. Any help would be appreciated.
Super in depth clarification section
Using XPath and a Google Sheet I am attempting to automatically make a roll20 formatted template macro for each spell on a spell casters list.
For example, the Shaman Spell List I used //tr/td[1]/a[#href] and //tr/td[1]/a/#href to create side by side columns of spell names and their associated URL's.
Then on another page I can copy and paste the entire class spell list and use Vlookup to get the associated URL's while keeping the organized level sectioned tables like so (Note the Hyperlinked spell names are rich text so the internal URL is invisible to IMPORTXML, hence the extra step).
With a single class having upwards of 500+ spells the ultimate goal is to create a series of IMPORTXML that look at the spell URL and pull relevant data from this particular section. For this example I'm using Arcane Mark.
The final goal is to use IMPORTXML to get each important category such as School, Casting Time, Target, Effect, Area, Range, etc. Put them in their respective columns and have a Concatenate I've written go through and pull all the various parts into one big formatted string compatible with the roll20 macro template to look like &{template:default} {{Name=Arcane mark}} {{School=Universal}} {{Casting Time=1 Standard Action}} {{Components=V,S}} {{Range=Touch}} {{Effect=One personal rune or mark, all of which must fit within 1 sq. ft.}} {{Duration=Permanent}} {{Saving Throw=None}} {{Spell Resistance=No}}
=ARRAYFORMULA(REGEXEXTRACT(TRANSPOSE(QUERY(TRANSPOSE(QUERY(ARRAY_CONSTRAIN(
IMPORTDATA("http://www.d20pfsrd.com/magic/all-spells/a/arcane-mark"),1000,5),
"where Col1 contains 'School'", 0)),,999^99)), A10&"\</b>\ (.+)\;"))
Struggling to find rank values from highest to lowest, please see attached example of what I'm trying to achieve.
My current custom expression is:
Sum([ViolationAmt])
I have tried this:
Sum([ViolationAmt]) over Rank([ViolationAmt])
I've played around with the rank expressions however unable to implement...would be very grateful for some help.
Spotfire Rank Example
I need to make a lot of assumptions here because I don't know anything about your data set or really what your end goal is, so please comment back and/or provide more info in your question if I am off base.
the first assumption is that each row in your dataset represents one, for simplicity, [AccountID] with a [ViolationAmt]. I'm also guessing you want to show the top N accounts with the highest violations in a table, since that's what you've shown here.
so it sounds like you are going to need two calculated columns: one for getting the total [ViolationAmt] per account, and then another to rank them.
for the first, create a column called [TotalViolationAmt] or somesuch and use:
Sum([ViolationAmt]) OVER ([AccountID])
for the second:
Rank([TotalViolationAmt])
it will be useful to read the documentation on ranking functions if you haven't already.
you could probably combine these two into a single column with something like:
Rank(Sum([ViolationAmt]) OVER ([AccountID]))
but I haven't tested this at all. again, if you put in a bit more detail about what you're trying to accomplish it will help you get a better, more detailed answer :)
I am building a system that uses Elasticsearch to store and retrieve library catalogue data. One thing I've been asked for is a browse interface.
Here's a definition of what this is:
The user does a search, for example "Author starts with" and they
supply "Smith"
The system puts them into the middle of a list of authors, at or near
the position of the first one that starts with "Smith", so they might
see:
Smart, Murray
Smart, Murray J.
Smeaton, Duncan
Smieliauskas, Wally
Smillie, John
Smith Milway, Katie <-- this being the first actual search result
Smith, A. M. C.
Smith, Andrew
Smith, Andrew M. C.
etc.
The one with the marker is the one actually searched for, but you can see the ones around it according to the sort order, including ones that don't actually match the query.
These will be paged, so having ~20 or so results per page. If the user pages back, they head towards the start of the alphabet, if they page forwards they will go onward.
Each result shown will have a count beside it showing how many results (i.e. catalogue items) are associated with that author.
Clicking on a result takes you to everything by that author (this and everything beyond it is fairly easy and mostly implemented already.)
I'm wondering if anyone has any good ideas on how to approach this. At this stage, I don't care too much about handling searches that aren't "field starts with" searches, as exactly how that will be done is currently up in the air and I'll deal with it when the time comes.
Here's what I'm thinking, but there are serious issues with it:
All the fields that are going to be browsed are faceted
I get a list of all the facets for that field, search through it to find the starting point, and handle the paging manually in code.
This has the big problem that I might be fetching hundreds of thousands of terms and processing them, which won't be quick.
In retrospect, it's no different to loading all the values into its own index and fetching all them in sorted order.
I'm open to any options here, whether I can somehow jump into the middle of a large set of facets like the query "from" field, or if I should instead put everything into another index specifically for this purpose (though I don't know how I'd structure and query it), or something else.
From what I can see, my ideal solution would be that I can specify the facet field, tell ES that I want to start at the one that starts with "Smith", and it displays from around there, then I have the ability to say "go 20 back", but I'm not sure that this is possible.
You can see an example of the sort of thing I'm talking about in action here: http://hollisclassic.harvard.edu/ - put in Smith as "Author (last name first)", and it gives you a (terribly ugly looking) browse list.
Any thoughts?
On:
The one with the marker is the one actually searched for, but you can
see the ones around it according to the sort order, including ones
that don't actually match the query.
I had a similar requirement: "Show the user how many records we would have found if the search-conditions were more relaxed".
I solved this by doing two searches (one exact, one more relaxed), as the performance of ES is so good that doing one or two searches does not matter. The time gets eaten up in the displaying (in my case) and not in the search.
Still you would need to merge these two results in you application to generate one list to display.
I wrote a Ruby script that appended "data" to the beginning of every word of the English dictionary, and then filtered out various strings using different parameters, and now I want to use a site like namecheap or gandi.net in order to take each of these strings and insert them into the domain name availability checker in order to determine which ones are available.
It is my understanding that this will involve making a POST HTTP request of some kind, as well as grabbing the element in question, but I don't really understand the dynamics of what to read about in order to do this kind of thing.
I imagine that after a few requests I will be limited, but as a learning exercise I am still curious as to how I would go about doing this.
I inspected the element (on namecheap) to see what the tag looked like, to find any uniquely identifiable class/id names that I could use to grab that specific part of the source, and found that inside a fieldset tag, there was a line of HTML that I can't seem to paste here, so here is a picture:
Thanks in advance for any guidance in helping me learn about web scripting!
I need to develop an application that will index several texts and I need to search for people’s names inside these texts. The problem is that, while a person’s correct name is “Gregory Jackson Junior”, inside the text, the name might me written as:
- Greg Jackson Jr
- Gegory Jackson Jr
- Gregory Jackson
- Gregory J. Junior
I plan to index the texts on a nightly bases and build a database index to speed up the search. I would like recommendation for good books and/or good articles on the subject.
Thanks
Check these related questions.
Algorithm to find articles with similar text
How to search for a person's name in a text? (heuristic)
Your question is incorrectly phrased. The examples do not indicate misspelling but change in the form of writing a full name.
And,
would your search expect to match on words like son with reference to the example?
would it expect to match bob when looking for a name called Robert?
Are you looking for things like this and this?
Ok, reading your comment suggests you do not want to venture into that.
For the record. Use a Bayesian filter. You may use mechanical truck for initializing your algorithm.