I would like to fetch part of web page with yql. I have tried several queries. Most of the queries can return the correct result except one query.
Here is the query:
select * from html where url="http://www.cngold.org/img_date/livesilvercn_rmb.html" and xpath='//div[6]/div[2]/div/div[2]/table/tbody/tr[4]/td[6]'
I hope to get the price but actually get the empty result.
If I retrieve the whole page with yql and check the xpath of that element, this time the xpath is
//div[3]/div/div[2]/a/div/div[2]/table/tbody/tr[4]/td[6]
Why there are so many differences?
How should I handle the situation?
Thanks in advance.
YQL cannot get values that are computed dynamically. In that case, you are better off using phantom.js.
This answer https://stackoverflow.com/a/7978072/1337392 provides several tools with which you can do HTML scrapping.
Hope it helps!
Related
I'm learning elastic search,
I wanted to ask if there is any way to get most searched results first,
like:
by altering the documents and updating a value of em with scores,
using some kind of formula or something else
thanks community :)
You don't have that out of the box but you can indeed modify a document and increase the number of views when a user clicks on a result then use that field as part of the score (function_score might help for this).
I am using GSA 7.2, In GSA Search Report It have two Report types
Searches that returned results
Searches that did not return results
what is difference between this types?
I tried Last week Search Report with both types, I am getting few same Keywords and Queries in Both Report types with different Occurrences count. Here my question is if GSA is showing result for some Keywords and Queries then It should not show in without result type, May be my Understanding is wrong, Please correct me.
Thank you for help
Looks like this issue is there for a long time.
Check this out. Not sure whether it is fixed or not, better check with google support.
BTW, Do you have user specific search (Role based search)? If so, just try to search using same term for all user/role and see any user/role gets zero results.
When you run the reports, are you restricting the search to a specific collection and/or time range?
If you run a report for "All Collections" then you might see items show in both reports because users are searching against a collection that does not have the documents.
What you'd want to do is run a report for a single day. If you see the same behavior for a single collection then download the search logs for that day and look for searches for that key term and see if the search query parameters are the same. If they are different then there could be some malformed search queries being executed. If not then it could be a transient issue with the GSA.
As a part of an Integration Requirement, I need to query Opportunity records that have been Modified after a specific date and time?
Now, Opportunity has a child object called ProductRevenue with a one to many relationship. Is there anyway I can construct a querypage that will fetch records whose Opportunity fields 'OR' its child ProductRevenue's fields have been modified after a specific date and time?
I have tried using the SearchSpec argument, but it does not let me query across child object fields.
For eg:
ObjOpptyQueryPageInput.ListOfOpportunity.Opportunity.searchspec = "([ModifiedDate] > '01/01/2013 00:00:00') OR ([ProductRevenueData.ModifiedDate] >= '01/01/2013 00:00:00')";
[This above code written in C# thew me an error saying - The object Opportunity does not have an integration component called - ProductRevenueData.ModifiedDate.]
Any help will be greatly appreciated. Thank you.
I have been looking for the answer myself, and here is my understanding although not a solution.
In Web Services 2.0, Oracle says "all parent records matching the parent criteria and only children matching the child criteria are returned."
You actually can define "searchspec" on the parent AND the child, and it does work in the way that Oracle defined. However it is probably not the behavior you are looking for. When you do this what happens is you get ALL parents that match the parent.searchspec regardless of whether its child matches the child.searchspec. However those parents will only have the child(ren) that match your child.searchspec in the query result. So if all you wanted was "parents that have these children" or "these children" you are out of luck. Because what you get is "many parents and some of their children."
So even when you are post-processing with two queries you will have to spend some time. :(
By the way your two separate queries will have to look something like this:
query 1.
ObjOpptyQueryPageInput.ListOfOpportunity.Opportunity.searchspec = "([ModifiedDate] > '01/01/2013 00:00:00')";
query 2. ObjOpptyQueryPageInput.ListOfOpportunity.Opportunity.ListOfProductRevenue.ProductRevenue.searchspec = "([ModifiedDate] >= '01/01/2013 00:00:00')";
Then post-process query 2 to take out all parents who have no children.
Then union that with the results from query 1.
From my experience, you will not be able to do this using their V2.0 api (i.e. searchspec). You can perform this using V1.0 api BUT this will return all parent records matching your criteria plus all related ProductRevenue records whether they meet the criteria or not. I do something similar and then post process the data against an xpath predicate filter. The only other option, I think, is 2 separate queries.
I had the same problem, and i tried many ways to resolve the problem, but for now you should deal with the result returned: you can use DOM, XPath or regular expressions to extract the information you want for the returned result.
In my case i used XPath because it's very fast and more easier. This is a link to the question i have posted with the correct answer :
Xpath solution for the parent-child query result
I hope this will fix the problem.
I am trying to create a query for GSA Search:
The query suppose to bring all the results that exclude some texts inside meta tag, some thing like:
inmeta:Education -first and -second and -courses...
But no luck with that. Is anybody knows what the right syntax is?
I have used the equal to condition before and the query looks like this -
(inmeta:education=first OR inmeta:education=second)
If your results must not include all the pages with the meta tag Education not equal to the value first then use -
inmeta:Education-first
For multiple values you can use the OR condition -
(inmeta:Education-first OR inmeta:Education-second)
You can find a lot more information here -
https://developers.google.com/search-appliance/documentation/46/xml_reference#request_query_terms
Hope this helps you.
I need to facet inside n documents which are selected like
... ORDER BY something DESC LIMIT 100
Is that possible with Solr? How?
this is a total hack, but here goes...
do your initial query, and get your results back.
construct a new query, like so:
http://localhost:8080/solr/select/?q=id%3A123+OR+id%3A456...(keep OR-ing them up)...&facet=true&facet.field=something
where you concatenate all of your ids to a new query using OR. then, when you facet on your field, the facet summary will only apply to the results.
AFAIK no, that's not supported / implemented. Facets aren't really meant to be "stats" but a guidance to the end-user. Picture yourself browsing a faceted interface and seeing facets change whenever you change sort order or paging. Faceted browsing would be useless if it worked like that.
I think this would be a nice feature for the StatsComponent though.
I think this is possible with results grouping (now in trunk!):
http://wiki.apache.org/solr/FieldCollapsing
... the only problem is that you can set only one 'facet.field' (i.e. group.field)
But the great thing is that you get scored facets!