Scraping the right item - xpath

I need to do this:
Sysyem ( web page ) ask a smartphone model to an user
User specify it in a form
System query some external page and with xpath extract what needed
System returns the cheapest item found ( if any ) with some xpath query on the external page, showing image, price, name, and giving the link to the original page
Now the problem is: if i order the results for descending price and i take the first, it's not the cheapest ( and sometime is not even the model i was looking for ), and if i take the last maybe will be not a smartphone, but a gadget like flip cover or other stuff.
I tought to use the xpath contains() to check if the name given at beginning by the user is present in the xpath query return value, but i found contains() is case sensitive, and is not so easy to make it case insensitive.
Here you can find an example of the issues described above: i query for "note 4", order by price, and both first and last can't be taken.
Actually i use php to load and make query on desired page, and i can easilly extract all i need ( for example first, last or just another random block of items from the required page ) but i need some method to take exactly the cheapest and right model the user is looking for from all that noisy results.
Some other ideas?

Related

Web Scraping returning empty data table UiPath

I’m using Data Scraping to scrape a product Information (i.e Product Name, Url, Price, Model) from a shopping website.
When I search for a product, I want whatever item comes first it scrapes that item’s data and for that purpose I have set maximum number of results to 1. But the problem is sometimes it is returning empty Data table And I cannot figure out why.
What I think is, if the current search result matches those elements that I selected in data scraping wizard, it returns the data table and if it doesn’t match it returns empty Data table.
For Example, While selecting elements in Data scraping wizard the search results were Samsung monitors. And when I ran the project I searched for Dell monitors, it returned Data table but when I searched for Samsung series or Dell Series it returned empty Data table. What is wrong with this?
You need to tell what you actually need as output.
But if your output is empty, mostly the reason is one of the following:
make sure the timeout is high enough, set it to 30000 if you are unsure
set a proper selector that has not a bad impact even when the website is being changed for some reason
For me it working properly with a proper timeout and a flexible selector with a *.

Laravel - find offset of record in mysql results

I have a MySQL table of records with several fields.
These records are shown and updated live in the browser. They are displayed in an order choosen by the user, with optional filters also choosen by the user.
Sometimes a change is made to one of the records, and it may affect the order for a given user.
In order to position the message correctly in the list, I need to find where its new offset falls after the change to the record. Basically, I need to get the "id" for the record that now comes before it in the MySQL results, so that I can use Javascript on the client side to reposition the record on the screen.
In raw SQL, I'd do something like this:
SET #rank=0;
SELECT rank
FROM
(SELECT #rank:=#rank+1 AS rank,
subQuery.id AS innerQuery,
FROM ...{rest of custom query here}... as subQuery)
AS outerQuery WHERE outerQuery.innerQuery={ID TO FIND};
Then I can just subtract 1 from the resulting rank, and find the ID of the question that comes before the record in question.
Is this kind of query possible with Laravel's query builder? Or is there a better strategy than what I've come up with here to accomplish the same task?
EDIT: There are a lot of records. So if possible I'd like to avoid loading all the records into memory to find the offset of the record. They are originally loaded on the screen in an "infinite scroll" type method, since it would be too much to load all of them at once.

Efficient way to query

My app has a class that saves picture that users upload. Each object in the class has a city property that holds the name of the city that the picture was taken at, and a like property that tracks the number of likes.
I want to be able to send a query that returns one picture per city and each picture should have the highest ranking of likes in the city it belongs to. How can I do that?
One way which I first thought about is doing multiple queries by fetching the most liked picture of a city and save it in an array, and then do the same to other cities.
However, each country has more than one city, thus it's not that efficient.
Parse doesn't support the ordinary operations used in databases. Besides, I tried to use a compound query. Unfortunately, I can't set limit or ordering on the subqueries. Any good solution for this?
It would be easy using group by. Unfortunately, Parse does not support "select distinct" or "group by" features.
As you've suggested you need to fetch for each country all the cities, and for each one get the top most rated photo.
BUT, since Parse has strict restrictions on the duration time execution of a request ( 3 sec for an event listener, 7 sec for a custom function ), I suggest you to do this in a background job, saving in a new table the top rated photo for each city. In this way you can easily query the db from client. The Background jobs can be executed up to 15 minuted before parse drop them, so you could make that kind of queries without timeouts.
Hope it helps

How to properly organize search of the person?

Let's say I have list of persons in my datastore. Each person there may have the following fields:
last name (*)
first name
middle name
id (*)
driving licence id (*)
another id (*)
date of birth
region
place of birth
At least one of the fields marked with (*) must exist.
Now user provides me with the same list of fields (and again at least one of the fields marked with (*) must be provided). I should search for the person user provided. But not all fields should be matched. I should display to the user somehow how I am sure in the results of search. Something like:
if person matched by id and last name (and user provided just these 2 fields for the search), then I am sure that result is correct (100%);
if person matched by id and last name (and user provided other fields, which were found in the database, but were not matched), then I am sure that result is almost correct by 60%;
etc.
(numbers are provided just as example)
How can I organize such search? Is there any standard algorithm? I also would like to minimize number of requests to the database.
P.S. I can not provide user with the actual field values from the database.
It sounds like your logic for determining the quality of a match will be too complex to handle at the database layer. I think you'll get the best performance by retrieving all of the records that match at least one of the mandatory keys, calculating the match score for each of them in memory, and returning the best score. For example, if the user provides you with an id, last name and place of birth, your query would look something like:
SELECT * FROM users WHERE id = `the_id` OR last_name = `the_last_name`;
This could be a performance problem if you have a VERY large dataset with lots of common last names but otherwise I would expect not to see too many collisions. You can check this on your own dataset outside of GAE. You could also get better performance if all mandatory fields MUST match by changing the OR to an AND.

Using Yahoo APIs, how to get list of locations matching certain prefix that have weather data available

I have an app that (among other things) uses Yahoo Weather API to display weather conditions for a location selected by user.
In the configuration dialog where user can enter the location, I'd love to offer autocompletion so that while user is typing location name, list of matching cities is suggested.
I can use YQL to fetch locations matching the prefix, i.e.:
select * from geo.places where text = 'Vie*'
but the problem is that not every location has a weather station associated with it and I'd love to skip these in my autocompletion list.
Using community tables (table called weather.woeid), following query will join previous query with the weather api, returning only locations that do have weather stations:
select location from weather.woeid where w in (select woeid from geo.places where text = 'Vie*')
This almost solves my problem, except for the fact that previous query (which produces same result as weather api call) doesn't return WOEID nor any kind of identifier I can use to directly query the Weather API after configuration. How can I capture the value of join parameter w? I tried something like select w, location ... but that doesn't seem to work.
Is there any other way to get list of locations (incl. WOEID) matching certain prefix that have weather data associated with them?
Afaik it is not possible with YQL to pass through values from the Sub-Select (the inner SELECT statement) to the outer SELECT, which I is what you want to do if I understand you correctly.
Based on your use case I want to propose another solution though:
I assume that the list of locations that have a weather station associated with them is relatively static, meaning this list does not change very often. If that is the case then it would not be very optimal in terms of performance to regenerate that list every time with YQL. Instead I would generate that list offline, store it in a file or MySQL or elsewhere and then just use that static list to answer to the AJAX call of your autocomplete field.
The data in that static list could look something like this:
{
"Vienna" => 72342,
"Hamburg" => 12334,
...
}
Once the user has selected a location and pressed enter, then you can send the YQL query to weather.woeid to look up the current weather based on the WOEID.

Resources