Web Scraping returning empty data table UiPath

Web Scraping returning empty data table UiPath - uipath

I’m using Data Scraping to scrape a product Information (i.e Product Name, Url, Price, Model) from a shopping website.
When I search for a product, I want whatever item comes first it scrapes that item’s data and for that purpose I have set maximum number of results to 1. But the problem is sometimes it is returning empty Data table And I cannot figure out why.
What I think is, if the current search result matches those elements that I selected in data scraping wizard, it returns the data table and if it doesn’t match it returns empty Data table.
For Example, While selecting elements in Data scraping wizard the search results were Samsung monitors. And when I ran the project I searched for Dell monitors, it returned Data table but when I searched for Samsung series or Dell Series it returned empty Data table. What is wrong with this?

You need to tell what you actually need as output.
But if your output is empty, mostly the reason is one of the following:
make sure the timeout is high enough, set it to 30000 if you are unsure
set a proper selector that has not a bad impact even when the website is being changed for some reason
For me it working properly with a proper timeout and a flexible selector with a *.

Related

Laravel - find offset of record in mysql results

I have a MySQL table of records with several fields.
These records are shown and updated live in the browser. They are displayed in an order choosen by the user, with optional filters also choosen by the user.
Sometimes a change is made to one of the records, and it may affect the order for a given user.
In order to position the message correctly in the list, I need to find where its new offset falls after the change to the record. Basically, I need to get the "id" for the record that now comes before it in the MySQL results, so that I can use Javascript on the client side to reposition the record on the screen.
In raw SQL, I'd do something like this:
SET #rank=0;
SELECT rank
FROM
(SELECT #rank:=#rank+1 AS rank,
subQuery.id AS innerQuery,
FROM ...{rest of custom query here}... as subQuery)
AS outerQuery WHERE outerQuery.innerQuery={ID TO FIND};
Then I can just subtract 1 from the resulting rank, and find the ID of the question that comes before the record in question.
Is this kind of query possible with Laravel's query builder? Or is there a better strategy than what I've come up with here to accomplish the same task?
EDIT: There are a lot of records. So if possible I'd like to avoid loading all the records into memory to find the offset of the record. They are originally loaded on the screen in an "infinite scroll" type method, since it would be too much to load all of them at once.

Retrieve filtered data from ALV

Is there an easy way of retrieving the ALV data that is displayed when there are also filters used on that ALV?
The ALV used is an object of CL_GUI_ALV_GRID. When showing it to the user, there is a filter placed on it by default. The user also has a button that processes the data in the ALV. How can I make sure the process only works with the data that is displayed, even if the user places his own filters on the ALV?
e.g: An ALV gets created from an itab that has 10 rows, but because there is also a filter passed on the ALV, only 8 rows are showing. When pressing a button, I only want to work with the 8 rows currently showing to the user.
I have tried finding a function module for this purpose but I can only find a FM which works with the selected rows in an ALV.
EDIT: Further, there is a method called get_filtered_entries, but it only retrieves those entries that are NOT displayed. Using this will be quite time-consuming to make the translation to displayed entries. get_filtered_entries
Thanks in advance.

GET_FILTERED_ENTRIES returns a table of excluded row indices. You just have to skip those in your processing.
" Copy original table
DATA(lit_buffer) = it_out[].
" Get excluded rows
o_grid->get_filtered_entries(
IMPORTING
et_filtered_entries = DATA(lit_index)
).
" Reverse order to keep correct indizes; thnx futu
SORT lit_index DESCENDING.
" Remove excluded rows from buffer
LOOP AT lit_index ASSIGNING FIELD-SYMBOL(<index>).
DELETE lit_buffer INDEX <index>.
ENDLOOP.
EDIT: I debugged cl_gui_alv_grid a little and it doesn't seems like that a filtered version of the original table exists at all. The lines get filtered, sorted, grouped and immediately transferred into a table of cells. Looks like it is nearly impossible to get the displayed rows without a performance drawback.

Smart pagination algorithm that works with local data cache

This is a problem I have been thinking about for a long time but I haven't written any code yet because I first want to solve some general problems I am struggling with. This is the main one.
Background
A single page web application makes requests for data to some remote API (which is under our control). It then stores this data in a local cache and serves pages from there. Ideally, the app remains fully functional when offline, including the ability to create new objects.
Constraints
Assume a server side database of products containing +- 50000 products (50Mb)
Assume no db type, we interact with it via REST/GraphQL interface
Assume a single product record is < 1kB
Assume a max payload for a resultset of 256kB
Assume max 5MB storage on the client
Assume search result sets ranging between 0 ... 5000 items per search
Challenge
The challenge is to define a stateless but (network) efficient way fetch pages from a result set so that it is deterministic which results we will get.
Example
In traditional paging, when getting the next 100 results for some query using this url:
https://example.com/products?category=shoes&firstResult=100&pageSize=100
the search result may look like this:
{
"totalResults": 2458,
"firstResult": 100,
"pageSize": 100,
"results": [
{"some": "item"},
{"some": "other item"},
// 98 more ...
]
}
The problem with this is that there is no way, based on this information, to get exactly the objects that are on a certain page. Because by the time we request the next page, the result set may have changed (due to changes in the DB), influencing which items are part of the result set. Even a small change can have a big impact: one item removed from the DB, that happened to be on page 0 of the result set, will change what results we will get when requesting all subsequent pages.
Goal
I am looking for a mechanism to make the definition of the result set independent of future database changes, so if someone was looking for shoes and got a result set of 2458 items, he could actually fetch all pages of that result set reliably even if it got influenced by later changes in the DB (I plan to not really delete items, but set a removed flag on them, for this purpose)
Ideas so far
I have seen a solution where the result set included a "pages" property, which was an array with the first and last id of the items in that page. Assuming your IDs keep going up in number and you don't really delete items from the DB ever, the number of items between two IDs is constant. Meaning the app could get all items between those two IDs and always get the exact same items back. The problem with this solution is that it only works if the list is sorted in ID order... I need custom sorting options.
The only way I have come up with for now is to just send a list of all IDs in the result set... That way pages can be fetched by doing a SELECT * FROM products WHERE id IN (3,4,6,9,...)... but this feels rather inelegant...
Any way I am hoping it is not too broad or theoretical. I have a web-based DB, just no good idea on how to do paging with it. I am looking for answers that help me in a direction to learn, not full solutions.

Versioning DB is the answer for resultsets consistency.
Each record has primary id, modification counter (version number) and timestamp of modification/creation. Instead of modification of record r you add new record with same id, version number+1 and sysdate for modification.
In fetch response you add DB request_time (do not use client timestamp due to possibly difference in time between client/server). First page is served normally, but you return sysdate as request_time. Other pages are served differently: you add condition like modification_time <= request_time for each versioned table.

You can cache the result set of IDs on the server side when a query arrives for the first time and return a unique ID to the frontend. This unique ID corresponds to the result set for that query. So now the frontend can request something like next_page with the unique ID that it got the first time it made the query. You should still go ahead with your approach of changing DELETE operation to a removed operation because it would make sure that none of the entries from the result set it deleted. You can discard the result set of the query from the cache when the frontend reaches the end of the result set or you can set a time limit on the lifetime of the cache entry.

Pagination in Classic ASP with VB Script

I am using ASP/VB Script in my project but, i don't have much idea of Pagination in Classic ASP. I have designed a datagrid format using tables and looping. That table is filled by accessing database. As we have a huge amount of data to display, we need pagination.
Thanks in advance

The pagination problem is not inherently to ASP classic or VBScript. You need first to define which strategy to follow:
In the client:
Ajax style pagination (You can use a jQuery plugin like SlickGrid)
Linked pagination: Your page have links to page 1, page 2, etc.
Infite scrolling: This is a modern way to do pagination, with more results added to the page via ajax
In the server
Full DB results retrieve and return only the page asked. This is sometimes necessary.
Full DB retrieve but caching the result so subsequent page request come from the cache, not the DB
Ask the DB only the page asked (Different techniques depending on the DB engine)

There is a issue you need to be aware of... the built-in ASP record set will allow pagiing, however is not very efficient. The entire result set gets returned to the browser and then it locates the appropriate page and displays that data.
Think of it like this... your result set is a 4 shelf book case. When you ask for page one all 4 shelves of books get returned. The the display code says "Okay now only show page 1". If you then ask for page two... All four shelves of books gets returned and then the display code says "Okay give me page 4".
So, you should look for a paging solution that takes place on the server, inside the database. This way if you ask for page 15 of a 50 page result, the database will only return one shelf of books.
This google query should put you on the right track.
Edit: How SQL Paging Works
You must us a stored procedure
One of the input parameters is the page to view
The stored procedure filters the results on the server
Here is the basic concept of what happens inside the proc:
Step 1:
Create a temp table that stores the entire result set. My preference is to store only two values in this temp table. An identity seed value called RowId and the primary key of the result data. (I'm one of those people that believes in non-sensical identity seed keys)
Step 2:
Insert all the PKey values from the select statement into the temp table
Step 3:
Determine the StartRowId and EndRowId based on the input page parameter.
Step 4:
Select from the temp table using an inner join to the datatable on the PKey. In the where clause limit the result so the RowId (of the temp table) is between StartRowId and EndRowId. Make sure to Order By the RowId.

Set page size
recordset.PageSize = 100 ' number of records per page
Set the current page
recordset.AbsolutePage = nPage ' nPage being the page you want to jump to.
Other useful bits:
recordset.RecordCount ' number of records returned
recordset.PageCount ' number of pages based on PageSize and RecordCount
That's the basic info. You'll still need to loop through the appropriate number of records, and check the page number as it is passed back to the page.

Telerik Report omitting data

After performing a product evaluation by one of the managers other can change the scoring for certain categories. This changes in scoring are stored in the database for reference.
The structure of the evaluation is like this:
Evaluatoin
- Category
- Scoring point
an evaluation can have many categories which all can have many scoring points.
My problem is the following:
If I change a scoring point a few times all is entered in the database but in the reports i am only seeing the first scoring point. The rest of them with the same name are left blank but are using space just as it would if all were visible. The stored procedure that is delivering the data is working fine. It bring all data to the report which then displayes it wrong.
=Fields.CategoryName is working fine... every category name is displayed correctly
=Fields.ScoringPointName is not working... it displayes only the first and leavese all the rest blank... if for example a scoring point name is Product robustnes it would display only the first change of scoring but wouldnt display the rest
Any ideas???

Found out what the problem was. Maybe it will be helpful for other people
I was showing the data in a group header section with grouping =Fields.DefinitionText. Thus it will only repeat if the Fields.DefinitionText is distinct. About the empty space it's caused by the detail section that repeats for every data record. Thus if I want to display all of the data records I have to move the group header section textboxes to the report's detail section.
Here and Here are some usefull things about reporting.
Cheers

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio