Data scraping from web for custom pages UiPath RPA

Data scraping from web for custom pages UiPath RPA - uipath

I am scraping from web (A book website), so looking for a dynamic way to extract the data from page 5 to 7, I don't know how many pages my search will have every time. It will be difficult to count the whole data. I am trying to extract from each page wise, what if the search don't have page 7 and only 6 pages then I can add an error handle which tells me there are only 6 pages available.

Think simple.
Select whole table.
Delete first 100 (I expect 20 entries per page) rows.
Done.
Alternativly
Start the web table via URL Parameter. For example in Google:
https://www.google.com/search?q=uipath&start=50

Dont use datascrabing auto next page feature, check next page button with element exist, and use if statement for nextpage element exist counter,
example if (nextpage not exist)then if (counter < 7) send error or alert mail

Related

Java8 - Asynchronous execution of independent code

Java8 - In my portal, on the page (Homepage) which gets displayed after Customer logs in, I need to display multiple sections like,
Order details, Available invoices, Catalog pdf etc.
There is no dependency between these sections, just need each section data to display on the front end.
If I try to get these data sequentially on the back end, it takes time to display the Homepage.
I'm looking at speeding up this process, I could think of:
Instead of trying to get all details after login on the back end, I will first display the Homepage, then make individual ajax calls for each section using jQuery.
Use forkjoinpool (RecursiveTask) on the back end and fire up recursive tasks for each section
I prefer doing it on the back end (business tier), Is there any other solution available on Java8 to acheive this task? Looking at calling asynchronously every section logic instead of sequential flow.

tt_news timed publishing of articles

In my TYPO3 6.2.17 installation, I use the tt_news extension 3.6
My articles are located in a folder and are displayed in a default tt_news list element. Usually if I save an article, I need to flush the Frotend Cache. So far so good.However, sometimes I need to display news articles timed, i.e. beginning from a certain date, which I did with the start value in the access tab of the news article. The problem is that the news are not displayed at the required date, until anyone deletes the Frontend Cache, plus after the set date.
What can I do so that the articles are displayed without anyone deleting the Frontend cache manually, after the access start date.
Edit:
This Problem cannot be solved just with cron jobs, because it would be to difficult for the content editors to create a cron job for every single news article

Disabling caching totally on given page isn't best choice, especially if you have large number of news to render at once and/or large number of visitors, for such cases even relatively short caching is better then no caching at all. The easiest way is shortening cache period of the pages which displays lists and single views by adding only on it shorter period like:
[globalVar = TSFE:id = 123|345]
config.cache_period = 60
[end]
(where 123 is your list page and 345 is single page UID) Instead using condition you can also just create ext TypoScript templates on these pages.
Keep in mind that period of cache is counted from its create time, so it may happen that some posts will require 2 periods (as first may be just dismissed by time diff) to disappear. If that's absolutely important to you to hide item right now just set the cache_period value to 29 seconds.
Finally if on list/single pages there are elements that also requires extensive rendering (like advanced TMENU's etc) you can cache these additionally with the cache function, it will prevent re-rendering menu between page's cache expirations, anyway you are stil able to force clearing it from BE with yellow flash icon,
pseudo code:
lib.mainMenu = COA
lib.mainMenu {
stdWrap.cache.key = lib_mainMenu_{page:uid}_{TSFE:sys_language_uid}
stdWrap.cache.key.insertData = 1
stdWrap.cache.lifetime = 3600
10 = HMENU
10 {
// ... your menu code
}
}

For clearing the cache through an external trigger see this question, which was asked yesterday: Refresh Typo3 by web server cron job
Alternatively you could exclude the page with the list plugin from caching. Check the behavior tab in the page properties.

Codeigniter cache page

I have a site developed in codeigniter.
In the page search I have a form that when I compile It I send a request to a servere with CURL and return me an xml.
This query and the print date is about 15seconds because I have to make more query with many server and this time is necessary.
But the problem is: I have a list of element, when I click on an element I make a query to retrieve the data of the element.
But if I click back or click to go back to all element searched I don't want to make an other query that takes 15second.
When I search the element I have a get request and I have a link like this:
http://myurl/backend/hotel/hotel_list?nation=94&city=1007&check-in=12%2FApr%2F2013&check-out=13%2FApr%2F2013&n_single_rooms=1&n_double_rooms=0&n_triple_rooms=0&n_extra_beds=0
I load the page and I can have more elements. i click on some of this in a simple link like this:
http://myurl/backend/hotel/hotel_view?id_service=tra_0_YYW
When I enter into this page I have to go back to the previous url (the first) without remake the query that takes more seconds.
I can't cache the result because is a realtime database and change every minutes or second but I thinked to cache the page search when I enter on it and if i go back to it reload from cache if the time is minor than 2 minutes for example.
Is this a good way or there is a more perfmormant way to do this in codeigniter?
I can't put in session because there is large data.
The other solution are:
- cache page (but every minutes I have to delete it)
- cache result (but every minutes I have to delete it)
- create sessionflashdata (but I have a large amount of data)
is there a way with the browser when I go back to don't remake the page?
Thanks

cache page (but every minutes I have to delete it)
I think you can easily implement it with codeigniter's page caching function "$this->output->cache(1);"
cache result (but every minutes I have to delete it)
You will have to use codeigniter's object caching method to implement it.
create sessionflashdata (but I have a large amount of data)
Its not a good idea to save huge data in session. Rather use 'database session' instead, which will help you handling similar way and codeigniter has its integrated support.
Hope this helps. You can read more about all kind of codeigniter caching if you are just starting with it.

How do I pass data across ActionResults in MVC 3?

I have a series of pages that an end user must fill out (check boxes) and when they are finished with each page I attempt to create a List of the check boxes they selected. At the end of the series of pages, I would like to show them everything that they have selected in a confirmation page. I have noticed that between requests the information in the List<> I create on each page is not available to the final confirmation page. I've tried a few different solutions (private globals) to no avail. How would I pass data across ActionResults to accomplish displaying all the selected data on the confirmation page? Thanks.
One potential solution. Others?

The web is stateless, meaning you have to store things if you want to keep them around for later use. That's true for any web framework. You'll need to store each page's results somewhere.
Options for building a wizard:
Store all of the selected answers in session and keep building it up from page to page. The final confirmation would get the results from session.
Store them in a database.
Store results in in a cookie.
Store them in HTML5 local storage
Carry them through on each page with hidden fields. Page 2 would have Page 1's answers in hidden fields, etc.

You need to save a state between the requests. You can do this with:
Query string parameters
Session state
Hidden fields
Db (if you wanna persist the intermediate choices after each request)
Local storage
Cookies
Anything else?
I'd guess, as RyanW points out, that storing them in session state is the usual way to do it. You could however fetch all steps in one request, do some fancy JS / store the intermediary results locally and make a final post when the questionnaire is complete.

How does pagination on Reddit's home page work?

Reddit uses a time decay algorithm. That would mean the sort order is subject to change. When a user goes to page 2, is there a mechanism to prevent them from seeing a post that was on page 1 but was bumped down to page 2 before they paged over? Is it just an acceptable flaw of the sort method? Or are the first couple of pages cached for the user so this doesn't happen?
Side note: It's my understand that Digg cannot suffer from this issue but that HackerNews and Reddit can.

From the next URL you see: http://www.reddit.com/?count=25&after=t3_dj7xt
So clearly the next page ensures that the page2 starts at the post after t3_dj7xt - whatever that translated to. This could be accomplished using IDs so you'd pass after=188 then the next page starts at 189 thus ensuring you don't see the same post if a time delay occured

It might be using the last ID as opposed to limiting from. Take these two examples of SQL:
SELECT * FROM Stories WHERE StoryID>$LastStoryID;
rather than:
SELECT * FROM Stories LIMIT 20, 10;

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio