how to get all chrome download links with wget command tool automatically? - bash

I'm trying to download all the images that appear on the page with WGET, it seems that eveything is fine but the command is actually downloading only the first 6 images, and no more. I can't figure out why.
The command i used:
wget -nd -r -P . -A jpeg,jpg http://www.edpeers.com/2013/weddings/umbria wedding-photographer/
It's downloading only the first 6 images relevant of the page and all other stuff that i don't need, look at the page, any idea why it's only getting the first 6 relevant images?
Thanks in advance.

Related

Downloading all images bigger than a certain size in kb from all pages of a website in Ubuntu 22.04

I have figured out how to download all images from a particular website: wget -i wget -qO- http://example.com | sed -n '/<img/s/.src="([^"])"./\1/p' | awk '{gsub("thumb-350-", "");print}' What I HAVEN'T figured out is how to download images from ALL pages of a website which increments like this (http://example.com/page/) and restrict the download to images of a certain size or bigger. Can you help me?
I used that command in Ubuntu's terminal and managed to download all the images from the original page. Now, to avoid repeating the same process 131 times, the number of pages in the blog, and to avoid downloading images that are too little, I'd like you to help me tweak that command.

WGET - Download specific files (by extension or mime-type) from third party websites

I need to get ALL ".js" extension files from a website by using wget, including third party ones, but it's not always being done.
I use the next code:
wget -H -p -A "*.js" -e robots=off --no-check-certificate https://www.quantcast.com
For example, if I execute wget to "https://www.stackoverflow.com" I want to get all "*.js" files from stackoverflow.com but also third party websites, such as "scorecardresearch.com", "secure.quantserve.com" and others.
Is something missing in my code?
Thanks in advance!
Wget with the -p flag will only download simple page requirements like scripts with a src, links with an href or images with a src.
Third party scripts are often loaded dynamically using script snippets (such as Google Tag Manager https://developers.google.com/tag-manager/quickstart). These dynamically loaded script will not be downloaded by Wget, since they need to run the JavaScript to actually load. To get absolutely everything, you would likely need something like Pupeteer or Selenium to load the page and scrape the contents.

How to download a page from GET url using wget

I am trying to download the google search results page for Gop primaries results using wget but I am not able to do that (this page). But, I noticed that the webpage is getting the data from this file https://goo.gl/KPGSqS which it gets using a GET request.
So, I was wondering if there is a way to download that file with wget? The usual way i do is using wget -c url but that is not working. So, any ideas on what i should do for this?
I tried with the user-agent option, but even that isn't working.
If you want to download a webpage content (parsed to simple text file or source html code) you could consider using lynx. Download lynx by typing sudo apt-get lynx and then you could save the webpage content using lynx http://your.url/ > savefile.txt.
You can find how to use lynx in this page

how do i use wget to download all images from the domain

Hello I would like to download all the pictures from the www.demotywatory.pl website.
I have seen other subject with accepted answer but this does not work for me at all.
The answer was:
wget -r -P /save/location -A jpeg,jpg,bmp,gif,png http://www.domain.com
So i tried that with several websites and alway got that: It looks like it tried only save the one file
Have you tried doing this:
wget -r -A.jpg http://www.demotywatory.pl
It will download all .jpg files from given URL.

Wget Not Downloading Every Folder

Hey, I have bash script running a wget command to get a directory:
wget -r -nH --cut-dirs=5 "ftp://$USER:$PASS#stumpyinc.com/subdomains/cydia/httpdocs/theme/themes/$theme_root"
And what it's supposed to do is download a folder structure that looks like this:
$theme_root/Library/Themes/$theme_name.theme/Icons
For some reason, it wont download any folder that's inside of the $theme_name.theme folder. There's also a UIImages folder in there that's not showing up, although files that are in that folder are being downloaded. Does anyone notice anything that I might have done wrong? Thanks in advance!
EDIT
if you add --level=inf it works perfectly!
Wget's default directory retrieval depth is 5 directories as per the wget manual. If the files you are trying to get are deeper than that from your starting position, it will not go down there. You can try giving a larger --level option or as your edit --level=inf.

Resources