Can't download all pdf files from a domain - caching

I'm trying to download all the pdf files from fsppm.fulbright.edu.vn/cache or fsppm.fulbright.edu.vn/documents. These files are publically accessible via google search. For example, I can find this file by searching Why Doesn't Vietnam Grow Faster? site:fsppm.fulbright.edu.vn.
I tried using wget -r -A "*.pdf" "https://www.fsppm.fulbright.edu.vn/". But I could only retrieve some of the files, excluding all the files I want to download. When I used wget to download from cache folder directly, it return 403 Forbidden error. Besides, among the downloaded folder, there is none named cache or documents.
Is it possible to download these files? If so, could anyone help me, please? Thank you very much.

Related

how to download files from google drive via command line

I would need to download files or folders from my google drive, via command line.
Thought to a script, a batch file, windows platform.
Seen that I could use gdrive app but I have some troubles with syntax.
I tried:
gdrive-windows-x64.exe download -r --path "G:\My Drive\myfolder"
but it gets me error as "invalid arguments"
Also I'm interested to a way to zip the content of a folder upon my google drive...again via command line
someone can help me?
thanks a lot
marco
Drive is an option.
Create a new folder and do the Initializing setting up;
Create the sub folder mirroring the remote folder structure;
Cd to the sub folder and run $ drive pull
Click here for more pulling documentation.
You have available a Command-line utility for working with Google Drive in github here:
https://github.com/google/skicka
Examples:
skicka download /folder1 ~/folder2
The contents of your ~/folder2 directory will match the contents of ~/folder1.
For download to local:
skicka download /local ~/remote
gdrive is an option written in GoLang. This does require connecting a Google account. This command downloads a Google Drive directory:
gdrive download --recursive DRIVEID

Download files from Artifactory

How can I download files from Artifactory . Is it possible to download using batch script . I used CURL commands to upload then on the same way please provide suggestions to download. Appreciate your help.
You can use the JFrog CLI - a compact and smart client that provides a simple interface that automates access to JFrog products. The CLI works on both Windows and Linux.
For downloading files, take a look at the command for downloading files from Artifactory. This command allows you downloading specific files, multiple files (using wildcards) or complete folders,
Use GNU WGET from here - http://gnuwin32.sourceforge.net/packages/wget.htm
Very small utillity and supports download percentage and alot of other options like overwriting, not download if file exists etc.
Hi I used the same CURL command with Ansible .But I missed to configure the remote server for Ansible .So the CURL was not working . After configuring the remote server. It was able to download Thanks a lot for the response

Update Magento extension using ssh to extract .tgz tar file

I am trying to update a module to a newer version. In the past I have manually uploaded each file carefully into the new directory and overwritten older files using FTP. However I wanted to use SSH to try and do this more easily and without any file permission problems.
I have:
Uploaded the .tgz file to the root folder (/http) on the server
Logged into the server via SSH
Changed the directory to the correct directory
Run the following command: tar -zxvf fishpig_splash.tgz
In the command line I was then given a list of all the files that had been extracted. However if I use FTP to go to any of these files I can see that they are still the older version and have not been overwritten.
I was expecting that the files would extract into the correct directories and overwrite any that already existed. I have tested the extraction by creating a temporary directory and extracting into that and everything worked fine.
Is there another part to this script I need to use to overwrite the files?
Thanks
Glynn
Sorry this was just me being stupid! When extracting the tar file there was a subfolder within it for the extension, I completely missed it. I just went down a level in the file and zipped up the contents only then extracted them at the root and everything worked fine. Thanks for the help though!

Can't curl then unzip zip file

I'm just trying to curl this zip file, then unzip it
curl -sS https://www.kaggle.com/c/word2vec-nlp-tutorial/download/labeledTrainData.tsv.zip > labeledTrainData.tsv.zip
unzip labeledTrainData.tsv.zip labeledTrainData.tsv
but I keep getting the error;
Archive: labeledTrainData.tsv.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
I'm using the same syntax as found in this response, I think. Is there something wrong with the file I'm downloading? I feel like I'm making a noob mistake. I run these two commands in a shell script
I am able to replicate your error. This sort of error generally indicates one of two things:
The file was not packaged properly
You aren't downloading what you think you're downloading.
In this case, your problem is the latter. It looks like you're downloading the file from the wrong URL. I'm seeing this when I open up the alleged zip file for reading:
<html><head><title>Object moved</title></head><body>
<h2>Object moved to here.</h2>
</body></html>
Long story short, you need to download from the alternate URL specified above. Additionally, Kaggle usually requires login credentials when downloading, so you'll need to specify your username/password as well.

Download large files in Heroku

I am facing some issues when downloading large files in Heroku. I have to download and parse files greater than 1Gb. What I am trying to do right now, is use curl to download them into /tmp folder (of a Rails application).
The curl command is: "curl --retry 999 -o #{destination} #{uri} 2> /dev/null" and destination is Rails.root.join("tmp", "file.example")
The problem is that after a few minutes downloading, the "curl" process that is downloading the file is finished, way far from the download is finished. Before being finished, the logs show lots of "Memory exceeded". This led me to the thinking that when I am saving to /tmp folder, it is storing the downloaded content in the memory and when it memory hit its limit, the process is killed.
I would like to know if any of you have already experienced a similar issue on Heroku and if saving to /tmp folder really works like this. If so, do you have any suggestions to get this working at Heroku?
thanks,
Elvio
You are probably better off saving the file in an external cloud provider like S3 using the fog gem. In any case, Heroku is a read only filesystem, so they won't allow you to curl, must less write to it.

Resources