iterate through specific files using webHDFS in a bash script - bash

I want to download specific files in a HDFS directory, with their names starting with "total_conn_data_". Since I've got many files I want to write a bash script.
Here's what I do:
myPatternFile="total_conn_data_*.csv"
for filename in `curl -i -X GET "https://knox.blabla/webhdfs/v1/path/to/the/directory/?OP=LISTSTATUS" -u username`; do
curl -i -X GET "https://knox.blabla/webhdfs/v1/path/to/the/directory/$filename?OP=OPEN" -u username -L -o "./data/$filename" -k;
done
But it does not work since curl -i -X GET "https://knox.blabla/webhdfs/v1/path/to/the/directory/?OP=LISTSTATUS" -u username is sending back a json text and not file names.
How should I do? Thanks

curl provides output in json format only. you will have to use other tools like jquery and sed to format that output and get the list of files.

Related

Ruby output as input for system command

I am trying download a ton of files via gsutil (Google Cloud). You can pass a list of URLs to download:
You can pass a list of URLs (one per line) to copy on stdin instead of as command line arguments by using the -I option. This allows you to use gsutil in a pipeline to upload or download files / objects as generated by a program, such as:
some_program | gsutil -m cp -I gs://my-bucket
How can I do this from Ruby, from within the program I mean? I tried to output them but that doesn't seem to work.
urls = ["url1", "url2", "url3"]
`echo #{puts urls} | gsutil -m cp -I gs://my-bucket`
Any idea?
A potential workaround would be to save the URLs in a file and use cat file | gsutil -m cp -I gs://my-bucket but that feels like overkill.
Can you try echo '#{urls.join("\n")}'
If you put puts it returns nil, rather than the string you want to return. The interpolation fails due to the same reason.

Sending file using CURL in windows

I'm trying to send a file using curl in windows.
Here's the command i'm using:
C:\curl>curl -X POST -F chat_id=#telegramchannel -F photo=#IMAGE.png https://api.telegram.org/bot812312342:XXXXXXXXXXXXXXXXXXXXXX/sendPhoto
and I keep getting this error:
curl: (26) Failed to open/read local data from file/application
does anybody know how to solve it and how to use the -F properly with files on windows?
Thanks
If telegramchannel is not a file, then you have to escape # with a backslash or use single quotes to encapsulate the content. As # has special meaning in curl context,
either
curl -X POST -F chat_id='#telegramchannel' -F photo=#IMAGE.png https://api.telegram.org/bot812312342:XXXXXXXXXXXXXXXXXXXXXX/sendPhoto
or
curl -X POST -F chat_id=\#telegramchannel -F photo=#IMAGE.png https://api.telegram.org/bot812312342:XXXXXXXXXXXXXXXXXXXXXX/sendPhoto

Curl wildcard delete

I'm trying to use curl to delete files before i upload a new set, I'm having trouble trying to wildcard the files.
The below code works to delete one specific file
curl -v -u usr:"pass" ftp://11.11.11.11/outgoing/ -Q "DELE /outgoing/configuration-1.zip"
But when i try and wildcard the file with the below
curl -v -u usr:"pass" ftp://11.11.11.11/outgoing/ -Q "DELE /outgoing/configuration-*.zip"
i ge the error below
errorconfiguration-*: No such file or directory
QUOT command failed with 550
Can i use wildcards in curl delete?
Thanks
Curl does not support wildcards in any commands on an FTP server. In order to perform the required delete, you'll have to first list the files in the directory on the server, filter down to the files you want, and then issue delete commands for those.
Assuming your files are in the path ftp://11.11.11.11/outgoing, you could do something like:
curl -u usr:"pass" -l ftp://11.11.11.11/outgoing \
| grep '^configuration[-][[:digit:]]\+[.]zip$' \
| xargs -I{} -- curl -v -u usr:"pass" ftp://11.11.11.11/outgoing -Q "DELE {}"
That command (untested, since I don't have access to your server) does the following:
Outputs a directory listing for the outgoing/outgoing directory on the server.
Filters that directory listing for file names that start with configuration-, then have one or more digits, and then end with .zip. You may need to adjust this regex for different patterns.
Supplies the matching names to xargs, which, using the delimiter {} to interpolate each matched name, runs the curl command to DELETE each file on the server.
You could use one curl command to delete all of the files by concatting the matched names together into a single delete command, but that would be less legible for use as an example.

Using all files in a directory with curl?

This is my script:
#!/bin/bash
curl -X POST -T /this/is/my/path/system.log https://whatever;
As you see, I am using a file called system.log. How can I do that for the complete /this/is/my/path/ path in a loop? There are about 50 files in /this/is/my/path/ which I want to use with curl.
Thanks!
You can upload multiple files using this range syntax in curl:
$ curl -u ftpuser:ftppass -T "{file1,file2}" ftp://ftp.testserver.com
A very robust solution is to iterate through a for loop. Moreover you can take advantage of this and insert echo commands or delete, or whatever command you want.
#!/bin/bash
for file in /this/is/my/path/*
do
curl -X POST -T "/this/is/my/path/$file" https://whatever;
done; # file

How to download a file using curl

I'm on mac OS X and can't figure out how to download a file from a URL via the command line. It's from a static page so I thought copying the download link and then using curl would do the trick but it's not.
I referenced this StackOverflow question but that didn't work. I also referenced this article which also didn't work.
What I've tried:
curl -o https://github.com/jdfwarrior/Workflows.git
curl: no URL specified!
curl: try 'curl --help' or 'curl --manual' for more information
.
wget -r -np -l 1 -A zip https://github.com/jdfwarrior/Workflows.git
zsh: command not found: wget
How can a file be downloaded through the command line?
The -o --output option means curl writes output to the file you specify instead of stdout. Your mistake was putting the url after -o, and so curl thought the url was a file to write to rate and hence that no url was specified. You need a file name after the -o, then the url:
curl -o ./filename https://github.com/jdfwarrior/Workflows.git
And wget is not available by default on OS X.
curl -OL https://github.com/jdfwarrior/Workflows.git
-O: This option used to write the output to a file which named like remote file we get. In this curl that file would be Workflows.git.
-L: This option used if the server reports that the requested page has moved to a different location (indicated with a Location: header and a 3XX response code), this option will make curl redo the request on the new place.
Ref: curl man page
The easiest solution for your question is to keep the original filename. In that case, you just need to use a capital o ("-O") as option (not a zero=0!). So it looks like:
curl -O https://github.com/jdfwarrior/Workflows.git
There are several options to make curl output to a file
# saves it to myfile.txt
curl http://www.example.com/data.txt -o myfile.txt -L
# The #1 will get substituted with the url, so the filename contains the url
curl http://www.example.com/data.txt -o "file_#1.txt" -L
# saves to data.txt, the filename extracted from the URL
curl http://www.example.com/data.txt -O -L
# saves to filename determined by the Content-Disposition header sent by the server.
curl http://www.example.com/data.txt -O -J -L
# -O Write output to a local file named like the remote file we get
# -o <file> Write output to <file> instead of stdout (variable replacement performed on <file>)
# -J Use the Content-Disposition filename instead of extracting filename from URL
# -L Follow redirects

Resources