How to get the file size on Unix in a Makefile? - shell

I would like to implement this as a Makefile task:
# step 1:
curl -u username:password -X POST \
-d '{"name": "new_file.jpg","size": 114034,"description": "Latest release","content_type": "text/plain"}' \
https://api.github.com/repos/:user/:repo/downloads
# step 2:
curl -u username:password \
-F "key=downloads/octocat/Hello-World/new_file.jpg" \
-F "acl=public-read" \
-F "success_action_status=201" \
-F "Filename=new_file.jpg" \
-F "AWSAccessKeyId=1ABCDEF..." \
-F "Policy=ewogIC..." \
-F "Signature=mwnF..." \
-F "Content-Type=image/jpeg" \
-F "file=#new_file.jpg" \
https://github.s3.amazonaws.com/
In the first part however, I need to get the file size (and content type if it's easy, not required though), so some variable:
{"name": "new_file.jpg","size": $(FILE_SIZE),"description": "Latest release","content_type": "text/plain"}
I tried this but it doesn't work (Mac 10.6.7):
$(shell du path/to/file.js | awk '{print $1}')
Any ideas how to accomplish this?

If you have GNU coreutils:
FILE_SIZE=$(stat -L -c %s $filename)
The -L tells it to follow symlinks; without it, if $filename is a symlink it will give you the size of the symlink rather than the size of the target file.
The MacOS stat equivalent appears to be:
FILE_SIZE=$(stat -L -f %z)
but I haven't been able to try it. (I've written this as a shell command, not a make command.) You may also find the -s option useful:
Display information in "shell output", suitable for initializing variables.

For reference, an alternative method is using du with -b bytes output and -s for summary only. Then cut to only keep the first element of the return string
FILE_SIZE=$(du -sb $filename | cut -f1)
This should return the same result in bytes as #Keith Thompson answer, but will also work for full directory sizes.
Extra: I usually use a macro for this.
define sizeof
$$(du -sb \
$(1) \
| cut -f1 )
endef
Which can then be called like,
$(call sizeof,$filename_or_dirname)

I think this is a case where parsing the output of ls is legitimate:
% FILE_SIZE=`ls -l $filename | awk '{print $5}'`
(no it's not: use stat, as noted by Keith Thompson)
For the type, you can use
% FILE_TYPE=`file --mime-type --brief $filename`

Related

Using wget in shell trouble with variable that has \

I'm trying to run a script for pulling finance history from yahoo. Boris's answer from this thread
wget can't download yahoo finance data any more
works for me ~2 out of 3 times, but fails if the crumb returned from the cookie has a "\" character in it.
Code that sometimes works looks like this
#!usr/bin/sh
symbol=$1
today=$(date +%Y%m%d)
tomorrow=$(date --date='1 days' +%Y%m%d)
first_date=$(date -d "$2" '+%s')
last_date=$(date -d "$today" '+%s')
wget --no-check-certificate --save-cookies=cookie.txt https://finance.yahoo.com/quote/$symbol/?p=$symbol -O C:/trip/stocks/stocknamelist/crumb.store
crumb=$(grep 'root.*App' crumb.store | sed 's/,/\n/g' | grep CrumbStore | sed 's/"CrumbStore":{"crumb":"\(.*\)"}/\1/')
echo $crumb
fileloc=$"https://query1.finance.yahoo.com/v7/finance/download/$symbol?period1=$first_date&period2=$last_date&interval=1d&events=history&crumb=$crumb"
echo $fileloc
wget --no-check-certificate --load-cookies=cookie.txt $fileloc -O c:/trip/stocks/temphistory/hs$symbol.csv
rm cookie.txt crumb.store
But that doesn't seem to process in wget the way I intend either, as it seems to be interpreting as described here:
https://askubuntu.com/questions/758080/getting-scheme-missing-error-with-wget
Any suggestions on how to pass the $crumb variable into wget so that wget doesn't error out if $crumb has a "\" character in it?
Edited to show the full script. To clarify I've got cygwin installed with wget package. I call the script from cmd prompt as (example where the script above is named "stocknamedownload.sh, the stock symbol I'm downloading is "A" from the startdate 19800101)
c:\trip\stocks\StockNameList>bash stocknamedownload.sh A 19800101
This script seems to work fine - unless the crumb returned contains a "\" character in it.
The following implementation appears to work 100% of the time -- I'm unable to reproduce the claimed sporadic failures:
#!/usr/bin/env bash
set -o pipefail
symbol=$1
today=$(date +%Y%m%d)
tomorrow=$(date --date='1 days' +%Y%m%d)
first_date=$(date -d "$2" '+%s')
last_date=$(date -d "$today" '+%s')
# store complete webpage text in a variable
page_text=$(curl --fail --cookie-jar cookies \
"https://finance.yahoo.com/quote/$symbol/?p=$symbol") || exit
# extract the JSON used by JavaScript in the page
app_json=$(grep -e 'root.App.main = ' <<<"$page_text" \
| sed -e 's#^root.App.main = ##' \
-e 's#[;]$##') || exit
# use jq to extract the crumb from that JSON
crumb=$(jq -r \
'.context.dispatcher.stores.CrumbStore.crumb' \
<<<"$app_json" | tr -d '\r') || exit
# Perform our actual download
fileloc="https://query1.finance.yahoo.com/v7/finance/download/$symbol?period1=$first_date&period2=$last_date&interval=1d&events=history&crumb=$crumb"
curl --fail --cookie cookies "$fileloc" >"hs$symbol.csv"
Note that the tr -d '\r' is only necessary when using a native-Windows jq mixed with an otherwise native-Cygwin set of tools.
You are adding quotes to the value of the variable instead of quoting the expansion. You are also trying to use tools that don't know what JSON is to process JSON; use jq.
wget --no-check-certificate \
--save-cookies=cookie.txt \
"https://finance.yahoo.com/quote/$symbol/?p=$symbol" \
-O C:/trip/stocks/stocknamelist/crumb.store
# Something like thist; it's hard to reverse engineer the structure
# of crumb.store from your pipeline.
crumb=$(jq 'CrumbStore.crumb' crumb.store)
echo "$crumb"
fileloc="https://query1.finance.yahoo.com/v7/finance/download/$symbol?period1=$first_date&period2=$last_date&interval=1d&events=history&crumb=$crumb"
echo "$fileloc"
wget --no-check-certificate \
--load-cookies=cookie.txt "$fileloc" \
-O c:/trip/stocks/temphistory/hs$symbol.csv

Bash: Parse Urls from file, process them and then remove them from the file

I am trying to automate a procedure where the system will fetch the contents of a file (1 Url per line), use wget to grab the files from the site (https folder) and then remove the line from the file.
I have made several tries but the sed part (at the end) cannot understand the string (I tried escaping characters) and remove it from that file!
cat File
https://something.net/xxx/data/Folder1/
https://something.net/xxx/data/Folder2/
https://something.net/xxx/data/Folder3/
My line of code is:
cat File | xargs -n1 -I # bash -c 'wget -r -nd -l 1 -c -A rar,zip,7z,txt,jpg,iso,sfv,md5,pdf --no-parent --restrict-file-names=nocontrol --user=test --password=pass --no-check-certificate "#" -P /mnt/USB/ && sed -e 's|#||g' File'
It works up until the sed -e 's|#||g' File part..
Thanks in advance!
Dont use cat if it's posible. It's bad practice and can be problem with big files... You can change
cat File | xargs -n1 -I # bash -c
to
for siteUrl in $( < "File" ); do
It's be more correct and be simpler to use sed with double quotes... My variant:
scriptDir=$( dirname -- "$0" )
for siteUrl in $( < "$scriptDir/File.txt" )
do
if [[ -z "$siteUrl" ]]; then break; fi # break line if him empty
wget -r -nd -l 1 -c -A rar,zip,7z,txt,jpg,iso,sfv,md5,pdf --no-parent --restrict-file-names=nocontrol --user=test --password=pass --no-check-certificate "$siteUrl" -P /mnt/USB/ && sed -i "s|$siteUrl||g" "$scriptDir/File.txt"
done
#beliy answers looks good!
If you want a one-liner, you can do:
while read -r line; do \
wget -r -nd -l 1 -c -A rar,zip,7z,txt,jpg,iso,sfv,md5,pdf \
--no-parent --restrict-file-names=nocontrol --user=test \
--password=pass --no-check-certificate "$line" -P /mnt/USB/ \
&& sed -i -e '\|'"$line"'|d' "File.txt"; \
done < File.txt
EDIT:
You need to add a \ in front of the first pipe
I believe you just need to use double quotes after sed -e. Instead of:
'...&& sed -e 's|#||g' File'
you would need
'...&& sed -e '"'s|#||g'"' File'
I see what you trying to do, but I dont understand the sed command including pipes. Maybe some fancy format that I dont understand.
Anyway, I think the sed command should look like this...
sed -e 's/#//g'
This command will remove all # from the stream.
I hope this helps!

bash/shell: easiest way to get kernel name components on command line

On my Fedora machine I sometimes need to find out certain components of the kernel name, e.g.
VERSION=3.18.9-200.fc21
VERSION_ARCH=3.18.9-200.fc21.x86_64
SHORT_VERSION=3.18
DIST_VERSION=fc21
EXTRAVERSION = -200.fc21.x86_64
I know uname -a/-r/-m but these give me not all the components I need.
Of course I can just disassemble uname -r e.g.
KERNEL_VERSION_ARCH=$(uname -r)
KERNEL_VERSION=$(uname -r | cut -d '.' -f 1-4)
KERNEL_SHORT_VERSION=$(uname -r | cut -d '.' -f 1-2)
KERNEL_DIST_VERSION=$(uname -r | cut -d '.' -f 4)
EXTRAVERSION="-$(uname -r | cut -d '-' -f 2)"
But this seems very cumbersome and not future-safe to me.
Question: is there an elegant way (i.e. more readable and distribution aware) to get all kernel version/name components I need?
Nice would be s.th. like
kernel-ver -f "%M.%m.%p-%e.%a"
3.19.4-200.fc21.x86_64
kernel-ver -f "%M.%m"
3.19
kernel-ver -f "%d"
fc21
Of course the uname -r part would need a bit sed/awk/grep magic. But there are some other options you can try:
cat /etc/os-release
cat /etc/lsb-release
Since it's fedora you can try: cat /etc/fedora-release
lsb_release -a is also worth a try.
cat /proc/version, but that nearly the same output as uname -a
In the files /etc/*-release the format is already VARIABLE=value, so you could source the file directly and access the variables later:
$ source /etc/os-release
$ echo $ID
fedora
To sum this up a command that should work on every system that combines the above ideas:
cat /etc/*_ver* /etc/*-rel* 2>/dev/null

How to define subroutines in a Makefile

I am working on a Makefile which has a¹ receipt producing some file using M4. It uses some complex shell constructions to compute macro values which have to be passed to M4. How can I organize code to avoid redundant declarations displayed in the following example?
M4TOOL= m4
M4TOOL+= -D PACKAGE=$$(cd ${PROJECTBASEDIR} && ${MAKE} -V PACKAGE)
M4TOOL+= -D VERSION=$$(cd ${PROJECTBASEDIR} && ${MAKE} -V VERSION)
M4TOOL+= -D AUTHOR=$$(cd ${PROJECTBASEDIR} && ${MAKE} -V AUTHOR)
M4TOOL+= -D RDC960=$$(openssl rdc960 ${DISTFILE} | cut -d ' ' -f 2)
M4TOOL+= -D SHA256=$$(openssl sha256 ${DISTFILE} | cut -d ' ' -f 2)
Portfile: Portfile.m4
${M4TOOL} ${.ALLSRC} > ${.TARGET}
¹ Actually a lot!
You should define pseudo-commands using the -c option of the shell, like this:
PROJECTVARIABLE=sh -c 'cd ${PROJECTBASEDIR} && ${MAKE} -V $$1' PROJECTVARIABLE
OPENSSLHASH=sh -c 'openssl $$1 $$2 | cut -d " " -f 2' OPENSSLHASH
Note the use of $ or $$ to use bsdmake variable expansion or shell variable expansion. With these defintions you can reorganise your code like this:
M4TOOLS+= -D PACKAGE=$$(${PROJECTVARIABLE} PACKAGE)
M4TOOLS+= -D VERSION=$$(${PROJECTVARIABLE} VERSION)
M4TOOLS+= -D AUTHOR=$$(${PROJECTVARIABLE} AUTHOR)
M4TOOLS+= -D RMD160=$$(${OPENSSLHASH} rmd160 ${DISTFILE})
M4TOOLS+= -D SHA256=$$(${OPENSSLHASH} sha256 ${DISTFILE})
The result is arguably easier to read and maintain. When you write such scripts, remember to use error codes and stderr to report errors.
PS: You can take a look at the COPYTREE_SHARE macro in /usr/ports/Mk/bsd.port.mk on a FreeBSD system. It illustrates well the technique.

Wget page title

Is it possible to Wget a page's title from the command line?
input:
$ wget http://bit.ly/rQyhG5 <<code>>
output:
If it’s broke, fix it right - Keeping it Real Estate. Home
This script would give you what you need:
wget --quiet -O - http://bit.ly/rQyhG5 \
| sed -n -e 's!.*<title>\(.*\)</title>.*!\1!p'
But there are lots of situations where it breaks, including if there is a <title>...</title> in the body of the page, or if the title is on more than one line.
This might be a little better:
wget --quiet -O - http://bit.ly/rQyhG5 \
| paste -s -d " " \
| sed -e 's!.*<head>\(.*\)</head>.*!\1!' \
| sed -e 's!.*<title>\(.*\)</title>.*!\1!'
but it does not fit your case as your page contains the following head opening:
<head profile="http://gmpg.org/xfn/11">
Again, this might be better:
wget --quiet -O - http://bit.ly/rQyhG5 \
| paste -s -d " " \
| sed -e 's!.*<head[^>]*>\(.*\)</head>.*!\1!' \
| sed -e 's!.*<title>\(.*\)</title>.*!\1!'
but there is still ways to break it, including no head/title in the page.
Again, a better solution might be:
wget --quiet -O - http://bit.ly/rQyhG5 \
| paste -s -d " " \
| sed -n -e 's!.*<head[^>]*>\(.*\)</head>.*!\1!p' \
| sed -n -e 's!.*<title>\(.*\)</title>.*!\1!p'
but I am sure we can find a way to break it. This is why a true xml parser is the right solution, but as your question is tagged shell, the above it the best I can come with.
The paste and the 2 sed can be merged in a single sed, but is less readable. However, this version has the advantage of working on multi-line titles:
wget --quiet -O - http://bit.ly/rQyhG5 \
| sed -n -e 'H;${x;s!.*<head[^>]*>\(.*\)</head>.*!\1!;T;s!.*<title>\(.*\)</title>.*!\1!p}'
Update:
As explain in the comments, the last sed above uses the T command which is a GNU extension. If you do not have a compatible version, you can use:
wget --quiet -O - http://bit.ly/rQyhG5 \
| sed -n -e 'H;${x;s!.*<head[^>]*>\(.*\)</head>.*!\1!;tnext;b;:next;s!.*<title>\(.*\)</title>.*!\1!p}'
Update 2:
As above still not working on Mac, try:
wget --quiet -O - http://bit.ly/rQyhG5 \
| sed -n -e 'H;${x;s!.*<head[^>]*>\(.*\)</head>.*!\1!;tnext};b;:next;s!.*<title>\(.*\)</title>.*!\1!p'
and/or
cat << EOF > script
H
\$x
\$s!.*<head[^>]*>\(.*\)</head>.*!\1!
\$tnext
b
:next
s!.*<title>\(.*\)</title>.*!\1!p
EOF
wget --quiet -O - http://bit.ly/rQyhG5 \
| sed -n -f script
(Note the \ before the $ to avoid variable expansion.)
It seams that the :next does not like to be prefixed by a $, which could be a problem in some sed version.
The following will pull whatever lynx thinks the title of the page is, saving you from all of the regex nonsense. Assuming the page you are retrieving is standards compliant enough for lynx, this should not break.
lynx -dump example.com | sed '2q;d'

Resources