How to check a text variable length in makefile? - makefile

As the title suggests, need to check the length of a text variable in a Make file script

I don't think there's anything built in, but
# XXX BROKEN - see below
len := $(shell printf '%s' '$(VARIABLE)' | wc -c)
should roughly do what you want.
This will fail if there are literal single quotes in the value, though. They can be escaped but doing that correctly is tedious and pesky. Maybe try
len := $(shell printf '%s' '$(subst ','"'"',$(VARIABLE))' | wc -c)
but I'm not sure I have covered all possible bases.
Maybe also notice the difference between wc -c (bytes) and wc -m (characters). They yield the same result for plain ASCII, but different results for other Unicode strings (perhaps also depending on your system encoding).

Related

Assigning a macro value in Make from a shell command

The I/O blocksize is going to figure prominently in a Makefile I need to write, so I need a way to calculate it. This script does what I want:
> cat blksz.bash
#!/bin/bash
bsl=$(du --block-size=1 testfile)
bsl=($bsl)
echo ${bsl[0]}
(Someone might have a better way to do it, but if you can bear with me, that's not really the most general point of the question.)
I can call this from my Makefile, and it works fine:
> cat Makefile
BLOCKSIZE := $(shell ./blksz.bash)
blocksize:
echo $(BLOCKSIZE)
> make blocksize
echo 4096
4096
Then I think this is such a small script, wouldn't it be better to just put it in the Makefile. But then it no longer works.
> cat Makefile
BLOCKSIZE := $(shell bsl=$(du --block-size=1 testfile) ; \
bsl=($bsl) ; \
echo ${bsl[0]})
blocksize:
echo $(BLOCKSIZE)
> make blocksize
echo
i.e. BLOCKSIZE is never defined. Clearly, I have defined the shell command incorrectly in the Makefile. Can anyone tell me the correct way to do this? Also, there might be better ways to get the block size, but the broader issue of how to get a return value out of a shell command so the Makefile can see it will probably come up again for me at some point, and that is fundamentally what I am trying to figure out.
One last thing, regarding duplicate questions, there are a few similar questions around, but nothing that gets quite at what I am asking, AFAICT. I think what makes this different is that I am using variables within the shell command, and somehow their contents are being lost.
To answer your question directly, $ is a special character to make (it introduces make variables and functions) so if you want to pass it to the shell you have to escape it as $$. Plus the issue of array variables not being available in /bin/sh as mentioned by #tripleee.
So your command must be:
BLOCKSIZE := $(shell /bin/bash -c 'bsl=$$(du --block-size=1 testfile) ; bsl=($$bsl) ; echo $${bsl[0]}').
Personally I don't like to use bash-specific features nor awk but YMMV. A better way to do this IMO is to use the stat program:
BLOCKSIZE := $(shell stat -c %s testfile)
If you don't want to stat, then another way would be:
BLOCKSIZE := ${shell set -- $$(du --block-size=1 testfile); echo $$1}
Unless you hack SHELL=/bin/bash your shell commands will be executed by /bin/sh. But you don't need Bash for this.
BLOCKSIZE := $(shell du --block-size=1 testfile | awk '{ print $$1 }')

How much memory does a variable take?

The variable "a=b" contains 1 char 'a' for the name, and 1 char 'b' for the value.
Together 2 bytes.
How many characters can you store with one byte ?
The variable needs a pointer. 8 bytes.
How many bytes do pointers take up ?
Together 10 bytes.
Does a variable "a=b" stored in memory take about 10 bytes ? And would 10 variables of the same size take about 100 bytes ?
So 1000 variables of 1000 bytes each would be almost 1MB of memory ?
I have a file data.sh that only contains variables.
I need to retrieve the value of one variable in that file.
I do this by using a function.
(called by "'function-name' 'datafile-name' 'variable-name'")
#!/usr/pkg/bin/ksh93
readvar () {
while read -r line
do
typeset "${line}"
done < "${1}"
nameref indirect="${2}"
echo "${indirect}"
}
readvar datafile variable
The function reads the file data.sh line by line.
While it does that is typesets each line.
After it's done with that,
it makes a namereference from the variable-name in the function-call,
to one of the variables of the file data.sh.
To finally print the value of that variable.
When the function is finished it no longer uses up memory.
But as long as the function is running it does.
This means all variables in the file data.sh are at some point stored in memory.
Correct ?
In reality I have a file with ip-addresses as variable name and a nickname as values. So I suppose this will not be such a problem on memory. But if I use this also for posts of visitors variable values will be of larger sizes. But then it would be possible to have this function only store for instance 10 variables in memory each time.
However I wonder if my way of calculating this memory usage of variables is making any sense.
Edit:
This might be a solution to avoid loading the whole file in memory.
#!/bin/ksh
readvar () {
input=$(print "${2}" | sed 's/\[/\\[/g' | sed 's/\]/\\]/g')
line=$(grep "${input}" "${1}")
typeset ${line}
nameref indirect="${2}"
print "${indirect}"
}
readvar ./test.txt input[0]
With the input test.txt
input[0]=192.0.0.1
input[1]=192.0.0.2
input[2]=192.0.0.2
And the output
192.0.0.1
Edit:
Of course !!!
In the original post
Bash read array from an external file
it said:
# you could do some validation here
so:
while read -r line
do
# you could do some validation here
declare "$line"
done < "$1"
lines would be declared (or typeset in ksh) under a condition.
Your real concern seems not to be "how much memory does this take?" but "how can I avoid taking uselessly much memory for this?". I'm going to answer this one first. For a bunch of thoughts about the original question, see the end of my answer.
For avoiding to use up memory I propose to use grep to get the one line which is of interest to you and ignore all the others:
line=$(grep "^$2=" "$1")
Then you can extract the information you need from this line:
result=$(echo "$line" | cut -d= -f 2)
Now the variable result contains the value which would have been assigned to $2 in the file $1. Since you have no need to store more than one such result value you definitely have no memory issue.
Now, to the original question:
To find out how much memory a shell uses up for each variable is tricky. You would need to have a look into the source of the shell to be sure on the implementation. It can vary from shell to shell (you appear to be using ksh which can be different from bash in this aspect). It also can vary from version to version.
One way to get an idea would be to watch a shell process's memory usage while making the shell set variables in large amounts:
bash -c 'a="$(head -c 1000 /dev/zero | tr "\0" x)"; for ((i=0; i<1000; i++)); do eval a${i}="$a"; done; grep ^VmPeak /proc/$$/status'
bash -c 'a="$(head -c 1000 /dev/zero | tr "\0" x)"; for ((i=0; i<10000; i++)); do eval a${i}="$a"; done; grep ^VmPeak /proc/$$/status'
bash -c 'a="$(head -c 1000 /dev/zero | tr "\0" x)"; for ((i=0; i<100000; i++)); do eval a${i}="$a"; done; grep ^VmPeak /proc/$$/status'
bash -c 'a="$(head -c 1000 /dev/zero | tr "\0" x)"; for ((i=0; i<200000; i++)); do eval a${i}="$a"; done; grep ^VmPeak /proc/$$/status'
This prints the peak amount of memory in use by a bash which sets 1000, 10000, 100000, and 200000 variables with a value of 1000 x characters. On my machine (using bash 4.2.25(1)-release) this gave the following output:
VmPeak: 19308 kB
VmPeak: 30220 kB
VmPeak: 138888 kB
VmPeak: 259688 kB
This shows that the memory used is growing more or less in a linear fashion (plus a fixed offset of ~17000k) and that each new variable takes ~1.2kB of additional memory.
But as I said, other shells' results may vary.

Assign string containing null-character (\0) to a variable in Bash

While trying to process a list of file-/foldernames correctly (see my other questions) through the use of a NULL-character as a delimiter I stumbled over a strange behaviour of Bash that I don't understand:
When assigning a string containing one or more NULL-character to a variable, the NULL-characters are lost / ignored / not stored.
For example,
echo -ne "n\0m\0k" | od -c # -> 0000000 n \0 m \0 k
But:
VAR1=`echo -ne "n\0m\0k"`
echo -ne "$VAR1" | od -c # -> 0000000 n m k
This means that I would need to write that string to a file (for example, in /tmp) and read it back from there if piping directly is not desired or feasible.
When executing these scripts in Z shell (zsh) the strings containing \0 are preserved in both cases, but sadly I can't assume that zsh is present in the systems running my script while Bash should be.
How can strings containing \0 chars be stored or handled efficiently without losing any (meta-) characters?
In Bash, you can't store the NULL-character in a variable.
You may, however, store a plain hex dump of the data (and later reverse this operation again) by using the xxd command.
VAR1=`echo -ne "n\0m\0k" | xxd -p | tr -d '\n'`
echo -ne "$VAR1" | xxd -r -p | od -c # -> 0000000 n \0 m \0 k
As others have already stated, you can't store/use NUL char:
in a variable
in an argument of the command line.
However, you can handle any binary data (including NUL char):
in pipes
in files
So to answer your last question:
can anybody give me a hint how strings containing \0 chars can be
stored or handled efficiently without losing any (meta-) characters?
You can use files or pipes to store and handle efficiently any string with any meta-characters.
If you plan to handle data, you should note additionally that:
Only the NUL char will be eaten by variable and argument of the command line, you can check this.
Be wary that command substitution (as $(command..) or `command..`) has an additional twist above being a variable as it'll eat your ending new lines.
Bypassing limitations
If you want to use variables, then you must get rid of the NUL char by encoding it, and various other solutions here give clever ways to do that (an obvious way is to use for example base64 encoding/decoding).
If you are concerned by memory or speed, you'll probably want to use a minimal parser and only quote NUL character (and the quoting char). In this case this would help you:
quote() { sed 's/\\/\\\\/g;s/\x0/\\x00/g'; }
Then, you can secure your data before storing them in variables and
command line argument by piping your sensitive data into quote, which will output a safe data stream without NUL chars. You can get back
the original string (with NUL chars) by using echo -en "$var_quoted" which will send the correct string on the standard output.
Example:
## Our example output generator, with NUL chars
ascii_table() { echo -en "$(echo '\'0{0..3}{0..7}{0..7} | tr -d " ")"; }
## store
myvar_quoted=$(ascii_table | quote)
## use
echo -en "$myvar_quoted"
Note: use | hd to get a clean view of your data in hexadecimal and
check that you didn't loose any NUL chars.
Changing tools
Remember you can go pretty far with pipes without using variables nor argument in command line, don't forget for instance the <(command ...) construct that will create a named pipe (sort of a temporary file).
EDIT: the first implementation of quote was incorrect and would not deal correctly with \ special characters interpreted by echo -en. Thanks #xhienne for spotting that.
EDIT2: the second implementation of quote had bug because of using only \0 than would actually eat up more zeroes as \0, \00, \000 and \0000 are equivalent. So \0 was replaced by \x00. Thanks for #MatthijsSteen for spotting this one.
Use uuencode and uudecode for POSIX portability
xxd and base64 are not POSIX 7 but uuencode is.
VAR="$(uuencode -m <(printf "a\0\n") /dev/stdout)"
uudecode -o /dev/stdout <(printf "$VAR") | od -tx1
Output:
0000000 61 00 0a
0000003
Unfortunately I don't see a POSIX 7 alternative for the Bash process <() substitution extension except writing to file, and they are not installed in Ubuntu 12.04 by default (sharutils package).
So I guess that the real answer is: don't use Bash for this, use Python or some other saner interpreted language.
I love jeff's answer. I would use Base64 encoding instead of xxd. It saves a little space and would be (I think) more recognizable as to what is intended.
VAR=$(echo -ne "foo\0bar" | base64)
echo -n "$VAR" | base64 -d | xargs -0 ...
As for -e, it is needed for the echo of a literal string with an encoded null ('\0'), though I also seem to recall something about "echo -e" being unsafe if you're echoing any user input as they could inject escape sequences that echo will interpret and end up with bad things. The -e flag is not needed when echoing the encoded stored string into the decode.
Here’s a maximally memory-efficient solution, that just escapes the NULL bytes with an \xFF.
(Since I wasn’t happy with base64 or the like. :)
esc0() { sed 's/\xFF/\xFF\xFF/g; s/\x00/\xFF0/g'; }
cse0() { sed 's/\xFF0/\xFF\x00/g; s/\xFF\(.\)/\1/g'; }
It of course escapes any actual \xFF by doubling it too, so it works exactly like when backslashes are used for escaping. This is also why a simple mapping can’t be used, and referring to the match in the replacement is required.
Here’s an example that paints gradients onto the framebuffer (doesn’t work in X), using variables to pre-render blocks and lines for speed:
width=7680; height=1080; # Set these to your framebuffer’s size.
blocksPerLine=$(( $width / 256 ))
block="$( for i in 0 1 2 3 4 5 6 7 8 9 A B C D E F; do for j in 0 1 2 3 4 5 6 7 8 9 A B C D E F; do echo -ne "\x$i$j"; done; done | esc0 )"
line="$( for ((b=0; b < blocksPerLine; b++)); do echo -en "$block"; done )"
for ((l=0; l <= $height; l++)); do echo -en "$line"; done | cse0 > /dev/fb0
Note how $block contains escaped NULLs (plus \xFFs), and at the end, before writing everything to the framebuffer, cse0 unescapes them.

Handle special characters in bash for...in loop

Suppose I've got a list of files
file1
"file 1"
file2
a for...in loop breaks it up between whitespace, not newlines:
for x in $( ls ); do
echo $x
done
results:
file
1
file1
file2
I want to execute a command on each file. "file" and "1" above are not actual files. How can I do that if the filenames contains things like spaces or commas?
It's a little trickier than I think find -print0 | xargs -0 could handle, because I actually want the command to be something like "convert input/file1.jpg .... output/file1.jpg" so I need to permutate the filename in the process.
Actually, Mark's suggestion works fine without even doing anything to the internal field separator. The problem is running ls in a subshell, whether by backticks or $( ) causes the for loop to be unable to distinguish between spaces in names. Simply using
for f in *
instead of the ls solves the problem.
#!/bin/bash
for f in *
do
echo "$f"
done
UPDATE BY OP: this answer sucks and shouldn't be on top ... #Jordan's post below should be the accepted answer.
one possible way:
ls -1 | while read x; do
echo $x
done
I know this one is LONG past "answered", and with all due respect to eduffy, I came up with a better way and I thought I'd share it.
What's "wrong" with eduffy's answer isn't that it's wrong, but that it imposes what for me is a painful limitation: there's an implied creation of a subshell when the output of the ls is piped and this means that variables set inside the loop are lost after the loop exits. Thus, if you want to write some more sophisticated code, you have a pain in the buttocks to deal with.
My solution was to take the "readline" function and write a program out of it in which you can specify any specific line number that you may want that results from any given function call. ... As a simple example, starting with eduffy's:
ls_output=$(ls -1)
# The cut at the end of the following line removes any trailing new line character
declare -i line_count=$(echo "$ls_output" | wc -l | cut -d ' ' -f 1)
declare -i cur_line=1
while [ $cur_line -le $line_count ] ;
do
# NONE of the values in the variables inside this do loop are trapped here.
filename=$(echo "$ls_output" | readline -n $cur_line)
# Now line contains a filename from the preceeding ls command
cur_line=cur_line+1
done
Now you have wrapped up all the subshell activity into neat little contained packages and can go about your shell coding without having to worry about the scope of your variable values getting trapped in subshells.
I wrote my version of readline in gnuc if anyone wants a copy, it's a little big to post here, but maybe we can find a way...
Hope this helps,
RT

How can I select random files from a directory in bash?

I have a directory with about 2000 files. How can I select a random sample of N files through using either a bash script or a list of piped commands?
Here's a script that uses GNU sort's random option:
ls |sort -R |tail -$N |while read file; do
# Something involving $file, or you can leave
# off the while to just get the filenames
done
You can use shuf (from the GNU coreutils package) for that. Just feed it a list of file names and ask it to return the first line from a random permutation:
ls dirname | shuf -n 1
# probably faster and more flexible:
find dirname -type f | shuf -n 1
# etc..
Adjust the -n, --head-count=COUNT value to return the number of wanted lines. For example to return 5 random filenames you would use:
find dirname -type f | shuf -n 5
Here are a few possibilities that don't parse the output of ls and that are 100% safe regarding files with spaces and funny symbols in their name. All of them will populate an array randf with a list of random files. This array is easily printed with printf '%s\n' "${randf[#]}" if needed.
This one will possibly output the same file several times, and N needs to be known in advance. Here I chose N=42.
a=( * )
randf=( "${a[RANDOM%${#a[#]}]"{1..42}"}" )
This feature is not very well documented.
If N is not known in advance, but you really liked the previous possibility, you can use eval. But it's evil, and you must really make sure that N doesn't come directly from user input without being thoroughly checked!
N=42
a=( * )
eval randf=( \"\${a[RANDOM%\${#a[#]}]\"\{1..$N\}\"}\" )
I personally dislike eval and hence this answer!
The same using a more straightforward method (a loop):
N=42
a=( * )
randf=()
for((i=0;i<N;++i)); do
randf+=( "${a[RANDOM%${#a[#]}]}" )
done
If you don't want to possibly have several times the same file:
N=42
a=( * )
randf=()
for((i=0;i<N && ${#a[#]};++i)); do
((j=RANDOM%${#a[#]}))
randf+=( "${a[j]}" )
a=( "${a[#]:0:j}" "${a[#]:j+1}" )
done
Note. This is a late answer to an old post, but the accepted answer links to an external page that shows terrible bash practice, and the other answer is not much better as it also parses the output of ls. A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice, but doesn't exactly answer the OP.
ls | shuf -n 10 # ten random files
A simple solution for selecting 5 random files while avoiding to parse ls. It also works with files containing spaces, newlines and other special characters:
shuf -ezn 5 * | xargs -0 -n1 echo
Replace echo with the command you want to execute for your files.
This is an even later response to #gniourf_gniourf's late answer, which I just upvoted because it's by far the best answer, twice over. (Once for avoiding eval and once for safe filename handling.)
But it took me a few minutes to untangle the "not very well documented" feature(s) this answer uses. If your Bash skills are solid enough that you saw immediately how it works, then skip this comment. But I didn't, and having untangled it I think it's worth explaining.
Feature #1 is the shell's own file globbing. a=(*) creates an array, $a, whose members are the files in the current directory. Bash understands all the weirdnesses of filenames, so that list is guaranteed correct, guaranteed escaped, etc. No need to worry about properly parsing textual file names returned by ls.
Feature #2 is Bash parameter expansions for arrays, one nested within another. This starts with ${#ARRAY[#]}, which expands to the length of $ARRAY.
That expansion is then used to subscript the array. The standard way to find a random number between 1 and N is to take the value of random number modulo N. We want a random number between 0 and the length of our array. Here's the approach, broken into two lines for clarity's sake:
LENGTH=${#ARRAY[#]}
RANDOM=${a[RANDOM%$LENGTH]}
But this solution does it in a single line, removing the unnecessary variable assignment.
Feature #3 is Bash brace expansion, although I have to confess I don't entirely understand it. Brace expansion is used, for instance, to generate a list of 25 files named filename1.txt, filename2.txt, etc: echo "filename"{1..25}".txt".
The expression inside the subshell above, "${a[RANDOM%${#a[#]}]"{1..42}"}", uses that trick to produce 42 separate expansions. The brace expansion places a single digit in between the ] and the }, which at first I thought was subscripting the array, but if so it would be preceded by a colon. (It would also have returned 42 consecutive items from a random spot in the array, which is not at all the same thing as returning 42 random items from the array.) I think it's just making the shell run the expansion 42 times, thereby returning 42 random items from the array. (But if someone can explain it more fully, I'd love to hear it.)
The reason N has to be hardcoded (to 42) is that brace expansion happens before variable expansion.
Finally, here's Feature #4, if you want to do this recursively for a directory hierarchy:
shopt -s globstar
a=( ** )
This turns on a shell option that causes ** to match recursively. Now your $a array contains every file in the entire hierarchy.
If you have Python installed (works with either Python 2 or Python 3):
To select one file (or line from an arbitrary command), use
ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"
To select N files/lines, use (note N is at the end of the command, replace this by a number)
ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N
If you want to copy a sample of those files to another folder:
ls | shuf -n 100 | xargs -I % cp % ../samples/
make samples directory first obviously.
MacOS does not have the sort -R and shuf commands, so I needed a bash only solution that randomizes all files without duplicates and did not find that here. This solution is similar to gniourf_gniourf's solution #4, but hopefully adds better comments.
The script should be easy to modify to stop after N samples using a counter with if, or gniourf_gniourf's for loop with N. $RANDOM is limited to ~32000 files, but that should do for most cases.
#!/bin/bash
array=(*) # this is the array of files to shuffle
# echo ${array[#]}
for dummy in "${array[#]}"; do # do loop length(array) times; once for each file
length=${#array[#]}
randomi=$(( $RANDOM % $length )) # select a random index
filename=${array[$randomi]}
echo "Processing: '$filename'" # do something with the file
unset -v "array[$randomi]" # set the element at index $randomi to NULL
array=("${array[#]}") # remove NULL elements introduced by unset; copy array
done
If you have more files in your folder, you can use the below piped command I found in unix stackexchange.
find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/
Here I wanted to copy the files, but if you want to move files or do something else, just change the last command where I have used cp.
This is the only script I can get to play nice with bash on MacOS. I combined and edited snippets from the following two links:
ls command: how can I get a recursive full-path listing, one line per file?
http://www.linuxquestions.org/questions/linux-general-1/is-there-a-bash-command-for-picking-a-random-file-678687/
#!/bin/bash
# Reads a given directory and picks a random file.
# The directory you want to use. You could use "$1" instead if you
# wanted to parametrize it.
DIR="/path/to/"
# DIR="$1"
# Internal Field Separator set to newline, so file names with
# spaces do not break our script.
IFS='
'
if [[ -d "${DIR}" ]]
then
# Runs ls on the given dir, and dumps the output into a matrix,
# it uses the new lines character as a field delimiter, as explained above.
# file_matrix=($(ls -LR "${DIR}"))
file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=$0;f=0}; /:$/&&!f{sub(/:$/,"");s=$0;f=1;next}; NF&&f{ print s"/"$0 }'))
num_files=${#file_matrix[*]}
# This is the command you want to run on a random file.
# Change "ls -l" by anything you want, it's just an example.
ls -l "${file_matrix[$((RANDOM%num_files))]}"
fi
exit 0
I use this: it uses temporary file but goes deeply in a directory until it find a regular file and return it.
# find for a quasi-random file in a directory tree:
# directory to start search from:
ROOT="/";
tmp=/tmp/mytempfile
TARGET="$ROOT"
FILE="";
n=
r=
while [ -e "$TARGET" ]; do
TARGET="$(readlink -f "${TARGET}/$FILE")" ;
if [ -d "$TARGET" ]; then
ls -1 "$TARGET" 2> /dev/null > $tmp || break;
n=$(cat $tmp | wc -l);
if [ $n != 0 ]; then
FILE=$(shuf -n 1 $tmp)
# or if you dont have/want to use shuf:
# r=$(($RANDOM % $n)) ;
# FILE=$(tail -n +$(( $r + 1 )) $tmp | head -n 1);
fi ;
else
if [ -f "$TARGET" ] ; then
rm -f $tmp
echo $TARGET
break;
else
# is not a regular file, restart:
TARGET="$ROOT"
FILE=""
fi
fi
done;
How about a Perl solution slightly doctored from Mr. Kang over here:
How can I shuffle the lines of a text file on the Unix command line or in a shell script?
$ ls | perl -MList::Util=shuffle -e '#lines = shuffle(<>); print
#lines[0..4]'

Resources