print environment variables sorted by name including variables with newlines - bash

I couldn't find an existing answer to this specific case: I would like to simply display all exported environment variables sorted by their name. Normally I can do this like simply like:
$ env | sort
However, if some environment variables contain newlines in their values (as is the case on the CI system I'm working with), this does not work because the multi-line values will get mixed up with other variables.

Answering my own question since I couldn't find this elsewhere:
$ env -0 | sort -z | tr '\0' '\n'
env -0 separates each variable by a null character (which is more-or-less how they are already stored internally). sort -z uses null characters instead of newlines as the delimiter for fields to be sorted, and finally tr '\0' '\n' replaces the nulls with newlines again.
Note: env -0 and sort -z are non-standard extensions provided by the GNU coreutils versions of these utilities. Open to other ideas for how to do this with POSIX sort--I'm sure it is possible but it might require a for loop or something; not as easy as a one-liner.

The bash builtin export prints a sorted list of envars:
export -p | sed 's/declare -x //'
Similarly, to print a sorted list of exported functions (without their definitions):
export -f | grep 'declare -fx' | sed 's/declare -fx //'

In a limited environment where env -0 is not available, eg. Alpine 3.13, 3.14 (commands are simplified busybox versions) you can use awk:
awk 'BEGIN { for (K in ENVIRON) { printf "%s=%s%c", K, ENVIRON[K], 0; }}' | sort -z | tr '\0' '\n'
This uses awk to print each environment variable terminated with a null, simulating env -0. Note that setting ORS to null (-vORS='\0') does not work in this limited version of awk, neither does directly printing \0 in the printf, hence the %c to print 0.
Busybox awk lacks any sort functions, hence the remainder of the answer is the same as the top one.

env | sort -f
Worked for me.
The -f option makes sort ignore case, which is what you probably want 99% of the time

Related

AWK NR Variable Syntax Issue

I am new to AWK and trying to write some code where I can delete all files in a directory apart from the newest N number.
My code works if I use a hard coded number instead of a variable.
Works:
delete=`ls -t | awk 'NR>3'`
echo $delete
Does Not Work:
amount=3
delete=`ls -t | awk 'NR>$amount'`
echo $delete
I know the problem lies somewhere with the bash variable not being recognised as an awk variable however I do not know how to correct.
I have tried variations of code to fix this such as below, however I am still at a loss.
amount=3
delete=`ls -t | awk -v VAR=${amount} 'NR>VAR'`
echo $delete
Could you please advise what the correct code is ?
Shells don't expand anything inside single quotes.
Either:
amount=3
delete=$(ls -t | awk "NR>$amount")
or:
amount=3
delete=$(ls -t | awk -v amount=$amount 'NR > amount')
Be aware that parsing the output of ls is fraught if your file names are not limited to the portable file name character set. Spaces, newlines, and other special characters in the file name can wreck the parsing.
The simplest fix is to use double quotes instead of single. Single quotes prevent the shell from interpolating the variable $amount in the quoted string.
amount=3
ls -t | awk "NR>$amount"
I would not use a variable to capture the result. If you do use one, you need to quote it properly when interpolating it.
amount=3
delete=$(ls -t | awk -v VAR="$amount" 'NR>VAR')
echo "$delete"
Note that this is basically identical to your second attempt, which should have worked, modulo the string quoting issues.

how to grep multiples variable in bash

I need to grep multiple strings, but i don't know the exact number of strings.
My code is :
s2=( $(echo $1 | awk -F"," '{ for (i=1; i<=NF ; i++) {print $i} }') )
for pattern in "${s2[#]}"; do
ssh -q host tail -f /some/path |
grep -w -i --line-buffered "$pattern" > some_file 2>/dev/null &
done
now, the code is not doing what it's supposed to do. For example if i run ./script s1,s2,s3,s4,.....
it prints all lines that contain s1,s2,s3....
The script is supposed to do something like grep "$s1" | grep "$s2" | grep "$s3" ....
grep doesn't have an option to match all of a set of patterns. So the best solution is to use another tool, such as awk (or your choice of scripting languages, but awk will work fine).
Note, however, that awk and grep have subtly different regular expression implementations. It's not clear from the question whether the target strings are literal strings or regular expression patterns, and if the latter, what the expectations are. However, since the argument comes delimited with commas, I'm assuming that the pieces are simple strings and should not be interpreted as patterns.
If you want the strings to be interpreted as patterns, you can change index to match in the following little program:
ssh -q host tail -f /some/path |
awk -v STRINGS="$1" -v IGNORECASE=1 \
'BEGIN{split(STRINGS,strings,/,/)}
{for(i in strings)if(!index($0,strings[i]))next}
{print;fflush()}'
Note:
IGNORECASE is only available in gnu awk; in (most) other implementations, it will do nothing. It seems that is what you want, based on the fact that you used -i in your grep invocation.
fflush() is also an extension, although it works with both gawk and mawk. In Posix awk, fflush requires an argument; if you were using Posix awk, you'd be better off printing to stderr.
You can use extended grep
egrep "$s1|$s2|$s3" fileName
If you don't know how many pattern you need to grep, but you have all of them in an array called s, you can use
egrep $(sed 's/ /|/g' <<< "${s[#]}") fileName
This creates a herestring with all elements of the array, sed replaces the field separator of bash (space) with | and if we feed that to egrep we grep all strings that are in the array s.
test.sh:
#!/bin/bash -x
a=" $#"
grep ${a// / -e } .bashrc
it works that way:
$ ./test.sh 1 2 3
+ a=' 1 2 3'
+ grep -e 1 -e 2 -e 3 .bashrc
(here is lots of text that fits all the arguments)

Loop over list of files to merge according their names

Files in the directory look like that:
A_1_email.txt
A_1_phone.txt
A_2_email.txt
A_2_phone.txt
B_1_email.txt
B_1_phone.txt
B_2_email.txt
B_2_phone.txt
What I want:
To merge files A_1_email.txt and A_1_phone.txt; to merge files B_1_email.txt and B_1_phone.txt and so on.
What I mean by that: if first to flags of files names matches (for example A to A; 1 to 1) than merge files.
How I tried to do this:
ls * | cut -d "_" -f 1-2 | sort | uniq -c | awk '{print $2}' > names && for name in
$(cat names); do
And I am lost here, really don't know how should I go on further.
The following are based on #MichaelJ.Barber's answer (which had the excellent idea of using join), but with the specific intention to avoid the dangerous practice of parsing the output of ls:
# Simple loop: avoids subshells, pipelines
for file in *_email.txt; do
if [[ -r "$file" && -r "${file%_*}_phone.txt" ]]; then
join "$file" "${file%_*}_phone.txt"
fi
done
or
##
# Use IFS and a function to avoid contaminating the global environment.
joinEmailPhone() {
local IFS=$'\n'
local -x LC_COLLATE=C # Ensure glob expansion sorting makes sense.
# According to `man (1) bash`, globs expand sorted "alphabetically".
# If we use LC_COLLATE=C, we don't need to sort again.
# Use an awk test (!seen[$0]++) to ensure uniqueness and a parameter expansion instead of cut
awk '!seen[$0]++{ printf("join %s_email.txt %s_phone.txt\n", $1, $1) }' <<< "${*%_*}" | sh
}
joinEmailPhone *
But in all probability (again assuming LC_COLLATE=C) you can get away with:
printf 'join %s %s\n' * | sh
I'll assume that the files all have tab-separated name-value pairs, where the value is email or phone as appropriate. If that's not the case, do some pre-sorting or otherwise modify as appropriate.
ls *_{email,phone}.txt |
cut -d "_" -f1-2 | # could also do this with variable expansion
sort -u |
awk '{ printf("join %s_email.txt %s_phone.txt\n", $1, $1) }' |
sh
What this does is to identify the unique prefixes for the files and use 'awk' to generate shell commands for joining the pairs, which are then piped into sh to actually run the commands.
You may use printf '%s\n' *_{email,phone}.txt | ... instead of ls *-... in the given scenario, i. e. no newline chars in file path names are to be expected. At least one external command less!
Use a for loop to iterate over the email files, using the read command
with the proper value of IFS to split the file name into the necessary parts.
Note that this does use one non-POSIX feature that bash provides, namely
using a here-string (<<<) to pass the value of $email to the read command.
for email in *_email.txt; do
IFS=_ read fst snd <<< $email
phone=${fst}_${snd}_phone.txt
# merge $email and $phone
done

List all environment variable names in busybox

Environment variables with multiline values may confuse env's output:
# export A="B
> C=D
> E=F"
# env
A=B
C=D
E=F
TERM=xterm
SHELL=/bin/bash
USER=root
MAIL=/var/mail/root
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/root
LANG=en_US.UTF-8
PS1=\h:\w\$
SHLVL=1
HOME=/root
LOGNAME=root
_=/usr/bin/env
In this case I can not just use awk -F= to extract all names because it shows wrong names C and E:
# env | awk -F= '{print $1}'
A
C
E
TERM
SHELL
USER
MAIL
PATH
PWD
LANG
PS1
SHLVL
HOME
LOGNAME
_
Then I figured out that env supports flag -0 to end each output line with 0 byte rather than newline, so using sed I could cut off the values in bash:
# env -0 | sed -e ':a;N;$!ba;s/\([^=]\+\)=\([^\x00]*\)\x00/\1\n/g'
A
TERM
SHELL
USER
MAIL
PATH
PWD
LANG
PS1
SHLVL
HOME
LOGNAME
_
But BusyBox's version of env does not support flag -0. Is there another way to do it?
If you are using linux (I thought busybox ran only on linux, but I may be wrong), /proc/self/environ contains a NUL separated environment in the same form as env -0 gives you. You can replace env -0 | with < /proc/self/environ.
sed -e ':a;N;$!ba;s/\([^=]\+\)=\([^\x00]*\)\x00/\1\n/g' < /proc/self/environ
This is maybe not an elegant but working solution. It first extracts all possible names from env's output, then verifies each of them using shell's expansion ${parameter:+word}. And finally it removes duplicates, since the same variable name could be printed on several lines in env's output (as a real variable name and as a part of some other variable's multiline value):
env | awk -F= '/[a-zA-Z_][a-zA-Z_0-9]*=/ {
if (!system("[ -n \"${" $1 "+y}\" ]")) print $1 }' | sort | uniq
PS: The | sort | uniq part can be also implemented in awk.
This will break if your environment variable values contain nulls. But that would also break from POSIX compatibility.
So it should work.
...unless you expect to encounter environment variable names which contain newlines. In that case the newlines will be truncated when they're displayed. However I can't seem to fathom how to create an environment variable with a newline in it in a busybox shell. My local shells balk at it at any rate. So I don't think that would be a big problem. As far as POSIX says, Other characters may be permitted by an implementation; applications shall tolerate the presence of such names. so I think stripping them and not erroring-out is tolerable.
# Read our environment; it's delimited by null bytes.
# Remove newlines
# Replace null bytes with newlines
# On each line, grab everything before the first '='
cat /proc/self/environ | tr -d '\n' | tr '\0' '\n' | cut -d '=' -f 1

Get list of variables whose name matches a certain pattern

In bash
echo ${!X*}
will print all the names of the variables whose name starts with 'X'.
Is it possible to get the same with an arbitrary pattern, e.g. get all the names of the variables whose name contains an 'X' in any position?
Use the builtin command compgen:
compgen -A variable | grep X
This should do it:
env | grep ".*X.*"
Edit: sorry, that looks for X in the value too.
This version only looks for X in the var name
env | awk -F "=" '{print $1}' | grep ".*X.*"
As Paul points out in the comments, if you're looking for local variables too, env needs to be replaced with set:
set | awk -F "=" '{print $1}' | grep ".*X.*"
Easiest might be to do a
printenv |grep D.*=
The only difference is it also prints out the variable's values.
This will search for X only in variable names and output only matching variable names:
set | grep -oP '^\w*X\w*(?==)'
or for easier editing of searched pattern
set | grep -oP '^\w*(?==)' | grep X
or simply (maybe more easy to remember)
set | cut -d= -f1 | grep X
If you want to match X inside variable names, but output in name=value form, then:
set | grep -P '^\w*X\w*(?==)'
and if you want to match X inside variable names, but output only value, then:
set | grep -P '^\w*X\w*(?==)' | grep -oP '(?<==).*'
Enhancing Johannes Schaub - litb answer removing fork/exec in modern bash we could do
compgen -A variable -X '!*X*'
i.e an X in any position in the variable list.
env | awk -F= '{if($1 ~ /X/) print $1}'
To improve on Johannes Schaub - litb's answer:
There is a shortcut for -A variable and a flag to include a pattern:
compgen -v -X '!*SEARCHED*'
-v is a shortcut for -A variable
-X takes a pattern that must not be matched.
Hence -v -X '!*SEARCHED*' reads as:
variables that do not, not match "anything + SEARCHED + anything"
Which is equivalent to:
variables that do match "anything + SEARCHED + anything"
The question explicitly mentions "variables" but I think it's safe to say that many people will be looking for "custom declared things" instead.
But neither functions nor aliases are listed by -v.
If you are looking for variables, functions and aliases, you should use the following instead:
compgen -av -A function -X '!*SEARCHED*'
# equivalent to:
compgen -A alias -A variable -A function -X '!*SEARCHED*'
And if you only search for things that start with a PREFIX, compgen does that for you by default:
compgen -v PREFIX
You may of course adjust the options as needed, and the official doc will help you: https://www.gnu.org/software/bash/manual/html_node/Programmable-Completion-Builtins.html
to expand Phi's and Johannes Schaub - litb's answers for the following use case:
print contents of all environment variables whose names match a pattern as strings which can be reused in other (Bash) scripts, i.e. with all special characters properly escaped and the whole contents quoted
In case you have the following environment variables
export VAR_WITH_QUOTES=\"FirstName\ LastName\"\ \<firstname.lastname#example.com\>
export VAR_WITH_WHITESPACES="
a bc
"
export VAR_EMPTY=""
export VAR_WITH_QUOTES_2=\"\'
then the following snippet prints all VAR* environment variables in reusable presentation:
for var in $(compgen -A export -X '!VAR*'); do
printf "%s=%s\n" "$var" "${!var#Q}"
done
Snippet is is valid for Bash 4+.
The output is as follows, please note output for newlines, empty variables and variables which contain quotation characters:
VAR_EMPTY=''
VAR_WITH_QUOTES='"FirstName LastName" <firstname.lastname#example.com>'
VAR_WITH_QUOTES_2='"'\'''
VAR_WITH_WHITESPACES=$' \n\ta bc\n'
This also relates to the question Escape a variable for use as content of another script

Resources