bash to store unique value if array in variable - bash

The bash below goes to a folder and stores all the unique values that are .html file names in f1. It then removes all text after the _ in $p. I added a for loop to get the unique id in $p. The terminal out shows $p is correct, but the last value is only being stored in the new array ($sorted_unique_ids), I am not sure why all three are not.
dir=/path/to
var=$(ls -td "$dir"/*/ | head -1) ## sore newest <run> in var
for f1 in "$var"/qc/*.html ; do
# Grab file prefix
bname=`basename $f1` # strip of path
p="$(echo $bname|cut -d_ -f1)"
typeset -A myarray ## define associative array
myarray[${p}]=yes ## store p in myarray
for i in ${!myarray[#]}; do echo ${!myarray[#]} | tr ' ' '\n' | sort; done
done
output
id1
id1
id1
id2
id1
id2
id1
id2
id3
id1
id2
id3
desired sorted_unique_ids
id1
id2
id3

Maybe something like this:
dir=$(ls -td "$dir"/*/ | head -1)
find "$dir" -maxdepth 1 -type f -name '*_*.html' -printf "%f\n" |
cut -d_ -f1 | sort -u
For input directory structure created like:
dir=dir
mkdir -p dir/dir
touch dir/dir/id{1,2,3}_{a,b,c}.html
So it looks like this:
dir/dir/id2_b.html
dir/dir/id1_c.html
dir/dir/id2_c.html
dir/dir/id1_b.html
dir/dir/id3_b.html
dir/dir/id2_a.html
dir/dir/id3_a.html
dir/dir/id1_a.html
dir/dir/id3_c.html
The script will output:
id1
id2
id3
Tested on repl.

latest=`ls -t "$dir"|head -1` # or …|sed q` if you're really jonesing for keystrokes
for f in "$latest"/qc/*_*.html; do f=${f##*/}; printf %s\\n "${f%_*}"; done | sort -u

Define an associative array:
typeset -A myarray
Use each p value as the index for an array element; assign any value you want to the array element (the value just acts as a placeholder):
myarray[${p}]=yes
If you run across the same p value more than once, each assignment to the array will overwrite the previous assignment; net result is that you'll only ever have a single element in the array with a value of p.
To obtain your unique list of p values, you can loop through the indexes for the array, eg:
for i in ${!myarray[#]}
do
echo ${i}
done
If you need the array indexes generated in sorted order try:
echo ${!myarray[#]} | tr ' ' '\n' | sort
You can then use this sorted result set as needed (eg, dump to stdout, feed to a loop, etc).
So, adding my code to the OPs original code would give us:
typeset -A myarray ## define associative array
dir=/path/to
var=$(ls -td "$dir"/*/ | head -1) ## sore newest <run> in var
for f1 in "$var"/qc/*.html ; do
# Grab file prefix
bname=`basename $f1` # strip of path
p="$(echo $bname|cut -d_ -f1)"
myarray[${p}]=yes ## store p in myarray
done
# display sorted, unique set of p values
for i in ${!myarray[#]}; do echo ${!myarray[#]} | tr ' ' '\n' | sort; done

Related

BASH print lowest numbered strings in a variable

i'am currently writing a skript to remove old folders. Something like a log rotation.
the directory contains folders like: (where 12345678 is epoch time from the creation)
123-1.2.3.4-12345678
i now did manage to get the skript sort out all the not used folders and older then X days.
I now want to remove all folders with the lowest numbers in the filename except of the the X newest.
the foldernames are saved in a variable like:
123-1.2.3.4-12345679
123-1.2.3.4-12345680
123-1.2.3.4-12345681
123-1.2.3.4-12345682
how can i find out which are the X newest and save the other ones in a vairable to remove them in a next step.
Assuming the directory names do not include newline characters,
would you please try:
#!/bin/bash
# example of an array of directory names
dirnames=(
"123-1.2.3.4-12345679"
"123-1.2.3.4-12345680"
"123-1.2.3.a.b-12345681"
"h5p-32.ad-12345682"
"foo bar baz-12345678"
)
x=2 # remove older directories except for the x newest ones
for d in "${dirnames[#]}"; do # loop over the directory names
ts=${d##*-} # extract the timestamp
printf "%s\t%s\n" "$ts" "$d" # prepend the timestamp to the directory name delimited by a tab character
done | sort -nrk1,1 | tail -n +$(( x + 1 )) | cut -f2- | tr "\n" "\0" | xargs -0 echo rm -rf
sort -nrk1,1 sorts the directory names with the timestamp in descending
order (newest first, oldest last).
tail -n +$(( x + 1 )) outputs x+1th line and onward.
cut -f2- removes the prepended timestamp.
tr "\n" "\0" converts a newline character to a null character to preserve
blank characters in the directory names combined with xargs -0.
If the output looks good, drop echo prior to rm.
If you instead want to save the output in an array, change the last four lines to:
mapfile -t array < <(for d in "${dirnames[#]}"; do
ts=${d##*-}
printf "%s\t%s\n" "$ts" "$d"
done | sort -nrk1,1 | tail -n +$(( x + 1 )) | cut -f2-)
Then the array array will hold the directory names to be removed.

How to assign value to a variable that is provided from file in for loop

Im facing a problem with assigning a value to a variable that its name is stored in other variable or a file
cat ids.txt
ID1
ID2
ID3
What i want to do is:
for i in `cat ids.txt'; do $i=`cat /proc/sys/kernel/random/uuid`
or
for i in ID1 ID2 ID3; do $i=`cat /proc/sys/kernel/random/uuid`
But its not working.
What i would like to have, its something like:
echo $ID1
5dcteeee-6abb-4agg-86bb-948593020451
echo $ID2
5dcteeee-6abb-4agg-46db-948593322990
echo $ID3
5dcteeee-6abb-4agg-86cb-948593abcd45
Use declare. https://linuxcommand.org/lc3_man_pages/declareh.html
# declare values
for i in ID1 ID2 ID3; do
declare ${i}=$(cat /proc/sys/kernel/random/uuid)
done
# read values (note the `!` in the variable to simulate "$ID1", not ID1)
for i in ID1 ID2 ID3; do echo ${!i}; done
3f204128-bac6-481e-abd3-37bb6cb522da
ccddd0fb-1b6c-492e-bda3-f976ca62d946
ff5e04b9-2e51-4dac-be41-4c56cfbce22e
Or better yet... Reading IDs from the file:
for i in $(cat ids.txt); do
echo "ID from file: ${i}"
declare ${i}=$(cat /proc/sys/kernel/random/uuid)
echo "${i}=${!i}"
done
Result:
$ cat ids.txt
ID1
ID2
ID3
$ for i in $(cat ids.txt); do echo "ID from file: ${i}"; declare ${i}=$(cat /proc/sys/kernel/random/uuid); echo "${i}=${!i}"; done
ID from file: ID1
ID1=d5c4a002-9039-498b-930f-0aab488eb6da
ID from file: ID2
ID2=a77f6c01-7170-4f4f-a924-1069e48e93db
ID from file: ID3
ID3=bafe8bb2-98e6-40fa-9fb2-0bcfd4b69fad
A one-liner using . built-in, process and command substitution, and printf's implicit loop:
. <(printf '%s=$(cat /proc/sys/kernel/random/uuid)\n' $(<ids.txt))
echo "ID1=$ID1"; echo "ID2=$ID2"; echo "ID3=$ID3"
Note: The lines of ids.txt must consist of only valid variable names and the file must come from a trusted source. Checking that file by grep -vq '^[[:alpha:]][[:alnum:]]*$' ids.txt before calling this command may be a safer approach.
Method with associative array:
#!/usr/bin/env bash
# Declares associative array to store ids and UUIDs
declare -A id_map=()
# Reads ids.txt into array
mapfile -t ids < ids.txt
# Iterates ids
for id in "${ids[#]}"; do
# Populates id_map with uuid for each id
# Prepends $id with an x because associative array keys must not be empty
read -r id_map["x$id"] < /proc/sys/kernel/random/uuid
done
# Debug content of id_map
for x_id in "${!id_map[#]}"; do
id="${x_id#?}" # Trims leading x filler
printf '%s=%s\n' "$id" "${id_map[$x_id]}"
done

How to get from a file only the character with reputed value

I need to extract from the file the words that contain certain letters in a certain amount.
I apologize if this question has been resolved in the past, I just did not find anything that fits what I am looking for.
File:
wab 12aaabbb abababx ab ttttt baaabb zabcabc
baab baaabb cbaab ab ccabab zzz
For example
1. If I chose the letters a and the number is 1 the output should be:
wab
ab
ab
//only the words that contains a and the char appear in the word 1 time
2. If I chose the letters a,b and the number is 3, the output should be:
12aaabbb
abababx
baaabb
//only the word contains a,b, and both chars appear in the word 3 times
3. If I chose the letters a,b,c and the number 2, the output should be:
ccabab
zabcabc
//only the words that contains a,b,c and the chars appear in the word 3 times
Is it possible to find 2 letters in the same script?
I was able to find in a single letter but I get only the words where the letters appear in sequence and I do not want to find only these words, that's what I did:
egrep '([a])\1{N-1}' file
And another problem I can not get only the specific words, I get all file and the letter I am looking for "a" in red.
I tried using -w but it does not display anything.
::: EDIT :::
try to edit what you did to a for
i=$1
fileName=$2
letters=${#: 3}
tr -s '[:space:]' '\n' < $fileName* |
for letter in $letters; do
grep -E "^[^$letter]*($letter[^$letter]*){$i}$"
done | uniq
There are various ways to split input so that grep sees a single word per line. tr is most common. For example:
tr -s '[:space:]' '\n' file | ...
We can build a function to find a specific number of a particular letter:
NofL(){
num=$1
letter=$2
regex="^[^$letter]*($letter[^$letter]*){$num}$"
grep -E "$regex"
}
Then:
# letter=a number=1
tr -s '[:space:]' '\n' file | NofL 1 a
# letters=a,b number=3
tr -s '[:space:]' '\n' file | NofL 3 a | NofL 3 b
# letters=a,b,c number=2
tr -s '[:space:]' '\n' file | NofL 2 a | NofL 2 b | NofL 2 c
Regexes are not really suited for that job as there are more efficient ways, but it is possible using repeated matching. We first select all words, from those we select words with n as, and from those we select words with n bs and so on.
Example for n=3 and a, b:
grep -Eo '[[:alnum:]]+' |
grep -Ex '[^a]*a[^a]*a[^a]*a[^a]*' |
grep -Ex '[^b]*b[^b]*b[^b]*b[^b]*'
To auto-generate such a command from an input like 3 a b, you need to dynamically create a pipeline, which is possible, but also a hassle:
exactly_n_times_char() {
(( $# >= 2 )) || { cat; return; }
local n="$1" char="$2" regex
regex="[^$char]*($char[^$char]*){$n}"
shift 2
grep -Ex "$regex" | exactly_n_times_char "$n" "$#"
}
grep -Eo '[[:alnum:]]+' file.txt | exactly_n_times_char 3 a b
With PCREs (requires GNU grep or pcregrep) the check can be done in a single regex:
exactly_n_times_char() {
local n="$1" regex=""
shift
for char; do # could be done without a loop using sed on $*
regex+="(?=[^$char\\W]*($char[^$char\\W]*){$n})"
done
regex+='\w+'
grep -Pow "$regex"
}
exactly_n_times_char 3 a b < file.txt
If a matching word appears multiple times (like baaabb in your example) it is printed multiple times too. You can filter out duplicates by piping through sort -u but that will change the order.
A method using sed and bash would be:
#!/bin/bash
file=$1
n=$2
chars=$3
for ((i = 0; i < ${#chars}; ++i)); do
c=${chars:i:1}
args+=(-e)
args+=("/^\([^$c]*[$c]\)\{$n\}[^$c]*\$/!d")
done
sed "${args[#]}" <(tr -s '[:blank:]' '\n' < "$file")
Notice that filename, count, and characters are parameterized. Use it as
./script filename 2 abc
which should print out
zabcabc
ccabab
given the file content in the question.
An implementation in pure bash, without calling an external program, could be:
#!/bin/bash
readonly file=$1
readonly n=$2
readonly chars=$3
while read -ra words; do
for word in "${words[#]}"; do
for ((i = 0; i < ${#chars}; ++i)); do
c=${word//[^${chars:i:1}]}
(( ${#c} == n )) || continue 2
done
printf '%s\n' "$word"
done
done < "$file"
You can match a string containing exactly N occurrences of character X with the (POSIX-extended) regexp [^X]*(X[^X]*){N}. To do this for multiple characters you could chain them, and the traditional way to process one 'word' at a time, simplistically defined as a sequence of non-whitespace chars, is like this
<infile tr -s ' \t\n' ' ' | grep -Ex '[^a]*(a[^a]*){3}' | \grep -Ex '[^b]*(b[^b]*){3}'
# may need to add \r on Windows-ish systems or for Windows-derived data
If you get colorized output from egrep and grep and maybe some other utilities it's usually because in a GNU-ish environment you -- often via a profile that was automatically provided and you didn't look at or modify -- set aliases to turn them into e.g. egrep --color=auto or possibly/rarely =always; using \grep or command grep or the pathname such as /usr/bin/grep disables the alias, or you could just un-set it/them. Another possibility is you may have envvar(s) set in which case you need to remove or suppress it/them, or explicitly say --color=never, or (somewhat hackily) pipe the output through ... | cat which has the effect of making [e]grep's stdout a pipe not a tty and thus turning off =auto.
However, GNU awk (not necessarily others) can also do this more directly:
<infile awk -vRS='[ \t\n]+' -F '' '{delete f;for(i=1;i<=NF;i++)f[$i]++}
f["a"]==3&&f["b"]==3'
or to parameterize the criteria:
<infile awk -vRS='[ \t\n]+' -F '' 'BEGIN{split("ab",w,//);n=3}
{delete f;for(i=1;i<=NF;i++)f[$i]++;s=1;for(t in w)if(f[w[t]]!=occur)s=0} s'
perl can do pretty much everything awk can do, and so can some other general-purpose tools, but I leave those as exercises.

Can I use 'column' in bash where a column has a newline?

Suppose I have an associative array of ( [name]="Some description" ).
declare -A myItems=(
[item1]='Item1 description'
[item2]='Item2 description'
)
I now want to print a table of myItems with nice, even column lengths.
str=''
for n in $(echo "${!myItems[#]}" | tr " " "\n" | sort); do
str+="$n\t${myItems[$n]}\n"
done
# $(printf '\t') was the simplest way I could find to capture a tab character
echo -e "$str" | column -t -s "$(printf '\t')"
### PRINTS ###
item1 Item1 description
item2 Item2 description
Cool. This works nicely.
Now suppose an item has a description that is multiple lines.
myItems=(
[item1]='Item1 description'
[item2]='Item2 description'
[item3]='This item has
multiple lines
in it'
)
Now running my script prints
item1 Item1 description
item2 Item2 description
item3 This item has
multiple lines
in it
What I want is
item1 Item1 description
item2 Item2 description
item3 This item has
multiple lines
in it
Is this achievable with column? If not, can I achieve it through some other means?
Assuming the array keys are always single lines, you can prefix all newline characters inside the values by a tab so that column aligns them correctly.
By the way: Bash supports so called C-Strings which allow you to specify tabs and newlines by their escape sequences. $'\n\t' is a newline followed by a tab.
for key in "${!myItems[#]}"; do
printf '_%s\t%s\n' "$key" "${myItems[$key]//$'\n'/$'\n_\t'}"
done |
column -ts $'\t' |
sed 's/^_/ /'
If you also want to sort the keys as in your question, I'd suggest something more robust than for ... in $(echo ...), for instance
printf %s\\n "${!myItems[#]}" | sort |
while IFS= read -r key; do
...
And here is a general solution allowing for multi-line keys and values:
printf %s\\0 "${!myItems[#]}" | sort -z |
while IFS= read -rd '' key; do
paste <(printf %s "$key") <(printf %s "${myItems[$key]}")
done |
sed 's/^/_/' | column -ts $'\t' | sed 's/^_//'
You may use this 2 pass code:
# find max length of key
for i in "${!myItems[#]}"; do ((${#i} > max)) && max=${#i}; done
# create a variable filled with space using max+4; assuming 4 is tab space size
printf -v spcs "%*s" $((max+4)) " "
# finally print key and value using max and spcs filler
for i in "${!myItems[#]}"; do
printf '%-*s\t%s\n' $max "$i" "${myItems[$i]//$'\n'/$'\n'$spcs}"
done
item1 Item1 description
item2 Item2 description
item3 This item has
multiple lines
in it
Something tested
#!/usr/bin/env bash
declare -A myItems=(
[item1]='Item1 description'
[item2]='Item2 description'
[item3]='This item has
multiple lines
in it'
)
# Iterate each item by key
for item in "${!myItems[#]}"; do
# Transform lines of item name into an array
IFS=$'\n' read -r -d '' -a itemName <<<"$item"
# Transform lines of item description into an array
IFS=$'\n' read -r -d '' -a itemDesc <<<"${myItems[$item]}"
# Computes the biggest number of lines
lines=$((${#itemName[#]}>${#itemDesc[#]}?${#itemName[#]}:${#itemDesc[#]}))
# Print each line formated
for (( l=0; l<lines; l++)); do
printf '%-8s%s\n' "${itemName[$l]}" "${itemDesc[$l]}"
done
done
Output:
item1 Item1 description
item2 Item2 description
item3 This item has
multiple lines
in it

Bash array size not reflecting actual size when used with local builtin command

I have a log file ala.txt looking like that:
dummy FromEndPoint = PW | dummy | ToEndPoint = LALA | dummy
dummy FromEndPoint = PW | dummy | ToEndPoint = PAPA | dummy
dummy FromEndPoint = WF | dummy | ToEndPoint = LALA | dummy
dummy FromEndPoint = WF | dummy | ToEndPoint = KAKA | dummy
I used sed to generate an array containing every combination of FromEndPoint and ToEndPoint. Then I want to iterate through it.
function main {
file="./ala.txt"
local a=`sed 's/^.*FromEndPoint = \([a-zA-Z\-]*\).*ToEndPoint = \([a-zA-Z\-]*\).*$/\1;\2/' $file | sort -u`
echo ${#a[#]} # prints 1
for connectivity in ${a[#]}; do
echo "conn: $connectivity" # iterates 4 times
#conn: PW;LALA
#conn: PW;PAPA
#conn: WF;KAKA
#conn: WF;LALA
done
}
Why echo ${#a[#]} prints 1 if there are 4 elements in the array? How can I get a real size of it?
Bash used: 4.4.12(1)-release
Don't use variables to store multi-line content, use arrays!
In a bash shell you could process substitution feature to make the command output appear as a file content for you to parse over with an array. in bash versions of 4 and above, commands readarray and mapfile can parse the multi-line output given a delimiter without needing to use a for-loop with read
#!/usr/bin/env bash
array=()
while read -r unique; do
array+=( "$unique" )
done< <(sed 's/^.*FromEndPoint = \([a-zA-Z\-]*\).*ToEndPoint = \([a-zA-Z\-]*\).*$/\1;\2/' file | sort -u)
Now printing the length of the array as echo "${#array[#]}" should print the length as expected.
And always separate the initialization of local variables and assigning of the values to it. See Why does local sweep the return code of a command?

Resources