How to use for/in with tuples?

How to use for/in with tuples? - bash

Is it possible to do something like:
for (a,b) in (1,2) (2,1)
do
run_program.py $a $b
done
I only know the for do done syntax in Linux. I want to run the program with the specific two (a,b) instances (or course, it easily generalizes to much larger than two).

There is no tuple construct in bash, and also no destructuring (the behavior which you're relying on to assign a=1 and b=2 when iterating over (1,2)). What you can do is have multiple arrays, where the same index in each refers to corresponding data, and iterate by index.
#!/bin/bash
# ^^^^ - IMPORTANT: /bin/sh does not support arrays, you *must* use bash
a1=( 1 2 ) # a1[0]=1; a1[1]=2
a2=( 2 1 ) # a2[0]=2; a2[1]=1
for idx in "${!a1[#]}"; do # iterate over indices: idx=0, then idx=1
a=${a1[$idx]} # lookup idx in array a1
b=${a2[$idx]} # lookup idx in array a2
run_program.py "$a" "$b" # ...and use both
done
Syntax pointers:
"${!array[#]}" expands to the list of indices for the array array.
a1=( 1 2 ) assigns to an array named a1. See BashFAQ #5 for an introduction to arrays in bash.
If you have constraints in your input that allows items to be split unambiguously, it's also possible to (hackishly) use that. For an example using a pattern of behaviors explained in BashFAQ #1:
inputs='1:2,2:1,'
while IFS=: read -r -d, a b <&3; do
run_program.py "$a" "$b" 3<&-
done 3<<<"$inputs"
Note that the use of 3 here is arbitrary: File descriptors 0, 1 and 2 are reserved for stdin, stdout and stderr; 3-9 are explicitly available for shell scripts to use; and in practice, higher FD numbers tend to also be available as well (but are prone to be dynamically auto-allocated to store backups or for other shell behavior; that said, a well-behaved shell won't stomp on a FD that a user has explicitly allocated, and will move an FD auto-allocated to store backups of temporarily-redirected descriptors out-of-the-way if the user puts it to explicit use).

Related

Bash: Iterate through variables showing variable name and value [duplicate]

I'm writing a bash script to analyse some files. In a first iteration I create associative arrays with word counts for each category that is analysed. These categories are not known in advance so the names of these associative arrays are variable but all with the same prefix count_$category. The associative arrays have a word as key and its occurrence count in that category as value.
After all files are analysed, I have to summarise the results for each category. I can iterate over the variable names using ${count_*} but how can I access the associative arrays behind those variable names? For each associative array (each count_* variable) I should iterate over the words and their counts.
I have already tried with indirect access like this but it doesn't work:
for categorycount in ${count_*} # categorycount now holds the name of the associative array variable for each category
do
array=${!categorycount}
for word in ${!array[#]}
do
echo "$word occurred ${array[$word]} times"
done
done

The modern (bash 4.3+) approach uses "namevars", a facility borrowed from ksh:
for _count_var in "${!count_#}"; do
declare -n count=$_count_var # make count an alias for $_count_var
for key in "${!count[#]}"; do # iterate over keys, via same
echo "$key occurred ${count[$key]} times" # extract value, likewise
done
unset -n count # clear that alias
done
declare -n count=$count_var allows "${count[foo]}" to be used to look up item foo in the associative array named count_var; similarly, count[foo]=bar will assign to that same item. unset -n count then removes this mapping.
Prior to bash 4.3:
for _count_var in "${!count_#}"; do
printf -v cmd '_count_keys=( "${!%q[#]}" )' "$_count_var" && eval "$cmd"
for key in "${_count_keys[#]}"; do
var="$_count_var[$key]"
echo "$key occurred ${!var} times"
done
done
Note the use of %q, rather than substituting a variable name directly into a string, to generate the command to eval. Even though in this case we're probably safe (because the set of possible variable names is restricted), following this practice reduces the amount of context that needs to be considered to determine whether an indirect expansion is secure.
In both cases, note that internal variables (_count_var, _count_keys, etc) use names that don't match the count_* pattern.

Portable array indexing in both bash and zsh

Array indexing is 0-based in bash, and 1-based in zsh (unless option KSH_ARRAYS is set).
As an example: To access the first element of an array, is there something nicer than:
if [ -n $BASH_VERSION ]; then
echo ${array[0]}
else
echo ${array[1]}
fi

TL;DR:
To always get consistent behaviour, use:
${array[#]:offset:length}
Explanation
For code which works in both bash and zsh, you need to use the offset:length syntax rather than the [subscript] syntax.
Even for zsh-only code, you'll still need to do this (or use emulate -LR zsh) since zsh's array subscripting basis is determined by the option KSH_ARRAYS.
Eg, to reference the first element in an array:
${array[#]:0:1}
Here, array[#] is all the elements, 0 is the offset (which always is 0-based), and 1 is the number of elements desired.

Renaming multiple files by adding an integer value of 1

I have multiple svg files....and I need to rename them by adding 1.
0.svg --> 1.svg
1.svg --> 2.svg
2.svg --> 3.svg
etc...
What would be the best way to do this using the linux terminal?

The trick is to process the files backwards so you don't overwrite existing files while renaming. Use parameter expansion to extract the numbers from the file names.
#!/bin/bash
files=(?.svg)
for (( i = ${#files[#]} - 1; i >= 0; --i )) ; do
n=${files[i]%.svg}
mv $n.svg $(( n + 1 )).svg
done
If the files can have names of different length (e.g. 9.svg, 10.svg) the solution will be more complex, as you need to sort the files numerically rather than lexicographically.

Considering the case that the filename numbers have multiple digits, please try the following:
while IFS= read -r num; do
new="$(( num + 1 )).svg"
mv -- "$num.svg" "$new"
done < <(
for f in *.svg; do
n=${f%.svg}
echo "$n"
done | sort -rn
)

This Shellcheck-clean code is intended to operate safely and cleanly no matter what is in the current directory:
#! /bin/bash -p
shopt -s nullglob # Globs that match nothing expand to nothing
shopt -s extglob # Enable extended globbing (+(...), ...)
# Put the file base numbers in a sparse array.
# (Bash automatically keeps such arrays sorted by increasing indices.)
sparse_basenums=()
for svgfile in +([0-9]).svg ; do
# Skip files with extra leading zeros (e.g. '09.svg')
[[ $svgfile == 0[0-9]*.svg ]] && continue
basenum=${svgfile%.svg}
sparse_basenums[$basenum]=$basenum
done
# Convert the sparse array to a non-sparse array (preserving order)
# so it can be processed in reverse order with a 'for' loop
basenums=( "${sparse_basenums[#]}" )
# Process the files in reverse (i.e. decreasing) order by base number
for ((i=${#basenums[*]}-1; i>=0; i--)) ; do
basenum=${basenums[i]}
mv -i -- "$basenum.svg" "$((basenum+1)).svg"
done
shopt -s nullglob prevents bad behaviour if the directory doesn't contain any files whose names are a decimal number followed by '.svg'. Without it the code would try to process a file called '+([0-9]).svg'.
shopt -s extglob enables a richer set of globbing patterns than the default. See the 'extglob' section in glob - Greg's Wiki for details.
The usefulness of sparse_basenums depends on the fact that Bash arrays can have arbitrary non-negative integer indices, that arrays with gaps in their indices are stored efficiently (sparse arrays), and that elements in arrays are always stored in order of increasing index. See Arrays (Bash Reference Manual) for more information.
The code skips files whose names have extra leading zeros ('09.svg', but not '0.svg') because it can't handle them safely as it is now. Trying to treat '09' as a number causes an error because it's treated as an illegal octal number. That is easily fixable, but there could still be problems if, for instance, you had both '9.svg' and '09.svg' (they would both naturally be renamed to '10.svg').
The code uses mv -i to prompt for user input in case something goes wrong and it tries to rename a file to one that already exists.
Note that the code will silently do the wrong thing (due to arithmetic overflow) if the numbers are too big (e.g. '99999999999999999999.svg'). The problem is fixable.

Most Efficient way to find pairs between 2 arrays in bash

I have 2 large arrays with hash values stored in them. I'm trying to find the best way to verify all of the hash values in array_a are also found in array_b. The best I've got so far is
Import the Hash files into an array
Sort each array
For loop through array_a
Inside of array_a's for loop, do another for look for array_b (seems inefficient).
If found unset value in array_b
Set "found" value to 1 and break loop
If array_a doesn't have a match output to file.
I have large images that I need to verify have been uploaded to the site and the hash values match. I've created a file from the original files and scraped the website ones to create a second list of hash values. Trying to keep this a vanilla as possible, so only using typical bash functionality.
#!/bin/bash
array_a=($(< original_sha_values.txt))
array_b=($(< sha_values_after_downloaded.txt))
# Sort to speed up.
IFS=$'\n' array_a_sorted=($(sort <<<"${array_a[*]}"))
unset IFS
IFS=$'\n' array_b_sorted=($(sort <<<"${array_b[*]}"))
unset IFS
for item1 in "${array_a_sorted[#]}" ; do
found=0
for item2 in "${!array_b_sorted[#]}" ; do
if [[ $item1 == ${array_b_sorted[$item2]} ]]; then
unset 'array_b_sorted[item2]'
found=1
break
fi
done
if [[ $found == 0 ]]; then
echo "$item1" >> hash_is_missing_a_match.log
fi
done
Sorting to sped it up a lot
IFS=$'\n' array_a_sorted=($(sort <<<"${array_a[*]}"))
unset IFS
IFS=$'\n' array_b_sorted=($(sort <<<"${array_b[*]}"))
unset IFS
Is this really the best way of doing this?
for item1 in "${array_a_sorted[#]}" ; do
...
for item2 in "${!array_b_sorted[#]}" ; do
if ...
unset 'array_b_sorted[item2]'
break
Both arrays have 12,000 lines of 64bit hashes, taking 20+ minutes to compare. Is there a way to improve the speed?

you're doing it hard way.
If the task is: find the entries in file1 not in file2. Here is a shorter approach
$ comm -23 <(sort f1) <(sort f2)

I think karakfa's answer is probably the best approach if you just want to get it done and not worry about optimizing bash code.
However, if you still want to do it in bash, and you are willing to use some bash-specific features, you could shave off a lot of time using an associative array instead of two regular arrays:
# Read the original hash values into a bash associative array
declare -A original_hashes=()
while read hash; do
original_hashes["$hash"]=1
done < original_sha_values.txt
# Then read the downloaded values and check each one to see if it exists
# in the associative array. Lookup time *should* be O(1)
while read hash; do
if [[ -z "${original_hashes["$hash"]+x}" ]]; then
echo "$hash" >> hash_is_missing_a_match.log
fi
done < sha_values_after_downloaded.txt
This should be a lot faster than the nested loop implementation using regular arrays. Also, I didn't need any sorting, and all of the insertions and lookups on the associative array should be O(1), assuming bash implements associative arrays as hash tables. I couldn't find anything authoritative to back that up though, so take that with a grain of salt. Either way, it should still be faster than the nested loop method.
If you want the output sorted, you can just change the last line to:
done < <(sort sha_values_after_downloaded.txt)
in which case you're still only having to sort one file, not two.

Bash generate random numbers from pool of numbers

I want to generate a random number from given list
For example if I give the numbers
1,22,33,400,400,23,12,53 etc.
I want to select a random number from the given numbers.

Couldn't find an exact duplicate of this. So here goes my attempt, exactly what 123 mentions in comments. The solution is portable across shell variants and does not make use of any shell binaries to simplify performance.
You can run the below commands directly on the console.
# Read the elements into bash array, with IFS being the de-limiter for input
IFS="," read -ra randomNos <<< "1,22,33,400,400,23,12,53"
# Print the random numbers using the '$RANDOM' variable built-in modulo with
# array length.
printf "%s\n" "${randomNos[ $RANDOM % ${#randomNos[#]}]}"
As per the comments below, if you want to ignore a certain list of numbers from a range to select; do the approach as below
#!/bin/bash
# Initilzing the ignore list with the numbers you have mentioned
declare -A ignoreList='([21]="1" [25]="1" [53]="1" [80]="1" [143]="1" [587]="1" [990]="1" [993]="1")'
# Generating the random number
randomNumber="$(($RANDOM % 1023))"
# Printing the number if it is not in the ignore list
[[ ! -n "${ignoreList["$randomNumber"]}" ]] && printf "%s\n" "$randomNumber"
You can save it in a bash variable like
randomPortNumber=$([[ ! -n "${ignoreList["$randomNumber"]}" ]] && printf "%s\n" "$randomNumber")
Remember associative-arrays need bash version ≥4 to work.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to use for/in with tuples? - bash

Is it possible to do something like: for (a,b) in (1,2) (2,1) do run_program.py $a $b done I only know the for do done syntax in Linux. I want to run the program with the specific two (a,b) instances (or course, it easily generalizes to much larger than two).

Related

Bash: Iterate through variables showing variable name and value [duplicate]

Portable array indexing in both bash and zsh

Renaming multiple files by adding an integer value of 1

Most Efficient way to find pairs between 2 arrays in bash

Bash generate random numbers from pool of numbers

Categories

Resources