Bash: list files ordered by the number they contain - bash

Goal
I have a series of files that have various names. Each file's name contain a unique integer number made of between 1 and 4 digits. I would like to order the files (by displaying their full name) but ordered by the number they contain.
Example
The files ..
Hello54.txt
piou2piou.txt
hap_1002_py.txt
JustAFile_78.txt
darwin2012.txt
.. should be listed as
piou2piou.txt
Hello54.txt
JustAFile_78.txt
hap_1002_py.txt
darwin2012.txt

You just need to sort by the numeric part. One approach might be to eliminate the non-numeric part and build an array. On the upside, bash has sparse arrays, so whatever order you add the members in, they will come out in the correct order. Pseudo code:
array[name_with_non_numerics_stripped]=name
print array
My quick and not careful implementation:
sortnum() {
# Store the output in a sparse array to get implicit sorting
local -a map
local key
# (REPLY is the default name `read` reads into.)
local REPLY
while read -r; do
# Substitution parameter expansion to nuke non-numeric chars
key="${REPLY//[^0-9]}"
# If key occurs more than once, you will lose a member.
map[key]="$REPLY"
done
# Split output by newlines.
local IFS=$'\n'
echo "${map[*]}"
}
If you have two members with the same "numeric" value, only the last one will be printed. #BenjaminW suggested appending such entries together with a newline. Something like…
[[ ${map[key]} ]] && map[key]+=$'\n'
map[key]+="$REPLY"
…in place of the map[key]="$REPLY" above.

Related

Arithmetic in shell script (arithmetic in string)

I'm trying to write a simple script that creates five textfiles enumerated by a variable in a loop. Can anybody tell my how to make the arithmetic expression be evaluated. This doesn't seem to work:
touch ~/test$(($i+1)).txt
(I am aware that I could evaluate the expression in a separate statement or change of the loop...)
Thanks in advance!
The correct answer would depend on the shell you're using. It looks a little like bash, but I don't want to make too many assumptions.
The command you list touch ~/test$(($i+1)).txt will correctly touch the file with whatever $i+1 is, but what it's not doing, is changing the value of $i.
What it seems to me like you want to do is:
Find the largest value of n amongst the files named testn.txt where n is a number larger than 0
Increment the number as m.
touch (or otherwise output) to a new file named testm.txt where m is the incremented number.
Using techniques listed here you could strip the parts of the filename to build the value you wanted.
Assume the following was in a file named "touchup.sh":
#!/bin/bash
# first param is the basename of the file (e.g. "~/test")
# second param is the extension of the file (e.g. ".txt")
# assume the files are named so that we can locate via $1*$2 (test*.txt)
largest=0
for candidate in (ls $1*$2); do
intermed=${candidate#$1*}
final=${intermed%%$2}
# don't want to assume that the files are in any specific order by ls
if [[ $final -gt $largest ]]; then
largest=$final
fi
done
# Now, increment and output.
largest=$(($largest+1))
touch $1$largest$2

How do I move files into folders with similar names in Unix?

I'm sorry if this question has been asked before, I just didn't know how to word it as a search query.
I have a set of folders that look like this:
Brain - Amygdala/ Brain - Spinal cord (cervical c-1)/ Skin - Sun Exposed (Lower leg)/
Brain - Caudate (basal ganglia)/ Lung/ Whole Blood/
I also have a set of files that look like this:
Brain_Amygdala.v7.covariates_output.txt Skin_Not_Sun_Exposed_Suprapubic.v7.covariates_output.txt
Brain_Caudate_basal_ganglia.v7.covariates_output.txt Skin_Sun_Exposed_Lower_leg.v7.covariates_output.txt
Brain_Spinal_cord_cervical_c-1.v7.covariates_output.txt Whole_Blood.v7.covariates_output.txt
As you can see, the files do not perfectly match up with the directories in their names. For example, Brain_Amygdala.v7.covariates_output.txt is not totally identical to Brain - Amygdala/. Even if we were to excise the tissue name from the covariates file, Brain_Amygdala is formatted differently from its corresponding folder.
Same with Whole Blood/. It is different from Whole_Blood.v7.covariates_output.txt, even if you were to isolate the tissue name from the covariates file Whole_Blood.
What I want to do, however, is to move each of these tissue files to their corresponding folder. If you notice, the covariate files are named after the tissue leading up to the first dot . in the file name. They are separated by underscores _. How I was thinking about approaching this was to break up the first few words leading up to the first . of the file name so that I can easily move it to its corresponding file.
e.g.
Brain_Amygdala.v7.covariates_output.txt -> Brain*Amygdala [mv]-> Brain*Amygdala/
a) I'm not sure how to isolate the first words of a file name leading up to the first . in a filename
b) if I were to do that, I don't know how to insert a wildcard in between each word and match that to the corresponding folder.
However, I am completely open to other ways of doing something like this.
Not a full answer, but it should address some of your concerns:
a) to isolate the first word of a string, leading up to the first .: use Parameter Expansions
string=Brain_Amygdala.v7.covariates_output.txt
until_dot=${string%%.*}
echo "$until_dot"
will output Brain_Amygdala (which we saved in the variable until_dot).
b) You may want to use the ${parameter/pattern/string} parameter expansion:
# Replace all non-alphabetic characters by the glob *
glob_pattern=${until_dot//[^[:alpha:]]/*}
echo "$glob_pattern"
will output (with the same variables as above) Brain*Amygdala
c) To use all of this: it's probably a good idea to determine the possible targets first, and do some basic checks:
# Use nullglob to have non matching glob expand to nothing
shopt -s nullglob
# DO NOT USE QUOTES IN THE FOLLOWING EXPANSION:
# the variable is actually a glob!
# Could also do dirs=( $glob_pattern*/ ) to check if directory
dirs=( $glob_pattern/ )
# Now check how many matches there are:
if ((${#dirs[#]} == 0)); then
echo >&2 "No matches for $glob_pattern"
elif ((${#dirs[#]} > 1)); then
echo >&2 "More than one matches for $glob_pattern: ${dirs[#]}"
else
echo "All good!"
# Remove the echo to actually perform the move
echo mv "$string" "${dirs[0]}"
fi
I don't know how your data will effectively conform to these, but I hope this answer actually answers some of your questions! (and to learn more about parameter expansions, do read — and experiment with — the link to the reference I gave you).

How to Get Random Array From Case

I tried to get random array in bash
I used the following code:
agents=$(shuf $DISTR)
case $DISTR in
"1")
echo "amos"
;;
"2")
echo "dia"
;;
"3")
echo "dia lagi"
;;
esac
echo "$agents"
But I am getting the blank result.
Can anyone help me?
i want the output is one of the case
Assumption
$DISTR is a path of a non-empty file.
Fix Existing Code
agents=($(shuf "$DISTR")) # create shuffled array
echo "${agents[0]}" # print the first entry
The first entry is a random entry since the array was shuffled.
Note that this solution does only work well for files with one word (no whitespace) per line. If a line contains more than one word, each word will be stored in a separate array entry after the lines are shuffled. For a file with the lines
...
first second
1st 2nd
...
we may print first or 1st, but never second or 2nd.
Better Solution
Shuffling the complete array is a lot of work if you are only interested in one random entry. It is faster to read the array in its original order and pick a random index.
mapfile agents "$DISTR" # create array
length=${#agents[#]}
randomIndex=$(( RANDOM % length )) # random number between 0 and length-1
echo "${agents[randomIndex]}"
Or in short
mapfile agents "$DISTR"
echo "${agents[RANDOM % ${#agents[#]}]}"
RANDOM is a built-in variable/function returns a pseudo-random number (do not use for encryption and so on). mapfile stores one line per array entry, even if the lines contain whitespace.
Best Solution
Again we did too much work. Why read the whole file into an array when we could pick just one random line? (Typical case of an XY problem).
shuf -n 1 "$DISTR" # print one random line
To generate random numbers with bash use the $RANDOM internal Bash function. Note that $RANDOM should not be used to generate an encryption key. $RANDOM is generated by using your current process ID (PID) and the current time/date as defined by the number of seconds elapsed since 1970.
$(( ( RANDOM % 10 ) + 3 ))

How do I open / manipulate multiple files in bash?

I have a bash script that take advantage of a local toolbox to perform an operation
my question is fairly simple
I have multiple files that are the same quantities but different time steps i would like to first untar them all, and then use the toolbox to perform some manipulation but i am not sure if i am on the right track.
=============================================
The file is as follows
INPUTS
fname = a very large number or files with same name but numbering
e.g wnd20121.grb
wnd20122.grb
.......
wnd2012100.grb
COMMANDS
> cdo -f nc copy fname ofile(s)
(If this is the ofile(s)=output file how can i store it for sequent use ? Take the ofile (output file) from the command and use it / save it as input to the next, producing a new subsequent numbered output set of ofile(s)2)
>cdo merge ofile(s) ofile2
(then automatically take the ofile(s)2 and input them to the next command and so on, producing always an array of new output files with specific set name I set but different numbering for distinguishing them)
>cdo sellon ofile(s)2 ofile(s)3
------------------------------------
To make my question clearer, I would like to know the way in which I can instruct basically through a bash script the terminal to "grab" multiple files that are usually the same name but have a different numbering to make the separate their recorded time
e.g. file1 file2 ...file n
and then get multiple outputs , with every output corresponding to the number of the file it converted.
e.g. output1 output2 ...outputn
How can I set these parameters so the moment they are generated they are stored for subsequent use in the script, in later commands?
Your question isn't clear, but perhaps the following will help; it demonstrates how to use arrays as argument lists and how to parse command output into an array, line by line:
#!/usr/bin/env bash
# Create the array of input files using pathname expansion.
inFiles=(wnd*.grb)
# Pass the input-files array to another command and read its output
# - line by line - into a new array, `outFiles`.
# The example command here simply prepends 'out' to each element of the
# input-files array and outputs each (modified) element on its own line.
# Note: The assumption is that the filenames have no embedded newlines
# (which is usually true).
IFS=$'\n' read -r -d '' -a outFiles < \
<(printf "%s\n" "${inFiles[#]}" | sed s'/^/out-/')
# Note: If you use bash 4, you could use `readarray -t outFiles < <(...)` instead.
# Output the resulting array.
# This also demonstrates how to use an array as an argument list
# to pass to _any_ command.
printf "%s\n" "${outFiles[#]}"

Open file in bash script

I've got a bash script accepting several files as input which are mixed with various script's options, for example:
bristat -p log1.log -m lo2.log log3.log -u
I created an array where i save all the index where i can find files in the script's call, so in this case it would be an arrat of 3 elements where
arr_pos[0] = 2
arr_pos[1] = 4
arr_pos[3] = 5
Later in the script I must call "head" and "grep" in those files and i tried this way
head -n 1 ${arr_pos[0]}
but i get this error non runtime
head: cannot open `2' for reading: No such file or directory
I tried various parenthesis combinations, but I can't find which one is correct.
The problem here is that ${arr_pos[0]} stores the index in which you have the file name, not the file name itself -- so you can't simply head it. The array storing your arguments is given by $#.
A possible way to access the data you want is:
#! /bin/bash
declare -a arr_pos=(2 4 5)
echo ${#:${arr_pos[0]}:1}
Output:
log1.log
The expansion ${#:${arr_pos[0]}:1} means you're taking the values ranging from the index ${arr_pos[0]} in the array $#, to the element of index ${arr_pos[0]} + 1 in the same array $#.
Another way to do so, as pointed by #flaschenpost, is to eval the index preceded by $, so that you'd be accessing the array of arguments. Although it works very well, it may be risky depending on who is going to run your script -- as they may add commands in the argument line.
Anyway, you may should try to loop through the entire array of arguments by the beginning of the script, hashing the values you find, so that you won't be in trouble while trying to fetch each value later. You may loop, using a for + case ... esac, and store the values in associative arrays.
I think eval is what you need.
#!/bin/bash
arr_pos[0]=2;
arr_pos[1]=4;
arr_pos[2]=5;
eval "cat \$${arr_pos[1]}"
For me that works.

Resources