Shell script sort list - algorithm

I have a list with the following content:
VIP NAME DATE ARRIVE_TIME FLIGHT_TIME
1 USER1 11-02 20.00 21.00
3 USER2 11-02 20.45 21.45
4 USER2 11-03 20.00 21.30
2 USER1 11-04 17.20 19.10
I want to sort this and similar lists with a shell script. The result should be a new list with lines that do not collide. VIP 1 is most important, if any VIP with a bigger number has ARRIVE_TIME before FLIGHT_TIME for VIP 1 on the same date this line should be removed, so the VIP number should be used to decide which lines to keep if the ARRIVE_TIME, FLIGHT_TIME and DATE collide. Similarly, VIP 2 is more important than VIP 3 and so on.
This is pretty advanced, and I am totally empty for ideas on how to solve this.

You can use the unix sort command to do this:
There's an example of how to set primary and secondary keys etc:
Example
The uniq command is what you need to remove dupes.

This might get you started:
I'm ignoring the header line. You can get rid of it using head or skip it in the for loop.
Sort the flights by date, arrival, departure and vip number - having the vip number as a sort key simplifies the logic later.
I'm saving the result in an array, but you could redirect it to a temporary file and read it in a line at a time with a while read line; do ...; done <tempfile loop.
I'm using indirection to make things more readable (naming the fields instead of using array indices directly - the exclamation point means indirection here instead of "not")
For each line in the result that occurs on the same date as the most recently printed line, compare its arrival time to the previous flight's departure time
Echo the lines that are appropriate.
save the date and departure time for later comparison.
You should adjust the < comparison to be <= if that works better for your data.
Here is the script:
#!/bin/bash
saveIFS="$IFS"
IFS=$'\n'
flights=($(sort -k3,3 -k4,4n -k5,5n -k1,1n flights ))
IFS="$saveIFS"
date=fields[2]
arrive=fields[3]
depart=fields[4]
for line in "${flights[#]}"
do
fields=($line)
if [[ ${!date} == $prevdate && ${!arrive} < $prevdep ]]
then
echo "deleted: $line" # or you could do something else here
else
echo $line
prevdep=${!depart}
prevdate=${!date}
fi
done

Related

Bash: checking substring increments with modular arithmetic

I have a list of files with file names that contain a substring of 6 numbers that represents HHMMSS, HH: 2 digits hour, MM: 2 digits minutes, SS: 2 digits seconds.
If the list of files is ordered, the increments should be in steps of 30 minutes, that is, the first substring should be 000000, followed by 003000, 010000, 013000, ..., 233000.
I want to check that no file is missing iterating the list of files and checking that neither of these substrings is missing. My approach:
string_check=000000
for file in ${file_list[#]}; do
if [[ ${file:22:6} == $string_check ]]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi
string_check=$((string_check+3000)) #this is the key line
done
And the previous to the last line is the key. It should be formatted to 6 digits, I know how to do that, but I want to add time like a clock, or, in more specific words, modular arithmetic modulo 60. How can that be done?
Assumptions:
all 6-digit strings are of the format xx[03]0000 (ie, has to be an even 00 or 30 minutes and no seconds)
if there are strings like xx1529 ... these will be ignored (see 2nd half of answer - use of comm - to address OP's comment about these types of strings being an error)
Instead of trying to do a bunch of mod 60 math for the MM (minutes) portion of the string, we can use a sequence generator to generate all the desired strings:
$ for string_check in {00..23}{00,30}00; do echo $string_check; done
000000
003000
010000
013000
... snip ...
230000
233000
While OP should be able to add this to the current code, I'm thinking we might go one step further and look at pre-parsing all of the filenames, pulling the 6-digit strings into an associative array (ie, the 6-digit strings act as the indexes), eg:
unset myarray
declare -A myarray
for file in ${file_list}
do
myarray[${file:22:6}]+=" ${file}" # in case multiple files have same 6-digit string
done
Using the sequence generator as the driver of our logic, we can pull this together like such:
for string_check in {00..23}{00,30}00
do
[[ -z "${myarray[${string_check}]}" ]] &&
echo "Problem: (file) '${string_check}' is missing"
done
NOTE: OP can decide if the process should finish checking all strings or if it should exit on the first missing string (per OP's current code).
One idea for using comm to compare the 2 lists of strings:
# display sequence generated strings that do not exist in the array:
comm -23 <(printf "%s\n" {00..23}{00,30}00) <(printf "%s\n" "${!myarray[#]}" | sort)
# OP has commented that strings not like 'xx[03]000]` should generate an error;
# display strings (extracted from file names) that do not exist in the sequence
comm -13 <(printf "%s\n" {00..23}{00,30}00) <(printf "%s\n" "${!myarray[#]}" | sort)
Where:
comm -23 - display only the lines from the first 'file' that do not exist in the second 'file' (ie, missing sequences of the format xx[03]000)
comm -13 - display only the lines from the second 'file' that do not exist in the first 'file' (ie, filenames with strings not of the format xx[03]000)
These lists could then be used as input to a loop, or passed to xargs, for additional processing as needed; keeping in mind the comm -13 output will display the indices of the array, while the associated contents of the array will contain the name of the original file(s) from which the 6-digit string was derived.
Doing this easy with POSIX shell and only using built-ins:
#!/usr/bin/env sh
# Print an x for each glob matched file, and store result in string_check
string_check=$(printf '%.0sx' ./*[0-2][0-9][03]000*)
# Now string_check length reflects the number of matches
if [ ${#string_check} -eq 48 ]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi
Alternatively:
#!/usr/bin/env sh
if [ "$(printf '%.0sx' ./*[0-2][0-9][03]000*)" \
= 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi

How can I create a one-to-one relationship between two loops in bash?

In a bash script, I want to create a one-to-one relationship between two for loops that each have variables defined as a sequence. For example, I want something like
for g in `seq 11 1 21`;do
for i in `seq 1 1 10`;do
cat >$i.txt <<EOF
this one is $g.
EOF
done
done
to result in ten files (1.txt, 2.txt, 3.txt, etc). 1.txt would contain "this one is 11." 2.txt would contain "this one is 12." Etc.
The above example is permutative, where each value of g acts on each value of i. Is there a way to make it so only one value of g acts on only one value of i in a corresponding order (ie 1 corresponds to 11, 2 corresponds to 12, etc)?
Any help is greatly appreciated. Thank you.
Before I answer, there's a critical problem with the question: the two sequences have different lengths (there are eleven g values, but only ten i values). Either one of g's values must be skipped, or something filled in as the extra value for i. For my answer I'll assume g should actually run from 11 to 20, not 21.
If you don't want all combinations, then you only want one loop; the trick is to make a single loop iterate through both sequences simultaneously. There are a couple of ways to do this in bash. One is to store both sequences as arrays, and then iterate over their indexes:
g_array=( {11..20} ) # Could also use g_array=( $(seq 11 1 20) ) here
i_array=( {1..10} )
for index in "${!g_array[#]}"; do # The ${!arr[2]} gets the *indexes* of the array
cat >"${i_array[index]}.txt" <<EOF
this one is ${g_array[index]}.
EOF
done
Alternately, since these are just numeric sequences, you can use the for ((init; test; step)) structure to do it:
for ((g=11,i=1; g<=20; g++,i++)); do # Note: semicolons between parts, commas between things that happen together
cat >"$i.txt" <<EOF
this one is $g.
EOF
done
arr1=( $(seq 11 21 ))
arr2=( $(seq 1 10 ))
for idx in $(seq 0 ${#arr2[*]})
do
file="/tmp/"${arr2[$idx]}".txt"
echo "This one is ${arr1[$idx]} " > $file
done
First,both the sequences are assigned to 2 arrays. And then as you iterate over the length of 2nd array, you frame the string and write to file.

How to iterate over two strings simultaneously ksh

I'm using data that is returned by another person's ksh93 script in the format of a print to the standard output. Depending on the flag I give it, their script gives me the information I need for my code. It comes out like a list separated by spaces, such that a run of the program has the format of:
"1 3 4 7 8"
"First Third Fourth Seventh Eighth"
For what I'm working on, I need to be able to match the entries of each output, so that I could make the information print in the following format:
1:First
3:Third
4:Fourth
7:Seventh
8:Eighth
I need to do more than just printing with the data, I just need to be able to access the pairs of information in each of the strings. Even though the actual contents of the strings can be any number of values, the two strings I get from running the other script will always be the same length.
I'm wondering if there exists a way to iterate over both at the same time, something along the lines of:
str_1=$(other_script -f)
str_2=$(other_script -i)
for a,b in ${str_1},${str_2} ; do
print "${a}:${b}"
done
This obviously isn't the right syntax, but I have been unable to find a way to make it work. Is there a way to iterate over both at the same time?
I know I could convert them to arrays first then iterate by numerical element, but I would like to save the time of converting them if there's a way to iterate over both simultaneously.
Why do you think it is not quick to convert the strings to arrays?
For example:
`#!/bin/ksh93
set -u
set -A line1
string1="1 3 4 7 8"
line1+=( ${string1} )
set -A line2
string2="First Third Fourth Seventh Eighth"
line2+=( ${string2})
typeset -i num_elem_line1=${#line1[#]}
typeset -i num_elem_line2=${#line2[#]}
typeset -i loop_counter=0
if (( num_elem_line1 == num_elem_line2 ))
then
while (( loop_counter < num_elem_line1 ))
do
print "${line1[${loop_counter}]}:${line2[${loop_counter}]}"
(( loop_counter += 1 ))
done
fi
`
As with the other comments, not sure why an array would be out of the question, especially if you plan on referencing the individual elements more than once later in your code.
A sample script that assumes you want to maintain your str_1/str_2 variables as strings; we'll load into arrays for referencing individual elements:
$ cat testme
#!/bin/ksh
str_1="1 3 4 7 8"
str_2="First Third Fourth Seventh Eighth"
str1=( ${str_1} )
str2=( ${str_2} )
# at this point matching array elements have the same index (0..4) ...
echo "++++++++++ str1[index]=element"
for i in "${!str1[#]}"
do
echo "str1[${i}]=${str1[${i}]}"
done
echo "++++++++++ str2[index]=element"
for i in "${!str1[#]}"
do
echo "str2[${i}]=${str2[${i}]}"
done
# since matching array elements have the same index, we just need
# to loop through one set of indexes to allow us to access matching
# array elements at the same time ...
echo "++++++++++ str1:str2"
for i in "${!str1[#]}"
do
echo ${str1[${i}]}:${str2[${i}]}
done
echo "++++++++++"
And a run of the script:
$ testme
++++++++++ str1[index]=element
str1[0]=1
str1[1]=3
str1[2]=4
str1[3]=7
str1[4]=8
++++++++++ str2[index]=element
str2[0]=First
str2[1]=Third
str2[2]=Fourth
str2[3]=Seventh
str2[4]=Eighth
++++++++++ str1:str2
1:First
3:Third
4:Fourth
7:Seventh
8:Eighth
++++++++++

Is there a way to implement a counter in bash but for letters instead of numbers?

I'm working with an existing script which was written a bit messily. Setting up a loop with all of the spaghetti code could make a bigger headache than I want to deal with in the near term. Maybe when I have more time I can clean it up but for now, I'm just looking for a simple fix.
The script deals with virtual disks on a xen server. It reads multipath output and asks if particular LUNs should be formatted in any way based on specific criteria. However, rather than taking that disk path and inserting it, already formatted, into a configuration file, it simply presents every line in the format
'phy:/dev/mapper/UUID,xvd?,w',
UUID, of course, is an actual UUID.
The script actually presents each of the found LUNs in this format expecting the user to copy and paste them into the config file replacing each ? with a letter in sequence. This is tedious at best.
There are several ways to increment a number in bash. Among others:
var=$((var+1))
((var+=1))
((var++))
Is there a way to do the same with characters which doesn't involve looping over the entire alphabet such that I could easily "increment" the disk assignment from xvda to xvdb, etc?
To do an "increment" on a letter, define the function:
incr() { LC_CTYPE=C printf "\\$(printf '%03o' "$(($(printf '%d' "'$1")+1))")"; }
Now, observe:
$ echo $(incr a)
b
$ echo $(incr b)
c
$ echo $(incr c)
d
Because, this increments up through ASCII, incr z becomes {.
How it works
The first step is to convert a letter to its ASCII numeric value. For example, a is 97:
$ printf '%d' "'a"
97
The next step is to increment that:
$ echo "$((97+1))"
98
Or:
$ echo "$(($(printf '%d' "'a")+1))"
98
The last step is convert the new incremented number back to a letter:
$ LC_CTYPE=C printf "\\$(printf '%03o' "98")"
b
Or:
$ LC_CTYPE=C printf "\\$(printf '%03o' "$(($(printf '%d' "'a")+1))")"
b
Alternative
With bash, we can define an associative array to hold the next character:
$ declare -A Incr; last=a; for next in {b..z}; do Incr[$last]=$next; last=$next; done; Incr[z]=a
Or, if you prefer code spread out over multiple lines:
declare -A Incr
last=a
for next in {b..z}
do
Incr[$last]=$next
last=$next
done
Incr[z]=a
With this array, characters can be incremented via:
$ echo "${Incr[a]}"
b
$ echo "${Incr[b]}"
c
$ echo "${Incr[c]}"
d
In this version, the increment of z loops back to a:
$ echo "${Incr[z]}"
a
How about an array with entries A-Z assigned to indexes 1-26?
IFS=':' read -r -a alpharray <<< ":A:B:C:D:E:F:G:H:I:J:K:L:M:N:O:P:Q:R:S:T:U:V:W:X:Y:Z"
This has 1=A, 2=B, etc. If you want 0=A, 1=B, and so on, remove the first colon.
IFS=':' read -r -a alpharray <<< "A:B:C:D:E:F:G:H:I:J:K:L:M:N:O:P:Q:R:S:T:U:V:W:X:Y:Z"
Then later, where you actually need the letter;
var=$((var+1))
'phy:/dev/mapper/UUID,xvd${alpharray[$var]},w',
The only problem is that if you end up running past 26 letters, you'll start getting blanks returned from the array.
Use a Bash 4 Range
You can use a Bash 4 feature that lets you specify a range within a sequence expression. For example:
for letter in {a..z}; do
echo "phy:/dev/mapper/UUID,xvd${letter},w"
done
See also Ranges in the Bash Wiki.
Here's a function that will return the next letter in the range a-z. An input of 'z' returns 'a'.
nextl(){
((num=(36#$(printf '%c' $1)-9) % 26+97));
printf '%b\n' '\x'$(printf "%x" $num);
}
It treats the first letter of the input as a base 36 integer, subtracts 9, and returns the character whose ordinal number is 'a' plus that value mod 26.
Use Jot
While the Bash range option uses built-ins, you can also use a utility like the BSD jot utility. This is available on macOS by default, but your mileage may vary on Linux systems. For example, you'll need to install athena-jot on Debian.
More Loops
One trick here is to pre-populate a Bash array and then use an index variable to grab your desired output from the array. For example:
letters=( "" $(jot -w %c 26 a) )
for idx in 1 26; do
echo ${letters[$idx]}
done
A Loop-Free Alternative
Note that you don't have to increment the counter in a loop. You can do it other ways, too. Consider the following, which will increment any letter passed to the function without having to prepopulate an array:
increment_var () {
local new_var=$(jot -nw %c 2 "$1" | tail -1)
if [[ "$new_var" == "{" ]]; then
echo "Error: You can't increment past 'z'" >&2
exit 1
fi
echo -n "$new_var"
}
var="c"
var=$(increment_var "$var")
echo "$var"
This is probably closer to what the OP wants, but it certainly seems more complex and less elegant than the original loop recommended elsewhere. However, your mileage may vary, and it's good to have options!

Generate ID number from a name in bash

Currently I have a bunch of names that are tied to numbers, for example:
Joe Bloggs - 17
John Smith - 23
Paul Smith - 24
Joe Bloggs - 32
Using the name and the number I'd like to generate a random/unique ID made of 4 numbers that also ends with the initial number.
So for example, Joe Bloggs and 17 would make something random/unique like: xxxx17.
Is this possible in bash? Would it be better in some other language?
This would be used on debian and darwin based systems.
It is impossible to ensure than 4-digit hash (checksum) would be unique for a set of 10 character long names.
As an alternative, you can try
file="./somefile"
paste -d"\0\n" <(seq -f "%04g" 9999 | sort -R | head -$(grep -c '' "$file")) <(grep -oP '\d+' "$file")
for better readability
paste -d"\0\n" <(
seq -f "%04g" 9999 | gsort -R | head -$(grep -c '' "$file")
) <(
grep -oP '\d+' "$file"
)
for your input produces something like:
010817
161523
748024
269032
All lines are in the form RRRRXX, where:
the RRRR is an guaranteed unique and random number (from the range 0001 up to 9999)
the XX is the number from your input
decomposition:
seq produces 9999 4-digit numbers (ofc, each number is unique)
sort -R sorts the lines in random order (based on their hash, so get unique random numbers)
head - from the random list show only first N lines, where the N is the number of lines in your file,
the number of lines is counted by grep -c '' (better than wc -l)
the grep -oP filters the numbers from your file
finally the paste combines the two inputs to the final output
the <(..) <(..) is process substitution
Each name, after you add their number, becomes unique already unless there are two Joe Bloggs 17. In your case, there are two Joe Bloggs, one with 17 and 32. Put those together, you have uniqueness "Joe Bloggs 17" and "Joe Bloggs 32" are not the same. Using this, you can simply assign a number to each name + number pair and remember that number in an associative array (dictionary). No need to be random. When you find a name that isn't already in the dictionary, just keep incrementing the number and, then, associate the new number with the name. If uniqueness is the only goal, then you are in good shape for 10,000 people.
Python is a great language for this, but you can make associative arrays in BASH too.
You can get very close to doing exactly what you want using the random string generated by $(date +%N) and then selecting 4 digits to use as the first for characters in the new ID. You can choose from the beginning if you want IDs that are closer together, or from the mid part of the string for more randomness. After selecting your random 4, then just keep track of the ones used in an array and check against the array as each new ID is assigned. This overhead is negligible for 10,000 or so IDs:
#!/bin/bash
declare -a used4=0 # array to hold IDs you have assigned
declare -i dupid=0 # a flag to prompt regeneration in case of a dup
while read -r line || [ -n "$line" ]; do
name=${line% -*}
id2=${line##* }
while [ $dupid -eq 0 ]; do
ns=$(date +%N) # fill variable with nanoseconds
fouri=${ns:4:4} # take 4 integers (mid 4 for better randomness)
# test for duplicate (this is BASH only test - use loop if portability needed)
[[ "$fouri" =~ "${used4[#]}" ]] && continue
newid="${fouri}${id2}" # contatinate 4ints + orig 2 digit id
used4+=( "$fouri" ) # add 4ints to used4 array
dupid=1
done
dupid=0 # reset flag
printf "%s => %s\n" "$line" "$newid"
done<"$1"
output:
$ bash fourid.sh dat/nameid.dat
Joe Bloggs - 17 => 762117
John Smith - 23 => 603623
Paul Smith - 24 => 210424
Joe Bloggs - 32 => 504732

Resources