Rename list of files based off of a list and a directory of files - bash

So, I have a master file - countryCode.tsv, which goes like this,
01 united_states
02 canada
etc.
I have another list of country files, which go like this,
united_states.txt
Wyoming
Florida
etc.
canada.txt
some
blah
shit
etc.
and, I have a list of files that are named like this,
01_1
01_2
02_1
02_2
etc.
the first part of the filename belongs to the country code in the first list, and the second part belongs to the line number of the country file.
for example,
01_02 would contain the info related to florida (united states).
now, here comes my question,
how do i rename these numerically named files to the country_state format, i.e., for example,
01_02 becomes united_states_florida

The way I would do this is to first read all of the countries into an associative array, then I would iterate over that array looking for '.txt' files for each country. When I find one, read each line in turn and look for a file that matches the country code and the line number from that file. If found, rename it.
Here is some sample code:
#!/bin/bash
declare -A countries # countries is an associative array.
while read code country; do
if [ ${#code} -ne 0 ]; then # Ignore blank lines.
countries[${code}]=${country}
fi
done < countryCodes.txt # countryCodes.txt is STDIN for the while
# loop, which is passed on to the read command.
for code in ${!countries[#]}; do # Iterate over the array indices.
counter=0
country=${countries[${code}]}
if [ -r "${country}.txt" ]; then # In case country file does not exist.
while read state; do
((counter++))
if [ -f "${code}_${counter}" ]; then
mv "${code}_${counter}" "${country}_${state}"
fi
done < "${country}.txt"
fi
done

Related

How to make below bash script clean and less repetitive?

I'm making birthday tracker for only a few people and was wondering how to combine all if statements into one since date and name are the only different things in every if statement, rest is same format.
date=`date +%b-%d`
echo $date
if [ $date == "Sep-09" ]; then
echo "Happy Birthday abc!"
fi
if [ $date == "Oct-11" ]; then
echo "Happy Birthday xyz!"
fi
.
.
.
I think I can use positional arguments ($1 for date, $2 for name..) but can't figure out how to use it for multiple dates and names. Or if there's any better way, that would be good to know as well.
Using case:
...
case "$date" in
"Sep-09") name=abc;;
"Oct-11") name=xyz;;
...
*) exit;; # if no condition met
esac
echo "Happy Birthday $name!"
You can use map/dictionary data structure here. You can check example here: How to define hash tables in Bash?
You could store your birthdays in a text file and read them with while and read.
read will let you read into one or more variables, and if there's more words than variables it stuffs all remaining words into the last variable, so it's perfect to use here when the person's name could be multiple words:
help read
The line is split into fields as with word splitting, and the first word is assigned to the first NAME, the second word to the second NAME, and so on, with any leftover words assigned to the last NAME. Only the characters found in $IFS are recognized as word delimiters.
I also suggest using the date format +%m-%d rather than +%b-%d as it means the dates in your file will be in order (e.g. you can run sort birthdays.txt to see everyone in birthday order, which you couldn't do if the month was represented by 3 letters).
birthdays.txt:
09-08 John Smith
10-11 Alice
checkbirthdays.sh:
#!/bin/bash
nowdate="$(date +%m-%d)"
(
while read checkdate name ; do
if [ "$nowdate" = "$checkdate" ] ; then
echo "Happy Birthday $name!"
fi
done
) < birthdays.txt
output of ./checkbirthdays.sh on September 8th:
Happy Birthday John Smith!
Assuming you have bash 4.0 or newer and, names don't contain whitespace or glob characters, use of an associative array would be appropriate for this task:
#!/bin/bash
declare -A birthday_to_names=(
['Sep-09']='abc def'
['Oct-11']='xyz'
)
names=${birthday_to_names[$(date +%b-%d)]}
[[ $names ]] && printf 'Happy Birthday %s!\n' $names
Note that this version congratulates all birthday people if there are multiple names associated with the given birthday.

Compare multiple tsv files for match columns

I have 4466 .tsv files with this structure:
file_structure
I want to compare the 4466 files to see how many IDs (the first column) matches.
I only found bash commands with two files with "comm". Could you tell me how I could do that?
Thank you
I read your question as:
Amongst all TSV files, which column IDs are found in every file?
If that's true, we want the intersection of all the sets of column IDs from all files. We can use the join command to get the intersection of any two files, and we can use the algebraic properites of an intersection to effectively join all files.
Consider the intersection of ID for these three files:
file1.tsv file2.tsv file3.tsv
--------- --------- ---------
ID ID ID
1 1 2
2 3 3
3
"3" is the only ID shared between all three. We can only join two files together at a time, so we need some way to effectively get, join (join file1.tsv file2.tsv) file3.tsv. Fortunately for us intersections are idempotent and associative, so we can apply join iteratively in a loop over all the files, like so:
# "Prime" the common file
cp file1.tsv common.tsv
for TSV in file*.tsv; do
join "$TSV" common.tsv > myTmp
mv myTmp common.tsv
echo "After joining $TSV, common IDs are:"
cat common.tsv
done
When I run that it prints the following:
After joining file1.tsv, common IDs are:
ID
1
2
3
After joining file2.tsv, common IDs are:
ID
1
3
After joining file3.tsv, common IDs are:
ID
3
The first iteration joins file1 with itself (because we primed common with file1); this is where we intersection to be idempotent
The second iteration joins in file2, cutting out ID "2"
The third iteration joins in file3, cutting ID down to just "3"
Technically, join considers the string "ID" to be one of the things to evaluate... it doesn't know what a header line is, or an what an ID is... it just knows to look in some number of fields for common values. In that example we didn't specify a field so it defaulted to the first field, and it always found "ID" and it always found "3".
For your files, we need to tell join to:
separate on a tab character, with -t <TAB-CHAR>
only output the join field (which, by default, is the first field), with -o 0
Here's my full implementation:
#!/bin/sh
TAB="$(printf '\t')"
# myJoin joins tsvX with the previously-joined common on
# the first field of both files; saving the the first field
# of the joined output back into common
myJoin() {
tsvX="$1"
join -t "$TAB" -o 0 common.tsv "$tsvX" > myTmp.tsv
mv myTmp.tsv common.tsv
}
# "Prime" common
cp input1.tsv common.tsv
for TSV in input*.tsv; do
myJoin "$TSV"
done
echo "The common IDs are:"
tail -n -1 common.tsv
For an explanation of why "$(printf '\t')", check out the following for POSIX compliance:
https://www.shellcheck.net/wiki/SC3003
https://unix.stackexchange.com/a/468048/366399
The question sounds quite vague. So, assuming that you want to extract IDs that all 4466 files have in common, i.e. IDs such that each of them occurs at least once in all of the *.tsv files, you can do this (e.g.) in pure Bash using associative arrays and calculating “set intersections” on them.
#!/bin/bash
# removes all IDs from array $1 that do not occur in array $2.
intersect_ids() {
local -n acc="$1"
local -rn operand="$2"
local id
for id in "${!acc[#]}"; do
((operand["$id"])) || unset "acc['${id}']"
done
}
# prints IDs that occur in all files called *.tsv in directory $1.
get_ids_intersection() (
shopt -s nullglob
local -ar files=("${1}/"*.tsv)
local -Ai common_ids next_ids
local file id _
if ((${#files[#]})); then
while read -r id _; do ((++common_ids["$id"])); done < "${files[0]}"
for file in "${files[#]:1}"; do
while read -r id _; do ((++next_ids["$id"])); done < "$file"
intersect_ids common_ids next_ids
next_ids=()
done
fi
for id in "${!common_ids[#]}"; do printf '%s\n' "$id"; done
)
get_ids_intersection /directory/where/tsv/files/are

Open file with two columns and dynamically create variables

I'm wondering if anyone can help. I've not managed to find much in the way of examples and I'm not sure where to start coding wise either.
I have a file with the following contents...
VarA=/path/to/a
VarB=/path/to/b
VarC=/path/to/c
VarD=description of program
...
The columns are delimited by the '=' and some of the items in the 2nd column may contain gaps as they aren't just paths.
Ideally I'd love to open this in my script once and store the first column as the variable and the second as the value, for example...
echo $VarA
...
/path/to/a
echo $VarB
...
/path/to/a
Is this possible or am I living in a fairy land?
Thanks
You might be able to use the following loop:
while IFS== read -r name value; do
declare "$name=$value"
done < file.txt
Note, though, that a line like foo="3 5" would include the quotes in the value of the variable foo.
A minus sign or a special character isn't allowed in a variable name in Unix.
You may consider using BASH associative array for storing key and value together:
# declare an associative array
declare -A arr
# read file and populate the associative array
while IFS== read -r k v; do
arr["$k"]="$v"
done < file
# check output of our array
declare -p arr
declare -A arr='([VarA]="/path/to/a" [VarC]="/path/to/c" [VarB]="/path/to/b" [VarD]="description of program" )'
What about source my-file? It won't work with spaces though, but will work for what you've shared. This is an example:
reut#reut-home:~$ cat src
test=123
test2=abc/def
reut#reut-home:~$ echo $test $test2
reut#reut-home:~$ source src
reut#reut-home:~$ echo $test $test2
123 abc/def

Loop two variables through one command in shell

I want to run a shell script that can simultaneously loop through two variables.
So that I can have an input and output file name. I feel like this isn't too hard of a concept but any help is appreciated.
Files = "File1,
File2,
...
FileN
"
Output = OutFile1,
Outfile2,
...
OutfileN
"
and I would in theory my code would be:
for File in $Files
do
COMMAND --file $File --ouput $Output
done
Obviously, there needs to be another loop but I'm stuck, any help is appreciated.
You don't really need to loop 2 variables, just use 2 BASH arrays:
input=("File1" "File2" "File3")
output=("OutFile1" "OutFile2" "OutFile3")
for ((i=0; i<${#input[#]}; i++)); do
echo "Processing input=${input[$i]} and output=${output[$i]}"
done
zsh enables multiple loop variables before the list.
#!/bin/zsh
input2output=(
'File1' 'Outfile1'
'File2' 'Outfile2'
)
for input ouput in $input2output
do
echo "[$input] --> [$ouput]"
done
quotes from zsh(5.9) manual or man zshmisc
for name ... [ in word ... ] term do list done
More than one parameter name can appear before the list of words. If N names are given, then on each execution of the loop the next N words are assigned to the corresponding parameters. If there are more names than remaining words, the remaining parameters are each set to the empty string.

Save a newline separated list into several bash variables

I'm relatively new to shell scripting and am writing a script to organize my music library. I'm using awk to parse the id3 tag info and am generating a newline separated list like so:
Kanye West
College Dropout
All Falls Down
I want to store each field in a separate variable so I can easily compose some mkdir and mv commands. I've tried piping the output to IFS=$'\n' read artist album title but each variable remains empty. I'm open to producing a different output from awk, but I still want to know how to parse a newline separated list using bash.
Edit:
It turns out that by piping directly to read by doing:
id3info "$filename" | awk "$awkscript" | {read artist; read album; read title;}
WILL NOT WORK. It results in the variables existing in a different scope. I found that using a herestring works best:
{read artist; read album; read title;} <<< "$(id3info "$filename" | awk "$awkscript")"
read normally reads one line at a time. So, if your id3 info is in the file testfile.txt, you can read it in as follows:
{ read artist ; read album ; read song ; } <testfile.txt
echo "artist='$artist' album='$album' song='$song'"
# insert your mkdir and mv commands....
When run on your test file, the above outputs:
artist='Kanye West' album='College Dropout' song='All Falls Down'
You can just read the file into a bash array and loop through the array like so:
IFS=$'\r\n' content=($(cat ${filepath}))
for ((idx = 0; idx < ${#content[#]}; idx+=3)); do
artist=${content[idx]}
album=${content[idx+1]}
title=${content[idx+2]}
done
Or read three lines in a loop.
yourscript |
while read artist; do # read first line of input
read album # read second line of input
read song # read third line of input
: self-destruct if the genre is rap
done
This loop will consume input lines in groups of three. If there is not an even multiple of three lines of input, the reads after that inside the loop will simply fail and the variables will be empty.
You can read the output from awk into an array. E.g.
readarray -t array <<< "$(printf '%s\n' 'Kanye West' 'College Dropout' 'All Falls Down')"
for ((i=0; i<${#array[#]}; i++ )) ; do
echo "array[$i]=${array[$i]}"
done
Produces:
array[0]=Kanye West
array[1]=College Dropout
array[2]=All Falls Down

Resources