Removing unneeded information in filename using bash script - bash

I need to remove everything before and after a name in bash script, the following are examples
test_3123_123_testone-2.cpp
abc_3123_12312_a.cpp
johnchase_4123123123_123123123_johnc-1.cpp
I would need them simply change into
testone.cpp
a.cpp
johnc.cpp
But having trouble with regex and trying to get this setup properly, any advice would be great!

the text right before .cpp but ignoring any dashes(-2) if they exist behind it.
Do exactly that. Write it from the end.
"before .cpp" -> so .cpp must be last
"ignoring any dashes" - so there is a dash(-2)
"if they exist behind it" - the dash is optional
"the text" - so match the text.
var=test_3123_123_testone-2.cpp
[[ "$var" =~ ([^_-]*)(-[0-9]+)?.cpp$ ]]
echo ${BASH_REMATCH[1]}.cpp

If your filename is in a variable called file, then: echo ${${file##*_}/-[0-9]*[.]/.}.
Breaking that down, inside out:
${file$$*_} remove everything from start of string to last _
${file/-[0-9]*[.]/.} replace "dash number dot" with just a dot
This is not bullet-proof, but it covers the given cases.
$ for file in test_3123_123_testone-2.cpp abc_3123_12312_a.cpp johnchase_4123123123_123123123_johnc-1.cpp
do
echo ${${file##*_}/-[0-9]*[.]/.}
done
testone.cpp
a.cpp
johnc.cpp

Related

How do I move files into folders with similar names in Unix?

I'm sorry if this question has been asked before, I just didn't know how to word it as a search query.
I have a set of folders that look like this:
Brain - Amygdala/ Brain - Spinal cord (cervical c-1)/ Skin - Sun Exposed (Lower leg)/
Brain - Caudate (basal ganglia)/ Lung/ Whole Blood/
I also have a set of files that look like this:
Brain_Amygdala.v7.covariates_output.txt Skin_Not_Sun_Exposed_Suprapubic.v7.covariates_output.txt
Brain_Caudate_basal_ganglia.v7.covariates_output.txt Skin_Sun_Exposed_Lower_leg.v7.covariates_output.txt
Brain_Spinal_cord_cervical_c-1.v7.covariates_output.txt Whole_Blood.v7.covariates_output.txt
As you can see, the files do not perfectly match up with the directories in their names. For example, Brain_Amygdala.v7.covariates_output.txt is not totally identical to Brain - Amygdala/. Even if we were to excise the tissue name from the covariates file, Brain_Amygdala is formatted differently from its corresponding folder.
Same with Whole Blood/. It is different from Whole_Blood.v7.covariates_output.txt, even if you were to isolate the tissue name from the covariates file Whole_Blood.
What I want to do, however, is to move each of these tissue files to their corresponding folder. If you notice, the covariate files are named after the tissue leading up to the first dot . in the file name. They are separated by underscores _. How I was thinking about approaching this was to break up the first few words leading up to the first . of the file name so that I can easily move it to its corresponding file.
e.g.
Brain_Amygdala.v7.covariates_output.txt -> Brain*Amygdala [mv]-> Brain*Amygdala/
a) I'm not sure how to isolate the first words of a file name leading up to the first . in a filename
b) if I were to do that, I don't know how to insert a wildcard in between each word and match that to the corresponding folder.
However, I am completely open to other ways of doing something like this.
Not a full answer, but it should address some of your concerns:
a) to isolate the first word of a string, leading up to the first .: use Parameter Expansions
string=Brain_Amygdala.v7.covariates_output.txt
until_dot=${string%%.*}
echo "$until_dot"
will output Brain_Amygdala (which we saved in the variable until_dot).
b) You may want to use the ${parameter/pattern/string} parameter expansion:
# Replace all non-alphabetic characters by the glob *
glob_pattern=${until_dot//[^[:alpha:]]/*}
echo "$glob_pattern"
will output (with the same variables as above) Brain*Amygdala
c) To use all of this: it's probably a good idea to determine the possible targets first, and do some basic checks:
# Use nullglob to have non matching glob expand to nothing
shopt -s nullglob
# DO NOT USE QUOTES IN THE FOLLOWING EXPANSION:
# the variable is actually a glob!
# Could also do dirs=( $glob_pattern*/ ) to check if directory
dirs=( $glob_pattern/ )
# Now check how many matches there are:
if ((${#dirs[#]} == 0)); then
echo >&2 "No matches for $glob_pattern"
elif ((${#dirs[#]} > 1)); then
echo >&2 "More than one matches for $glob_pattern: ${dirs[#]}"
else
echo "All good!"
# Remove the echo to actually perform the move
echo mv "$string" "${dirs[0]}"
fi
I don't know how your data will effectively conform to these, but I hope this answer actually answers some of your questions! (and to learn more about parameter expansions, do read — and experiment with — the link to the reference I gave you).

Bash check if path contains two folders

How can I search an arbitrary path and determine if it has two folder names? The folder names can appear in any position in either order. Not a shell expert so seeking help here.
if [ -p "$PATH" ]; then
echo "path is set"
else
echo "path is not set"
fi
I found this segment but I'm not sure it's useful. $PATH is a special variable correct?
First, let me make sure I understand the question right. You have some path (like "/home/sam/foo/bar/baz") and you want to test whether it contains two specific directory names (e.g. "foo" and "bar") in either order, right? So, looking for "foo" and "bar":
/home/sam/foo/bar/baz would match
/mnt/bar/subdir/foo would also match
/mnt/bar/foo2 would not match, because "foo2" is not "foo"
If that's correct, you can do this in bash as two tests:
dir1="foo"
dir2="bar"
if [[ "/$path/" = *"/$dir1/"* && "/$path/" = *"/$dir2/"* ]]; then
echo "$path" contains both $dir1 and $dir2"
else
echo "$path" does not contain both $dir1 and $dir2"
fi
Notes:
This is using the [[ ]] conditional expression, which is different from [ ] and not available in basic shells. If you use this type of expression, you need to start the shell script with a shebang that tells the OS to run it with bash, not a generic shell (i.e. the first line should be either #!/bin/bash or #!/usr/bin/env bash), and do not run it with the sh command (that will override the shebang).
The way the comparison works is that it sees whether the path matches both the patterns *"/$dir1/"* and *"/$dir2/"* -- that is, it matches those names, with a slash at each end, maybe with something else (*) before and after. But since the path might not start and/or end with a slash, we add them ("/$path/") to make sure they're there.
Do not use PATH as a variable in your script -- it's a very special variable that tells the shell where to find executable commands. If you ever use it for anything else, your script will suddenly start getting "command not found" errors. Actually, there are a bunch of all-caps special-meaning variables; to avoid conflicts with them, use lowercase or mixed-case variables for your things.

Comparing two sets of variables line by line in unix, code only prints out the very last line

this is my first stackoverflow question, regarding bash scripting. I am a beginner in this language, so be kind with me.
I am trying to write a comparison script. I tried to store all the outputs into variables, but only the last one is stored.
Example code:
me:1234567
you:2345678
us:3456789
My code:
#!bin/bash
while read -r forName forNumber
do
aName="$forName"
echo "$aName"
aNumber="$forNumber"
echo "$aNumber"
done < "exampleCodeFile.txt"
echo "$aNumber"
For the first time, everything will be printed out fine. However, the second echo will only print out "3456789", but not all the numbers again. Same with $aName. This is a problem because i have another file, which i stored a bunch of numbers to compare $aNumber with, using the same method listed above, called $aMatcher, consisting:
aMatcher:
1234567
2345678
3456789
So if i tried to run a comparison:
if [ "$aNumber" == "$aMatcher" ]; then
echo "match found!"
fi
Expected output (with bash -x "scriptname"):
'['1234567 == 1234567']'
echo "match found!"
Actual output (with bash -x "scriptname"):
'['3456789 == 3456789']'
echo "match found!"
Of course my end product would wish to list out all the matches, but i wish to solve my current issue before attempting anything else. Thanks!
When you run your following code
aNumber="$forNumber"
You are over-writing the variable $aNumber for every line of the file exampleCodeFile.txt rather than appending.
If you really want the values to be appended, change the above line to
aNumber="$aNumber $forNumber"
And while matching with $aMatcher, you again have to use a for/while loop to iterate through every value in $aNumber and $aMatcher.

Trouble getting shell script to work with simple conditional statement

This is part of a bigger project but I can't get this part to work and I'm having a brain fart.
#!/bin/bash
echo -n "Do you wish to download/checkout the source code? > "
read text
if ["$text" = "Yes"]
then
do something
else
do something else
fi
It should simply be reading in what the user types and then go through a simple conditional. but I get this error
./check.sh: line 6: [Yes: command not found
I thought I had formatted the shell script correctly but I guess not.
Add spaces after brackets:
if [[ "$text" = "Yes" ]]
When performing operations on strings it's always a good idea to use double square brackets. It will make your code work properly with spaces and new lines.

Writing a shell conditional for extensions

I'm writing a quick shell script to build and execute my programs in one fell swoop.
I've gotten that part down, but I'd like to include a little if/else to catch bad extensions - if it's not an .adb (it's an Ada script), it won't let the rest of the program execute.
My two-part question is:
How do I grab just the extension? Or is it easier to just say *.adb?
What would the if/else statement look like? I have limited experience in Bash so I understand that's a pretty bad question.
Thanks!
There are ways to extract the extension, but you don't really need to:
if [[ $filename == *.adb ]] ; then
. . . # this code is run if $filename ends in .adb
else
. . . # this code is run otherwise
fi
(The trouble with extracting the extension is that you'd have to define what you mean by "extension". What is the extension of a file named foo? How about a file named report.2012.01.29? So general-purpose extension-extracting code is tricky, and not worth it if your goal is just to confirm that file has a specific extension.)
There are multiple ways to do it. Which is best depends in part on what the subsequent operations will be.
Given a variable $file, you might want to test what the extension is. In that case, you probably do best with:
extn=${file##*.}
This deletes everything up to the last dot in the name, slashes and all, leaving you with adb if the file name was adafile.adb.
If, on the other hand, you want to do different things depending on the extension, you might use:
case "$file" in
(*.adb) ...do things with .adb files;;
(*.pqr) ...do things with .pqr files;;
(*) ...cover the rest - maybe an error;;
esac
If you want the name without the extension, you can do things the more traditional way with:
base=$(basename $file .adb)
path=$(dirname $file)
The basename command gives you the last component of the file name with the extension .adb stripped off. The dirname command gives you the path leading to the last component of the file name, defaulting to . (the current directory) if there is no specified path.
The more recent way to do those last two operations is:
base=${file##/}
path=${file%/*}
The advantage of these is that they are built-in operations that do not invoke a separate executable, so they are quicker. The disadvantage of the built-ins is that if you have a name that ends with a slash, the built-in treats it as significant but the command does not (and the command is probably giving you the more desirable behaviour, unless you want to argue GIGO).
There are other techniques available too. The expr command is an old, rather heavy-weight mechanism that would not normally be used (but it is very standard). There may be other techniques using the (( ... )), $(( ... )) and [[ ... ]] operators to evaluate various sorts of expression.
To get just the extension from the file path and name, use parameter expansion:
${filename##*.} # deletes everything to the last dot
To compare it with the string adb, just do
if [[ ${filename##*.} != adb ]] ; then
echo Invalid extension at "$filename".
exit 1
fi
or, using 'else`:
if [[ ${filename##*.} != adb ]] ; then
echo Invalid extension at "$filename".
else
# Run the script...
fi
Extension:
fileext=`echo $filename | sed 's_.*\.__'`
Test
if [[ x"${fileext}" = "xadb" ]] ; then
#do something
fi

Resources