Getting File name from String in Shell Script - shell

I need to get the name of the file (IN_INK_GOOGLE_20200519.dat) in below string . I want to do it using shellscript. Also, as filename regex conditions are - should start with IN_INK & end with .dat and if it matches more than once, only one match or first match should print. is it possible? ,
echo "-sIN.GOOGLE.IN.INK.INNK.INACC\n-i/am/ft/data/INK/out/GOOGLE_INK/INK/out/IN_INK_GOOGLE_20200519.dat\n-o/apps/tnk/in/download/in/IN_INK_GOOGLE_20200519.dat\n-uinnko1\n-end"

One solution is to pipe into sed as follows:
echo ... | sed -e 's/[.]dat.*/.dat/' -e 's|.*/||'
To use a "pure shell" solution try the # and % features of variable expansions:
e="-sIN.GOOGLE.IN.INK.INNK.INACC\n-i/am/ft/data/INK/out/GOOGLE_INK/INK/out/IN_INK_GOOGLE_20200519.dat\n-o/apps/tnk/in/download/in/IN_INK_GOOGLE_20200519.dat\n-uinnko1\n-end"
f="${e%%.dat*}.dat"; #strip trailing clutter after first
echo ${f##*/}; #echo just name
echo IN_INK${f##*/IN_INK} #variation

Related

Replacing 1 character with sed

I am trying to process a change of a specific character with regex using sed.
Essentially I am running a bash script that is renaming files that have a specific string and I need to keep this string mostly constant. Here is an example file name:
_FILE20210714.023.jpg
So I am trying to create a variable nfile that is used for the mv command and will convert it to the following:
_FILE20210714.123.jpg
Keep in mind that I only want to change the last 0 to a 1.
I came up with the following regex to grab that specific character, but I'm lost on how to substitute with sed:
_FILE\d{8}\.\K0
nfile=$(echo ${file}| sed -e 's/_FILE\d{8}\.\K0/_FILE\d{8}\.\K1/')
when i then echo the nfile variable i get the original name and i'm not sure how to resolve this.
echo ${file}
echo ${nfile}
/home/user/_FILE20210714.023.jpg
/home/user/_FILE20210714.023.jpg
So essential once I can substitute the 023 to 123 I'm set only problem is I have multiple files that end in like .034.jpg so I can't direct string match it.
sed doesn't support the \d escape sequence, you need to use [0-9].
Unless you use the -E option, you have to escape {} quantifiers.
sed doesn't support \K, but I don't think it's needed here.
You need to use a capture group to copy the digits from the original name to the replacement.
nfile=$(echo "${file}"| sed -E -e 's/(_FILE[0-9]{8}\.)0/\11/')
For this particular case a simple parameter substitution should suffice:
for file in '_FILE20210714.023.jpg' '/home/user/_ACH20210714.023.jpg'
do
nfile="${file//.0/.1}"
echo "######################"
echo " file: ${file}"
echo "nfile: ${nfile}"
done
This generates:
######################
file: _FILE20210714.023.jpg
nfile: _FILE20210714.123.jpg
######################
file: /home/user/_ACH20210714.023.jpg
nfile: /home/user/_ACH20210714.123.jpg
If you have the perl rename on your system, you'd write
rename -v 's/\.0(\d+\.jpg)$/.1$1/' *.jpg
Since you tagged bash
newname () {
local parts=() IFS="."
read -ra parts <<< "$1"
parts[1]="1${parts[1]#0}"
echo "${parts[*]}"
}
for file in *.jpg; do
mv -v "$file" "$(newname "$file")"
done

Remove suffix as well as prefix from path in bash

I have filepaths of the form:
../healthy_data/F35_HC_532d.dat
I want to extract F35_HC_532d from this. I can remove prefix and suffix from this filename in bash as:
for i in ../healthy_data/*; do echo ${i#../healthy_data/}; done # REMOVES PREFIX
for i in ../healthy_data/*; do echo ${i%.dat}; done # REMOVES SUFFIX
How can I combine these so that in a single command I would be able to remove both and extract only the part that I want?
You can use BASH regex for this like this and print captured group #1:
for file in ../healthy_data/*; do
[[ $file =~ .*/([_[:alnum:]]+)\.dat$ ]] && echo "${BASH_REMATCH[1]}"
done
If you can use Awk, it is pretty simple,
for i in ../healthy_data/*
do
stringNeeded=$(awk -F/ '{split($NF,temp,"."); print temp[1]}' <<<"$i")
printf "%s\n" "$stringNeeded"
done
The -F/ splits the input string on / character, and $NF represents the last field in the string in that case, F35_HC_532d.dat, now the split() function is called with the de-limiter . to extract the part before the dot.
The options/functions in the above Awk are POSIX compatible.
Also bash does not support nested parameter expansions, you need to modify in two fold steps something like below:-
tempString="${i#*/*/}"
echo "${tempString%.dat}"
In a single-loop,
for i in ../healthy_data/*; do tempString="${i#*/*/}"; echo "${tempString%.dat}" ; done
The two fold syntax here, "${i#*/*/}" part just stores the F35_HC_532d.dat into the variable tempString and in that variable we are removing the .dat part as "${tempString%.dat}"
If all files end with .dat (as you confirmed) you can use the basename command:
basename -s .dat /path/to/files/*
If there are many(!) of those files, use find to avoid an argument list too long error:
find /path/to/files -maxdepth 1 -name '*.dat' -exec basename -s .dat {} +
For a shell script which needs to deal if any number of .dat files use the second command!
Do you count this as one step?
for i in ../healthy_data/*; do
sed 's#\.[^.]*##'<<< "${i##*/}"
done
You can't strip both a prefix and suffix in a single parameter expansion.
However, this can be accomplished in a single loop using parameter expansion operations only. Just save the prefix stripped expansion to a variable and use expansion again to remove its suffix:
for file in ../healthy_data/*; do
prefix_stripped="${file##*\/healthy_data\/}"
echo "${prefix_stripped%.dat}"
done
If you are on zsh, one way to achieve this without the need for defining another variable is
for i in ../healthy_data/*; do echo "${${i#../healthy_data/}%.dat}"; done
This removes prefix and suffix in one step.
In your specific example the prefix stems from the fact that the files are located in a different directory. You can get rid of the prefix by cding in this case.
(cd ../healthy_data ; for i in *; do echo ${i%.dat}; done)
The (parens) invoke a sub shell process and your current shell stays where it is. If you don't want a sub shell you can cd back easily:
cd ../healthy_data ; for i in *; do echo ${i%.dat}; done; cd -

Unix: Remove date from filename using sed without modifying existing one

I have a legacy code which transmits the file only if it has date within the command. But client transmission doesnt want date to be appended to filename. legacy code cannot be modified since many other transmision depends on it. So my requirement is i want to have date parameter in the command but again the same has to be removed using a single command.
Condition in legacy code:
grep '\`date' $COMMAND
Note: COMMAND will contain the complete command defined below and not the filename (not CMD output).
So ideally my command should have `date added. I added a command like this below.
CMD=`echo prefix_filename.txt | sed 's/^prefix_//'`_`date +%m%d%Y`
The above command is used to remove prefix_ and send filename. Here i get output as filename.txt_09232016. Since legacy code logic only checks if command has `date in it, i added it. Is there a way to remove the date again in the same command so that my output will be filename.txt
Current output:
filename.txt_09232016
Expected output:
filename.txt
Get the file name before date part:
echo 'filename.txt_09232016' | grep -o '^.*\.txt'
Or remove date from the end of the file:
echo 'filename.txt_09232016' | sed 's/_[0-9]\+$//'
There are a number of things you can do to improve/simplify your code. The main thing is that bash have very nice built-in string manipulation. Another is that you should probably use $(...) instead of `...` notation:
CMD=`echo prefix_filename.txt | sed 's/^prefix_//'`_`date +%m%d%Y`
Can be replaced with
ORIG=prefix_filename.txt
CMD=${ORIG#prefix_}_$(date +%m%d%Y)
Continuing,
echo $CMD
NODATE=${CMD%_*}
echo $NODATE
This prints
filename.txt_09232016
filename.txt
The construct ${var#pattern} removes the shortest occurrence of pattern from the start of your variable: in this case, prefix_. Similarly, the construct ${var%pattern} removes the shortest occurrence of pattern from the end of your string: in this case _*.
In the first case, you could have used ${var##pattern} since prefix_ is a fixed string. However, in the second case you could not use ${var%%pattern}, since you want to make sure you only truncate starting at the last underscore, not the first one and the date is specified as a dynamic pattern.
Just as an FYI, the links point to www.tldp.org, which has the best Bash manual I have come across by far. It gets dense sometimes, but the explanations are generally worth it in the end.
Just do that:
echo filename.txt_09232016 | sed s/_[^_]*$//
Here, you are replacing (by nothing) ' _ ' and all subsequent characters, until the end of the string ($), since they are all different (^) of ' _ '.

Create variable by combining text + another variable

Long story short, I'm trying to grep a value contained in the first column of a text file by using a variable.
Here's a sample of the script, with the grep command that doesn't work:
for ii in `cat list.txt`
do
grep '^$ii' >outfile.txt
done
Contents of list.txt :
123,"first product",description,20.456789
456,"second product",description,30.123456
789,"third product",description,40.123456
If I perform grep '^123' list.txt, it produces the correct output... Just the first line of list.txt.
If I try to use the variable (ie grep '^ii' list.txt) I get a "^ii command not found" error. I tried to combine text with the variable to get it to work:
VAR1= "'^"$ii"'"
but the VAR1 variable contained a carriage return after the $ii variable:
'^123
'
I've tried a laundry list of things to remove the cr/lr (ie sed & awk), but to no avail. There has to be an easier way to perform the grep command using the variable. I would prefer to stay with the grep command because it works perfectly when performing it manually.
You have things mixed in the command grep '^ii' list.txt. The character ^ is for the beginning of the line and a $ is for the value of a variable.
When you want to grep for 123 in the variable ii at the beginning of the line, use
ii="123"
grep "^$ii" list.txt
(You should use double quotes here)
Good moment for learning good habits: Continue in variable names in lowercase (well done) and use curly braces (don't harm and are needed in other cases) :
ii="123"
grep "^${ii}" list.txt
Now we both are forgetting something: Our grep will also match
1234,"4-digit product",description,11.1111. Include a , in the grep:
ii="123"
grep "^${ii}," list.txt
And how did you get the "^ii command not found" error ? I think you used backquotes (old way for nesting a command, better is echo "example: $(date)") and you wrote
grep `^ii` list.txt # wrong !
#!/bin/sh
# Read every character before the first comma into the variable ii.
while IFS=, read ii rest; do
# Echo the value of ii. If these values are what you want, you're done; no
# need for grep.
echo "ii = $ii"
# If you want to find something associated with these values in another
# file, however, you can grep the file for the values. Use double quotes so
# that the value of $ii is substituted in the argument to grep.
grep "^$ii" some_other_file.txt >outfile.txt
done <list.txt

Running command on substring of every file

Let's say I've some files like:
samplea.txt
sampleb.txt
samplec.txt
And I want to run some command with this form:
./cmd -foo a.xml -bar samplea.txt
First I've tried to
for file in "./*.txt"
do
echo -e $file
done
But this way it will print every file in a straight line. By trying:
echo -e $file\n
It does not produce the expected (single line for each file).
Couldn't even pass through the first part of the problem, that would be running a command on each file (which it could be achieved by find (...) -exec), but what i really wanted to do was extract a substring of each name.
Doing:
echo ${file:1}
won't work since I could only do so after splitting the filenames, to get the "a","b","c" from each one.
I'm sorry if it sounds confusing, but it's my first bash script.
Do not quote the wildcard expression. You can use parameter expansion to remove parts of a string:
for file in sample*.txt ; do
part=${file#sample} # Remove "sample" at the beginning.
part=${part%.txt} # Remove ".txt" at the end.
./cmd -foo "$part".xml -bar "$file"
done

Resources