How to display 0-nth character in a file name - shell

I have a file = 'test_acn_mark_down_201400000.csv'.
I wanted to have a value only file1='test_acn_mark_down' in unix
which means from position 0 to f4 and the delimiter will be '-'.
Please help me .

You can use cut:
file='test_acn_mark_down_201400000.csv'
echo "$file" | cut -d _ -f1-4
file1=$(echo "echi $file" | cut -d _ -f1-4)
echo "$file1"
test_acn_mark_down

You asked for the first four fields.
The best way will be using cut:
file=test_acn_mark_down_201400000.csv
file1=$(echo "${file}" | cut -d _ -f1-4)
When you know that the remaining part of the filename is without '_' characters, you can also use a special syntax removing everything on the end starting with the last underscore:
file=test_acn_mark_down_201400000.csv
file1="${file%_*}"

Related

Cut everything from specific char (or after) + Bash

I have files which all look like this:
filename.bla_1
of cours I cannot know if the filename has "_" in it. could be file_name.bla_1.
I want to write a function that take filename and delete the _# at the end.
filename.bla_1 will be --> filename.bla
echo $filename | rev | cut -d "_" -f2 | rev
will do the trick if the file doesn't have "" in the name but I want to make sure this works also for filenames with ""
You can use parameter expansion. The % removes the shortest possible pattern on the right side of the value, ## removes the longest possible match on the left:
#! /bin/bash
for f in filename.bla_1 \
file_name_with_underscores.foo_2 \
file_name_with_underscores.foo \
filename.with_dots.foo_2 ; do
ext=${f##*.}
basename=${f%.*}
echo "$basename.${ext%_*}"
done
If you care to tweak the globbing parser a little,
shopt -s extglob
for f in abc.bla a_b_c_.bla abc.bla_1 a_b_c_.bla_2 123.456.789 123.456.789_x abc_
do echo ${f%_+([^._])}
done
abc.bla
a_b_c_.bla
abc.bla
a_b_c_.bla
123.456.789
123.456.789
abc_
${f%_+([^._])} means the value of $f with a _ followed immediately by one or more non-dot-or-underscore characters trimmed OFF the end.
Use #choroba's answer.
But to fix your code, after you reverse the filename, you need to take the 2nd and all following fields, not just the 2nd:
$ filename=foo_bar_baz.bla_1
$ rev <<<"$filename" | cut -d_ -f2- | rev
foo_bar_baz.bla
The -f2- with the trailing hyphen is the magic here. Read the cut man page.

Using sed to find a string with wildcards and then replacing with same wildcards

So I am trying to remove new lines using sed, because it the only way I can think of to do it. I'm completely self taught so there may be a more efficient way that I just don't know.
The string I am searching for is \HF=-[0-9](newline character). The problem is the data it is searching through can look like (Note: there are actual new line characters in this data, which I think is causing a bit of the problem)
1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc-
pVDZ\\Squish3_Slide0\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0.,-1.
3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0.6974
,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411,0.\H
,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0,0.,1
.3948,3.\C,0,0.,-1.3948,3.\C,0,1.2079,0.6974,3.\C,0,-1.2079,0.6974,3.\
C,0,-1.2079,-0.6974,3.\C,0,1.2079,-0.6974,3.\H,0,0.,2.4822,3.\H,0,2.14
97,1.2411,3.\H,0,-2.1497,1.2411,3.\H,0,-2.1497,-1.2411,3.\H,0,2.1497,-
1.2411,3.\H,0,0.,-2.4822,3.\\Version=ES64L-G09RevD.01\State=1-AG\HF=-4
61.3998608\MP2=-463.0005321\RMSD=3.490e-09\PG=D02H [SG"(C4H4),X(C8H8)]
\\#
OR
1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc-
pVDZ\\Squish3.1_Slide0\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0.,-
1.3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0.69
74,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411,0.
\H,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0,0.
,1.3948,3.1\C,0,0.,-1.3948,3.1\C,0,1.2079,0.6974,3.1\C,0,-1.2079,0.697
4,3.1\C,0,-1.2079,-0.6974,3.1\C,0,1.2079,-0.6974,3.1\H,0,0.,2.4822,3.1
\H,0,2.1497,1.2411,3.1\H,0,-2.1497,1.2411,3.1\H,0,-2.1497,-1.2411,3.1\
H,0,2.1497,-1.2411,3.1\H,0,0.,-2.4822,3.1\\Version=ES64L-G09RevD.01\St
ate=1-AG\HF=-461.4104442\MP2=-463.0062587\RMSD=3.651e-09\PG=D02H [SG"(
C4H4),X(C8H8)]\\#
OR
1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc-
pVDZ\\Squish3.3_Slide1.7\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0.
,-1.3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0.
6974,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411,
0.\H,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0,
0.,-0.3052,3.3\C,0,0.,-3.0948,3.3\C,0,1.2079,-1.0026,3.3\C,0,-1.2079,-
1.0026,3.3\C,0,-1.2079,-2.3974,3.3\C,0,1.2079,-2.3974,3.3\H,0,0.,0.782
2,3.3\H,0,2.1497,-0.4589,3.3\H,0,-2.1497,-0.4589,3.3\H,0,-2.1497,-2.94
11,3.3\H,0,2.1497,-2.9411,3.3\H,0,0.,-4.1822,3.3\\Version=ES64L-G09Rev
D.01\State=1-AG\HF=-461.436061\MP2=-463.0177441\RMSD=7.859e-09\PG=C02H
[SGH(C4H4),X(C8H8)]\\#
OR
1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc-
pVDZ\\Squish3.6_Slide0.9\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0.
,-1.3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0.
6974,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411,
0.\H,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0,
0.,0.4948,3.6\C,0,0.,-2.2948,3.6\C,0,1.2079,-0.2026,3.6\C,0,-1.2079,-0
.2026,3.6\C,0,-1.2079,-1.5974,3.6\C,0,1.2079,-1.5974,3.6\H,0,0.,1.5822
,3.6\H,0,2.1497,0.3411,3.6\H,0,-2.1497,0.3411,3.6\H,0,-2.1497,-2.1411,
3.6\H,0,2.1497,-2.1411,3.6\H,0,0.,-3.3822,3.6\\Version=ES64L-G09RevD.0
1\State=1-AG\HF=-461.4376969\MP2=-463.0163868\RMSD=7.263e-09\PG=C02H [
SGH(C4H4),X(C8H8)]\\#
Basically the number I am looking for can be broken up into two lines at any point based on character count. I need to get rid of the newline breaking up the number so that I can extract the entire value into a separate file. (I have no problems with the extraction to a new file, hence why it isn't included in the code)
Currently I am using this code
sed -i ':a;N;$!ba;s/HF=-*[0-9]*\n/HF=-*[0-9]*/g' $i &&
Which ALMOST works, expect it doesn't replace the wildcard values with the same values. It replaces it with the actual text [0-9] instead and doesn't always remove the new line character.
Important to the is that THERE ARE ACTUAL NEW LINE CHARACTERS in the output file and there is no way to change that without messing up the other 30 lines I am extracting from this output file.
What I want is to just get rid of the newline characters that occur when that string is found, regardless of how many digits there are in between the - sign and the newline character.
So the expected output would be something like
1\1\GINC-N076\SP\RMP2-FC\CC-pVDZ\C12H12\R2536\09-Apr-2020\0\\# mp2/cc-
pVDZ\\Squish3_Slide0\\0,1\H,0,0.,2.4822,0.\C,0,0.,1.3948,0.\C,0,0.,-1.
3948,0.\C,0,1.2079,0.6974,0.\C,0,-1.2079,0.6974,0.\C,0,-1.2079,-0.6974
,0.\C,0,1.2079,-0.6974,0.\H,0,2.1497,1.2411,0.\H,0,-2.1497,1.2411,0.\H
,0,-2.1497,-1.2411,0.\H,0,2.1497,-1.2411,0.\H,0,0.,-2.4822,0.\C,0,0.,1
.3948,3.\C,0,0.,-1.3948,3.\C,0,1.2079,0.6974,3.\C,0,-1.2079,0.6974,3.\
C,0,-1.2079,-0.6974,3.\C,0,1.2079,-0.6974,3.\H,0,0.,2.4822,3.\H,0,2.14
97,1.2411,3.\H,0,-2.1497,1.2411,3.\H,0,-2.1497,-1.2411,3.\H,0,2.1497,-
1.2411,3.\H,0,0.,-2.4822,3.\\Version=ES64L-G09RevD.01\State=1-AG\HF=-461.3998608\MP2=-463.0005321\RMSD=3.490e-09\PG=D02H [SG"(C4H4),X(C8H8)]
\\#
These files are rather large and have over 1500 executions of this line of code, so the more efficient the better.
Everything else in the script this is in is using a combination of grep, awk, sed, and basic UNIX commands.
EDIT
After trying
sed -i -E ':a;N;$!ba;s/(\\HF=-?[.0-9]*)\n/\1/' $i &&
I still had no luck getting rid of those pesky new line characters.
If it has any effect on the answers at all here is the rest of the code to go with the one line that is causing problems
echo name HF MP2 mpdiff | cat > allE
for i in *.out
do echo name HF MP2 mpdiff | cat > $i.allE
grep "Slide" $i | cut -d "\\" -f2 | cat | tr -d '\n' > $i.name &&
grep "EUMP2" $i | cut -d "=" -f3 | cut -c 1-25 | tr '\n' ' ' | tr -s ' ' >> $i.mp &&
grep "EUMP2" $i | cut -d "=" -f2 | cut -c 1-25 | tr '\n' ' ' | tr -s ' ' >> $i.mpdiff &&
sed -i -E ':a;N;$!ba;s/(\\HF=-?[.0-9]*)\n/\1/' $i &&
grep '\\HF' $i | awk -F 'HF' '{print substr($2,2,14)}' | tr '\n' ' ' >> $i.hf &&
paste $i.name >> $i.energies &&
sed -i 's/ /0 /g' $i.hf &&
sed -i 's/\\/0/g' $i.hf &&
sed -i 's/[A-Z]/0/g' $i.hf &&
paste $i.hf >> $i.energies &&
sed -i 's/[ABCEFGHIJKLMNOPQRSTUVWXYZ]//g' $i.mp &&
paste $i.mp >> $i.energies &&
sed -i 's/[ABCEFGHIJKLMNOPQRSTUVWXYZ]//g' $i.mpdiff &&
paste $i.mpdiff >> $i.energies &&
transpose $i.energies >> $i.allE #temp.txt &&
#cat temp.txt > $i.energies
#echo $i is finished
done
echo see allE for energies
#rm *.energies #temp.txt
rm *.name
rm *.mp
rm *.hf
rm *.mpdiff
Here is how you can fix your current attempt.
sed -E ':a;N;$!ba;s/(\\HF=-?[.0-9]*)\n/\1/'
Add the i flag if you want to make the changes on the file itself, add && to send the job to the background, etc. The -E flag is needed, because backreferences (see below) are part of extended regular expressions.
I made the following changes: I changed -* to -? as there should be at most one dash (if I understand correctly and that is in fact a minus sign, not a dash). I added the period to the bracket expression, so that the decimal point would be matched too. (Note that in a bracket expression, the dot is a regular character). I wrapped the whole thing except the newline in parentheses - making it into a subexpression, which you can refer to with a backreference - which is what I did in the replacement part.
A few notes though - this will join the lines even if the entire number is at the end of one line, but not followed by the closing \. If in fact the entire number being on one line, but the closing \ is on the next line, you can change the sed command slightly, to leave those alone. On the other hand, this does not handle situations where, for example, one line ends in \H and the next line begins with F=304.222\ You only mentioned "split number" in your problem statement; shouldn't you, though, also handle such cases, where the newline splits the \HF=...\ token, just not in the "number" portion of the token?
It looks like your input lines start with a space. I have ignored them in this solution.
sed -rz 's/(AG\\HF=-[0-9]*)\n/\1/g' "$i"

Extracting a part of lines matching a pattern

I have a configuration file and need to parse out some values using bash
Ex. Inside config.txt
some_var= Not_needed
tests= spec1.rb spec2.rb spec3.rb
some_other_var= Also_not_needed
Basically I just need to get "spec1.rb spec2.rb spec3.rb" WITHOUT all the other lines and "tests=" removed from the line.
I have this and it works, but I'm hoping there's a much more simple way to do this.
while read run_line; do
if [[ $run_line =~ ^tests=* ]]; then
echo "FOUND"
all_selected_specs=`echo ${run_line} | sed 's/^tests= /''/'`
fi
done <${config_file}
echo "${all_selected_specs}"
all_selected_specs=$(awk -F '= ' '$1=="tests" {print $2}' "$config_file")
Using a field separator of "= ", look for lines where the first field is tests and print the second field.
This should work too
grep "^tests" ${config_file} | sed -e "s/^tests= //"
How about grep and cut?
all_selected_specs=$(grep "^tests=" "$config_file" | cut -d= -f2-)
try:
all_selected_specs=$(awk '/^tests/{sub(/.*= /,"");print}' Input_file)
searching for string tests which comes in starting of a line then substituting that line's all values till (= ) to get all spec values, once it is substituted then we are good to get the spec values so printing that line. Finally saving it's value to variable with $(awk...).

Extracting a substring

I have to find a substring where my string starts with country=" and ends with " like following-
country="NZ"
I have to extract only NZ part and add it to an existing string like-
string+=NZ
Please helP!!!
Use sed in regular expression mode:
string=""
INPUT='country="NZ"'
string+=$(echo $INPUT | sed -r 's/country="(.*?)"/\1/')
If your string truly only contains country="CODE", then cut works too, using " as the delimiter:
echo 'country="NZ"' | cut -d\" -f2

BASH: i can echo string + grep + sed, but how to add more strings on the same line?

Asking a question here is always my last resort. I tried everything even the most embarrassing code so i'm confused on explaining what i tried with no success. I have:
echo $output | grep -i -m 1 "Time:" | sed 's/.*\s\([0-9]*:[0-9]*:[0-9]*\).time.*/\1/'
it outputs:
23:25:31
Easy.
But i'd like to add one more string to the end, like " , $year" - so that i have:
23:25:31 , 2013
The problem is that whatever i tried (printf, -n, -e, -ne, brackets, quotes, |, ;, &, /r, etc.) gives an error or goes to a new line anyway.
Any suggestion will be really appreciated.
Thanks
time=$(echo $output | grep -i -m 1 "Time:" | sed 's/.*\s\([0-9]*:[0-9]*:[0-9]*\).time.*/\1/')
echo "The time is ${time}, 2013"
Alternates
add tr -d '\n' at the end of echo+grep+sed pipeline.
{ entire-echo-grep-sed-pipeline ; echo , 2013 ; } | xargs echo (This however, will add a space before ,)

Resources