Remove characters from specific length - shell

How can I remove the set of characters from a specified length in a file using shell script.
Example:
Filename : abc.txt
helloshell
Now how can I remove characters starting from 8 to 10 (the ell at the end)?
I have tried sed -r command on Linux servers but it's not working on AIX servers.
Linux command:
sed -r 's/.(.{3}).*/\1/' filename.txt

With Bash (extract substring from 0 to 7th character):
str="helloshell"
echo ${str:0:7}
With sed (removes 3 characters starting from 7th position) :
str="helloshell"
startpos=7;
nbchar=3;
echo "$str" | sed "s/^\(.\{$startpos\}\).\{$nbchar\}\(.*\)/\1\2/"

$ sed -r 's/^(.{7}).{0,3}(.*)$/\1\2/g'
helloshell
hellosh
1234567890
1234567
$
{0,3} ensures 3 or less chars from 8th position (0,3 implies remove only if present and hence also removes 1/2/3 chars from 8th position) i.e match and remove minimum 0 chars(for no chars) and maximum 3 chars
If you want exactly only 3 chars to removed from 8th position use {3} but it wont remove chars from 8th position if there are less than 3 chars, eg:
$ sed -r 's/^(.{7}).{3}(.*)$/\1\2/g'
123456789
123456789
$
Edit1:
you can use this instead without the -r switch and some escaping: sed 's/^\(.\{7\}\).\{0,3\}\(.*\)$/\1\2/g'
for performing the above operation only lines starting with BH, you can add a restriction for the substitute command like this:
sed '/^BH/s/^\(.\{7\}\).\{0,3\}\(.*\)$/\1\2/g'
/^BH/s.. would ensure substitution is performed only on lines starting with BH
$ sed '/^BH/s/^\(.\{7\}\).\{0,3\}\(.*\)$/\1\2/g'
BHhelloshell
BHhelloll
BH123456789
BH123459
helloshell
helloshell
$
To exclude BH while counting you can use:
$ sed '/^BH/s/^BH\(.\{7\}\).\{0,3\}\(.*\)$/BH\1\2/g'
BHhelloshell
BHhellosh
BH123456789
BH1234567
helloshell
helloshell
$

Try this if you are always expecting the last 3 to be deleted.
echo helloshell | sed 's/...$//'
For 8-10 try this:
echo helloshell | sed 's/\(.\{7\}\).../\1/'
echo helloshellHowAreYou | sed 's/\(.\{7\}\).../\1/'
For AIX you may need to remove the \ from \{
sed 's/\(.{7}\).../\1/'
And if there is a pattern that you want in the search string you have to adjust the value within in the \( and \)

Related

Trim ending white space of lines in .txt file

I am trying to remove the last space of a file.txt which contains many rows.
I just need to remove "only the last space" after the third column/each line.
My file looks like this:
3 180 120
3 123 145
6 234 0
4 122 12
I have been trying with the following script but it does not work, so far. Somebody can help me, please?
#!/bin/bash
var="val1 val2 val3 "
var="${var%"${var##*[![:space:]]}"}"
echo "===$var===" <Antart_csv1_copy.txt> trimmed.txt
You can use sed:
sed -i -e 's/ $//g' filename.txt
-i will make the command inplace (change the original file)
-e 's/ $//g' will take regular expression <space><endline> and change it to nothing. Modifier g makes it for all lines in the file
You can try it first without -i and redirect output:
sed -e 's/ $//g' filename.txt > trimmed.txt
Another solution removing all trailing spaces from each line :
while read line; do echo "${line%%*( )}" >> trimmed.txt; done < Antart_csv1_copy.txt

Identify "$" that is immediately followed by only alphabet/alphanumeric words

"$" should not be immediately followed by digits [0-9]. It should only show the
output- "$" which is immediately followed by aphabet/alphanumeric/alphacharacter.
Input: dirname $0/../bin/$12JAVA_INV/$FILE12NAME
Output: $FILE12NAME
grep -o '[$][a-zA-z_]*'
Using this I'm receiving an output as: $ $ $FILENAME
You're getting $ in the result because * means to match zero or more of the preceding pattern. $0 matches because it has a $ followed by 0 letters.
If you want at least 1 letter, use + instead, it means one or more.
But if you want to be able to match $FILE12NAME, you also need to allow digits after the first character. So use:
grep -i -o '\$[a-z_][a-z_0-9]*'
This matches $, followed by a letter or underscore, followed by zero or more letters, underscores, or numbers.
It looks like you want:
$ echo 'dirname $0/../bin/$12JAVA_INV/$FILE12NAME' | awk '{print $NF}' FS=/
$FILE12NAME
But if you really want to parse it the way you describe, you could do either of:
$ echo 'dirname $0/../bin/$12JAVA_INV/$FILE12NAME' | sed -e 's/.*\(\$[^0-9]\)/\1/'
$FILE12NAME
$ echo 'dirname $0/../bin/$12JAVA_INV/$FILE12NAME' | sed -E 's/.*(\$[^0-9])/\1/'
$FILE12NAME

Reverse four length of letters with sed in unix

How can I reverse a four length of letters with sed?
For example:
the year was 1815.
Reverse to:
the raey was 5181.
This is my attempt:
cat filename | sed's/\([a-z]*\) *\([a-z]*\)/\2, \1/'
But it does not work as I intended.
not sure it is possible to do it with GNU sed for all cases. If _ doesn't occur immediately before/after four letter words, you can use
sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
\b is word boundary, word definition being any alphabet or digit or underscore character. So \b will ensure to match only whole words not part of words
$ echo 'the year was 1815.' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
the raey was 5181.
$ echo 'two time five three six good' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
two emit evif three six doog
$ # but won't work if there are underscores around the words
$ echo '_good food' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
_good doof
tool with lookaround support would work for all cases
$ echo '_good food' | perl -pe 's/(?<![a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])(?!=[a-z0-9])/$4$3$2$1/gi'
_doog doof
(?<![a-z0-9]) and (?!=[a-z0-9]) are negative lookbehind and negative lookahead respectively
Can be shortened to
perl -pe 's/(?<![a-z0-9])[a-z0-9]{4}(?!=[a-z0-9])/reverse $&/gie'
which uses the e modifier to place Perl code in substitution section. This form is suitable to easily change length of words to be reversed
Possible shortest sed solution even if a four length of letters contains _s.
sed -r 's/\<(.)(.)(.)(.)\>/\4\3\2\1/g'
Following awk may help you in same. Tested this in GNU awk and only with provided sample Input_file
echo "the year was 1815." |
awk '
function reverse(val){
num=split(val, array,"");
i=array[num]=="."?num-1:num;
for(;i>q;i--){
var=var?var array[i]:array[i]
};
printf (array[num]=="."?var".":var);
var=""
}
{
for(j=1;j<=NF;j++){
printf("%s%s",j==NF||j==2?reverse($j):$j,j==NF?RS:FS)
}}'
This might work for you (GNU sed):
sed -r '/\<\w{4}\>/!b;s//\n&\n/g;s/^[^\n]/\n&/;:a;/\n\n/!s/(.*\n)([^\n])(.*\n)/\2\1\3/;ta;s/^([^\n]*)(.*)\n\n/\2\1/;ta;s/\n//' file
If there are no strings of the length required to reverse, bail out.
Prepend and append newlines to all required strings.
Insert a newline at the start of the pattern space (PS). The PS is divided into two parts, the first line will contain the current word being reversed. The remainder will contain the original line.
Each character of the word to be reversed is inserted at the front of the first line and removed from the original line. When all the characters in the word have been processed, the original word will have gone and only the bordering newlines will exist. These double newlines are then replaced by the word in the first line and the process is repeated until all words have been processed. Finally the newline introduced to separate the working line and the original is removed and the PS is printed.
N.B. This method may be used to reverse strings of varying string length i.e. by changing the first regexp strings of any number can be reversed. Also strings between two lengths may also be reversed e.g. /\<w{2,4}\>/ will change all words between 2 and 4 character length.
It's a recurrent problem so somebody created a bash command called "rev".
echo "$(echo the | rev) $(echo year | rev) $(echo was | rev) $(echo 1815 | rev)".
OR
echo "the year was 1815." | rev | tr ' ' '\n' | tac | tr '\n' ' '

Bash: replace 4 occourance of a string if exist

I have a string that is sometimes
xxx.11_222_33_44_555.yyy
and sometimes
xxx.11_222_33_44.yyy
I would like to:
Check if has 4 occourances of _ (figured out how to do it).
If so - remove string's _33 (the 33 string changes, can be any number), so I am left with xxx.11_222_44.yyy.
Using sed :
sed 's/\(_[0-9]*\)_[0-9]*\(_[0-9]*_[0-9]*\)/\1\2/'
It matches the four underscores and replace the whole by the needed parts.
Test run :
$ echo "xxx.11_222_33_44_555.yyy" | sed 's/\(_[0-9]*\)_[0-9]*\(_[0-9]*_[0-9]*\)/\1\2/'
xxx.11_222_44_555.yyy
$ echo "xxx.11_222_33_44.yyy" | sed 's/\(_[0-9]*\)_[0-9]*\(_[0-9]*_[0-9]*\)/\1\2/'
xxx.11_222_33_44.yyy
perhaps something like this
echo "xxx.11_222_33_44.yyy" | sed -e's/\.\([0-9]\+\)_\([0-9]\+\)_\([0-9]\+\)_\([0-9]\+\)\./.\1_\2_\4./'
which checks if there are 4 groups of numbers separated by _ between the two dots and if yes, it leaves out the third group
try this;
echo "xxx.11_222_33_44_555.yyy" | awk -F'_' 'NF>4{print $1"_"$2"_"$4"_"$5};'
Solution using perl and Lookahead and Lookbehind
$ a="xxx.11_222_33_44_555.yyy"
$ perl -pe 's/\.\d+_\d+_\K\d+_(?=\d+_\d+\.)//' <<< "$a"
xxx.11_222_44_555.yyy

delete first N characters conditionally using sed or awk or anything

I am trying to delete zeros at the start of the line till fourth characters. If zero is occurring beyond 4th position then need not to delete it. I am not able to achieve this correctly.
Condition:
01230 <------Delete 1 zero at start.
001230 <-----Delete 2 zeros at start.
0001230 <----Delete 3 zeros at start.
00001230<----Delete 4 zero at start.
000001230<---Delete 4 zero at start and leave 1, output 01230
1234560<-----Delete nothing.
Example:
INPUT file:
cat file
0000abc0
00abcde0
0abcede0
00000abcede0
Expected output:
abc0
abcde0
abcede0
0abcede0
What have been already tried:(which of course did not helped)
cat file |sed 's/^[0]//g' <----This just delete one zero at the start
000abc0
0abcde0
abcede0
0000abcede0
cat file | sed 's/^[0][0][0][0]//g'<---THis only works for line having 4 zeros.
abc0
00abcde0
0abcede0
0abcede0
cat file | sed 's/^[0]*//g' <-----Removes all the zeros at start.
abc0
abcde0
abcede0
abcede0
cat file | sed 's/0//g'{4} <------I am lost what it do!!
000abc
00abcde0
0abcede0
000abcede
Use {} to specify number of occurances and -r to allow extended regexp syntax:
sed -r 's/^0{1,4}//g'
Deletes from one up to four zeros at the start of the line.
gawk '{gsub(/^0{1,4}/,"")}1' file
abc0
abcde0
abcede0
0abcede0

Resources