'grep' a file name, but only show - bash

I have files that are named ACC160-MASTER-201503, ACC160L-MASTER-201503 and IS160-MASTER-201503. I'm basically trying to output the first few letters (in the example, just ACC and IS) and not the numbers. After that I need another grep to only grab up to the - (in the example, ACC160, ACC160L, and IS160).

If this is bash/ksh/zsh/sh/..., just use parameter substitution:
for file in ACC160-MASTER-201503 ACC160L-MASTER-201503 IS160-MASTER-201503; do
alnum=${file%%-*} # remove the first "-" and following chars
alpha=${alnum%%[0-9]*} # remove the first digit and all following chars
echo "$file => $alnum => $alpha"
done
ACC160-MASTER-201503 => ACC160 => ACC
ACC160L-MASTER-201503 => ACC160L => ACC
IS160-MASTER-201503 => IS160 => IS

If you want to get all the filename until the first '-' then grep -oP ".*?(?=-)" should do the trick.
-o is the only-matching option
-P enables Perl syntax
.*? is a reluctant quantifiers, it reads while looking for the next motif in the regex
(?=-) is a lookahead tag: it tells you what to expect (-) after the matching substring

Related

extract substring until first digit

I have the following string:
test1234a or test1234 for example and I want to extract only test from that string.
I tried the follwing
echo "Test12h" | sed -e 's/[0-9]\*$//' but is not working.
Is there any possibility to extract the substring until first digit?
Please let me know what I miss.
Thank you
The proper tool for extracting substrings matching a regexp from a command's output is grep. Like,
echo "Test12h" | grep -o '^[^[:digit:]]*'
will output Test.
If Test12h is in a variable, you don't even need external utilities; parameter expansions can easily handle that, e.g:
var='Test12h'
echo "${var%%[[:digit:]]*}"
Could you please try following.
sed 's/\([^0-9]*\).*/\1/' Input_file
OR
sed 's/[0-9][0-9a-zA-Z]*//' Input_file
If the source happens to reside in a variable you can use parameter substitution, eg:
$ for test in test1234a test1234 Test12h
do
echo "${test} => ${test//[0-9]*/}"
done
test1234a => test
test1234 => test
Test12h => Test
Where the general format is: ${var//Pattern/Replacement}; in the above:
var == test
Pattern == [0-9]* (first occurrence of a digit and then everything that follows
Replacement == '' (nothing, empty string)
So we end up stripping off everything after the first occurrence of a number.
You made a small error in your try:
echo "Test12h" | sed -e 's/[0-9].*$//' # will output Test
The \* expects a real * in the input:
echo "Test1*" | sed -e 's/[0-9]\*$//' # will output Test
The .* matches any number of any characters.

sed removing # and ; comments from files up to certain keyword

I have files that need to be removed from comments and white space until keyword . Line number varies . Is it possible to limit multiple continued sed substitutions based on Keyword ?
This removes all comments and white spaces from file :
sed -i -e 's/#.*$//' -e 's/;.*$//' -e '/^$/d' file
For example something like this :
# string1
# string2
some string
; string3
; string4
####
<Keyword_Keep_this_line_and_comments_white_space_after_this>
# More comments that need to be here
; etc.
sed -i '1,/keyword/{/^[#;]/d;/^$/d;}' file
I would suggest using awk and setting a flag when you reach your keyword:
awk '/Keyword/ { stop = 1 } stop || !/^[[:blank:]]*([;#]|$)/' file
Set stop to true when the line contains Keyword. Do the default action (print the line) when stop is true or when the line doesn't match the regex. The regex matches lines whose first non-blank character is a semicolon or hash, or blank lines. It's slightly different to your condition but I think it does what you want.
The command prints to standard output so you should redirect to a new file and then overwrite the original to achieve an "in-place edit":
awk '...' input > tmp && mv tmp input
Use grep -n keyword to get the line number that contains the keyword.
Use sed -i -e '1,N s/#..., when N is the line number that contains the keyword, to only remove comments on the lines 1 to N.

Replace and increment letters and numbers with awk or sed

I have a string that contains
fastcgi_cache_path /var/run/nginx-cache15 levels=1:2 keys_zone=MYSITEP:100m inactive=60m;
One of the goals of this script is to increment nginx-cache two digits based on the value find on previous file. For doing that I used this code:
# Replace cache_path
PREV=$(ls -t /etc/nginx/sites-available | head -n1) #find the previous cache_path number
CACHE=$(grep fastcgi_cache_path $PREV | awk '{print $2}' |cut -d/ -f4) #take the string to change
SUB=$(echo $CACHE |sed "s/nginx-cache[0-9]*[0-9]/&#/g;:a {s/0#/1/g;s/1#/2/g;s/2#/3/g;s/3#/4/g;s/4#/5/g;s/5#/6/g;s/6#/7/g;s/7#/8/g;s/8#/9/g;s/9#/#0/g;t a};s/#/1/g") #increment number
sed -i "s/nginx-cache[0-9]*/$SUB/g" $SITENAME #replace number
Maybe not so elegant, but it works.
The other goal is to increment last letter of all occurrences of MYSITEx (MYSITEP, in that case, should become MYSITEQ, after MYSITEP, etc. etc and once MYSITEZ will be reached add another letter, like MYSITEAA, MYSITEAB, etc. etc.
I thought something like:
sed -i "s/MYSITEP[A-Z]*/MYSITEGG/g" $SITENAME
but it can't works cause MYSITEGG is a static value and can't be used.
How can I calculate the last letter, increment it to the next one and once the last Z letter will be reached, add another letter?
Thank you!
Perl's autoincrement will work on letters as well as digits, in exactly the manner you describe
We may as well tidy your nginx-cache increment as well while we're at it
I assume SITENAME holds the name of the file to be modified?
It would look like this. I have to assign the capture $1 to an ordinary variable $n to increment it, as $1 is read-only
perl -i -pe 's/nginx-cache\K(\d+)/ ++($n = $1) /e; s/MYSITE\K(\w+)/ ++($n = $1) /e;' $SITENAME
If you wish, this can be done in a single substitution, like this
perl -i -pe 's/(?:nginx-cache|MYSITE)\K(\w+)/ ++($n = $1) /ge' $SITENAME
Note: The solution below is needlessly complicated, because as Borodin's helpful answer demonstrates (and #stevesliva's comment on the question hinted at), Perl directly supports incrementing letters alphabetically in the manner described in the question, by applying the ++ operator to a variable containing a letter (sequence); e.g.:
$ perl -E '$letters = "ZZ"; say ++$letters'
AAA
The solution below may still be of interest as an annotated showcase of how Perl's power can be harnessed from the shell, showing techniques such as:
use of s///e to determine the replacement string with an expression.
splitting a string into a character array (split //, "....")
use of the ord and chr functions to get the codepoint of a char., and convert a(n incremented) codepoint back to a char.
string replication (x operator)
array indexing and slices:
getting an array's last element ($chars[-1])
getting all but the last element of an array (#chars[0..$#chars-1])
A perl solution (in effect a re-implementation of what ++ can do directly):
perl -pe 's/\bMYSITE\K([A-Z]+)/
#chars = split qr(), $1; $chars[-1] eq "Z" ?
"A" x (1 + scalar #chars)
:
join "", #chars[0..$#chars-1], chr (1 + ord $chars[-1])
/e' <<'EOF'
...=MYSITEP:...
...=MYSITEZP:...
...=MYSITEZZ:...
EOF
yields:
...=MYSITEQ:... # P -> Q
...=MYSITEZQ:... # ZP -> ZQ
...=MYSITEAAA:... # ZZ -> AAA
You can use perl's -i option to replace the input file with the result
(perl -i -pe '...' "$SITENAME").
As Borodin's answer demonstrates, it's not hard to solve all tasks in the question using perl alone.
The s function's /e option allows use of a Perl expression for determining the replacement string, which enables sophisticated replacements:
$1 references the current MYSITE suffix in the expression.
#chars = split qr(), $1 splits the suffix into a character array.
$chars[-1] eq "Z" tests if the last suffix char. is Z
If so: The suffix is replaced with all As, with an additional A appended
("A" x (1 + scalar #chars)).
Otherwise: The last suffix char. is replaced with the following letter in the alphabet
(join "", #chars[0..$#chars-1], chr (1 + ord $chars[-1]))

remove only *some* fullstops from a csv file

If I have lines like the following:
1,987372,987372,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,.,1.293,12.23,0.989,0.973,D,.,.,.,.,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
how can I replace all instances of ,., with ,?,
I want to preserve actual decimal places in the numbers so I can't just do
sed 's/./?/g' file
however when doing:
sed 's/,.,/,?,/g' file
this only appears to work in some cases. i.e. there are still instances of ,., hanging around.
anyone have any pointers?
Thanks
This should work :
sed ':a;s/,\.,/,?,/g;ta' file
With successive ,., strings, after a substitution succeeded, next character to be processed will be the following . that doesn't match the pattern, so with you need a second pass.
:a is a label for upcoming loop
,\., will match dot between commas. Note that the dot must be escaped because . is for matching any character (,a, would match with ,.,).
g is for general substitution
ta tests previous substitution and if it succeeded, loops to :a label for remaining substitutions.
Using sed it is possible by running a loop as shown in above answer however problem is easily solved using perl command line with lookarounds:
perl -pe 's/(?<=,)\.(?=,)/?/g' file
1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
This command doesn't need a loop because instead of matching surrounding commas we're just asserting their position using a lookbehind and lookahead.
All that's necessary is a single substitution
$ perl -pe 's/,\.(?=,)/,?/g' dots.csv
1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
You have an example using sed style regular expressions. I'll offer an alternative - parse the CSV, and then treat each thing as a 'field':
#!/usr/bin/perl
use strict;
use warnings;
#iterate input row by row
while ( <DATA> ) {
#remove linefeeds
chomp;
#split this row on ,
my #row = split /,/;
#iterate each field
foreach my $field ( #row ) {
#replace this field with "?" if it's "."
$field = "?" if $field eq ".";
}
#stick this row together again.
print join ",", #row,"\n";
}
__DATA__
1,987372,987372,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,.,1.293,12.23,0.989,0.973,D,.,.,.,.,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
This is more verbose than it needs to be, to illustrate the concept. This could be reduced down to:
perl -F, -lane 'print join ",", map { $_ eq "." ? "?" : $_ } #F'
If your CSV also has quoting, then you can break out the Text::CSV module, which handles that neatly.
You just need 2 passes since the trailing , found on a ,., match isn't available to match the leading , on the next ,.,:
$ sed 's/,\.,/,?,/g; s/,\.,/,?,/g' file
1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
The above will work in any sed on any OS.

filter specific attribute from a file

I have an input.txt file has following text. I have to filter the "".
- <ci>
<id>a573f0d014c18a5811793aedb5aad3</id>
<viewName>Windows</viewName>
</ci>
- <ci>
<id>7ad9088802ef62d75a15c9d4799fe8</id>
<viewName>Network</viewName>
</ci>
- <ci>
<id>abbbeeb60c4074bbc8483f321e0b43</id>
<viewName>Unix</viewName>
</ci>
Output should be like this:
a573f0d014c18a5811793aedb5aad3
7ad9088802ef62d75a15c9d4799fe8
abbbeeb60c4074bbc8483f321e0b43
With gnu grep you can use a positive lookahead and a positive lookbehind:
$ grep -oP '(?<=<id>).*(?=</id>)' file
a573f0d014c18a5811793aedb5aad3
7ad9088802ef62d75a15c9d4799fe8
abbbeeb60c4074bbc8483f321e0b43
another grep alternative based on data pattern
grep -o '[a-f0-9]\{30\}'
Perl solution:
perl -lane 'print $1 if /^\s*<id>(\S+)<\/id>/' file
The /regex/ captures the information between < id > and < /id > into variable $1
These command-line options are used:
n loop around every line of the input file, put the line in the $_ variable, do not automatically print every line
l removes newlines before processing, and adds them back in afterwards
a autosplit mode – perl will automatically split input lines on whitespace into the #F array
e : execute the perl code

Resources