Extracting release number from Jira created bitbucket branch - bash

I am using Jira to create a bitbucket branch for releases. As part of the build process I need to extract the release number from the branch name.
An earlier part of the build dumps the whole branch name to a text file. The problem I'm having is removing all the text before the build number.
An example branch name would be:
release/some-jira-ticket-343-X.X.X
Where X.X.X is the release number e.g. 1.11.1 (each of X could be any length integer).
My first thought was to literally just select the last 3 characters with sed, however as X could be any length this won't work.
Another post (Removing non-alphanumeric characters with sed) suggesting using the sed alpha class. However this won't work as the jira ticket ID will have numbers in.
Any ideas?

You can remove all characters up to last -:
$ sed 's/.*-//' <<< "release/some-jira-ticket-343-1.11.2"
1.11.2
or with grep, to output only digits and dots at the end of the line:
grep -o '[0-9.]*$'

awk solution,
$ awk -F- '{print $NF}' <<< "release/some-jira-ticket-343-1.11.1"
grep solution,
grep -oP '[0-9]-\K.*' <<< "release/some-jira-ticket-343-1.11.1"

use string operators:
var="release/some-jira-ticket-343-2.155.7"
echo ${var##*-}
print:
2.155.7

Awk solution:
awk -F [-.] '{ print $5"."$6"."$7 }' <<< "release/some-jira-ticket-343-12.4.7"
12.4.7
Set the field delimiter to - and . and then extract the pieces of data we need.

Related

Delete everything before a pattern

I'm trying to clean a text file.
I want to delete everything start before the first 12 numbers.
1:0:135103079189:0:0:2:0::135103079189:000011:00
A:908529896240:0:10250:2:0:1:
603307102606:0:0:1:0::01000::M
Output desired:
135103079189:0:0:2:0::135103079189:000011:00
908529896240:0:10250:2:0:1:
603307102606:0:0:1:0::01000::M
Here's my command but seems not working.
sed '/:\([0-9]\{12\}\)/d' t.txt
the d command in sed will delete entire line on matching the given regex, you need to use s command to search and replace only part of line... however, for given problem, sed is not suitable as it doesn't support non-greedy regex
you can use perl instead
$ perl -pe's/^.*?(?=\d{12}:)//' ip.txt
135103079189:0:0:2:0::135103079189:000011:00
908529896240:0:10250:2:0:1:
603307102606:0:0:1:0::01000::M
.*? match zero or more characters as minimally as possible
(?=\d{12}:) only if it is followed by 12-digits ending with :
use perl -i -pe for in-place editing
some possible corner cases
$ # this is matching part of field
$ echo 'foo:123:abc135103079189:23:603307102606:1' | perl -pe's/^.*?(?=\d{12}:)//'
135103079189:23:603307102606:1
$ # this is not matching 12-digit field at end of line
$ echo 'foo:123:135103079189' | perl -pe's/^.*?(?=\d{12}:)//'
foo:123:135103079189
$ # so, add start/end of line matching cases and restrict 12-digits to whole field
$ echo 'foo:123:abc135103079189:23:603307102606:1' | perl -pe 's/^(?:.*?:)?(?=\d{12}(:|$))//'
603307102606:1
$ echo 'foo:123:135103079189' | perl -pe's/^(?:.*?:)?(?=\d{12}(:|$))//'
135103079189
Could you please try following.
awk --re-interval 'match($0,/[0-9]{12}/){print substr($0,RSTART)}' Input_file
Since I have OLD version of awk so I am using --re-interval you could remove it in case you have new version of it.
This might work for you (GNU sed):
sed -n 's/[0-9]\{12\}/\n&/;s/.*\n//p' file
We only want to print specific lines so use the -n option to turn off automatic printing. If a line contains a 12 digit number, insert a newline before it. Remove any characters before and including a newline and print the result.
If you want to print lines that do not contain a 12 digit number as is, use:
sed 's/[0-9]\{12\}/\n&/;s/.*\n//' file
The crux of the problem is to identify the start of a multi-character string, insert a unique marker and delete all characters before and including the unique marker. As sed uses the newline to delimit lines, only the user can introduce newlines into the pattern space and as a result, newlines will always be unique.
Taking the nice answer from #Sundeep, in case you would like to use grep or pcregrep (macOS/BSD) you could give a try to:
$ grep -oP '^(?:.*?:)?(?=\d{12})\K.*' file
or
$ pcregrep -o '^(?:.*?:)?(?=\d{12})\K.*' file
The \K will ignore everything after the pattern
Alternative thoughts - I almost think your data is too dirty for a quick sed fix but if generally it's all similar to your sample set of data then certainly pick one of the answers with sed etc. However if you wanted to be more particular about it you could build up a set of commands to ensure the values. I like doing this for debugging and when speed isn't urgent.
Take this tiny sample of code, you could do this other ways but I'm getting the value for each part of the string and I know the order because it contiguous. You could then set up controls on which parts to keep and such as it builds out say a new string per line. Overwrought for sure, but sometimes that is a better long term approach.
#!/bin/bash
while IFS= read -r line ;do
IFS=':' read -r -a array <<< "$line"
for ((i=0; i<${#array[#]}; i++)) ;do
echo "part : ${array[$i]}"
done
done < "test_data.txt"
You could then build the data back up how you wanted and more easily understand what's happening every step of the way ..
part : 1
part : 0
part : 135103079189
part : 0
part : 0
part : 2
part : 0
part :
part : 135103079189
part : 000011
part : 00
part : A
part : 908529896240
part : 0

Mask email address, phonenumber, ssn (pattern) using awk

Requirement is to mask some sensitive data from the log file, below code works as expected when awk version is 4.0.2.
I will be greping the log files and then have to mask some data using pattern as mentioned in the below awk snippet and then return the result.
echo "123-123-432-123-999-889 and 123456 and 1234-1234-4321-1234 and xyz#abc.com" | awk ' gsub (/[0-9]{6,}|([0-9]{3,}.){3,}|\w{2,}#\w{2,}.\w{2,}/, "****") 1'
The same is not working in awk version 3.1.7 which is production server version.
I can use only grep, cat, awk and there is no permission to use perl or sed as it is restricted by Admin Team.
Expected Output:
****and **** and ****and ****
Solution should also work if the content is in file, for example
sample.log
123-123-432-123-999-889
and
123456
and
1234-1234-4321-1234
and xyz#abc.com
Command:
cat sample.log | awk ' gsub (/[0-9]{6,}|([0-9]{3,}.){3,}|\w{2,}#\w{2,}.\w{2,}/, "****") 1'
Please help me with awk which can work in 3.1.7 version of awk
Activate RE intervals with:
awk --re-interval '...'
You MAY also need to replace \ws with [[:alnum:]_].
The problem you;re having is that you're using a very old version of gawk from before RE Intervals (e.g. {1,3}) were enabled by default so in that old gawk every { and } is just a literal character for backward compatibility with the 1980s awks (old, broken awk and nawk), so you need to explicitly tell gawk to interpret {1,3} as a RE Interval instead of a literal string of 5 chars.
Idk if back then \w was supported or not so you MAY also need to use the bracket expression I suggested above instead.

How do I delete all rows with a blank space in the third column within a file?

So, I have a file which contains the results of some calculations I've run in the past weeks. I've collected the results in a file which I intend to plot. It is basically a bunch of rows with the format "x" "y" "f(x,y)", like this:
1.7 4.7 -460.5338556921
1.7 4.9 -460.5368762353
1.7 5.5
However, some lines, exemplified by the last one, contain a blank space in the 3rd column, resulting from failed calculations. I'd still like to plot the viable points, but, as there are thousands of points (and therefore rows) that task just be accomplished easily by hand. I'd like to know how to make a script or program (I'd prefer a shell script, but I'll gladly go along with whatever works), which identifies those lines and deletes them. Does anyone know a way to do it?
awk '$3' <filename>
or better
awk 'NF > 2' <filename> # if in any entry in the column-3 happens to be zero
This will do the purpose!
The simplest form of grep command that should probably be understood by any shell these days:
grep -v '^[^[:space:]]*[[:space:]]*[^[:space:]]*[[:space:]]*$' <filename>
With grep:
grep ' .* [^ ]' file
or using ERE:
grep -E '\s\S+\s\S' file
I would to use:
perl -lanE 'print if #F==3 && /^[\d\s\.+-]+$/' file
will print only lines:
which contains 3 fields
and contains only numbers, spaces, and .+-
I do not know how you are going to plot. You would like a grep or awk solution and pipe all valid lines into your plotting application.
When you need to call a program for each set of values, you can skip the invalid lines when you are reading the values:
while read -r x y fxy; do
if [ -n "${fxy}" ]; then
myplotter "$x" "$y" "${fxy}"
fi
done < file

Unix: Removing date from a string in single command

For satisfying a legacy code i had to add date to a filename like shown below(its definitely needed and cannot modify legacy code :( ). But i need to remove the date within the same command without going to a new line. this command is read from a text file so i should do this within the single command.
$((echo "$file_name".`date +%Y%m%d`| sed 's/^prefix_//')
so here i am removing the prefix from filename and adding a date appended to filename. i also do want to remove the date which i added. for ex: prefix_filename.txt or prefix_filename.zip should give me as below.
Expected output:
filename.txt
filename.zip
Current output:
filename.txt.20161002
filename.zip.20161002
Assumming all the files are formatted as filename.ext.date, You can pipe the output to 'cut' command and get only the 1st and 2nd fields :
~> X=filename.txt.20161002
~> echo $X | cut -d"." -f1,2
filename.txt
I am not sure that I understand your question correctly, but perhaps this does what you want:
$((echo "$file_name".`date +%Y%m%d`| sed -e 's/^prefix_//' -e 's/\.[^.]*$//')
Sample input:
cat sample
prefix_original.txt.log.tgz.10032016
prefix_original.txt.log.10032016
prefix_original.txt.10032016
prefix_one.txt.10032016
prefix.txt.10032016
prefix.10032016
grep from start of the string till a literal dot "." followed by digit.
grep -oP '^.*(?=\.\d)' sample
prefix_original.txt.log.tgz
prefix_original.txt.log
prefix_original.txt
prefix_one.txt
prefix.txt
prefix
perhaps, following should be used:
grep -oP '^.*(?=\.\d)|^.*$' sample
If I understand your question correctly, you want to remove the date part from a variable, AND you already know from the context that the variable DOES contain a date part and that this part comes after the last period in the name.
In this case, the question boils down to removing the last period and what comes after.
This can be done (Posix shell, bash, zsh, ksh) by
filename_without=${filename_with%.*}
assuming that filename_with contains the filename which has the date part in the end.
% cat example
filename.txt.20161002
filename.zip.20161002
% cat example | sed "s/.[0-9]*$//g"
filename.txt
filename.zip
%

bash how to extract a field based on its content from a delimited string

Problem - I have a set of strings that essentially look like this:
|AAAAAA|BBBBBB|CCCCCCC|...|XXXXXXXXX|...|ZZZZZZZZZ|
The '...' denotes omitted fields.
Please note that the fields between the pipes ('|') can appear in ANY ORDER and not all fields are necessarily present. My task is to find the "XXXXXXX" field and extract it from the string; I can specify that field with a regex and find it with grep/awk/etc., but once I have that one line extracted from the file, I am at a loss as to how to extract just that text between the pipes.
My searches have turned up splitting the line into individual fields and then extracting the Nth field, however, I do not know what N is, that is the trick.
I've thought of splitting the string by the delimiter, substituting the delimiter with a newline, piping those lines into a grep for the field, but that involves running another program and this will be run on a production server through near-TB of data, so I wanted to minimize program invocations. And I cannot copy the files to another machine nor do I have the benefit of languages like Python, Perl, etc., I'm stuck with the "standard" UNIX commands on SunOS. I think I'm being punished.
Thanks
As an example, let's extract the field that matches MyField:
Using sed
$ s='|AAAAAA|BBBBBB|CCCCCCC|...|XXXXXXXXX|12MyField34|ZZZZZZZZZ|'
$ sed -E 's/.*[|]([^|]*MyField[^|]*)[|].*/\1/' <<<"$s"
12MyField34
Using awk
$ awk -F\| -v re="MyField" '{for (i=1;i<=NF;i++) if ($i~re) print $i}' <<<"$s"
12MyField34
Using grep -P
$ grep -Po '(?<=\|)[^|]*MyField[^|]*' <<<"$s"
12MyField34
The -P option requires GNU grep.
$ sed -e 's/^.*|\(XXXXXXXXX\)|.*$/\1/'
Naturally, this only makes sense if XXXXXXXXX is a regular expression.
This should be really fast if used something like:
$ grep '|XXXXXXXXX|' somefile | sed -e ...
One hackish way -
sed 's/^.*|\(<whatever your regex is>\)|.*$/\1/'
but that might be too slow for your production server since it may involve a fair amount of regex backtracking.

Resources