Extract version using shell commands - shell

I am trying to extract version from below mentioned URL's using shell commands like, grep, awk, sed, cut which ever is most suitable
https://abcd/efgh/1.1.3/hijkl/mnop
https://abcd/efgh/hijkl/2.3.4.5/mnop
https://abcd/3.4/efgh/hijkl/mnop
I am looking to extract the version(numbers with dot) alone from the URL, where the position may vary as in above example. Looking for suggestions.
Expected output to be :
1.1.3
2.3.4.5
3.4

You may use this grep:
grep -Eo '[0-9]+(\.[0-9]+)+' file
1.1.3
2.3.4.5
3.4

With awk, written and tested with shown samples in GNU awk.
awk 'match($0,/([0-9]+\.){1,}[0-9]+/){print substr($0,RSTART,RLENGTH)}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/([0-9]+\.){1,}[0-9]+/){ ##using match function to match regex of ([0-9]+\.){1,}[0-9]+ in current line.
print substr($0,RSTART,RLENGTH) ##Printing sub string of matched regex above, starting index is RSTART till value of RLENGTH here.
}
' Input_file ##Mentioning Input_file name here.

I would use GNU AWK following way, let file.txt content be
https://abcd/efgh/1.1.3/hijkl/mnop
https://abcd/efgh/hijkl/2.3.4.5/mnop
https://abcd/3.4/efgh/hijkl/mnop
then
awk 'BEGIN{RS="[/\n]"}/^[.[:digit:]]+$/' file.txt
output
1.1.3
2.3.4.5
3.4
Explanation: I specify row seperator (RS) as either / or newline (\n) then print only rows (i.e. parts between / or newline and / or / and newline) which contain only . or digits - in order to achieve such effect I use ^ and $ denoting start and end of record.

Related

How to remove string between two characters and before the first occurrence using sed

I would like to remove the string between ":" and the first "|" using sed.
input:
|abc:1.2.3|def|
output from sed:
|abc|def|
I managed to come up with sed 's|\(:\)[^|]*|\1|', but this sed command does not remove the first character (":"). How can I modify this command to also remove the colon?
You don't need to group : in your pattern and use it in substitution.
You should keep it simple:
s='|abc:1.2.3|def|'
sed 's/:[^|]*//' <<< "$s"
|abc|def|
: matches a colon and [^|]* matches 0 or more non-pipe characters
1st solution: With awk you could try following awk program.
awk 'match($0,/:[^|]*/){print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH)}' Input_file
Explanation: Using match function of awk, where matching from : to till first occurrence of | here. So what match function does is, whenever a regex is matched in it, it will SET values for its OOTB variables named RSTART and RLENGTH, so based on that we are printing sub-string to neglect matched part and print everything else as per required output in question.
2nd solution: Using FPAT option in GNU awk, try following, written and tested with your shown samples only.
awk -v FPAT=':[^|]*' '{print $1,$2}' Input_file

sed extract part of string from a file

I've ben trying to extract only part of string from a file looking like this:
str1=USER_NAME
str2=justAstring
str3=https://product.org/v-4.5-bin.zip
str4=USER_HOME
I need to extract ONLY the version - in this case: 4.5
I did it by grep and then sed but now the output is 4.5-bin.zip
-> grep str3 file.txt
str3=https://product.org/v-4.5-bin.zip
-> echo str3=https://product.org/v-4.5-bin.zip | sed -n "s/^.*v-\(\S*\)/\1/p"
4.5-bin.zip
What should I do in order to remove also the -bin.zip at the end?
Thanks.
1st solution: With your shown samples, please try following sed code.
sed -n '/^str3=/s/.*-\([^-]*\)-.*/\1/p' Input_file
Explanation: Using sed's -n option which will STOP printing of values by default, to only print matched part. In main program checking condition if line starts from str3= then perform substitution there. In substitution catching everything between 1st - and next - in a capturing group and substituting whole line with it by using \1 and printing the matched portion only by using p option.
2nd solution: Using GNU grep you could try following grep program.
grep -oP '^str3=.*?-\K([^-]*)' Input_file
3rd solution: Using awk program for getting expected output as per shown smaples.
awk -F'-' '/^str3=/{print $2}' Input_file
4th solution: Using awk's match function to get expected results with help of using RSTART and RLENGTH variables which get set once a TRUE match is found by match function.
awk 'match($0,/^str3=.*-/){split(substr($0,RSTART,RLENGTH),arr,"-");print arr[2]}' Input_file
If you know the version contains just digits and dots, replace \S by [0-9.]. Also, match the remaining characters outside of the capture group to get it removed.
sed -n 's/^.*v-\([0-9.]*\).*/\1/p'

Delete first column of csv file [duplicate]

This question already has answers here:
awk - how to delete first column with field separator
(5 answers)
Closed 5 years ago.
I would like to know how i can delete the first column of a csv file with awk or sed
Something like this :
FIRST,SECOND,THIRD
To something like that
SECOND,THIRD
Thanks in advance
Following awk will be helping you in same.
awk '{sub(/[^,]*/,"");sub(/,/,"")} 1' Input_file
Following sed may also help you in same.
sed 's/\([^,]*\),\(.*\)/\2/' Input_file
Explanation:
awk ' ##Starting awk code here.
{
sub(/[^,]*/,"") ##Using sub for substituting everything till 1st occurence of comma(,) with NULL.
sub(/,/,"") ##Using sub for substituting comma with NULL in current line.
}
1 ##Mentioning 1 will print edited/non-edited lines here.
' Input_file ##Mentioning Input_file name here.
Using awk
$awk -F, -v OFS=, '{$1=$2; $2=$3; NF--;}1' file
SECOND,THIRD
With Sed
sed -i -r 's#^\w+,##g' test.csv
Grab the begin of the line ^, every character class [A-Za-z0-9] and also underscore until we found comma and replace with nothing.
Adding g after delimiters you can do a global substitution.
Using sed : ^[^,]+, regex represent the first column including the first comma. ^ means start of the line, [^,]+, means anything one or more times but a comma sign followed by a comma.
you can use -i with sed to make changes in file if needed.
sed -r 's/^[^,]+,//' input
SECOND,THIRD

"grep" a csv file including multi-lines fields?

file.csv:
XA90;"standard"
XA100;"this is
the multi-line"
XA110;"other standard"
I want to grep the "XA100" entry like this:
grep XA100 file.csv
to obtain this result:
XA100;"this is
the multi-line"
but grep return only one line:
XA100;"this is
source.csv contains 3 entries.
The "XA100" entry contain a multi-line field.
And grep doesn't seem to be the right tool to "grep" CSV file including multilines fields.
Do you know the way to make the job ?
Edit: the real world file contains many columns. The researched term can be in any column (not at begin of line, nor at the begin of field). All fields are encapsulated by ". Any field can contain a multi-line, from 1 line to any, and this cannot be predicted.
Give this line a try:
awk '/^XA100;/{p=1}p;p&&/"$/{p=0}' file
I extended your example a bit:
kent$ cat f
XA90;"standard"
XA100;"this is
the
multi-
line"
XA110;"other standard"
kent$ awk '/^XA100;/{p=1}p;p&&/"$/{p=0}' f
XA100;"this is
the
multi-
line"
In the comments you mention: In the real world file, each line start with ". I assume they also end with " and present you this:
Test file:
$ cat file
"single line"
"multi-
lined"
Code and outputs:
$ awk 'BEGIN{RS=ORS="\"\n"} /single/' file
"single line"
$ awk 'BEGIN{RS=ORS="\"\n"} /m/' file
"multi-
lined"
You can also parametrize the search:
$ awk -v s="multi" 'BEGIN{RS=ORS="\"\n"} match($0,s)' file
"multi-
lined"
try:
Solution 1:
awk -v RS="XA" 'NR==3{gsub(/$\n$/,"");print RS $0}' Input_file
Making Record separator as string XA then looking for line 3rd here and then globally substituting the $\n$(which is to remove the extra line at the end of the line) with NULL. Then printing the Record Separator with the current line.
Solution 2:
awk '/XA100/{print;getline;while($0 !~ /^XA/){print;getline}}' Input_file
Looking for string XA100 then printing the current line and using getline to go to next line, using while loop then which will run and print the lines until a line is starting from XA.
If this file was exported from MS-Excel or similar then lines end with \r\n while the newlines inside quotes are just \ns so then all you need is:
$ awk -v RS='\r\n' '/XA100/' file
XA100;"this is
the multi-line"
The above uses GNU awk for multi-char RS. On some platforms, e.g. cygwin, you'll have to add -v BINMODE=3 so gawk sees the \rs rather than them getting stripped by underlying C primitives.
Otherwise, it's extremely hard to parse CSV files in general without a real CSV parser (which awk currently doesn't have but is in the works for GNU awk) but you could do this (again with GNU awk for multi-char RS):
$ cat file
XA90;"standard"
XA100;"this is
the multi-line"
XA110;"other standard"
$ awk -v RS="\"[^\"]*\"" -v ORS= '{gsub(/\n/," ",RT); print $0 RT}' file
XA90;"standard"
XA100;"this is the multi-line"
XA110;"other standard"
to replace all newlines within quotes with blank chars and then process it as regular 1-line-per-record file.
Using PS response, this works for the small example:
sed 's/^X/\n&/' file.csv | awk -v RS= '/XA100/ {print}'
For my real world CSV file, with many columns, with researched term anywhere, with unknown count of multi-lines, with characters " replaced by "", with multi-lines lines beginning with ", with all fields encapsulated by ", this works. Note the exclusion of the second character " in sed part:
sed 's/^"[^"]/\n&/' file.csv | awk -v RS= '/RESEARCH_TERM/ {print}'
Because first column of any entry cannot start with "". First column allways looks like "XXXXXXXXX", where X is any character but ".
Thank you all for so much responses, maybe others solutions are working depending the CSV file format you use.

Using a multi-character field separator in awk on Solaris

I wish to use a string (BIRCH) as a field delimiter in awk to print second field. I am trying the following command:
cat tmp.log|awk -FBirch '{ print $2}'
Below output is getting printed:
irch2014/06/23,04:36:45,3,1401503,xml-harlan,P12345-1,temp,0a653356353635635,temp,L,Success
Desired output:
2014/06/23,04:36:45,3,1401503,xml-harlan,P12345-1,temp,0a653356353635635,temp,L,Success
Contents of tmp.log file.
-bash-3.2# cat tmp.log
Dec 05 13:49:23 [x.x.x.x.180.100] business-log-dev/int [TEST][0x80000001][business-log][info] mpgw(Test): trans(8497187)[request][10.x.x.x]:
Birch2014/06/23,04:36:45,3,1401503,xml-harlan,P12345-1,temp,0a653356353635635,temp,L,Success
Am I doing something wrong?
OS: Solaris10
Shell: Bash
Tried below command suggested in one of the ansers below. I am getting the desired output, but with an extra empty line at the top. How can this be eliminated from the output?
-bash-3.2# /usr/xpg4/bin/awk -FBirch '{print $2}' tmp.log
2014/06/23,04:36:45,3,1401503,xml-harlan,P12345-1,temp,0a653356353635635,temp,L,Success
Originally, I suggested putting quotes around "Birch" (-F'Birch') but actually, I don't think that should make any difference.
I'm not at all experienced working with Solaris but you may want to also try using nawk ("new awk") instead of awk.
nawk -FBirch '{print $2}' file
If this works, you may want to consider creating an alias so that you always use the newer version of awk with more features.
You may also want to try using the version of awk in the /usr/xpg4/bin directory, which is a POSIX compliant implementation so should support multi-character FS:
/usr/xpg4/bin/awk -FBirch '{print $2}' file
If you only want to print lines which have more than one field, you can add a condition:
/usr/xpg4/bin/awk -FBirch 'NF>1{print $2}' file
This only prints the second field when there is more than one field.
From the man page of the default awk on solaris usr/bin/awk
-Fc Uses the character c as the field separator
(FS) character. See the discussion of FS
below.
As you can see solaris awk only takes a single character as a Field separator
Also in the man page is split
split(s, a, fs)
Split the string s into array elements a[1], a[2], ...
a[n], and returns n. The separation is done with the
regular expression fs or with the field separator FS if
fs is not given.
As you can see here it takes a regular expression as a separator so we can use.
awk 'split($0,a,"Birch"){print a[2]}' file
To print the second field split by Birch

Resources