sed extract part of string from a file - bash

I've ben trying to extract only part of string from a file looking like this:
str1=USER_NAME
str2=justAstring
str3=https://product.org/v-4.5-bin.zip
str4=USER_HOME
I need to extract ONLY the version - in this case: 4.5
I did it by grep and then sed but now the output is 4.5-bin.zip
-> grep str3 file.txt
str3=https://product.org/v-4.5-bin.zip
-> echo str3=https://product.org/v-4.5-bin.zip | sed -n "s/^.*v-\(\S*\)/\1/p"
4.5-bin.zip
What should I do in order to remove also the -bin.zip at the end?
Thanks.

1st solution: With your shown samples, please try following sed code.
sed -n '/^str3=/s/.*-\([^-]*\)-.*/\1/p' Input_file
Explanation: Using sed's -n option which will STOP printing of values by default, to only print matched part. In main program checking condition if line starts from str3= then perform substitution there. In substitution catching everything between 1st - and next - in a capturing group and substituting whole line with it by using \1 and printing the matched portion only by using p option.
2nd solution: Using GNU grep you could try following grep program.
grep -oP '^str3=.*?-\K([^-]*)' Input_file
3rd solution: Using awk program for getting expected output as per shown smaples.
awk -F'-' '/^str3=/{print $2}' Input_file
4th solution: Using awk's match function to get expected results with help of using RSTART and RLENGTH variables which get set once a TRUE match is found by match function.
awk 'match($0,/^str3=.*-/){split(substr($0,RSTART,RLENGTH),arr,"-");print arr[2]}' Input_file

If you know the version contains just digits and dots, replace \S by [0-9.]. Also, match the remaining characters outside of the capture group to get it removed.
sed -n 's/^.*v-\([0-9.]*\).*/\1/p'

Related

How to remove string between two characters and before the first occurrence using sed

I would like to remove the string between ":" and the first "|" using sed.
input:
|abc:1.2.3|def|
output from sed:
|abc|def|
I managed to come up with sed 's|\(:\)[^|]*|\1|', but this sed command does not remove the first character (":"). How can I modify this command to also remove the colon?
You don't need to group : in your pattern and use it in substitution.
You should keep it simple:
s='|abc:1.2.3|def|'
sed 's/:[^|]*//' <<< "$s"
|abc|def|
: matches a colon and [^|]* matches 0 or more non-pipe characters
1st solution: With awk you could try following awk program.
awk 'match($0,/:[^|]*/){print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH)}' Input_file
Explanation: Using match function of awk, where matching from : to till first occurrence of | here. So what match function does is, whenever a regex is matched in it, it will SET values for its OOTB variables named RSTART and RLENGTH, so based on that we are printing sub-string to neglect matched part and print everything else as per required output in question.
2nd solution: Using FPAT option in GNU awk, try following, written and tested with your shown samples only.
awk -v FPAT=':[^|]*' '{print $1,$2}' Input_file

Clean output using sed

I have a file that begins with this kind of format
INFO|NOT-CLONED|/folder/another-folder/another-folder|last-folder-name|
What I need is to read the file and get this output:
INFO|NOT-CLONED|last-folder-name
I have this so far:
cat clone_them.log | grep 'INFO|NOT-CLONED' | sed -E 's/INFO\|NOT-CLONED\|(.*)/g'
But is not working as intended
NOTE: the last "another-folder" and "last-folder-name is the same
If you want a sed solution:
$ sed -En 's/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\1\2/p' file
INFO|NOT-CLONED|last-folder-name
How it works:
-E
Use extended regex
-n
Don't print unless we explicitly tell it to.
s/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\1\2/p
Look for lines that include INFO|NOT-CLONED| (save this in group 1) followed by anything, .*, followed by | followed by any characters not |, [^|]* (saved in group 2), followed by | at the end of the line. The replacement text is group 1 followed by group 2.
The p option tells sed to print the line if the match succeeds. Since the substitution only succeeds for lines that contain INFO|NOT-CLONED|, this eliminates the need for an extra grep process.
Variation: Returning just the last-folder-name
To just get the last-folder-name without the INFO|NOT-CLONED, we need only remove \1 from the output:
$ sed -En 's/(INFO\|NOT-CLONED\|).*\|([^|]*)\|$/\2/p' file
last-folder-name
Since we no longer need the first capture group, we could simplify and remove the now unneeded parens so that the only capture group is the last folder name:
$ sed -En 's/INFO\|NOT-CLONED\|.*\|([^|]*)\|$/\1/p' file
last-folder-name
Its simpler in awk as input file is properly delimited by | symbol. You need to tell awk that the input fields are separated by | and output should also remain separated with | symbol using IFS and OFS respectively.
awk 'BEGIN{FS=OFS="|"}/INFO\|NOT-CLONED/{print $1,$2,$(NF-1)}' clone_them.log
INFO|NOT-CLONED|last-folder-name

how to grep everything between single quotes?

I am having trouble figuring out how to grep the characters between two single quotes .
I have this in a file
version: '8.x-1.0-alpha1'
and I like to have the output like this (the version numbers can be various):
8.x-1.0-alpha1
I wrote the following but it does not work:
cat myfile.txt | grep -e 'version' | sed 's/.*\?'\(.*?\)'.*//g'
Thank you for your help.
Addition:
I used the sed command sed -n "s#version:\s*'\(.*\)'#\1#p"
I also like to remove 8.x- which I edited to sed -n "s#version:\s*'8.x-\(.*\)'#\1#p".
This command only works on linux and it does not work on MAC. How to change this command to make it works on MAC?
sed -n "s#version:\s*'8.x-\(.*\)'#\1#p"
If you just want to have that information from the file, and only that you can quickly do:
awk -F"'" '/version/{print $2}' file
Example:
$ echo "version: '8.x-1.0-alpha1'" | awk -F"'" '/version/{print $2}'
8.x-1.0-alpha1
How does this work?
An awk program is a series of pattern-action pairs, written as:
condition { action }
condition { action }
...
where condition is typically an expression and action a series of commands.
-F "'": Here we tell awk to define the field separator FS to be a <single quote> '. This means the all lines will be split in fields $1, $2, ... ,$NF and between each field there is a '. We can now reference these fields by using $1 for the first field, $2 for the second ... etc and this till $NF where NF is the total number of fields per line.
/version/{print $2}: This is the condition-action pair.
condition: /version/:: The condition reads: If a substring in the current record/line matches the regular expression /version/ then do action. Here, this is simply translated as if the current line contains a substring version
action: {print $2}:: If the previous condition is satisfied, then print the second field. In this case, the second field would be what the OP requests.
There are now several things that can be done.
Improve the condition to be /^version :/ && NF==3 which reads _If the current line starts with the substring version : and the current line has 3 fields then do action
If you only want the first occurance, you can tell the system to exit immediately after the find by updating the action to {print $2; exit}
I'd use GNU grep with pcre regexes:
grep -oP "version: '\\K.*(?=')" file
where we are looking for "version: '" and then the \K directive will forget what it just saw, leaving .*(?=') to match up to the last single quote.
Try something like this: sed -n "s#version:\s*'\(.*\)'#\1#p" myfile.txt. This avoids the redundant cat and grep by finding the "version" line and extracting the contents between the single quotes.
Explanation:
the -n flag tells sed not to print lines automatically. We then use the p command at the end of our sed pattern to explicitly print when we've found the version line.
Search for pattern: version:\s*'\(.*\)'
version:\s* Match "version:" followed by any amount of whitespace
'\(.*\)' Match a single ', then capture everything until the next '
Replace with: \1; This is the first (and only) capture group above, containing contents between single quotes.
When your only want to look at he quotes, you can use cut.
grep -e 'version' myfile.txt | cut -d "'" -f2
grep can almost do this alone:
grep -o "'.*'" file.txt
But this may also print lines you don't want to: it will print all lines with 2 single quotes (') in them. And the output still has the single quotes (') around it:
'8.x-1.0-alpha1'
But sed alone can do it properly:
sed -rn "s/^version: +'([^']+)'.*/\1/p" file.txt

substitute a letter at a specific position in the file itself using bash

I am trying to do this:
I have a file with content like below;
file:
abcdefgh
I am looking for a way to do this;
file:
aBCdefgh
So,make the 2nd and 3rd letter "capital/uppercase" in the file itself, because I have to do multiple conversions at different positions in a string in the file. Can someone please help me to know how to do this?
I came to know something like this below, but it does only for a single first character of the string in the file:
sed -i 's/^./\U&/' file
output:
Abcdefgh
Thanks much!
Change your sed approach to the following:
sed -i 's/\(.\)\(..\)/\1\U\2/' file
$ cat file
aBCdefgh
matching section:
\(.\) - match the 1st char of the string into the 1st captured group
\(..\) - match the next 2 chars placing into the 2nd captured group
replacement section:
\1 - points to the 1st parenthesized group \1 i.e. the 1st char
\U\2 - uppercase the characters from the 2nd captured group \2
Bonus approach for I want to capitalize "105th & 106th" characters:
sed -Ei 's/(.{104})(..)/\1\U\2/' file
awk on duty.
echo "abcdefgh" | awk '{print substr($0,1,1) toupper(substr($0,2,2)) substr($0,4)}'
Output will be as follows.
aBCdefgh
In case you have a Input_file and you want to save the edits into same Input_file.
awk '{print substr($0,1,1) toupper(substr($0,2,2)) substr($0,4)}' Input_file > temp_file && mv temp_file Input_file
Explanation: Please run above code as this is only for explanation purposes.
echo "abcdefgh" ##using echo command to print a string on the standard output.
| ##Pipe(|) is used for taking a command's standard output to pass as a standard input to another command(in this case echo is passing it's standard output to awk).
awk '{ ##Starting awk here.
##Print command in awk is being used to print anything variable, string etc etc.
##substring is awk's in-built utility which will allow us to get the specific parts of the line, variable. So it's syntax is substr(line/variable,starting point of the line/number,number of characters you need from the strating point mentioned), in case you haven't mentioned any number of characters it will take all the characters from starting point to till the end of the line.
##toupper, so it is also a awk's in-built utility which will covert any text to UPPER CASE passed to it, so in this case I am passing 2nd and 3rd character to it as per OP's request.
print substr($0,1,1) toupper(substr($0,2,2)) substr($0,4)}'

How to extract specific string in a file using awk, sed or other methods in bash?

I have a file with the following text (multiple lines with different values):
TokenRange(start_token:8050285221437500528,end_token:8051783269940793406,...
I want to extract the value of start_token and end_token. I tried awk and cut, but I am not able to figure out the best way to extract the targeted values.
Something like:
cat filename| get the values of start_token and end_token
grep -oP '(?<=token:)\d+' filename
Explanation:
-o: print only part that matches, not complete line
-P: use Perl regex engine (for look-around)
(?<=token:): positive look-behind – zero-width pattern
\d+: one or more digits
Result:
8050285221437500528
8051783269940793406
A (potentially more efficient) variant of this, as pointed out by hek2mgl in his comment, uses \K, the variable-width look-behind:
grep -oP 'token:\K\d+'
\K keeps everything that has been matched to the left of it, but does not include it in the match (see perlre).
Using awk:
awk -F '[(:,]' '{print $3, $5}' file
8050285221437500528 8051783269940793406
First value is start_token and last value is end_token.
a sed version
sed -e '/^TokenRange(/!d' -e 's/.*:\([0-9]*\),.*:\([0-9]*\),.*/\1 \2/' YourFile

Resources