how to grep everything between single quotes? - bash

I am having trouble figuring out how to grep the characters between two single quotes .
I have this in a file
version: '8.x-1.0-alpha1'
and I like to have the output like this (the version numbers can be various):
8.x-1.0-alpha1
I wrote the following but it does not work:
cat myfile.txt | grep -e 'version' | sed 's/.*\?'\(.*?\)'.*//g'
Thank you for your help.
Addition:
I used the sed command sed -n "s#version:\s*'\(.*\)'#\1#p"
I also like to remove 8.x- which I edited to sed -n "s#version:\s*'8.x-\(.*\)'#\1#p".
This command only works on linux and it does not work on MAC. How to change this command to make it works on MAC?
sed -n "s#version:\s*'8.x-\(.*\)'#\1#p"

If you just want to have that information from the file, and only that you can quickly do:
awk -F"'" '/version/{print $2}' file
Example:
$ echo "version: '8.x-1.0-alpha1'" | awk -F"'" '/version/{print $2}'
8.x-1.0-alpha1
How does this work?
An awk program is a series of pattern-action pairs, written as:
condition { action }
condition { action }
...
where condition is typically an expression and action a series of commands.
-F "'": Here we tell awk to define the field separator FS to be a <single quote> '. This means the all lines will be split in fields $1, $2, ... ,$NF and between each field there is a '. We can now reference these fields by using $1 for the first field, $2 for the second ... etc and this till $NF where NF is the total number of fields per line.
/version/{print $2}: This is the condition-action pair.
condition: /version/:: The condition reads: If a substring in the current record/line matches the regular expression /version/ then do action. Here, this is simply translated as if the current line contains a substring version
action: {print $2}:: If the previous condition is satisfied, then print the second field. In this case, the second field would be what the OP requests.
There are now several things that can be done.
Improve the condition to be /^version :/ && NF==3 which reads _If the current line starts with the substring version : and the current line has 3 fields then do action
If you only want the first occurance, you can tell the system to exit immediately after the find by updating the action to {print $2; exit}

I'd use GNU grep with pcre regexes:
grep -oP "version: '\\K.*(?=')" file
where we are looking for "version: '" and then the \K directive will forget what it just saw, leaving .*(?=') to match up to the last single quote.

Try something like this: sed -n "s#version:\s*'\(.*\)'#\1#p" myfile.txt. This avoids the redundant cat and grep by finding the "version" line and extracting the contents between the single quotes.
Explanation:
the -n flag tells sed not to print lines automatically. We then use the p command at the end of our sed pattern to explicitly print when we've found the version line.
Search for pattern: version:\s*'\(.*\)'
version:\s* Match "version:" followed by any amount of whitespace
'\(.*\)' Match a single ', then capture everything until the next '
Replace with: \1; This is the first (and only) capture group above, containing contents between single quotes.

When your only want to look at he quotes, you can use cut.
grep -e 'version' myfile.txt | cut -d "'" -f2

grep can almost do this alone:
grep -o "'.*'" file.txt
But this may also print lines you don't want to: it will print all lines with 2 single quotes (') in them. And the output still has the single quotes (') around it:
'8.x-1.0-alpha1'
But sed alone can do it properly:
sed -rn "s/^version: +'([^']+)'.*/\1/p" file.txt

Related

sed extract part of string from a file

I've ben trying to extract only part of string from a file looking like this:
str1=USER_NAME
str2=justAstring
str3=https://product.org/v-4.5-bin.zip
str4=USER_HOME
I need to extract ONLY the version - in this case: 4.5
I did it by grep and then sed but now the output is 4.5-bin.zip
-> grep str3 file.txt
str3=https://product.org/v-4.5-bin.zip
-> echo str3=https://product.org/v-4.5-bin.zip | sed -n "s/^.*v-\(\S*\)/\1/p"
4.5-bin.zip
What should I do in order to remove also the -bin.zip at the end?
Thanks.
1st solution: With your shown samples, please try following sed code.
sed -n '/^str3=/s/.*-\([^-]*\)-.*/\1/p' Input_file
Explanation: Using sed's -n option which will STOP printing of values by default, to only print matched part. In main program checking condition if line starts from str3= then perform substitution there. In substitution catching everything between 1st - and next - in a capturing group and substituting whole line with it by using \1 and printing the matched portion only by using p option.
2nd solution: Using GNU grep you could try following grep program.
grep -oP '^str3=.*?-\K([^-]*)' Input_file
3rd solution: Using awk program for getting expected output as per shown smaples.
awk -F'-' '/^str3=/{print $2}' Input_file
4th solution: Using awk's match function to get expected results with help of using RSTART and RLENGTH variables which get set once a TRUE match is found by match function.
awk 'match($0,/^str3=.*-/){split(substr($0,RSTART,RLENGTH),arr,"-");print arr[2]}' Input_file
If you know the version contains just digits and dots, replace \S by [0-9.]. Also, match the remaining characters outside of the capture group to get it removed.
sed -n 's/^.*v-\([0-9.]*\).*/\1/p'

Delete words in a line using grep or sed

I want to delete three words with a special character on a line such as
Input:
\cf4 \cb6 1749,1789 \cb3 \
Output:
1749,1789
I have tried a couple sed and grep statements but so far none have worked, mainly due to the character \.
My unsuccessful attempt:
sed -i 's/ [.\c ] //g' inputfile.ext >output file.ext
Awk accepts a regex Field Separator (in this case, comma or space):
$ awk -F'[ ,]' '$0 = $3 "." $4' <<< '\cf4 \cb6 1749,1789 \cb3 \'
1749.1789
-F'[ ,]' - Use a single character from the set space/comma as Field Separator
$0 = $3 "." $4 - If we can set the entire line $0 to Field 3 $4 followed by a literal period "." followed by Field 4 $4, do the default behavior (print entire line)
Replace <<< 'input' with file if every line of that file has the same delimeters (spaces/comma) and number of fields. If your input file is more complex than the sample you shared, please edit your question to show actual input.
The backslash is a special meta-character that confuses bash.
We treat it like any other meta-character, by escaping it, with--you guessed it--a backslash!
But first, we need to grep this pattern out of our file
grep '\\... \\... [0-9]+,[0-9]+ \\... \\' our_file # Close enough!
Now, just sed out those pesky backslashes
| sed -e 's/\\//g' # Don't forget the g, otherwise it'll only strip out 1 backlash
Now, finally, sed out the clusters of 2 alpha followed by a number and a space!
| sed -e 's/[a-z][a-z][0-9] //g'
And, finally....
grep '\\... \\... [0-9]+,[0-9]+ \\... \\' our_file | sed -e 's/\\//g' | sed -e 's/[a-z][a-z][0-9] //g'
Output:
1749,1789
My guess is you are having trouble because you have backslashes in input and can't figure out how to get backslashes into your regex. Since backslashes are escape characters to shell and regex you end up having to type four backslashes to get one into your regex.
Ben Van Camp already posted an answer that uses single quotes to make the escaping a little easier; however I shall now post an answer that simply avoids the problem altogether.
grep -o '[0-9]*,[0-9]*' | tr , .
Locks on to the comma and selects the digits on either side and outputs the number. Alternately if comma is not guaranteed we can do it this way:
egrep -o ' [0-9,]*|^[0-9,]*' | tr , . | tr -d ' '
Both of these assume there's only one usable number per line.
$ awk '{sub(/,/,".",$3); print $3}' file
1749.1789
$ sed 's/\([^ ]* \)\{2\}\([^ ]*\).*/\2/; s/,/./' file
1749.1789

How do I seperate a link to get the end of a URL in shell?

I have some data that looks like this
"thumbnailUrl": "http://placehold.it/150/adf4e1"
I want to know how I can get the trailing part of the URL, I want the output to be
adf4e1
I was trying to grep when starting with / and ending with " but I'm only a beginner in shell scripting and need some help.
I came up with a quick and dirty solution, using grep (with perl regex) and cut:
$ cat file
"thumbnailUrl": "http://placehold.it/150/adf4e1"
"anotherUrl": "http://stackoverflow.com/questions/3979680"
"thumbnailUrl": "http://facebook.com/12f"
"randortag": "http://google.com/this/is/how/we/roll/3fk19as1"
$ cat file | grep -o '/\w*"$' | cut -d'/' -f2- | cut -d'"' -f1
adf4e1
3979680
12f
3fk19as1
We could kill this with a thousand little cuts, or just one blow from Awk:
awk -F'[/"]' '{ print $(NF-1); }'
Test:
$ echo '"thumbnailUrl": "http://placehold.it/150/adf4e1"' \
| awk -F'[/"]' '{ print $(NF-1); }'
adf4e1
Filter thorugh Awk using double quotes and slashes as field separators. This means that the trailing part ../adf4e1" is separated as {..}</>{adf4e1}<">{} where curly braces denote fields and angle brackets separators. The Awk variable NF gives the 1-based number of fields and so $NF is the last field. That's not the one we want, because it is blank; we want $(NF-1): the second last field.
"Golfed" version:
awk -F[/\"] '$0=$(NF-1)'
If the original string is coming from a larger JSON object, use something like jq to extract the value you want.
For example:
$ jq -n '{thumbnail: "http://placehold.it/150/adf4e1"}' |
> jq -r '.thumbnail|split("/")[-1]'
adf4e1
(The first command just generates a valid JSON object representing the original source of your data; the second command parses it and extracts the desired value. The split function splits the URL into an array, from which you only care about the last element.)
You can also do this purely in bash using string replacement and substring removal if you wrap your string in single quotes and assign it to a variable.
#!/bin/bash
string='"thumbnailUrl": "http://placehold.it/150/adf4e1"'
string="${string//\"}"
echo "${string##*/}"
adf4e1 #output
You can do that using 'cut' command in linux. Cut it using '/' and keep the last cut. Try it, its fun!
Refer http://www.thegeekstuff.com/2013/06/cut-command-examples

shell script cut from variables

The file is like this
aaa&123
bbb&234
ccc&345
aaa&456
aaa$567
bbb&678
I want to output:(contain "aaa" and text after &)
123
456
I want to do in in shell script,
Follow code be consider
#!/bin/bash
raw=$(grep 'aaa' 1.txt)
var=$(cut -f2 -d"&" "$raw")
echo $var
It give me a error like
cut: aaa&123
aaa&456
aaa$567: No such file or directory
How to fix it? and how to cut (or grep or other) from exist variables?
Many thanks!
With GNU grep:
grep -oP 'aaa&\K.*' file
Output:
123
456
\K: ignore everything before pattern matching and ignore pattern itself
From man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
-P, --perl-regexp
Interpret PATTERN as a Perl compatible regular expression (PCRE)
Cyrus has my vote. An awk alternative if GNU grep is not available:
awk -F'&' 'NF==2 && $1 ~ /aaa/ {print $2}' file
Using & as the field separator, for lines with 2 fields (i.e. & must be present) and the first field contains "aaa", print the 2nd field.
The error with your answer is that you are treating the grep output like a filename in the cut command. What you want is this:
grep 'aaa.*&' file | cut -d'&' -f2
The pattern means "aaa appears before an &"

Display all fields except the last

I have a file as show below
1.2.3.4.ask
sanma.nam.sam
c.d.b.test
I want to remove the last field from each line, the delimiter is . and the number of fields are not constant.
Can anybody help me with an awk or sed to find out the solution. I can't use perl here.
Both these sed and awk solutions work independent of the number of fields.
Using sed:
$ sed -r 's/(.*)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
Note: -r is the flag for extended regexp, it could be -E so check with man sed. If your version of sed doesn't have a flag for this then just escape the brackets:
sed 's/\(.*\)\..*/\1/' file
1.2.3.4
sanma.nam
c.d.b
The sed solution is doing a greedy match up to the last . and capturing everything before it, it replaces the whole line with only the matched part (n-1 fields). Use the -i option if you want the changes to be stored back to the files.
Using awk:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file
1.2.3.4
sanma.nam
c.d.b
The awk solution just simply prints n-1 fields, to store the changes back to the file use redirection:
$ awk 'BEGIN{FS=OFS="."}{NF--; print}' file > tmp && mv tmp file
Reverse, cut, reverse back.
rev file | cut -d. -f2- | rev >newfile
Or, replace from last dot to end with nothing:
sed 's/\.[^.]*$//' file >newfile
The regex [^.] matches one character which is not dot (or newline). You need to exclude the dot because the repetition operator * is "greedy"; it will select the leftmost, longest possible match.
With cut on the reversed string
cat youFile | rev |cut -d "." -f 2- | rev
If you want to keep the "." use below:
awk '{gsub(/[^\.]*$/,"");print}' your_file

Resources