Parse file by splitting string in file and get desired output using single command - bash

I'm using bash to look into file and parse the results. Can someone tell me how to use cut/awk to split the string and get desired output by using single command? I can get through individual cut and get the below output (with 2 commands and concatenation) but i want to do using single command instead of two commands.
test.log:
1/98 | (PASSED) com.yahoo.qa.java.projects.stackoverview.questions.Password_01() | 21:20:20
Tried code:
str1=`cat test.log | tail -1 | cut -d '|' -f 1`
str2=`cat test.log | tail -1 | cut -d '|' -f 2 | sed -e 's/com.yahoo.qa.java.projects./''/g'`
str3="${str1} | ${str2}"
Expected:
1/98 | (PASSED) stackoverview.questions.Password_01

Since this is a simple substitution on an individual line it's better suited to sed than awk and not at all appropriate for cut:
$ sed 's/\(.*| [^ ]* \)com\.yahoo\.qa\.java\.projects\.\([^(]*\).*/\1\2/' file
1/98 | (PASSED) stackoverview.questions.Password_01

Following single awk may help you in same.
awk 'END{sub(/com\.yahoo\.qa\.java\.projects\./,"",$4);print $1,$2,$3,$4}' Input_file
OR for all kind of awks following may help you in same too.(As per SIR ED's suggestions):
awk '{value=$0} END{split(value, a," ");sub(/com.yahoo.qa.java.projects\./,"",a[4]);print a[1],a[2],a[3],a[4]}' Input_file

Using awk
$ awk -F "com[.]yahoo[.]qa[.]java[.]projects[.]" 'sub(/\(\).*/,"",$2)' file
1/98 | (PASSED) stackoverview.questions.Password_01

Related

How to grep only matching string from this result?

I am just simply trying to grab the commit ID, but not quite sure what I'm missing:
➜ ~ curl https://github.com/microsoft/vscode/releases -s | grep -oE 'microsoft/vscode/commit/(.*?)/hovercard'
microsoft/vscode/commit/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/hovercard
The only thing I need back from this is ccbaa2d27e38e5afa3e5c21c1c7bef4657064247.
This works just fine on regex101.com and in ruby/python. What am I missing?
If supported, you can use grep -oP
echo "microsoft/vscode/commit/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/hovercard" | grep -oP "microsoft/vscode/commit/\K.*?(?=/hovercard)"
Output
ccbaa2d27e38e5afa3e5c21c1c7bef4657064247
Another option is to use sed with a capture group
echo "microsoft/vscode/commit/ccbaa2d27e38e5afa3e5c21c1c7bef4657064247/hovercard" | sed -E 's/microsoft\/vscode\/commit\/([^\/]+)\/hovercard/\1/'
Output
ccbaa2d27e38e5afa3e5c21c1c7bef4657064247
The point is that grep does not support extracting capturing group submatches. If you install pcregrep you could do that with
curl https://github.com/microsoft/vscode/releases -s | \
pcregrep -o1 'microsoft/vscode/commit/(.*?)/hovercard' | head -1
The | head -1 part is to fetch the first occurrence only.
I would suggest using awk here:
awk 'match($0,/microsoft\/vscode\/commit\/[^\/]*\/hovercard/){print substr($0,RSTART+24,RLENGTH-34);exit}'
The regex will match a line containing
microsoft\/vscode\/commit\/ - microsoft/vscode/commit/ fixed string
[^\/]* - zero or more chars other than /
\/hovercard - a /hovercard string.
The substr($0,RSTART+24,RLENGTH-34) will print the part of the line starting at the RSTART+24 (24 is the length of microsoft/vscode/commit/) index and the RLENGTH is the length of microsoft/vscode/commit/ + the length of the /hovercard.
The exit command will fetch you the first occurrence. Remove it if you need all occurrences.
You can use sed:
curl -s https://github.com/microsoft/vscode/releases |
sed -En 's=.*microsoft/vscode/commit/([^/]+)/hovercard.*=\1=p' |
head -n 1
head -n 1 is to print the first match (there are 10)grep -o will print (only) everything that matches, including microsoft/ etc.
Your task can not be achieved with Mac's grep. grep -o prints all matching text (compared to default behaviour of printing matching lines), including microsoft/ etc. A grep which implemented perl regex (like GNU grep on Linux) could make use of look ahead/behind (grep -Po '(?<=microsoft/vscode/commit/)[^/]+(?=/hovercard)'). But it's just not available on Mac's grep.
On MacOS you don't have gnu utilities available by default. You can just pipe your output to a simple awk like this:
curl https://github.com/microsoft/vscode/releases -s |
grep -oE 'microsoft/vscode/commit/[^/]+/hovercard' |
awk -F/ '{print $(NF-1)}'
ccbaa2d27e38e5afa3e5c21c1c7bef4657064247
3a6960b964327f0e3882ce18fcebd07ed191b316
f4af3cbf5a99787542e2a30fe1fd37cd644cc31f
b3318bc0524af3d74034b8bb8a64df0ccf35549a
6cba118ac49a1b88332f312a8f67186f7f3c1643
c13f1abb110fc756f9b3a6f16670df9cd9d4cf63
ee8c7def80afc00dd6e593ef12f37756d8f504ea
7f6ab5485bbc008386c4386d08766667e155244e
83bd43bc519d15e50c4272c6cf5c1479df196a4d
e7d7e9a9348e6a8cc8c03f877d39cb72e5dfb1ff

I am having trouble removing new lines from text in a shell script (using sed and grep)

I am using zsh on macOS
I have a shell script that produces a text file with speedtest results in the following layout:
Download: 63.57 Mbps (data used: 69.3 MB)
Upload: 16.11 Mbps (data used: 23.0 MB)
I can manipulate the layout and produce this:
↓ 63.57 Mbps |
↑ 16.11 Mbps
Note the line break before the first line of text and the one after the pipe. In the Terminal only the final line is printed out: ↑ 16.11 Mbps
The script to transform the input is this:
DOWNLOAD=$(cat ~/Terminal_Projects/temp_speedtest_result.txt | grep Download | sed 's/ Download: /↓ /g' | sed 's/ (data used: //g' | sed -E 's/[0-9]{1,4}\.[0-9] MB)//g' | sed 's/\n\r\t//')
UPLOAD=$(cat ~/Terminal_Projects/temp_speedtest_result.txt | grep Upload | sed 's/ Upload: /↑ /g' | sed 's/ (data used: //g' | sed -E 's/[0-9]{1,4}\.[0-9] MB)//g' | tr '\n' ' ')
RESULT=$DOWNLOAD" | "$UPLOAD
echo $RESULT
I used multiple instances of sed because I couldn't get it to work in just one instance. You may know how to get it to work.
What I want to do is output the DOWNLOAD and UPLOAD variables on a single line. I have another very similar script that achieves that with exactly the same manipulation of variables.
What I have tried:
Using RESULT="$DOWNLOAD | $UPLOAD"
Using RESULT="${DOWNLOAD} | ${UPLOAD}"
Using tr '\n' ' ' instead of the sed command to remove \n
I tried removing the up and down arrows in case those symbols aren't supported - same behaviour.
I have tried using sed on the RESULT variable to try removing new lines. I also tried writing the contents of the RESULT variable to a new temp txt file and then retrieving the contents of the file and using grep to extract the results one by one in the hope the new lines would not be copied. Didn't work for me.
It looks like there are line breaks that I have been unable to remove but I could be wrong.
I am new to command line and shell scripts. Trying to apply my very limited knowledge to a new scenario. Any help would be appreciated.
tr seems to do the job and echo -n "$UPLOAD" shows on a single line, so I think you're on the right track and only need to fix the DOWNLOAD part.
I suggest you simplify the script a bit using something along these lines:
INPUT_FILE="~/Terminal_Projects/temp_speedtest_result.txt"
DOWNLOAD="$(grep Download $INPUT_FILE | awk '{print "↓" $2 " " $3}' | tr '\n' ' ')"
UPLOAD="$(grep Upload $INPUT_FILE | awk '{print "↓" $2 " " $3}' | tr '\n' ' ')"
echo "$DOWNLOAD | $UPLOAD"
How about
RESULT="$(grep -Ew '(Down|Up)load' <~/Terminal_Projects/temp_speedtest_result.txt | tr '\n' ' ')"
? This is more efficient (only one grep and tr process needed) and also fixes the bug you have in your solution by your use of a pipe in RESULT=$DOWNLOAD" | "$UPLOAD (which should have brought up an error message).

Sed output a value between two matching strings in a url

I have multiple urls as input
https://drive.google.com/a/domain.com/file/d/1OR9QLGsxiLrJIz3JAdbQRACd-G9ZfL3O/view?usp=drivesdk
https://drive.google.com/a/domain.com/file/d/1sEWMFqGW9p2qT-8VIoBesPlVJ4xvOzXD/view?usp=drivesdk
How can I create a sed command to simply return only the file ID
desired output:
1OR9QLGsxiLrJIz3JAdbQRACd-G9ZfL3O
1sEWMFqGW9p2qT-8VIoBesPlVJ4xvOzXD
Looks like I need to start between /d/ and stop at /view but I'm not quite sure how to do that.
I've tried? sed -e 's/d\(.*\)\/view/\1/'
I was able to do this with cut -d '/' -f 8
also awk -F/ '{print $8}' file worked, thanks!
Your command was almost right:
# Wrong
sed -e 's/d\(.*\)\/view/\1/'
# better, removing unmatched stuff including the / after the d
sed -e 's/.*d\/\(.*\)\/view.*/\1/'
# better: using # for making the command easier to read
sed -e 's#.*d/\(.*\)/view.*#\1#'
# Alternative:Using cut when you don't know which field /d/ is
some_straem | grep -Eo '/d/.*/view' | cut -d/ -f3

grep return the string in between words

I am trying to use grep to filter out the RDS snapshot identifier from the rds describe-db-snapshots command output below:
"arn:aws:rds:ap-southeast-1:123456789:snapshot:rds:apple-pie-2018-05-06-17-12",
"rds:apple-pie-2018-05-06-17-12",
how to return the exact output as in
rds:apple-pie-2018-05-06-17-12
tried using
grep -Eo ",rds:"
but not able to
Following awk may also help you on same.
awk 'match($0,/^"rds[^"]*/){print substr($0,RSTART+1,RLENGTH-1)}' Input_file
Your grep -Eo ",rds:" is failing for different reasons:
You did not add a " in the string to match
Between the comma and rds you need to match the character.
You are trying to match the comma that can be on the previous line
Your sample input is 2 lines (with a newline in between), perhaps the real input is without the newline.
You want to match until the next double quote.
You can support both input-styles (with/without newline) with
grep -Eo '(,|^)"rds:[^"]*' rdsfile |cut -d'"' -f2
You can do this in one command with
sed -rn 's/.*(,|^)"(rds:[^"]*).*/\2/p' rdsfile
EDIT: Manipulting stdout and not the file is with similar commands:
yourcommand | grep -Eo '(,|^)"rds:[^"]*' |cut -d'"' -f2
# or
yourcommand | sed -rn 's/.*(,|^)"(rds:[^"]*).*/\2/p'
You can also test the original commands with yourcommand > rdsfile.
You might notice that rdsfile is missing data that you have seen on the screen, in that case add 2>&1
yourcommand 2>&1 | grep -Eo '(,|^)"rds:[^"]*' |cut -d'"' -f2
# or
yourcommand 2>&1 | sed -rn 's/.*(,|^)"(rds:[^"]*).*/\2/p'

Shell script for string search between particular lines, timestamps

I have a file with more than 10000 lines. I am trying to search for a string in between particular set of lines, between 2 timestamps.
I am using sed command to achieve this.
sed -n '1,4133p' filename | sed -n '/'2015-08-12'/, /'2015-09-12'/p' filename | grep -i "string"
With the above command I am getting desired result. The above command is considering entire file not the lines I have specified.
Is there is a way to achieve this?.
Please help
I think the problem is here:
sed -n '1,4133p' filename | sed -n '/'2015-08-12'/, /'2015-09-12'/p' filename |
^^^
You want to pipe the output of your first sed command into the second. The way you have this, the output from the first is clobbered and replaced with a re-scan of the file.
Try this:
sed -n '1,4133p' filename | sed -n '/'2015-08-12'/, /'2015-09-12'/p' | grep -i "string"
Any time you find yourself chaining together pipes of seds and greps stop and just use 1 awk command instead:
awk -v IGNORECASE=1 '/2015-08-12/{f=1} f&&/string/; /2015-09-12/||(NR==4133){exit}' file
The above uses GNU awk for IGNORECASE, with other awks you'd just change /string/ to tolower($0)~/string/.

Resources