Read file until match pattern - bash
I´m reading a file and I would like to get the info until I found a match.
So having the file here https://ufile.io/182kx
I would like to have the json info from lastActiveTimes: until I found ,chatNotif:0
Returning
{"707514313":1505610703,"1568212945":1505638160,"732898933":1505638352,"100009336847960":1505635266,"721251435":1505570865,"718844397":1505623246,"1461941075":1505501435,"100004389551456":1505637706,"1211838231":1505582601,"1040249145":1505636186,"1242203773":1505628782,"517814298":1505567030,"807572767":1505638353,"738307936":1505638009,"683874946":1505598251,"822469152":1505636589,"727476234":1505627000,"781209703":1505631577,"1058918804":1505629365,"539657070":1505629599,"1506662943":1505606109,"538279690":1505575467,"1122078957":1505633239,"1426504238":1505614371,"1760126206":1505637897,"100009494169236":1505633218,"100000193088625":1505633785,"628050112":1505599301,"692803720":1505602132,"100000982526361":1505611187,"1567918281":1505549275,"562061542":1505633121,"680188549":1505637979,"201400626":1505510516,"709905371":1505635235,"100000921265645":1505637511,"100002576634271":1505633420,"100001152648289":1505638358,"1580474418":1505583268,"1093906498":1505635647,"1568491642":1505613600,"1759941492":1505592915,"1021502749":1505621933,"100001091369712":1505593740,"1201111516":1505631603,"511729394":1505637150,"1228064980":1505627119,"1484357891":1505632720,"773982263":1505636776,"610763631":1505581711,"581839860":1505636663,"100001509228647":1505550106,"100001496847848":1505520708,"553024640":1505631903,"1657607627":1505460838,"100008134920032":1505636261,"518105631":1505610763,"100000167522595":1505559871,"604094302":1505591423,"831534764":1505498705,"716402163":1505625063,"100005862197805":1505615273,"779160397":1505625381,"683029723":1505602056,"1105801871":1505638150,"1007323327":1505618323,"500432034":1505617899,"1019441248":1505593648,"1321064988":1505549642,"600465009":1505557526,"734790522":1505614982,"1139898038":1505597330,"762749332":1505595541,"100006926654236":1505637009,"100007887856728":1505580453,"1073032118":1505602788,"575893114":1505630287,"1463373342":1505609305}
I was trying sed with
sed -n '/lastActiveTimes:/,/chatNotif/p' home.html | sed '1s/.*lastActiveTimes://; $s/chatNotif.*//' > end.json
But did not work
if you do not mind using Perl you can try:
perl -lne 'print $& if /(?<=lastActiveTimes:).*?(?=,chatNotif)/g' home.txt
It prints anything between these two assertions: lastActiveTimes: and ,chatNotif
or
ack -o '(?<=lastActiveTimes:).*?(?=,chatNotif)' home.txt
With GNU grep and Perl regular expression (-P):
grep -Poz '(?<=lastActiveTimes:).*(\n.*)*(?=,chatNotif)' file
Output:
{"707514313":1505610703,"1568212945":1505639008,"732898933":1505641310,"100009336847960":1505641325,"721251435":1505570865,"718844397":1505623246,"1461941075":1505501435,"100004389551456":1505637706,"1211838231":1505582601,"1040249145":1505639741,"1242203773":1505628782,"517814298":1505567030,"807572767":1505638510,"738307936":1505641007,"683874946":1505598251,"822469152":1505636589,"727476234":1505627000,"781209703":1505631577,"1058918804":1505629365,"539657070":1505629599,"1506662943":1505606109,"538279690":1505640516,"1122078957":1505633239,"1426504238":1505614371,"1760126206":1505637897,"100009494169236":1505633218,"100000193088625":1505633785,"628050112":1505599301,"692803720":1505641333,"100000982526361":1505611187,"1567918281":1505549275,"562061542":1505641305,"680188549":1505637979,"201400626":1505510516,"709905371":1505635235,"100000921265645":1505637511,"100002576634271":1505633420,"100001152648289":1505640582,"1580474418":1505583268,"1093906498":1505635647,"1568491642":1505638670,"1759941492":1505592915,"1021502749":1505621933,"100001091369712":1505593740,"1201111516":1505631603,"511729394":1505637150,"1228064980":1505627119,"1484357891":1505632720,"773982263":1505641308,"610763631":1505581711,"581839860":1505641241,"100001509228647":1505550106,"100001496847848":1505520708,"553024640":1505631903,"1657607627":1505460838,"100008134920032":1505636261,"518105631":1505610763,"100000167522595":1505559871,"604094302":1505591423,"831534764":1505498705,"716402163":1505625063,"100005862197805":1505615273,"779160397":1505625381,"683029723":1505602056,"1105801871":1505641175,"1007323327":1505640781,"500432034":1505617899,"1019441248":1505593648,"1321064988":1505549642,"600465009":1505557526,"734790522":1505614982,"1139898038":1505597330,"762749332":1505595541,"100006926654236":1505637009,"100007887856728":1505580453,"1073032118":1505602788,"575893114":1505630287,"1463373342":1505640415}
Related
GREP: is there a way to use grep inserting a text between filename and the pattern?
I have grep --color -EH "^([^,]*\,){3}5" try.csv and the output it does is this: try.csv:410,30151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,512 try.csv:652,20151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,41 try.csv:109,30151010,R,5005,,,,,,,,,,,,,,,,,,,,,,,,,,,,,455 I tried grep --color -EH "^([^,]*,){3}5" try.csv | perl -ne 'print ",$_"' but the output looks like this : ,try.csv:410,30151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,512 ,try.csv:652,20151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,41 ,try.csv:109,30151010,R,5005,,,,,,,,,,,,,,,,,,,,,,,,,,,,,455 Expected output: try.csv:,410,30151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,512 try.csv:,652,20151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,41 try.csv:,109,30151010,R,5005,,,,,,,,,,,,,,,,,,,,,,,,,,,,,455 I am very new to Perl and shell. I'm searching in the CSV files.
You may insert a comma by using sed, $ grep --color -EH "^([^,]*\,){3}5" try.csv | sed 's/:/&,/' s/:/&,/: the the special character & in the replacement refers to that portion of the string which matched. And you may add a comma behind & to meet your requirement.
Need string extraction between tags
I have a string named as <tr><td>-Xms36g</td></tr> I need to extract Xms36g from it and I have tried and ended successfully with grep -oE '[Xms0-9g]' | xargs | sed 's| ||g' But I would like to know is there any other way I can achieve this. Thank you.
Using grep with PCRE (-P) grep -Po -- '-\K[^<]+' - matches - literally and \K discards the match [^<]+ gets the portion upto next < i.e. our desired portion With sed: sed -E 's/^[^-]*-([^<]+)<.*/\1/' ^[^-]*- matches substring upto the - The only captured group, ([^<]+) gets the portion upto next < <.* matches the rest In the replacement we have used the captured group only Example: % grep -Po -- '-\K[^<]+' <<<'<tr><td>-Xms36g</td></tr>' Xms36g % sed -E 's/^[^-]*-([^<]+)<.*/\1/' <<<'<tr><td>-Xms36g</td></tr>' Xms36g
Parsing HTML with regular expressions is frowned upon. If you have xmllint which is shipped with libxml2-util you can use this: xmllint --html --xpath '//text()' file You can also pipe to standard input. In this case you need to use - for the filename: foo | xmllint --html --xpath '//text()' -
There are seemingly endless ways you could do this. Here's an awk example: awk -F'-|<' '{print $4}' Another variation: awk -F'[-<]' '$0=$4 {print}' Using sed: sed -E 's/.*-([^/<>]*).*/\1/' Using cut: cut -b 10-15 Using echo: echo "${str:9:6}"
Text Manipulation using sed or AWK
I get the following result in my script when I run it against my services. The result differs depending on the service but the text pattern showing below is similar. The result of my script is assigned to var1. I need to extract data from this variable $var1=HOST1*prod*gem.dot*serviceList : svc1 HOST1*prod*kem.dot*serviceList : svc3, svc4 HOST1*prod*fen.dot*serviceList : svc5, svc6 I need to strip the name of the service list from $var1. So the end result should be printed on separate line as follow: svc1 svc2 svc3 svc4 svc5 svc6 Can you please help with this? Regards
Using sed and grep: sed 's/[^ ]* :\|,\|//g' <<< "$var1" | grep -o '[^ ]*' sed deletes every non-whitespace before a colon and commas. Grep just outputs the resulting services one per line.
Using gnu grep and gnu sed: grep -oP ': *\K\w+(, \w+)?' <<< "$var1" | sed 's/, /\n/' svc1 svc3 svc4 svc5 svc6
grep is the perfect tool for the job. From man grep: -o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line. Sounds perfect! As far as I'm aware this will work on any grep: echo "$var1" | grep -o 'svc[0-9]\+' Matches "svc" followed by one or more digits. You can also enable the "highly experimental" Perl regexp mode with -P, which means you can use the \d digit character class and don't have to escape the + any more: grep -Po 'svc\d+' <<<"$var1" In bash you can use <<< (a Here String) which supplies "$var1" to grep on the standard input. By the way, if your data was originally on separate lines, like: HOST1*prod*gem.dot*serviceList : svc1 HOST1*prod*kem.dot*serviceList : svc3, svc4 HOST1*prod*fen.dot*serviceList : svc5, svc6 This would be a good job for awk: awk -F': ' '{split($2,a,", "); for (i in a) print a[i]}'
How to grep -o without the -o
I've got BusyBox v1.01 providing my commands. Hence, -o is not included in the grep. How can I get grep -o behavior without the ... -o?
awk solution: awk '/PATTERN/{match($0,/PATTERN/);print substr($0,RSTART,RLENGTH)}' inputFile
If you have sed you can use simple regex. (see linuxquestions.org) sed -n 's/.*\(PATTERN\).*/\1/p' FILE So to find only the text StackOverflow in a file file.txt you'd write sed -n 's/.*\(StackOverflow\).*/\1/p' file.txt Remember the pattern in the sed command is a regular expression. So If your pattern contains any meta characters of regular expression, they need to be escaped.
You could use Perl instead: perl -lne 'print $1 while /(pattern)/g' FILE
Bash grep sth. then to find the position
I've long been wondering about this question; say I first try to grep some lines from a file: cat 101127_2.bam |grep 'TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA' Then it'll pop out the whole line containing this string. However, can we use some simple bash code to locate at which line this string locates? (100th? 1000th?...)
grep -n 'TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA' 101127_2.bam I found it using man grep and writing /line number // EDIT: Thanks #Keith Thompson I'm editing post from cat file | grep -n pattern to grep -n pattern file, I was in a hurry sorry
try this: cat 101127_2.bam |grep -n 'TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA'
This might work for you too: sed '/TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA/=;d' 101127_2.bam or sed -n '/TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA/=' 101127_2.bam The above solutions only output the matching line numbers, to see the lines matched too: sed '/TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA/!d;=' 101127_2.bam or sed -n '/TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA/{=;p}' 101127_2.bam