Read file until match pattern - bash

I´m reading a file and I would like to get the info until I found a match.
So having the file here https://ufile.io/182kx
I would like to have the json info from lastActiveTimes: until I found ,chatNotif:0
Returning
{"707514313":1505610703,"1568212945":1505638160,"732898933":1505638352,"100009336847960":1505635266,"721251435":1505570865,"718844397":1505623246,"1461941075":1505501435,"100004389551456":1505637706,"1211838231":1505582601,"1040249145":1505636186,"1242203773":1505628782,"517814298":1505567030,"807572767":1505638353,"738307936":1505638009,"683874946":1505598251,"822469152":1505636589,"727476234":1505627000,"781209703":1505631577,"1058918804":1505629365,"539657070":1505629599,"1506662943":1505606109,"538279690":1505575467,"1122078957":1505633239,"1426504238":1505614371,"1760126206":1505637897,"100009494169236":1505633218,"100000193088625":1505633785,"628050112":1505599301,"692803720":1505602132,"100000982526361":1505611187,"1567918281":1505549275,"562061542":1505633121,"680188549":1505637979,"201400626":1505510516,"709905371":1505635235,"100000921265645":1505637511,"100002576634271":1505633420,"100001152648289":1505638358,"1580474418":1505583268,"1093906498":1505635647,"1568491642":1505613600,"1759941492":1505592915,"1021502749":1505621933,"100001091369712":1505593740,"1201111516":1505631603,"511729394":1505637150,"1228064980":1505627119,"1484357891":1505632720,"773982263":1505636776,"610763631":1505581711,"581839860":1505636663,"100001509228647":1505550106,"100001496847848":1505520708,"553024640":1505631903,"1657607627":1505460838,"100008134920032":1505636261,"518105631":1505610763,"100000167522595":1505559871,"604094302":1505591423,"831534764":1505498705,"716402163":1505625063,"100005862197805":1505615273,"779160397":1505625381,"683029723":1505602056,"1105801871":1505638150,"1007323327":1505618323,"500432034":1505617899,"1019441248":1505593648,"1321064988":1505549642,"600465009":1505557526,"734790522":1505614982,"1139898038":1505597330,"762749332":1505595541,"100006926654236":1505637009,"100007887856728":1505580453,"1073032118":1505602788,"575893114":1505630287,"1463373342":1505609305}
I was trying sed with
sed -n '/lastActiveTimes:/,/chatNotif/p' home.html | sed '1s/.*lastActiveTimes://; $s/chatNotif.*//' > end.json
But did not work

if you do not mind using Perl you can try:
perl -lne 'print $& if /(?<=lastActiveTimes:).*?(?=,chatNotif)/g' home.txt
It prints anything between these two assertions: lastActiveTimes: and ,chatNotif
or
ack -o '(?<=lastActiveTimes:).*?(?=,chatNotif)' home.txt

With GNU grep and Perl regular expression (-P):
grep -Poz '(?<=lastActiveTimes:).*(\n.*)*(?=,chatNotif)' file
Output:
{"707514313":1505610703,"1568212945":1505639008,"732898933":1505641310,"100009336847960":1505641325,"721251435":1505570865,"718844397":1505623246,"1461941075":1505501435,"100004389551456":1505637706,"1211838231":1505582601,"1040249145":1505639741,"1242203773":1505628782,"517814298":1505567030,"807572767":1505638510,"738307936":1505641007,"683874946":1505598251,"822469152":1505636589,"727476234":1505627000,"781209703":1505631577,"1058918804":1505629365,"539657070":1505629599,"1506662943":1505606109,"538279690":1505640516,"1122078957":1505633239,"1426504238":1505614371,"1760126206":1505637897,"100009494169236":1505633218,"100000193088625":1505633785,"628050112":1505599301,"692803720":1505641333,"100000982526361":1505611187,"1567918281":1505549275,"562061542":1505641305,"680188549":1505637979,"201400626":1505510516,"709905371":1505635235,"100000921265645":1505637511,"100002576634271":1505633420,"100001152648289":1505640582,"1580474418":1505583268,"1093906498":1505635647,"1568491642":1505638670,"1759941492":1505592915,"1021502749":1505621933,"100001091369712":1505593740,"1201111516":1505631603,"511729394":1505637150,"1228064980":1505627119,"1484357891":1505632720,"773982263":1505641308,"610763631":1505581711,"581839860":1505641241,"100001509228647":1505550106,"100001496847848":1505520708,"553024640":1505631903,"1657607627":1505460838,"100008134920032":1505636261,"518105631":1505610763,"100000167522595":1505559871,"604094302":1505591423,"831534764":1505498705,"716402163":1505625063,"100005862197805":1505615273,"779160397":1505625381,"683029723":1505602056,"1105801871":1505641175,"1007323327":1505640781,"500432034":1505617899,"1019441248":1505593648,"1321064988":1505549642,"600465009":1505557526,"734790522":1505614982,"1139898038":1505597330,"762749332":1505595541,"100006926654236":1505637009,"100007887856728":1505580453,"1073032118":1505602788,"575893114":1505630287,"1463373342":1505640415}

Related

GREP: is there a way to use grep inserting a text between filename and the pattern?

I have grep --color -EH "^([^,]*\,){3}5" try.csv
and the output it does is this:
try.csv:410,30151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,512
try.csv:652,20151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,41
try.csv:109,30151010,R,5005,,,,,,,,,,,,,,,,,,,,,,,,,,,,,455
I tried grep --color -EH "^([^,]*,){3}5" try.csv | perl -ne 'print ",$_"'
but the output looks like this :
,try.csv:410,30151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,512
,try.csv:652,20151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,41
,try.csv:109,30151010,R,5005,,,,,,,,,,,,,,,,,,,,,,,,,,,,,455
Expected output:
try.csv:,410,30151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,512
try.csv:,652,20151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,41
try.csv:,109,30151010,R,5005,,,,,,,,,,,,,,,,,,,,,,,,,,,,,455
I am very new to Perl and shell. I'm searching in the CSV files.
You may insert a comma by using sed,
$ grep --color -EH "^([^,]*\,){3}5" try.csv | sed 's/:/&,/'
s/:/&,/: the the special character & in the replacement refers to that portion of the string which matched. And you may add a comma behind & to meet your requirement.

Need string extraction between tags

I have a string named as <tr><td>-Xms36g</td></tr>
I need to extract Xms36g from it and I have tried and ended successfully with
grep -oE '[Xms0-9g]' | xargs | sed 's| ||g'
But I would like to know is there any other way I can achieve this.
Thank you.
Using grep with PCRE (-P)
grep -Po -- '-\K[^<]+'
- matches - literally and \K discards the match
[^<]+ gets the portion upto next < i.e. our desired portion
With sed:
sed -E 's/^[^-]*-([^<]+)<.*/\1/'
^[^-]*- matches substring upto the -
The only captured group, ([^<]+) gets the portion upto next <
<.* matches the rest
In the replacement we have used the captured group only
Example:
% grep -Po -- '-\K[^<]+' <<<'<tr><td>-Xms36g</td></tr>'
Xms36g
% sed -E 's/^[^-]*-([^<]+)<.*/\1/' <<<'<tr><td>-Xms36g</td></tr>'
Xms36g
Parsing HTML with regular expressions is frowned upon. If you have xmllint which is shipped with libxml2-util you can use this:
xmllint --html --xpath '//text()' file
You can also pipe to standard input. In this case you need to use - for the filename:
foo | xmllint --html --xpath '//text()' -
There are seemingly endless ways you could do this. Here's an awk example:
awk -F'-|<' '{print $4}'
Another variation:
awk -F'[-<]' '$0=$4 {print}'
Using sed:
sed -E 's/.*-([^/<>]*).*/\1/'
Using cut:
cut -b 10-15
Using echo:
echo "${str:9:6}"

Text Manipulation using sed or AWK

I get the following result in my script when I run it against my services. The result differs depending on the service but the text pattern showing below is similar. The result of my script is assigned to var1. I need to extract data from this variable
$var1=HOST1*prod*gem.dot*serviceList : svc1 HOST1*prod*kem.dot*serviceList : svc3, svc4 HOST1*prod*fen.dot*serviceList : svc5, svc6
I need to strip the name of the service list from $var1. So the end result should be printed on separate line as follow:
svc1
svc2
svc3
svc4
svc5
svc6
Can you please help with this?
Regards
Using sed and grep:
sed 's/[^ ]* :\|,\|//g' <<< "$var1" | grep -o '[^ ]*'
sed deletes every non-whitespace before a colon and commas. Grep just outputs the resulting services one per line.
Using gnu grep and gnu sed:
grep -oP ': *\K\w+(, \w+)?' <<< "$var1" | sed 's/, /\n/'
svc1
svc3
svc4
svc5
svc6
grep is the perfect tool for the job.
From man grep:
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
Sounds perfect!
As far as I'm aware this will work on any grep:
echo "$var1" | grep -o 'svc[0-9]\+'
Matches "svc" followed by one or more digits. You can also enable the "highly experimental" Perl regexp mode with -P, which means you can use the \d digit character class and don't have to escape the + any more:
grep -Po 'svc\d+' <<<"$var1"
In bash you can use <<< (a Here String) which supplies "$var1" to grep on the standard input.
By the way, if your data was originally on separate lines, like:
HOST1*prod*gem.dot*serviceList : svc1
HOST1*prod*kem.dot*serviceList : svc3, svc4
HOST1*prod*fen.dot*serviceList : svc5, svc6
This would be a good job for awk:
awk -F': ' '{split($2,a,", "); for (i in a) print a[i]}'

How to grep -o without the -o

I've got BusyBox v1.01 providing my commands. Hence, -o is not included in the grep. How can I get grep -o behavior without the ... -o?
awk solution:
awk '/PATTERN/{match($0,/PATTERN/);print substr($0,RSTART,RLENGTH)}' inputFile
If you have sed you can use simple regex. (see linuxquestions.org)
sed -n 's/.*\(PATTERN\).*/\1/p' FILE
So to find only the text StackOverflow in a file file.txt you'd write
sed -n 's/.*\(StackOverflow\).*/\1/p' file.txt
Remember the pattern in the sed command is a regular expression. So If your pattern contains any meta characters of regular expression, they need to be escaped.
You could use Perl instead:
perl -lne 'print $1 while /(pattern)/g' FILE

Bash grep sth. then to find the position

I've long been wondering about this question;
say I first try to grep some lines from a file:
cat 101127_2.bam |grep 'TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA'
Then it'll pop out the whole line containing this string.
However, can we use some simple bash code to locate at which line this string locates? (100th? 1000th?...)
grep -n 'TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA' 101127_2.bam
I found it using man grep and writing /line number
// EDIT: Thanks #Keith Thompson I'm editing post from cat file | grep -n pattern to grep -n pattern file, I was in a hurry sorry
try this:
cat 101127_2.bam |grep -n 'TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA'
This might work for you too:
sed '/TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA/=;d' 101127_2.bam
or
sed -n '/TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA/=' 101127_2.bam
The above solutions only output the matching line numbers, to see the lines matched too:
sed '/TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA/!d;=' 101127_2.bam
or
sed -n '/TGATTACTTGCTTTATTTTAGTGTTTAATTTGTTCTTTTCTAATAA/{=;p}' 101127_2.bam

Resources