I have used this script to extract all the occurences of function data between name __libc_memalign and : from "file1" to "file 2" . Now the file 2 contains multiple(6000 group) occurrences of code with in this pattern. How I can iterate through each group in the file "file2" and process each group?
`awk '/__libc_memalign/ {p=1;print;next} /:/ && p {p=0;print} p' file1.out >file2`
sample input
0 0xc40840 : __libc_memalign
0 0x40bac0 0x7ffe493d0d50 W
0 0x40bac2 0x7ffe493d0d48 W
0 0x40bac4 0x7ffe493d0d40 W
..
0 0xc40840 : __libc_memalign
0 0x40bac0 0x7ffe493d0d50 R
0 0x40bac2 0x7ffe493d0d48 R
0 0x40bac4 0x7ffe493d0d40 R
....
0 0xc40840 : __libc_memalign
0 0x40bab0 0x7ffe493b0d50 W
0 0x40bab2 0x7ffe493dbd48 R
0 0x40bac4 0x7ffe493d0d40 W
It's not really clear what you mean by "group" or "process" but hopefully at least this could nudge you in the right direction.
Assuming there are no empty lines in your groups, add a separator between them; then loop over sequences between empty lines. Your Awk script already seems to put an empty line when it finishes a group, so you can simply
awk '/__libc_memalign/ {p=1; print; next}
/:/ && p {p=0; print} p' file1.out |
while true; do
while read -r line; do
case $line in '') break;; esac
echo "$line"
done |
# Pipe the collected group into "process
process
done
This is fairly clumsy, and can probably be refactored significantly. If you don't particularly need the intermediate results, maybe simply
awk '/__libc_memalign/ {
p=1; cmd = "process" print | cmd; next}
/:/ && p { p=0; close(cmd) }
p { print | cmd }' file1.out
Related
I am trying to write a util function in a bash script that can take a multi-line string and append it to the supplied file if it does not already exist.
This works fine using grep if the pattern does not contain \n.
if grep -qF "$1" $2
then
return 1
else
echo "$1" >> $2
fi
Example usage
append 'sometext\nthat spans\n\tmutliple lines' ~/textfile.txt
I am on MacOS btw which has presented some problems with some of the solutions I've seen posted elsewhere being very linux specific. I'd also like to avoid installing any other tools to achieve this if possible.
Many thanks
If the files are small enough to slurp into a Bash variable (you should be OK up to a megabyte or so on a modern system), and don't contain NUL (ASCII 0) characters, then this should work:
IFS= read -r -d '' contents <"$2"
if [[ "$contents" == *"$1"* ]]; then
return 1
else
printf '%s\n' "$1" >>"$2"
fi
In practice, the speed of Bash's built-in pattern matching might be more of a limitation than ability to slurp the file contents.
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why I replaced echo with printf.
Using awk:
awk '
BEGIN {
n = 0 # length of pattern in lines
m = 0 # number of matching lines
}
NR == FNR {
pat[n++] = $0
next
}
{
if ($0 == pat[m])
m++
else if (m > 0 && $0 == pat[0])
m = 1
else
m = 0
}
m == n {
exit
}
END {
if (m < n) {
for (i = 0; i < n; i++)
print pat[i] >>FILENAME
}
}
' - "$2" <<EOF
$1
EOF
if necessary, one would need to properly escape any metacharacters inside FS | OFS :
jot 7 9 |
{m,g,n}awk 'BEGIN { FS = OFS = "11\n12\n13\n"
_^= RS = (ORS = "") "^$" } _<NF || ++NF'
9
10
11
12
13
14
15
jot 7 -2 | (... awk stuff ...)
-2
-1
0
1
2
3
4
11
12
13
I have a .txt file with numeric indices of certain 'outlier' data points, each on their own line, called by $outlier_file:
1
7
30
43
48
49
56
57
65
Using the following code, I can successfully remove certain files (volumes of neuroimaging data in this case) by using while + read.
while read outlier; do
# Remove current outlier vol from eddy unwarped DWI data
rm $DWI_path/$1/vol000*"$outlier".nii.gz;
done < $outlier_file
However, I also need to remove the numbers located at these 'outlier' indices from another text file stored in $bvec_file, which has 69 columns & 3 rows. Within each row, the numbers are space delimited. So e.g., for this example, I need to remove all 3 rows of column 1, 7, 30, etc. and then save this version with the outliers removed into a new *.txt file.
0 0.9988864166 -0.0415925034 -0.06652866169 -0.6187155495 0.2291534462 0.8892356214 0.7797364286 0.1957395685 0.9236669465 -0.5400265342 -0.3845263463 -0.4903989539 0.4863306385 -0.6496130843 0.5571164636 0.8110081715 0.9032142094 -0.3234596075 -0.1551409525 -0.806059879 0.4811597826 -0.7820757748 -0.9528881463 0.1916556621 -0.007136403284 -0.2459431735 -0.7915263574 -0.1938049261 -0.1578786349 0.8688043633 -0.5546072294 -0.4019951732 0.2806154851 0.3478762022 0.9548067252 -0.9696777541 -0.4816255837 -0.7962240023 0.6818610905 0.7097978218 0.6739686799 0.1317547111 -0.7648252249 -0.1456021218 -0.5948047487 0.0934205064 0.5268769564 -0.8618324858 -0.3721029232 -0.1827616535 0.691353613 0.4159071597 0.4605505287 0.1312199424 0.426674893 -0.4068291509 0.7167859082 0.2330824665 0.01909161256 -0.06375254731 -0.5981122948 -0.2672253674 0.6875472994 0.2302943724 0 0 0 0
0 0.04258194557 0.9988207007 0.6287131425 0.7469024143 0.5528476637 0.3024964957 0.1446931241 0.9305823612 0.1675139932 0.8208211337 0.8238722992 0.5983722761 0.4238174961 0.639429196 0.1072148887 0.5551578885 0.003337599176 0.511740508 0.9516619405 0.3851404227 0.8526321065 0.1390947346 0.2030449535 0.7759459569 0.165587903 0.9523372297 0.5801228933 0.3277276562 0.7413928896 0.442482978 0.2320585706 0.1079269171 0.1868672655 0.1606136006 0.2968573235 0.1682337977 0.8745679247 0.5989061899 0.4172933119 0.01746934331 0.5641480832 0.7455469091 0.3471016571 0.8035001467 0.5870623128 0.361107261 0.8192579877 0.4160218909 0.5651330299 0.4070513153 0.7221181184 0.714223583 0.6971767133 0.4937978446 0.4232911691 0.8011701162 0.2870385494 0.9016941521 0.09688949547 0.9086826131 0.2631932421 0.152678096 0.6295753848 0.9712458578 0 0 0 0
0 -0.02031513434 -0.02504539005 -0.7747862425 0.2435730944 0.8011542666 0.343155766 -0.6091592581 -0.3093581909 -0.3446424728 -0.1860752773 -0.4163819443 -0.6336083058 0.7641081337 -0.4112580017 -0.8234841915 0.1845683194 0.4291770641 -0.7959243273 -0.2650864686 0.449371034 -0.203724703 0.6074620459 0.2253373638 -0.6009791836 -0.9861692137 0.1804598471 0.1922068008 -0.9246806119 0.6522353256 -0.2222336438 0.7990992685 -0.9092588527 -0.9414539684 0.9236803664 0.0148272357 -0.1772637652 0.05628269894 -0.08566629406 -0.6007759525 0.7041888058 0.4769729119 0.6532997034 -0.5427364139 -0.5772239915 0.5491494803 0.9278330427 0.2263117816 -0.290121617 0.7363179158 0.8949343019 -0.02399176716 0.5629439653 -0.5493977074 -0.8596191107 -0.7992328333 0.4388809483 0.6354737076 0.3641705918 0.9951120218 0.412591228 -0.75696169 0.9514620339 -0.3618197699 0.06038199928 0 0 0 0
As far as I've gotten in one approach is using awk to index the right columns.. (just printing them right now) but I can only get this to work if I call $1 (i.e., the numeric index of the first outlier column)...
awk -F ' ' '{print $1}' $bvec_file
If I try to refer to the value in $outlier, it doesn't work. Instead, this prints the entire contents of $bvec_file
while read outlier; do
# Remove current outlier vol from eddy unwarped DWI data
rm $DWI_path/$1/vol000*"$outlier".nii.gz;
# Remove outlier #'s from bvec file
awk -F ' ' '{print $1}' $bvec_file
done < $outlier_file
I am completely stuck on how to get this done. Any advice would be greatly appreciated.
To delete the outliers from bvec_file after the loop and only delete the ones where the associated file was successfully removed:
#!/usr/bin/env bash
tmp=$(mktemp) || exit 1
while IFS= read -r outlier; do
# Remove current outlier vol from eddy unwarped DWI data
rm "$DWI_path/$1"/vol000*"$outlier".nii.gz &&
echo "$outlier"
done < "$outlier_file" |
awk '
NR==FNR { os[$0]; next }
{
for (o in os) {
$o=""
}
$0=$0; $1=$1
}
1' - "$bvec_file" > "$tmp" &&
mv "$tmp" "$bvec_file"
Or to delete the outliers one at a time as the files are removed:
#!/usr/bin/env bash
tmp=$(mktemp) || exit 1
while IFS= read -r outlier; do
# Remove current outlier vol from eddy unwarped DWI data
rm "$DWI_path/$1"/vol000*"$outlier".nii.gz &&
# Remove outlier #'s from bvec file
awk -v o="$outlier" '{$o=""; $0=$0; $1=$1} 1' "$bvec_file" > "$tmp" &&
mv "$tmp" "$bvec_file"
done < <(sort -rnu "$outlier_file")
Always quote your shell variables, see https://mywiki.wooledge.org/Quotes, and the && at the end of each line is to ensure the next command only runs if the previous commands succeeded.
The magical incantation in the awk script does the following - lets say your input is a b c and the outlier field is field number 2, b:
$ echo 'a b c'
a b c
$
$ echo 'a b c' | awk -v o=2 '{$o=""; print NF ":", $0}'
3: a c
$
$ echo 'a b c' | awk -v o=2 '{$o=""; $0=$0; print NF ":", $0}'
2: a c
$
$ echo 'a b c' | awk -v o=2 '{$o=""; $0=$0; $1=$1; print NF ":", $0}'
2: a c
The o="" sets the field value to null, the $0=$0 forces awk to resplit $0 into fields so it effectively deletes field 2 (as opposed to the previous step which set it to null but it still existed as such), and the $1=$1 recombines $0 from it's fields replacing every FS (any contiguous chain of white space chars including the 2 blanks now between a and c) with OFS (a single blank char).
Reading a text file into an array, extracting elements and sorting them is taking a very long time.
The text file is ffmpeg console output for R128 audio analysis. I need to get the highest M and S values. Example:
[Parsed_ebur128_0 # 0x7fd32a60caa0] t: 4.49998 M: -22.2 S: -29.9 I: -27.0 LUFS LRA: 9.8 LU FTPK: -12.4 dBFS TPK: -9.7 dBFS
[Parsed_ebur128_0 # 0x7fd32a60caa0] t: 4.69998 M: -22.5 S: -28.6 I: -25.9 LUFS LRA: 11.3 LU FTPK: -12.7 dBFS TPK: -9.7 dBFS
The text file can be hundreds or thousands of lines long depending on the duration of the audio file being analysed
I want to find the highest M (-22.2) and S Values (-28.6) and assign them to variables M and S
This is what I am using currently:
ARRAY=()
while read LINE
do
ARRAY+=("$LINE")
done < $tempDir/text.txt
for LINE in "${ARRAY[#]}"
do
echo "$LINE" | sed -n ‘/B:/p' | sed 's/S:.*//' | sed -n -e 's/^.*M://p' | sed -n -e 's/-//p' >>/$tempDir/R128M.txt
done
for LINE in "${ARRAY[#]}"
do
echo "$LINE" | sed -n '/M:/p' | sed 's/I:.*//' | sed -n -e 's/^.*S://p' | sed -n -e 's/-//p' >>$tempDir/R128S.txt
done
cat $tempDir/R128M.txt
M=( $(sort $tempDir/R128M.txt) )
cat $tempDir/R128S.txt
S=( $(sort $tempDir/R128S.txt) )
Is there a faster way of doing this?
Rather than reading in the whole file in memory, writing bits of it out to separate file, and reading those in again, just parse it and pick out the largest values:
$ awk '$7 > m || m == "" { m = $7 } $9 > s || s == "" { s = $9 } END { print m, s }' data
-22.2 -28.6
In your data, field 7 and 9 contains the values of M and S. The awk script will update its m and s variables if it finds larger values in these fields and then print the largest found at the end. The m == "" and s == "" are needed to trigger initialization of the values if no values has been read yet.
Another way with awk, which may look cleaner:
$ awk 'FNR == 1 { m = $7; s = $9; next } $7 > m { m = $7 } $9 > s { s = $9 } END { print m, s }' data
To assign them to M and S in the shell:
$ declare $( awk 'FNR == 1 { m = $7; s = $9; next } $7 > m { m = $7 } $9 > s { s = $9 } END { printf("M=%f S=%f\n", m, s) }' data )
$ echo $M $S
-22.200000 -28.600000
Adjust the printf() format to use %s instead of %f if you want the original strings instead of float values, or set the number of decimals you might want with, e.g., %.2f in place of %f.
First of all, three-process pipe is a bit redundant for a single value extraction, especially taking into account you reinstantiate it anew for every line.
Next, you save all the values into a file and then sort that file, while all you need is the maximum value. You can easily find it during the very first (value extraction) loop, for additional O(N) running time, instead of I/O and sorting with all the I/O overhead and O(NlogN) sorting expenses. See ARITHMETIC EXPANSION and conditional expressions in bash manual.
Consider a plain text file containing page-breaking ASCII control character "Form Feed" ($'\f'):
alpha\n
beta\n
gamma\n\f
one\n
two\n
three\n
four\n
five\n\f
earth\n
wind\n
fire\n
water\n\f
Note that each page has a random number of lines.
Need a bash routine that return the page number of a given line number from a text file containing page-breaking ASCII control character.
After a long time researching the solution I finally came across this piece of code:
function get_page_from_line
{
local nline="$1"
local input_file="$2"
local npag=0
local ln=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( ++npag ))
ln=$(echo -n "$page" | wc -l)
total=$(( total + ln ))
if [ $total -ge $nline ]; then
echo "${npag}"
return
fi
done < "$input_file"
echo "0"
return
}
But, unfortunately, this solution proved to be very slow in some cases.
Any better solution ?
Thanks!
The idea to use read -d $'\f' and then to count the lines is good.
This version migth appear not ellegant: if nline is greater than or equal to the number of lines in the file, then the file is read twice.
Give it a try, because it is super fast:
function get_page_from_line ()
{
local nline="${1}"
local input_file="${2}"
if [[ $(wc -l "${input_file}" | awk '{print $1}') -lt nline ]] ; then
printf "0\n"
else
printf "%d\n" $(( $(head -n ${nline} "${input_file}" | grep -c "^"$'\f') + 1 ))
fi
}
Performance of awk is better than the above bash version. awk was created for such text processing.
Give this tested version a try:
function get_page_from_line ()
{
awk -v nline="${1}" '
BEGIN {
npag=1;
}
{
if (index($0,"\f")>0) {
npag++;
}
if (NR==nline) {
print npag;
linefound=1;
exit;
}
}
END {
if (!linefound) {
print 0;
}
}' "${2}"
}
When \f is encountered, the page number is increased.
NR is the current line number.
----
For history, there is another bash version.
This version is using only built-it commands to count the lines in current page.
The speedtest.sh that you had provided in the comments showed it is a little bit ahead (20 sec approx.) which makes it equivalent to your version:
function get_page_from_line ()
{
local nline="$1"
local input_file="$2"
local npag=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( npag + 1 ))
IFS=$'\n'
for line in ${page}
do
total=$(( total + 1 ))
if [[ total -eq nline ]] ; then
printf "%d\n" ${npag}
unset IFS
return
fi
done
unset IFS
done < "$input_file"
printf "0\n"
return
}
awk to the rescue!
awk -v RS='\f' -v n=09 '$0~"^"n"." || $0~"\n"n"." {print NR}' file
3
updated anchoring as commented below.
$ for i in $(seq -w 12); do awk -v RS='\f' -v n="$i"
'$0~"^"n"." || $0~"\n"n"." {print n,"->",NR}' file; done
01 -> 1
02 -> 1
03 -> 1
04 -> 2
05 -> 2
06 -> 2
07 -> 2
08 -> 2
09 -> 3
10 -> 3
11 -> 3
12 -> 3
A script of similar length can be written in bash itself to locate and respond to the embedded <form-feed>'s contained in a file. (it will work for POSIX shell as well, with substitute for string index and expr for math) For example,
#!/bin/bash
declare -i ln=1 ## line count
declare -i pg=1 ## page count
fname="${1:-/dev/stdin}" ## read from file or stdin
printf "\nln:pg text\n" ## print header
while read -r l; do ## read each line
if [ ${l:0:1} = $'\f' ]; then ## if form-feed found
((pg++))
printf "<ff>\n%2s:%2s '%s'\n" "$ln" "$pg" "${l:1}"
else
printf "%2s:%2s '%s'\n" "$ln" "$pg" "$l"
fi
((ln++))
done < "$fname"
Example Input File
The simple input file with embedded <form-feed>'s was create with:
$ echo -e "a\nb\nc\n\fd\ne\nf\ng\nh\n\fi\nj\nk\nl" > dat/affex.txt
Which when output gives:
$ cat dat/affex.txt
a
b
c
d
e
f
g
h
i
j
k
l
Example Use/Output
$ bash affex.sh <dat/affex.txt
ln:pg text
1: 1 'a'
2: 1 'b'
3: 1 'c'
<ff>
4: 2 'd'
5: 2 'e'
6: 2 'f'
7: 2 'g'
8: 2 'h'
<ff>
9: 3 'i'
10: 3 'j'
11: 3 'k'
12: 3 'l'
With Awk, you can define RS (the record separator, default newline) to form feed (\f) and IFS (the input field separator, default any sequence of horizontal whitespace) to newline (\n) and obtain the number of lines as the number of "fields" in a "record" which is a "page".
The placement of form feeds in your data will produce some empty lines within a page so the counts are off where that happens.
awk -F '\n' -v RS='\f' '{ print NF }' file
You could reduce the number by one if $NF == "", and perhaps pass in the number of the desired page as a variable:
awk -F '\n' -v RS='\f' -v p="2" 'NR==p { print NF - ($NF == "") }' file
To obtain the page number for a particular line, just feed head -n number to the script, or loop over the numbers until you have accrued the sum of lines.
line=1
page=1
for count in $(awk -F '\n' -v RS='\f' '{ print NF - ($NF == "") }' file); do
old=$line
((line += count))
echo "Lines $old through line are on page $page"
((page++)
done
This gnu awk script prints the "page" for the linenumber given as command line argument:
BEGIN { ffcount=1;
search = ARGV[2]
delete ARGV[2]
if (!search ) {
print "Please provide linenumber as argument"
exit(1);
}
}
$1 ~ search { printf( "line %s is on page %d\n", search, ffcount) }
/[\f]/ { ffcount++ }
Use it like awk -f formfeeds.awk formfeeds.txt 05 where formfeeds.awk is the script, formfeeds.txt is the file and '05' is a linenumber.
The BEGIN rule deals mostly with the command line argument. The other rules are simple rules:
$1 ~ search applies when the first field matches the commandline argument stored in search
/[\f]/ applies when there is a formfeed
Okay, I have two files: one is baseline and the other is a generated report. I have to validate a specific string in both the files match, it is not just a single word see example below:
.
.
name os ksd
56633223223
some text..................
some text..................
My search criteria here is to find unique number such as "56633223223" and retrieve above 1 line and below 3 lines, i can do that on both the basefile and the report, and then compare if they match. In whole i need shell script for this.
Since the strings above and below are unique but the line count varies, I had put it in a file called "actlist":
56633223223 1 5
56633223224 1 6
56633223225 1 3
.
.
Now from below "Rcount" I get how many iterations to be performed, and in each iteration i have to get ith row and see if the word count is 3, if it is then take those values into variable form and use something like this
I'm stuck at the below, which command to be used. I'm thinking of using AWK but if there is anything better please advise. Here's some pseudo-code showing what I'm trying to do:
xxxxx=/root/xxx/xxxxxxx
Rcount=`wc -l $xxxxx | awk -F " " '{print $1}'`
i=1
while ((i <= Rcount))
do
record=_________________'(Awk command to retrieve ith(1st) record (of $xxxx),
wcount=_________________'(Awk command to count the number of words in $record)
(( i=i+1 ))
done
Note: record, wcount values are later printed to a log file.
Sounds like you're looking for something like this:
#!/bin/bash
while read -r word1 word2 word3 junk; do
if [[ -n "$word1" && -n "$word2" && -n "$word3" && -z "$junk" ]]; then
echo "all good"
else
echo "error"
fi
done < /root/shravan/actlist
This will go through each line of your input file, assigning the three columns to word1, word2 and word3. The -n tests that read hasn't assigned an empty value to each variable. The -z checks that there are only three columns, so $junk is empty.
I PROMISE you you are going about this all wrong. To find words in file1 and search for those words in file2 and file3 is just:
awk '
NR==FNR{ for (i=1;i<=NF;i++) words[$i]; next }
{ for (word in words) if ($0 ~ word) print FILENAME, word }
' file1 file2 file3
or similar (assuming a simple grep -f file1 file2 file3 isn't adequate). It DOES NOT involve shell loops to call awk to pull out strings to save in shell variables to pass to other shell commands, etc, etc.
So far all you're doing is asking us to help you implement part of what you think is the solution to your problem, but we're struggling to do that because what you're asking for doesn't make sense as part of any kind of reasonable solution to what it sounds like your problem is so it's hard to suggest anything sensible.
If you tells us what you are trying to do AS A WHOLE with sample input and expected output for your whole process then we can help you.
We don't seem to be getting anywhere so let's try a stab at the kind of solution I think you might want and then take it from there.
Look at these 2 files "old" and "new" side by side (line numbers added by the cat -n):
$ paste old new | cat -n
1 a b
2 b 56633223223
3 56633223223 c
4 c d
5 d h
6 e 56633223225
7 f i
8 g Z
9 h k
10 56633223225 l
11 i
12 j
13 k
14 l
Now lets take this "actlist":
$ cat actlist
56633223223 1 2
56633223225 1 3
and run this awk command on all 3 of the above files (yes, I know it could be briefer, more efficient, etc. but favoring simplicity and clarity for now):
$ cat tst.awk
ARGIND==1 {
numPre[$1] = $2
numSuc[$1] = $3
}
ARGIND==2 {
oldLine[FNR] = $0
if ($0 in numPre) {
oldHitFnr[$0] = FNR
}
}
ARGIND==3 {
newLine[FNR] = $0
if ($0 in numPre) {
newHitFnr[$0] = FNR
}
}
END {
for (str in numPre) {
if ( str in oldHitFnr ) {
if ( str in newHitFnr ) {
for (i=-numPre[str]; i<=numSuc[str]; i++) {
oldFnr = oldHitFnr[str] + i
newFnr = newHitFnr[str] + i
if (oldLine[oldFnr] != newLine[newFnr]) {
print str, "mismatch at old line", oldFnr, "new line", newFnr
print "\t" oldLine[oldFnr], "vs", newLine[newFnr]
}
}
}
else {
print str, "is present in old file but not new file"
}
}
else if (str in newHitFnr) {
print str, "is present in new file but not old file"
}
}
}
.
$ awk -f tst.awk actlist old new
56633223225 mismatch at old line 12 new line 8
j vs Z
It's outputing that result because the 2nd line after 56633223225 is j in file "old" but Z in file "new" and the file "actlist" said the 2 files had to be common from one line before until 3 lines after that pattern.
Is that what you're trying to do? The above uses GNU awk for ARGIND but the workaround is trivial for other awks.
Use the below code:
awk '{if (NF == 3) { word1=$1; word2=$2; word3=$3; print "Words are:" word1, word2, word3} else {print "Line", NR, "is having", NF, "Words" }}' filename.txt
I have given the solution as per the requirement.
awk '{ # awk starts from here and read a file line by line
if (NF == 3) # It will check if current line is having 3 fields. NF represents number of fields in current line
{ word1=$1; # If current line is having exact 3 fields then 1st field will be assigned to word1 variable
word2=$2; # 2nd field will be assigned to word2 variable
word3=$3; # 3rd field will be assigned to word3 variable
print word1, word2, word3} # It will print all 3 fields
}' filename.txt >> output.txt # THese 3 fields will be redirected to a file which can be used for further processing.
This is as per the requirement, but there are many other ways of doing this but it was asked using awk.