Script with nawk doesn't print output to screen - shell

I came across a nice one-liner to search for text and print out the results with a given number of trailing and following lines. I'm trying to create a script out of this.
So far this is what I have:
# *********************************************************
# This uses nawk to list lines coming prior to and after
# the given search string.
# Format: xgrep <string> <file> <before> <after>
# ********************************************************
STR=$1
FILE=$2
BEF=$3
AFT=$4
export STR
export FILE
export BEF
export AFT
nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=$BEF a=$AFT s=$STR $FILE
The problem is that the output of the nawk command doesn't appear on the screen
>xgrep "bin/sh" * 0 3
>
But if I type in the command, I get a proper output:
>nawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=0 a=3 s="bin/sh" *
#!/bin/sh
export AEGIS_PROJECT=$1
export EDITOR=cat
#!/bin/sh
aegis -cpu $pwd
aegis -dbu $1
#!/bin/sh
cd $HOME/ex_7/src/DTTCom
NRMan alphabuild clean
NRMan alphabuild library ex_7 8.19 app TEST DTTCom debug -j10
#!/bin/sh
cd $HOME/icx/src/DTTCom
NRMan alphabuild clean
NRMan alphabuild library icx_1 1.1 app TEST DTTCom debug -j10
#!/bin/sh
# *********************************************************
# This uses nawk to list lines coming prior to and after
# the given search string.
What's the reason for this and how can I get the script to work?

Try using:
xgrep "bin/sh" '*' 0 3
instead.
The expansion of the wildcard term is happening in the current shell before the arguments are passed to your script as shown in this transcript:
pax: xgrep include *.c 0 3
#include <stdio.h>
gawk: (FILENAME=binmath.c FNR=1) fatal: division by zero attempted in '%'
pax: xgrep include '*.c' 0 3
#include <stdio.h>
// Prototypes - should go in separate header file.
void compBinNum (char*);
#include <stdio.h>
#include <string.h>
#include <string.h>
#define UTF8ERR_TOOSHORT -1
#define UTF8ERR_BADSTART -2
#include <stdio.h>
#include <errno.h>
#include <errno.h>
int main (void) {
FILE *file = fopen ("words.txt", "r");
You can see how these arguments work with the following simple script and output:
pax: cat argtest
#!/bin/bash
echo Number of arguments: $#
echo Arguments: "$*"
pax: argtest *
Number of arguments: 32
Arguments: Makefile a.exe argtest binmath.c binmath.exe dev
file.db infile input.txt inputfile limit10 output.txt
p2.sh p2expected p2input1 p2input2 qq qq.c qq.cpp qq.exe
qq.exe.stackdump qq.pl qq.py qqin qqq.c qqq.s tmpfile
words2.txt xgrep xx.c xx.exe xx.pl
pax: argtest '*'
Number of arguments: 1
Arguments: *
Update:
Based on your question in the comments:
Thanks. It worked when I wrapped the file with single quotes. Is there a way that I could do this inside the script so that the user doesn't have to bother with typing single quotes?
No, that's because the shell is doing it before your script ever sees it. However, if you move the file specification to the end of the command line thus:
xgrep include 0 3 *.c
you could modify your script to not just process argument number 4 but every argument after that as well, one at a time. Then, when they've been expanded by the shell, it won't matter.
Something like (with the gawk/nawk on a single line):
STR=$1
BEF=$2
AFT=$3
while [[ ! -z "$4" ]] ; do
echo ========== "$4"
gawk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)
print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}'
b=$BEF a=$AFT s=$STR "$4"
| sed "s/^/$4: /"
shift
done
echo ==========
Letting the shell handle your expansion and using a loop also allows you to do tricks such as printing the file name with each block (or line) of output:
pax: xgrep include 0 3 *.c
========== binmath.c
binmath.c: #include <stdio.h>
binmath.c:
binmath.c: // Prototypes - should go in separate header file.
binmath.c: void compBinNum (char*);
========== qq.c
qq.c: #include <stdio.h>
qq.c: #include <string.h>
qq.c: #include <string.h>
qq.c:
qq.c: #define UTF8ERR_TOOSHORT -1
qq.c: #define UTF8ERR_BADSTART -2
========== qqq.c
========== xx.c
xx.c: #include <stdio.h>
xx.c: #include <errno.h>
xx.c: #include <errno.h>
xx.c:
xx.c: int main (void) {
xx.c: FILE *file = fopen ("words.txt", "r");
==========

Related

diff with file names prefixing each line

How can I diff two (or more) files, displaying the file name at the beginning of each line?
That is, instead of this:
--- file1.c
+++ file2.c
## -1 +1 ##
-int main() {
+int main(void) {
I would prefer something like this:
file1.c:- int main() {
file2.c:+ int main(void) {
This is not so useful when there are only two files, but extremely handy when using --from-file/--to-file.
I could not find a more concise solution, so I wrote my own script to do it, using several calls to diff to add a different prefix each time.
#!/bin/bash
# the first argument is the original file that others are compared with
orig=$1
len1=${#1}
shift
# we compute the length of the filenames to ensure they are aligned
for arg in "$#"
do
len2=${#arg}
maxlen=$((len1 > len2 ? len1 : len2))
prefix1=$(printf "%-${maxlen}s" "$orig")
prefix2=$(printf "%-${maxlen}s" "$arg")
diff --old-line-format="$prefix1:-%L" \
--new-line-format="$prefix2:+%L" \
--unchanged-line-format="" $orig $arg
echo "---" # not necessary, but helps visual separation
done
Improved above script by adding more flavors like group-formats and also printed file names as top header, which is the basic need, especially when we run diff on several sub-directories recursively. Also added diff -rwbit to ignore white spaces etc. Please remove this option -wbit if you don't need. Keep -r which is harmless.
I use this quite a lot with git as follows: git difftool -v -y -x mydiff
+ cat mydiff
#!/usr/bin/bash
# the first argument is the original file that others are compared with
orig=$1
len1=${#1}
shift
# we compute the length of the filenames to ensure they are aligned
for arg in "$#"
do
len2=${#arg}
maxlen=$((len1 > len2 ? len1 : len2))
prefix1=$(printf "%-${maxlen}s" "$orig")
prefix2=$(printf "%-${maxlen}s" "$arg")
echo -e "\nmydiff $orig $arg =========================\n"
diff -rwbit \
--old-line-format="$prefix1:-%L" \
--new-line-format="$prefix2:+%L" \
--old-group-format='%df%(f=l?:,%dl)d%dE
%<' \
--new-group-format='%dea%dF%(F=L?:,%dL)
%>' \
--changed-group-format='%df%(f=l?:,%dl)c%dF%(F=L?:,%dL)
%<---
%>' \
--unchanged-line-format="" $orig $arg
echo "---" # not necessary, but helps visual separation
done
+ cat -n test1
1 1st line in test1
2 2nd line in test1
3 3rd line in test1
4 4th line in test1
5 6th line in test1
+ cat -n test2
1 1st line in test1
2 2nd line in test2 changed
3 3rd line added in test2
4 4th line in test1
+ mydiff test1 test2
mydiff test1 test2 =========================
2,3c2,3
test1:-2nd line in test1
test1:-3rd line in test1
---
test2:+2nd line in test2 changed
test2:+3rd line added in test2
5d4
test1:-6th line in test1
---

shell script runs out of memory

I have written the following random-number generator shell script:
for i in $(seq 1 $1) #for as many times, as the first argument ($1) defines...
do
echo "$i $((RANDOM%$2))" #print the current iteration number and a random number in [0, $2)
done
I run it like that:
./generator.sh 1000000000 101 > data.txt
to generate 1B rows of an id and a random number in [0,100] and store this data in file data.txt.
My desired output is:
1 39
2 95
3 61
4 27
5 85
6 44
7 49
8 75
9 52
10 66
...
It works fine for small number of rows, but with 1B, I get the following OOM error:
./generator.sh: xrealloc: ../bash/subst.c:5179: cannot allocate 18446744071562067968 bytes (4299137024 bytes allocated)
Which part of my program creates the error?
How could I write the data.txt file line-by-line?
I have tried replacing the echo line with:
echo "$i $((RANDOM%$2))" >> $3
where $3 is data.txt, but I see no difference.
The problem is your for loop:
for i in $(seq 1 $1)
This will first expand $(seq 1 $1), creating a very big list, which you then pass to for.
Using while, however, we can read the output of seq line-by-line, which will take a small amount of memory:
seq 1 1000000000 | while read i; do
echo $i
done
$(seq 1 $1) is computing the whole list before iterating over it. So it takes memory to store the entire list of 10^9 numbers, which is a lot.
I am not sure if you can make seq run lazily, i.e, get the next number only when needed. You can do a simple for loop instead:
for ((i=0; i<$1;++i))
do
echo "$i $((RANDOM%$2))"
done
If you want it fast this should work.
You will need to compile it using g++ using the form
g++ -o <executable> <C++file>
For example i did it this way
g++ -o inseq.exe CTest.cpp
CTest.cpp
#include <iostream>
#include <string>
#include <fstream>
#include <iomanip>
#include <cstdlib>
#include <sstream>
int main (int argc,char *argv[])
{
std::stringstream ss;
int x = atoi(argv[1]);
for(int i=1;i<=x;i++)
{
ss << i << "\n";
if(i%10000==0)
{
std::cout << ss.rdbuf();
ss.clear();
ss.str(std::string());
}
}
std::cout << ss.rdbuf();
ss.clear();
ss.str(std::string());
}
Speed comparisons
Lowest speeds of 3 tests for each of the methods presented for a 1000000 line file.
Jidder
$ time ./inseq 1000000 > file
real 0m0.143s
user 0m0.131s
sys 0m0.011s
Carpetsmoker
$ cat Carpet.sh
#!/bin/bash
seq 1 $1 | while read i; do
echo $i
done
.
$ time ./Carpet.sh 1000000 > file
real 0m12.223s
user 0m9.753s
sys 0m2.140s
Hari Shankar
$ cat Hari.sh
#!/bin/bash
for ((i=0; i<$1;++i))
do
echo "$i $((RANDOM%$2))"
done
.
$ time ./Hari.sh 1000000 > file
real 0m9.729s
user 0m8.084s
sys 0m1.064s
As you can see from the results, my way is slightly faster by about 60-70*.
Edit
Because python is great
$ cat Py.sh
#!/usr/bin/python
for x in xrange(1, 1000000):
print (x)
'
$ time ./Py.sh >file
real 0m0.543s
user 0m0.499s
sys 0m0.016s
4* slower than c++ so if the file was going to take an hour to make it would take 4 with these two lines.
EDIT 2
Decided to try Python and c++ on the 1000000000 line file
For a none CPU-intensive task this seems to be using a lottt of cpu
PID USER %CPU TIME+ COMMAND
56056 me 96 2:51.43 Py.sh
Results for Python
real 9m37.133s
user 8m53.550s
sys 0m8.348s
Results for c++
real 3m9.047s
user 2m53.400s
sys 0m2.842s

Bash script - search for files whose name matches a pattern

I am trying to compile a simple bash script. It should search for files whose name matches a supplied pattern (pattern is supplied as an argument) and list a few first lines of the file. All the files will be in one directory.
I know I should use head -n 3 for listing the first few lines of the file, but I have no idea how to search for that supplied pattern and how to put it together.
Thank you very much for all the answers.
No need really, the shell will do patterns for you:
head -3 *.c
==> it.c <==
#include<stdio.h>
int main()
{
==> sem.c <==
#include <stdio.h> /* printf() */
#include <stdlib.h> /* exit(), malloc(), free() */
#include <sys/types.h> /* key_t, sem_t, pid_t */
==> usbtest.c <==
Another example:
head -3 file[0-9]
==> file1 <==
file1 line 1
file1 line 2
file1 line 3
==> file2 <==
file2 line 1
file2 line 2
file2 line 3
==> file9 <==
file9 line 1
file9 line 2
file9 line 3
Bash has a globstar option that when set will enable you to use ** to search subdirectories:
head -3 **/mypattern*.txt
To set globstar you can add the following to your .bashrc:
shopt -s globstar
find . -type f -name 'mypattern*.txt' -exec head -n 3 {} \;
Add a -maxdepth 0 before the -exec if you do not want to descend into subdirectories.

How extract strings between < and > with sed or awk

I want extract all header files between < and >. I have a file called configure.ac from a git-repository. I want know which header files are present in this file. I would like to generate a list file with only header files. Example:
# _NL_MEASUREMENT_MEASUREMENT is an enum and not a define
AC_MSG_CHECKING([for _NL_MEASUREMENT_MEASUREMENT])
AC_LINK_IFELSE(
[AC_LANG_PROGRAM(
[[#include <langinfo.h>]],
[[char c = *((unsigned char *) nl_langinfo(_NL_MEASUREMENT_MEASUREMENT));]])],
[nl_ok=yes],
[nl_ok=no])
AC_MSG_RESULT($nl_ok)
if test "$nl_ok" = "yes"; then
AC_DEFINE(HAVE__NL_MEASUREMENT_MEASUREMENT, 1,
[Define to 1 if _NL_MEASUREMENT_MEASUREMENT is available])
fi
if test "$ac_cv_header_sys_shm_h" = "yes"; then
AC_MSG_CHECKING(whether shmctl IPC_RMID allowes subsequent attaches)
AC_RUN_IFELSE(
[AC_LANG_SOURCE([[
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
int main()
{
int id;
char *shmaddr;
id = shmget (IPC_PRIVATE, 4, IPC_CREAT | 0600);
if (id == -1)
exit (2);
shmaddr = shmat (id, 0, 0);
shmctl (id, IPC_RMID, 0);
if ((char*) shmat (id, 0, 0) == (char*) -1)
{
shmdt (shmaddr);
exit (1);
}
shmdt (shmaddr);
shmdt (shmaddr);
exit (0);
}
]])],
[AC_DEFINE([IPC_RMID_DEFERRED_RELEASE],[1],
[Define to 1 if shared memory segments are released deferred.])
AC_MSG_RESULT(yes)],
[AC_MSG_RESULT(no)],
[AC_MSG_RESULT(assuming no)])
AC_DEFINE(USE_SYSV_SHM, 1, [Define to 1 to use SYSV shared memory])
else
shmtype=none
fi
The output file must contain:
langinfo.h
types.h
ipc.h
shm.h
I tried:
echo "#include <stdio.h>" | sed -n 's/.*<\(.*\)\>.*/\1/p'
---> stdio.h
cat configure.ac | sed -n 's/.*<\(.*\)\>.*/\1/p' | sort -u > list.txt
---> It doesn't work
I can't find the error.
It depends a bit on your version of sed. On Mac OS X 10.9.1 Mavericks (BSD sed), this works:
$ sed -n 's/.*\<\(.*\)\>.*/\1/p' data
langinfo.h
sys/types.h
sys/ipc.h
sys/shm.h
$
(where data is the fragment of configure.ac you quote in the question). OTOH, GNU sed (version 4.2.2) gives (with the ... being elided lines):
$ /usr/gnu/bin/sed -n 's/.*\<\(.*\)\>.*/\1/p' data
a
_NL_MEASUREMENT_MEASUREMENT
AC_LINK_IFELSE
AC_LANG_PROGRAM
h
_NL_MEASUREMENT_MEASUREMENT
yes
...
AC_LANG_SOURCE
h
h
h
main
id
shmaddr
...
else
shmtype
fi
$
Change the regex to:
$ /usr/gnu/bin/sed -n 's/.*<\(.*\)>.*/\1/p' data
langinfo.h
sys/types.h
sys/ipc.h
sys/shm.h
$
and the same output with BSD sed.
Moral: by default, the angle brackets <> are not metacharacters and do not need backslash escaping.
When they are escaped, they have a specific meaning (end of word or thereabouts).
perl -lne 'print $1 if(/\<(.*?)\>/)' your_file

Error when concatenate in macro gcc preprocessor

I'm getting an error when I try to use ## in macro this is what I try to make:
With this defines:
#define PORT 2
#define PIN 3
I want that preprocessor generates:
PM2.3=1
when I call a macro like this:
SetPort(PORT,PIN)
Then, I see that I can make the substitution PORT and PIN at the same time that concatenation, then I think I must to use 2 defines:
#define SetP2(PORT,PIN) PM##PORT.PIN = 1
#define SetPort(PORT,PIN) SetP2(PORT,PIN)
but I get an error on:
#define PIN 3 --> expected identifier before numeric constant
and a warning on:
SetPort(PORT,PIN) --> Syntax error
Any idea?
This works for me:
$ cat portpin.c
#define PORT 2
#define PIN 3
#define SetP2(prefix,prt) prefix ## prt
#define SetPort(prt,pn) SetP2(PM,prt).pn = 1
SetPort(PORT,PIN)
$ gcc -E portpin.c
# 1 "portpin.c"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "portpin.c"
PM2. 3 = 1
$
I don't know how important it is for there to be no space between the . and the 3, but the preprocessor seems to want to insert it.
UPDATE:
Actually I tried your original code, and it seems to produce the same result, so my answer above is probably not much use to you.
UPDATE 2:
It turns out the OP is expecting the pre-processor to generate PM2.no3=1 and not PM2.3=1. This can easily be done as follows:
$ cat portpin.c
#define PORT 2
#define PIN 3
#define SetP2(PORT,PIN) PM##PORT.no##PIN=1
#define SetPort(PORT,PIN) SetP2(PORT,PIN)
SetPort(PORT,PIN)
$ gcc -E portpin.c
# 1 "portpin.c"
# 1 "<built-in>"
# 1 "<command line>"
# 1 "portpin.c"
PM2.no3=1
$

Resources