I have a piped unix script which finally yields a line number to me in the subject file.
Now,I need to print out the file contents from this particular line to the end.
Is it possible to feed the line number to sed via xargs,for sed to print out the desired.
.....|tail -1 | cut -f 1 | xargs sed ...?
Is this possible?
..... | tail -1 | cut -f 1 | xargs -i sed -n '{},$p' your_file
Related
The follwoing script.sh compares part of a string (coming from stdin by cating a csv file) to a defined string and reports the differences in a certain format
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done < "${1:-/dev/stdin}"
It is intendet to be executed on a number of rows from a very large file in the format
XYZ,ABMDEFG
and it works well when i use it in a pipe:
cat large_file | ./find_something.sh
However, when I try to use it with parallel, i get this error:
$ cat large_file | parallel ./find_something.sh
./find_something.sh: line 9: XYZ, ABMDEFG : No such file or directory
What is causing this? Is parallel supposed to work for something like this, if I want to redirect the output to a single file afterwards?
Less important side note: I'm rather proud of my string comparison method, but if someone has a faster way to get from comparing ABCDEFG and XYZ,ABMDEFG to obtain XYZ,C3M I'd be happy to hear that, too.
Edit:
I should have said, I also want to preserve the order of each line in the output, corresponding to the input. Is that possible using parallel?
Your script accepts its input from a file (defaulting to stdin), whereas parallel will pass input as arguments, not via stdin. In that sense, parallel is closer to xargs.
Presumably, you want each of the lines in large_file to be processed as a unit, possibly in parallel.
That means you need your script to only process one such line at a time, and let parallel call your script many times, once for each line.
So your script should look like this:
#!/usr/bin/env bash
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
line="$1"
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
Then you can redirect to a file as follows:
cat large_file | parallel ./find_something.sh > output_file
-k keeps the order.
#!/usr/bin/env bash
doit() {
reference="ABCDEFG"
ref_transp=$(echo "$reference" | sed -e 's/\(.\)/\1\n/g')
while read line; do
line_transp=$(echo "$line" | cut -d',' -f2 | sed -e 's/\(.\)/\1\n/g')
output=$(paste -d ' ' <(echo "$ref_transp") <(echo "$line_transp") | grep -vnP '([A-Z]) \1' | sed -E 's/([0-9][0-9]*):([A-Z]) ([A-Z]*)/\2\1\3/' | grep '^[A-Z][0-9][0-9]*[A-Z*]$')
echo "$(echo ${line:0:35}, $output)"
done
}
export -f doit
cat large_file | parallel --pipe -k doit
#or
parallel --pipepart -a large_file --block -10 -k doit
I have this Command :
cat -n file.log | grep "Start new test" | tail -1 | cut -f 1 | xargs -I % sed -n %',$s/is not alive/&/p' file.log
That gives the output of the whole line :
Jan 19 23:20:33 s_localhost#file platMgt.xbin[3260]: blade 10 is not alive
Jan 19 23:20:33 s_localhost#file platMgt.xbin[3260]: blade 11 is not alive
how can I modify it to get the last part only :
blade 11 is not alive
can I modify that in a way to display :
Error:blade 11 is not alive ?
Thank you for your response
You can use cut to delimit it on the colons and then add the error message:
cat -n file.log | grep "Start new test" | tail -1 | cut -f 1 | xargs -I % sed -n %',$s/is not alive/&/p' file.log | cut -d: -f 4 | xargs -I % echo Error: %
To get the last part after colon awk is better tool:
s='Jan 19 23:20:33 s_localhost#file platMgt.xbin[3260]: blade 10 is not alive'
awk -F':' '{print "Error:" $NF}' <<< "$s"
OUTPUT:
blade 10 is not alive
EDIT: WIth your piped commands you can combine it as:
grep "Start new test" file.log|tail -1|awk -F':' '{print "Error:" $NF}'
PS: Though this whole thing is possible in awk itself.
The following obtains the the last ":" separated field with a sed command,
cat text.txt | sed 's/^.*: \([^:]*$\)/\1/g'
Currently, I have a command in a bash script that greps for a given string in a text file and prints the line numbers only using sed ...
grep -n "<string>" file.txt | sed -n 's/^\([0-9]*\).*/\1/p'
The grep could find multiple matches, and thus, print multiple line numbers. From this command's output, I would like to extract the minimum and maximum values, and assign those to respective bash variables. How could I best modify my existing command or add new commands to accomplish this? If using awk or sed will be necessary, I have a preference of using sed. Thanks!
You can get the minimum and maximum with this:
grep -n "<string>" input | sed -n -e 's/^\([0-9]*\).*/\1/' -e '1p;$p'
You can also read them into an array:
F=($(grep -n "<string>" input | sed -n -e 's/^\([0-9]*\).*/\1/' -e '1p;$p'))
echo ${F[0]} # min
echo ${F[1]} # max
grep -n "<string>" file.txt | sed -n -e '1s/^\([0-9]*\).*/\1/p' -e '$s/^\([0-9]*\).*/\1/p'
grep .... |awk -F: '!f{print $1;f=1} END{print $1}'
Here's how I'd do it, since grep -n 'pattern' file prints output in the format line number:line contents ...
minval=$(grep -n '<string>' input | cut -d':' -f1 | sort -n | head -1)
maxval=$(grep -n '<string>' input | cut -d':' -f1 | sort -n | tail -1)
the cut -d':' -f1 command splits the grep output around the colon and pulls out only the first field (the line numbers), sort -n sorts the numeric line numbers in ascending order (which they would already be in, but it's good practice to ensure it), then head -1 and tail -1 remove the first, and last value in the sorted list respectively, i.e. the minimum and maximum values and assign them to variables $minval and $maxval respectively.
Hope this helps!
Edit: Turns out you can't do it the way I had it originally, since echoing out a list of newline-separated values apparently concatenates them into one line.
It can be done with one process. Like this:
awk '/expression/{if(!n)print NR;n=NR} END {print n}' file.txt
Then You can assign to an array (as perreal suggested). Or You can modify this script and assign to varables using eval
eval $(awk '/expression/{if(!n)print "A="NR;n=NR} END {print "B="n}' file.txt)
echo $A
echo $B
Output (file.txt contains three lines of expression)
1
3
I have an application (myapp) that gives me a multiline output
result:
abc|myparam1|def
ghi|myparam2|jkl
mno|myparam3|pqr
stu|myparam4|vwx
With grep and sed I can get my parameters as below
myapp | grep '|' | sed -e 's/^[^|]*//' | sed -e 's/|.*//'
But then want these myparamx values as paramaters of a script to be executed for each parameter.
myscript.sh myparam1
myscript.sh myparam2
etc.
Any help greatly appreciated
Please see xargs. For example:
myapp | grep '|' | sed -e 's/^[^|]*//' | sed -e 's/|.*//' | xargs -n 1 myscript.sh
May be this can help -
myapp | awk -F"|" '{ print $2 }' | while read -r line; do /path/to/script/ "$line"; done
I like the xargs -n 1 solution from Dark Falcon, and while read is a classical tool for such kind of things, but just for completeness:
myapp | awk -F'|' '{print "myscript.sh", $2}' | bash
As a side note, speaking about extraction of 2nd field, you could use cut:
myapp | cut -d'|' -f 1 # -f 1 => second field, starting from 0
I have a comma delimited file "myfile.csv" where the 5th column is a date/time stamp. (mm/dd/yyyy hh:mm). I need to list all the rows that contain duplicate dates (there are lots)
I'm using a bash shell via cygwin for WinXP
$ cut -d, -f 5 myfile.csv | sort | uniq -d
correctly returns a list of the duplicate dates
01/01/2005 00:22
01/01/2005 00:37
[snip]
02/29/2009 23:54
But I cannot figure out how to feed this to grep to give me all the rows.
Obviously, I can't use xargs straight up since the output contains spaces. I thought I could do uniq -z -d but for some reason, combining those flags causes uniq to (apparently) return nothing.
So, given that
$ cut -d, -f 5 myfile.csv | sort | uniq -d -z | xargs -0 -I {} grep '{}' myfile.csv
doesn't work... what can I do?
I know that I could do this in perl or another scripting language... but my stubborn nature insists that I should be able to do it in bash using standard commandline tools like sort, uniq, find, grep, cut, etc.
Teach me, oh bash gurus. How can I get the list of rows I need using typical cli tools?
sort -k5,5 will do the sort on fields and avoid the cut;
uniq -f 4 will ignore the first 4 fields for the uniq;
Plus a -D on the uniq will get you all of the repeated lines (vs -d, which gets you just one);
but uniq will expect tab-delimited instead of csv, so tr '\t' ',' to fix that.
Problem is if you have fields after #5 that are different. Are your dates all the same length? You might be able to add a -w 16 (to include time), or -w 10 (for just dates), to the uniq.
So:
tr '\t' ',' < myfile.csv | sort -k5,5 | uniq -f 4 -D -w 16
The -z option of uniq needs the input to be NUL separated. You can filter the output of cut through:
tr '\n' '\000'
To get zero separated rows. Then sort, uniq and xargs have options to handle that. Try something like:
cut -d, -f 5 myfile.csv | tr '\n' '\000' | sort -z | uniq -d -z | xargs -0 -I {} grep '{}' myfile.csv
Edit: the position of tr in the pipe was wrong.
You can tell xargs to use each line as an argument in its entirety using the -d option. Try:
cut -d, -f 5 myfile.csv | sort | uniq -d | xargs -d '\n' -I '{}' grep '{}' myfile.csv
This is a good candidate for awk:
BEGIN { FS="," }
{ split($5,A," "); date[A[0]] = date[A[0]] " " NR }
END { for (i in date) print i ":" date[i] }
Set field seperator to ',' (CSV).
Split fifth field on the space, stick result in A.
Concatenate the line number to the list of what we have already stored for that date.
Print out the line numbers for each date.
Try escaping the spaces with sed:
echo 01/01/2005 00:37 | sed 's/ /\\ /g'
cut -d, -f 5 myfile.csv | sort | uniq -d | sed 's/ /\\ /g' | xargs -I '{}' grep '{}' myfile.csv
(Yet another way would be to read the duplicate date lines into an IFS=$'\n' array and iterate over it in a for loop.)