I have the following text file:
File: Test.txt
Number of Cars:
10
Number of Bikes:
20
Number of Cycles:
10
Note: Now i want to get number of bikes next line that is 20 from the file.
My Try: 1
sed -n '/Number of Bikes:/{N;p;}' Test.txt
Output:
Number of Bikes:
20
My Try: 2
awk '/Number of Bikes/{_=2}_&&_--' Test.txt
Output:
Number of Bikes:
20
Expected Output:
20
If you only need to search for Bikes you can grep the line and include the following line, pipe it to tail and get the last line, and remove any leading blank space.
grep -A 1 "^Number of Bikes:$" file1|tail -1|sed -e 's/ *//'
Here's a way you could do it using awk:
awk 'f && f-- { print $1 } /Number of Bikes:/ { f = 1 }' file
The flag f is set when the heading is matched. The first field of the next line is printed, because this is the only line where f is true. f-- means that f will be set back to 0 for all subsequent lines.
Perhaps slightly better in this case is to simply exit after printing one line:
awk 'f { print $1; exit } /Number of Bikes:/ { f = 1 }' file
Related
I have multiple files in the following format. This one has 3 sequences (number of sequences vary in all files, but always end in ".") with 40 positions each, as indicated by the numbers in the first line. From the beginning of the lines (except the first one) there are the names of the sequences:
3 40
00076284. ATGTCTGTGG TTCTTTAACC
00892634. TTGTCTGAGG TTCGTAAACC
00055673. TTGTCTGAGG TCCGTGAACC
GCCGGGAACA TCCGCAAAAA
ACCGTGAAAC GGGGTGAACT
TCCCCCGAAC TCCCTGAACG
I need to convert it to this format, where the sequences are continuous, with no spaces nor \n, and on a new line after their names.The only spaces that should remain are between the two numbers in the first line.
3 40
00076284.
ATGTCTGTGGTTCTTTAACCGCCGGGAACATCCGCAAAAA
00892634.
TTGTCTGAGGTTCGTAAACCACCGTGAAACGGGGTGAACT
00055673.
TTGTCTGAGGTCCGTGAACCTCCCCCGAACTCCCTGAACG
Tried sed to delete spaces and \n's but don't know how to apply it after the first line and how to avoid making one huge line.
Thanks
Here's a shell script that may provide what you need:
head -1 input
awk '
NR == 1 { sequences = $1 ; positions = $2 ; next }
{
if ( $1 ~ /^[0-9]/ ) {
sid = $1 ; $1 = "" ; sequence_name[ NR - 1 ] = sid
sequence[ NR - 1 ] = $0
} else {
sequence[ ( NR - 1 ) % ( sequences + 1 ) ] = sequence[ (NR-1) % ( sequences + 1 ) ] " " $0
}
}
END {
for ( x = 1 ; x <= length( sequence_name ) ; x++ )
{
print sequence_name[x]
print sequence[x]
}
}' input | tr -d ' '
I added head -1 to the top of the shell just to get the first line out of your file. I couldn't output the first line within the awk script because of the pipe to tr -d ' '.
I think this should work, but my output is longer since if I actually concat all the last "orphan" sequences I get a way longer line.
cat input.txt | awk '/^[0-9]+ [0-9]+$/{printf("%s\n",$0); next} /[0-9]+[.]/{ printf("\n%s\n",$1);for(i=2; i<=NF;i++){printf("%s",$i)}; next} /^ */{ for(i=1; i<=NF;i++){printf("%s",$i)}; next;}'
3 40
Please try and let me know.
Remember the position of empty line and merge the lines before empty line with those after:
awk '
NR==1{print;next}
NR!=1 && !empty{arr[NR]=$1 "\n" $2 $3}
/^$/{empty=NR-1;next}
NR!=1 && empty{printf "%s%s%s\n", arr[NR-empty], $1, $2}
' file
My second solution without awk: Merge the file with itself using empty line as separator
cat >file <<EOF
3 40
00076284. ATGTCTGTGG TTCTTTAACC
00892634. TTGTCTGAGG TTCGTAAACC
00055673. TTGTCTGAGG TCCGTGAACC
GCCGGGAACA TCCGCAAAAA
ACCGTGAAAC GGGGTGAACT
TCCCCCGAAC TCCCTGAACG
EOF
head -n1 file
paste <(sed -n '1!{ /^$/q;p; }' file) <(sed -n '1!{ /^$/,//{/^$/!p}; }' file) |
sed 's/[[:space:]]//g; s/\./.\n/'
Will output:
3 40
00076284.
ATGTCTGTGGTTCTTTAACCGCCGGGAACATCCGCAAAAA
00892634.
TTGTCTGAGGTTCGTAAACCACCGTGAAACGGGGTGAACT
00055673.
TTGTCTGAGGTCCGTGAACCTCCCCCGAACTCCCTGAACG
:
head -n1 file output first line
sed -n '1!{ /^$/q;p; }' file
1! - don't output first line
/^$/q - quit when empty line
p print everything else
sed -n '1!{ /^$/,//{/^$/!p}; }' file
1! - ignore first line
/^$/,// - from empty line until the end
/^$/!p - output if not an empty tline
paste <(..) <(...) - merge the two seds with a tab
sed 's/[[:space:]]//g; s/\./.\n/
s/[[:space:]]//g; remove all spaces
s/\./.\n/ replace a comma with a comma and a newline.
Let's say I have a standard 80 columns terminal, execute command with long line output (i.e. stdout from ls) that splits into two or more lines, and want to indent the continuation line of all my bash stdout.
Indent should be configurable, 1 or 2 or 3 or whatever spaces.
from this
lrwxrwxrwx 1 root root 24 Feb 19 1970 sdcard -> /storage/emula
ted/legacy/
to this
lrwxrwxrwx 1 root root 24 Feb 19 1970 sdcard -> /storage/emula
ted/legacy/
Read this Indenting multi-line output in a shell script so I tried to pipe | sed 's/^/ /' but gives me the exact opposite, indents the first lines and not the continuation.
Ideally I would put a script in profile.rc or whatever so every time I open an interactive shell and execute any command long output gets indented .
I'd use awk for this.
awk -v width="$COLUMNS" -v spaces=4 '
BEGIN {
pad = sprintf("%*s", spaces, "") # generate `spaces` spaces
}
NF { # if current line is not empty
while (length > width) { # while length of current line is greater than `width`
print substr($0, 1, width) # print `width` chars from the beginning of it
$0 = pad substr($0, width + 1) # and leave `pad` + remaining chars
}
if ($0 != "") # if there are remaining chars
print # print them too
next
} 1' file
In one line:
awk -v w="$COLUMNS" -v s=4 'BEGIN{p=sprintf("%*s",s,"")} NF{while(length>w){print substr($0,1,w);$0=p substr($0,w+1)} if($0!="") print;next} 1'
As #Mark suggested in comments, you can put this in a function and add it to .bashrc for ease of use.
function wrap() {
awk -v w="$COLUMNS" -v s=4 'BEGIN{p=sprintf("%*s",s,"")} NF{while(length>w){print substr($0,1,w);$0=p substr($0,w+1)} if($0!="") print;next} 1'
}
Usage:
ls -l | wrap
Edit by Ed Morton per request:
Very similar to oguzismails script above but should work with Busybox or any other awk:
$ cat tst.awk
BEGIN { pad = sprintf("%" spaces "s","") }
{
while ( length($0) > width ) {
printf "%s", substr($0,1,width)
$0 = substr($0,width+1)
if ( $0 != "" ) {
print ""
$0 = pad $0
}
}
print
}
$
$ echo '123456789012345678901234567890' | awk -v spaces=3 -v width=30 -f tst.awk
123456789012345678901234567890
$ echo '123456789012345678901234567890' | awk -v spaces=3 -v width=15 -f tst.awk
123456789012345
678901234567
890
$ echo '' | awk -v spaces=3 -v width=15 -f tst.awk
$
That first test case is to show that you don't get a blank line printed after a full-width input line and the third is to show that it doesn't delete blank rows. Normally I'd have used sprintf("%*s",spaces,"") to create the pad string but I see in a comment that that doesn't work in the apparently non-POSIX awk you're using.
This might work for you (GNU sed):
sed 's/./&\n /80;P;D' file
This splits lines into length 80 and indents the following line(s) by 2 spaces.
Or if you prefer:
s=' ' c=80
sed "s/./&\n$s/$c;P;D" file
To prevent empty lines being printed, use:
sed 's/./&\n/80;s/\n$//;s/\n /;P;D' file
or more easily:
sed 's/./\n &/81;P;D' file
One possible solution using pure bash string manipulation.
You can make the script to read stdin and format whatever it read.
MAX_LEN=5 # length of the first/longest line
IND_LEN=2 # number of indentation spaces
short_len="$((MAX_LEN-IND_LEN))"
while read line; do
printf "${line:0:$MAX_LEN}\n"
for i in $(seq $MAX_LEN $short_len ${#line}); do
printf "%*s${line:$i:$short_len}\n" $IND_LEN
done
done
Usage: (assuming the script is saved as indent.sh)
$ echo '0123456789' | ./indent.sh
01234
567
89
This is my sample file.
I want to do this.
I have fixed requirement to delete 2nd and 3rd line keeping the 1st line.
From the bottom, I want to delete above 2 lines excluding last line, as I wouldn't know what my last line number is as it depends on file.
Once I delete my 2nd and 3rd line 4th line should ideally come at 2nd and so on, same for a bottom after delete.
I want to use head/tail command and modify the existing file only. as Changes to write back to the same file.
Sample file text format.
Input File
> This is First Line
> Delete Delete Delete This Line
> Delete Delete Delete This Line
> ..
> ..
> ..
> ..
> Delete Delete Delete This Line
> Delete Delete Delete This Line
> This is Last Line, should not be deleted It could be come at any line
number (variable)
Output file (same file modified)
This is First Line
..
..
..
..
This is Last Line, should not be deleted It could be come at any line number (variable)
Edit - Because of compatibility issues on Unix (Using HP Unix on ksh shell) I want to implement this using head/tail/awk. not sed.
Adding solution as per OP's request to make it genuine solution.
Approach: In this solution OP could provide lines from starting point and from ending point of any Input_file and those lines will be skipped.
What code will do: I have written code in that way it will generate an awk code as per your given lines to be skipped then and will run it too.
cat print_lines.ksh
start_line="2,3"
end_line="2,3"
total_lines=$(wc -l<Input_file)
awk -v len="$total_lines" -v OFS="||" -v s1="'" -v start="$start_line" -v end="$end_line" -v lines=$(wc -l <Input_file) '
BEGIN{
num_start=split(start, a,",");
num_end=split(end, b,",");
for(i=1;i<=num_start;i++){
val=val?val OFS "FNR=="a[i]:"FNR=="a[i]};
for(j=1;j<=num_end;j++){
b[j]=b[j]>1?len-(b[j]-1):b[j];
val=val?val OFS "FNR=="b[j]:"FNR=="b[j]};
print "awk " s1 val "{next} 1" s1" Input_file"}
' | sh
Change Input_file name to your actual file name and let me know how it goes then.
Following awk may help you in same(Since I don't have Hp system so didn't test it).
awk -v lines=$(wc -l <Input_file) 'FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){next} 1' Input_file
EDIT: Adding non-one liner form of solution too now.
awk -v lines=$(wc -l <Input_file) '
FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){
next}
1
' Input_file
wc + sed solution:
len=$(wc -l inpfile | cut -d' ' -f1)
sed "$(echo "$((len-2)),$((len-1))")d; 2,3d" inpfile > tmp_f && mv tmp_f inpfile
$ cat inputfile
> This is First Line
> ..
> ..
> ..
> ..
> This is Last Line, should not be deleted It could be come at any line
Perl suggestion... read whole file into array #L, get index of last line. Delete 2nd last, 3rd last, 3rd and 2nd line. Print what's left.
perl -e '#L=<>; $p=$#L; delete $L[$p-1]; delete $L[$p-2]; delete $L[2]; delete $L[1]; print #L' file.txt
Or, maybe a little more succinctly with splice:
perl -e '#L=<>; splice #L,1,2; splice #L,$#L-2,2; print #L' file.txt
If you wish to have some flexibility a ksh script approach may work, though little expensive in terms of resources :
#!/bin/ksh
[ -f "$1" ] || echo "Input is not a file" || exit 1
total=$(wc -l "$1" | cut -d' ' -f1 )
echo "How many lines to delete at the end?"
read no
[ -z "$no" ] && echo "Not sure how many lines to delete, aborting" && exit 1
sed "2,3d;$((total-no)),$((total-1))d" "$1" >tempfile && mv tempfile "$1"
And feed the file as argument to the script.
Notes
This deletes second and third lines.
Plus no number of lines from last excluding last as read from user.
Note: My ksh version is 93u+ 2012-08-01
awk '{printf "%d\t%s\n", NR, $0}' < file | sed '2,3d;N;$!P;D' file
The awk here serves the purpose of providing line numbers and then passing the output to the sed which uses the line numbers to do the required operations.
%d : Used to print the numbers. You can also use '%i'
'\t' : used to place a tab between the number and string
%s : to print the string of charaters
'\n' : To create a new line
NR : to print lines numbers starting from 1
For sed
N: Read/append the next line of input into the pattern space.
$! : is for not deleting the last line
D : This is used when pattern space contains no new lines normal and start a new cycle as if the d command was issued. Otherwise, delete text in the pattern space up to the specified lines, and restart cycle with the resultant pattern space, without reading a new line of input.
P : Print up to the first embedded newline of the current pattern space.This
prints the lines after removing the subjected lines.
I enjoyed this task and wrote awk script for more scaleable case (huge files).
Reading/scanning the input file once (no need to know line count), not storing the whole file in memory.
script.awk
BEGIN { range = 3} # define sliding window range
{lines[NR] = $0} # capture each line in array
NR == 1 {print} # print 1st line
NR > range * 2{ # for lines in sliding window range bottom
print lines[NR - range]; # print sliding window top line
delete lines[NR - range]; # delete sliding window top line
}
END {print} # print last line
running:
awk -f script.awk input.txt
input.txt
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
output:
line 1
line 4
line 5
line 6
line 7
line 10
I have a file and I want to extract specific lines from that file like lines 2, 10, 15,21, .... and so on. There are around 200 thousand lines to be extracted from the file. How can I do it efficiently in bash
Maybe looking for:
sed -n -e 1p -e 4p afile
Put the linenumbers of the lines you want in a file called "wanted", like this:
2
10
15
21
Then run this script:
#!/bin/bash
while read w
do
sed -n ${w}p yourfile
done < wanted
TOTALLY ALTERNATIVE METHOD
Or you could let "awk" do it all for you, like this which is probably miles faster since you won't have to create 200,000 sed processes:
awk 'FNR==NR{a[$1]=1;next}{if(FNR in a){print;}}' wanted yourfile
The FNR==NR portion detects when awk is reading the file called "wanted" and if so, it sets element "$1" of array "a" to "1" so we know that this line number is wanted. The stuff in the second set of curly braces is active when processing your bigger file only and it prints the current line if its linenumber is in the array "a" we created when reading the "wanted" file.
$ gawk 'ARGIND==1 { L[$0]++ }; ARGIND==2 && FNR in L' lines file > file.lines
Wanted line numbers have to be stored in lines delimited by newline and they may safely be in random order. It almost exactly the same as #Mark Setchell’s second method, but uses a little more clear way to determine which file is current. Although this ARGIND is GNU extension, so gawk. If you are limited to original AWK or mawk, you can write it as:
$ awk 'FILENAME==ARGV[1] { L[$0]++ }; FILENAME==ARGV[2] && FNR in L' lines file > file.lines
Efficiency test:
$ awk 'BEGIN { for (i=1; i<=1000000; i++) print i }' > file
$ shuf -i 1-1000000 -n 200000 > lines
$ time gawk 'ARGIND==1 { L[$0]++ }; ARGIND==2 && FNR in L' lines file > file.lines
real 0m1.734s
user 0m1.460s
sys 0m0.052s
UPD:
As #Costi Ciudatu pointed out, there is room for impovement for the case when all wanted lines are in the head of a file.
#!/usr/bin/gawk -f
ARGIND==1 { L[$0]++ }
ENDFILE { L_COUNT = FNR }
ARGIND==2 && FNR in L { L_PRINTED++; print }
ARGIND==2 && L_PRINTED == L_COUNT { exit 0 }
Sript interrupts when last line is printed, so now it take few milliseconds to filter out 2000 random lines from first 1 % of a one million lines file.
$ time ./getlines.awk lines file > file.lines
real 0m0.016s
user 0m0.012s
sys 0m0.000s
While reading a whole file still takes about a second.
$ time gawk 'ARGIND==1 { L[$0]++ }; ARGIND==2 && FNR in L' lines file > file.lines
real 0m0.780s
user 0m0.756s
sys 0m0.016s
Provided your system supports sed -f - (i.e. for sed to read its script on standard input; it works on Linux, but not on some other platforms) you can turn the file of line numbers into a sed script, naturally using sed:
sed 's/$/p/' lines | sed -n -f - inputfile >output
If the lines you're interested in are close to the beginning of the file, you can make use of head and tail to efficiently extract specific lines.
For your example line numbers (assuming that list doesn't go on until close to 200,000), a dummy but still efficient approach to read those lines would be the following:
for n in 2 10 15 21; do
head -n $n /your/large/file | tail -1
done
sed Example
sed -n '2p' file
awk Example
awk 'NR==2' file
this will print 2nd line of file
use same logic in loop & try.
say a for loop
for VARIABLE in 2 10 15 21
do
awk "NR==$VARIABLE" file
done
Give your line numbers this way..
I've got a comma separated textfile, which contains the column headers in the first line:
column1;column2;colum3
foo;123;345
bar;345;23
baz;089;09
Now I want a short command that outputs the first line and the matching line(s). Is there a shorter way than:
head -n 1 file ; cat file | grep bar
This should do the job:
sed -n '1p;2,${/bar/p}' file
where:
1p will print the first line
2,$ will match from second line to the last line
/bar/p will print those lines that match bar
Note that this won't print the header line twice if there's a match in the columns names.
This might work for you:
cat file | awk 'NR<2;$0~v' v=baz
column1;column2;colum3
baz;089;09
Usually cat file | ... is useless but in this case it keeps the file argument out of the way and allows the variable v to be amended quickly.
Another solution:
cat file | sed -n '1p;/foo/p'
column1;column2;colum3
foo;123;345
You can use grouping commands, then pipe to column command for pretty-printing
$ { head -1; grep bar; } <input.txt | column -ts';'
column1 column2 colum3
bar 345 23
What if the first row contains bar too? Then it's printed two times with your version. awk solution:
awk 'NR == 1 { print } NR > 1 && $0 ~ "bar" { print }' FILE
If you want the search sting as the almost last item on the line:
awk 'ARGIND > 1 { exit } NR == 1 { print } NR > 1 && $0 ~ ARGV[2] { print }' FILE YOURSEARCHSTRING 2>/dev/null
sed solution:
sed -n '1p;1d;/bar/p' FILE
The advantage for both of them, that it's a single process.
head -n 1 file && grep bar file Maybe there is even a shorter version but will get a bit complicated.
EDIT: as per bobah 's comment I have added && between the commands to have only a single error for missing file
Here is the shortest command yet:
awk 'NR==1||/bar/' file