Delete range of lines when line number of known or not in unix using head and tail? - shell

This is my sample file.
I want to do this.
I have fixed requirement to delete 2nd and 3rd line keeping the 1st line.
From the bottom, I want to delete above 2 lines excluding last line, as I wouldn't know what my last line number is as it depends on file.
Once I delete my 2nd and 3rd line 4th line should ideally come at 2nd and so on, same for a bottom after delete.
I want to use head/tail command and modify the existing file only. as Changes to write back to the same file.
Sample file text format.
Input File
> This is First Line
> Delete Delete Delete This Line
> Delete Delete Delete This Line
> ..
> ..
> ..
> ..
> Delete Delete Delete This Line
> Delete Delete Delete This Line
> This is Last Line, should not be deleted It could be come at any line
number (variable)
Output file (same file modified)
This is First Line
..
..
..
..
This is Last Line, should not be deleted It could be come at any line number (variable)
Edit - Because of compatibility issues on Unix (Using HP Unix on ksh shell) I want to implement this using head/tail/awk. not sed.

Adding solution as per OP's request to make it genuine solution.
Approach: In this solution OP could provide lines from starting point and from ending point of any Input_file and those lines will be skipped.
What code will do: I have written code in that way it will generate an awk code as per your given lines to be skipped then and will run it too.
cat print_lines.ksh
start_line="2,3"
end_line="2,3"
total_lines=$(wc -l<Input_file)
awk -v len="$total_lines" -v OFS="||" -v s1="'" -v start="$start_line" -v end="$end_line" -v lines=$(wc -l <Input_file) '
BEGIN{
num_start=split(start, a,",");
num_end=split(end, b,",");
for(i=1;i<=num_start;i++){
val=val?val OFS "FNR=="a[i]:"FNR=="a[i]};
for(j=1;j<=num_end;j++){
b[j]=b[j]>1?len-(b[j]-1):b[j];
val=val?val OFS "FNR=="b[j]:"FNR=="b[j]};
print "awk " s1 val "{next} 1" s1" Input_file"}
' | sh
Change Input_file name to your actual file name and let me know how it goes then.
Following awk may help you in same(Since I don't have Hp system so didn't test it).
awk -v lines=$(wc -l <Input_file) 'FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){next} 1' Input_file
EDIT: Adding non-one liner form of solution too now.
awk -v lines=$(wc -l <Input_file) '
FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){
next}
1
' Input_file

wc + sed solution:
len=$(wc -l inpfile | cut -d' ' -f1)
sed "$(echo "$((len-2)),$((len-1))")d; 2,3d" inpfile > tmp_f && mv tmp_f inpfile
$ cat inputfile
> This is First Line
> ..
> ..
> ..
> ..
> This is Last Line, should not be deleted It could be come at any line

Perl suggestion... read whole file into array #L, get index of last line. Delete 2nd last, 3rd last, 3rd and 2nd line. Print what's left.
perl -e '#L=<>; $p=$#L; delete $L[$p-1]; delete $L[$p-2]; delete $L[2]; delete $L[1]; print #L' file.txt
Or, maybe a little more succinctly with splice:
perl -e '#L=<>; splice #L,1,2; splice #L,$#L-2,2; print #L' file.txt

If you wish to have some flexibility a ksh script approach may work, though little expensive in terms of resources :
#!/bin/ksh
[ -f "$1" ] || echo "Input is not a file" || exit 1
total=$(wc -l "$1" | cut -d' ' -f1 )
echo "How many lines to delete at the end?"
read no
[ -z "$no" ] && echo "Not sure how many lines to delete, aborting" && exit 1
sed "2,3d;$((total-no)),$((total-1))d" "$1" >tempfile && mv tempfile "$1"
And feed the file as argument to the script.
Notes
This deletes second and third lines.
Plus no number of lines from last excluding last as read from user.
Note: My ksh version is 93u+ 2012-08-01

awk '{printf "%d\t%s\n", NR, $0}' < file | sed '2,3d;N;$!P;D' file
The awk here serves the purpose of providing line numbers and then passing the output to the sed which uses the line numbers to do the required operations.
%d : Used to print the numbers. You can also use '%i'
'\t' : used to place a tab between the number and string
%s : to print the string of charaters
'\n' : To create a new line
NR : to print lines numbers starting from 1
For sed
N: Read/append the next line of input into the pattern space.
$! : is for not deleting the last line
D : This is used when pattern space contains no new lines normal and start a new cycle as if the d command was issued. Otherwise, delete text in the pattern space up to the specified lines, and restart cycle with the resultant pattern space, without reading a new line of input.
P : Print up to the first embedded newline of the current pattern space.This
prints the lines after removing the subjected lines.

I enjoyed this task and wrote awk script for more scaleable case (huge files).
Reading/scanning the input file once (no need to know line count), not storing the whole file in memory.
script.awk
BEGIN { range = 3} # define sliding window range
{lines[NR] = $0} # capture each line in array
NR == 1 {print} # print 1st line
NR > range * 2{ # for lines in sliding window range bottom
print lines[NR - range]; # print sliding window top line
delete lines[NR - range]; # delete sliding window top line
}
END {print} # print last line
running:
awk -f script.awk input.txt
input.txt
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
output:
line 1
line 4
line 5
line 6
line 7
line 10

Related

Search file A for a list of strings located in file B and append the value associated with that string to the end of the line in file A

This is a bit complicated, well I think it is..
I have two files, File A and file B
File A contains delay information for a pin and is in the following format
AD22 15484
AB22 9485
AD23 10945
File B contains a component declaration that needs this information added to it and is in the format:
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
So what I am trying to achieve is the following output
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='15484';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='10945';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='9485';
There is no order to the pin numbers in file A or B
So I'm assuming the following needs to happen
open file A, read first line
search file B for first string field in the line just read
once found in file B at the end of the line add the text "\nPIN_DELAY='"
add the second string filed of the line read from file A
add the following text at the end "';"
repeat by opening file A, read the second line
I'm assuming it will be a combination of sed and awk commands and I'm currently trying to work it out but think this is beyond my knowledge. Many thanks in advance as I know it's complicated..
FILE2=`cat file2`
FILE1=`cat file1`
TMPFILE=`mktemp XXXXXXXX.tmp`
FLAG=0
for line in $FILE1;do
echo $line >> $TMPFILE
for line2 in $FILE2;do
if [ $FLAG == 1 ];then
echo -e "PIN_DELAY='$(echo $line2 | awk -F " " '{print $1}')'" >> $TMPFILE
FLAG=0
elif [ "`echo $line | grep $(echo $line2 | awk -F " " '{print $1}')`" != "" ];then
FLAG=1
fi
done
done
mv $TMPFILE file1
Works for me, you can also add a trap for remove tmp file if user send sigint.
awk to the rescue...
$ awk -vq="'" 'NR==FNR{a[$1]=$2;next} {print; for(k in a) if(match($0,k)) {print "PIN_DELAY=" q a[k] q ";"; next}}' keys data
'DXN_0':
PIN_NUMBER='(AD22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='15484';
'DXP_0':
PIN_NUMBER='(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,AD23,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='10945';
'VREFN_0':
PIN_NUMBER='(AB22,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)';
PIN_DELAY='9485';
Explanation: scan the first file for key/value pairs. For each line in the second data file print the line, for any matching key print value of the key in the requested format. Single quotes in awk is little tricky, setting a q variable is one way of handling it.
FINAL Script for my application, A big thank you to all that helped..
# ! /usr/bin/sh
# script created by Adam with a LOT of help from users on stackoverflow
# must pass $1 file (package file from Xilinx)
# must pass $2 file (chips.prt file from the PCB design office)
# remove these temp files, throws error if not present tho, whoops!!
rm DELAYS.txt CHIP.txt OUTPUT.txt
# BELOW::create temp files for the code thanks to Glastis#stackoverflow https://stackoverflow.com/users/5101968/glastis I now know how to do this
DELAYS=`mktemp DELAYS.txt`
CHIP=`mktemp CHIP.txt`
OUTPUT=`mktemp OUTPUT.txt`
# BELOW::grep input file 1 (pkg file from Xilinx) for lines containing a delay in the form of n.n and use TAIL to remove something (can't remember), sed to remove blanks and replace with single space, sed to remove space before \n, use awk to print columns 3,9,10 and feed into awk again to calculate delay provided by fedorqui#stackoverflow https://stackoverflow.com/users/1983854/fedorqui
# In awk, NF refers to the number of fields on the current line. Since $n refers to the field number n, with $(NF-1) we refer to the penultimate field.
# {...}1 do stuff and then print the resulting line. 1 evaluates as True and anything True triggers awk to perform its default action, which is to print the current line.
# $(NF-1) + $NF)/2 * 141 perform the calculation: `(penultimate + last) / 2 * 141
# {$(NF-1)=sprintf( ... ) assign the result of the previous calculation to the penultimate field. Using sprintf with %.0f we make sure the rounding is performed, as described above.
# {...; NF--} once the calculation is done, we have its result in the penultimate field. To remove the last column, we just say "hey, decrease the number of fields" so that the last one gets "removed".
grep -E -0 '[0-9]\.[0-9]' $1 | tail -n +2 | sed -e 's/[[:blank:]]\+/ /g' -e 's/\s\n/\n/g' | awk '{print ","$3",",$9,$10}' | awk '{$(NF-1)=sprintf("%.0f", ($(NF-1) + $NF)/2 * 169); NF--}1' >> $DELAYS
# remove blanks in part file and add additional commas (,) so that the following awk command works properly
cat $2 | sed -e "s/[[:blank:]]\+//" -e "s/(/(,/g" -e 's/)/,)/g' >> $CHIP
# this awk command is provided by karakfa#stackoverflow https://stackoverflow.com/users/1435869/karakfa Explanation: scan the first file for key/value pairs. For each line in the second data file print the line, for any matching key print value of the key in the requested format. Single quotes in awk is little tricky, setting a q variable is one way of handling it. https://stackoverflow.com/questions/32458680/search-file-a-for-a-list-of-strings-located-in-file-b-and-append-the-value-assoc
awk -vq="'" 'NR==FNR{a[$1]=$2;next} {print; for(k in a) if(match($0,k)) {print "PIN_DELAY=" q a[k] q ";"; next}}' $DELAYS $CHIP >> $OUTPUT
# remove the additional commas (,) added in earlier before ) and after ( and you are done..
cat $OUTPUT | sed -e 's/(,/(/g' -e 's/,)/)/g' >> chipsd.prt

Extract specified lines from a file

I have a file and I want to extract specific lines from that file like lines 2, 10, 15,21, .... and so on. There are around 200 thousand lines to be extracted from the file. How can I do it efficiently in bash
Maybe looking for:
sed -n -e 1p -e 4p afile
Put the linenumbers of the lines you want in a file called "wanted", like this:
2
10
15
21
Then run this script:
#!/bin/bash
while read w
do
sed -n ${w}p yourfile
done < wanted
TOTALLY ALTERNATIVE METHOD
Or you could let "awk" do it all for you, like this which is probably miles faster since you won't have to create 200,000 sed processes:
awk 'FNR==NR{a[$1]=1;next}{if(FNR in a){print;}}' wanted yourfile
The FNR==NR portion detects when awk is reading the file called "wanted" and if so, it sets element "$1" of array "a" to "1" so we know that this line number is wanted. The stuff in the second set of curly braces is active when processing your bigger file only and it prints the current line if its linenumber is in the array "a" we created when reading the "wanted" file.
$ gawk 'ARGIND==1 { L[$0]++ }; ARGIND==2 && FNR in L' lines file > file.lines
Wanted line numbers have to be stored in lines delimited by newline and they may safely be in random order. It almost exactly the same as #Mark Setchell’s second method, but uses a little more clear way to determine which file is current. Although this ARGIND is GNU extension, so gawk. If you are limited to original AWK or mawk, you can write it as:
$ awk 'FILENAME==ARGV[1] { L[$0]++ }; FILENAME==ARGV[2] && FNR in L' lines file > file.lines
Efficiency test:
$ awk 'BEGIN { for (i=1; i<=1000000; i++) print i }' > file
$ shuf -i 1-1000000 -n 200000 > lines
$ time gawk 'ARGIND==1 { L[$0]++ }; ARGIND==2 && FNR in L' lines file > file.lines
real 0m1.734s
user 0m1.460s
sys 0m0.052s
UPD:
As #Costi Ciudatu pointed out, there is room for impovement for the case when all wanted lines are in the head of a file.
#!/usr/bin/gawk -f
ARGIND==1 { L[$0]++ }
ENDFILE { L_COUNT = FNR }
ARGIND==2 && FNR in L { L_PRINTED++; print }
ARGIND==2 && L_PRINTED == L_COUNT { exit 0 }
Sript interrupts when last line is printed, so now it take few milliseconds to filter out 2000 random lines from first 1 % of a one million lines file.
$ time ./getlines.awk lines file > file.lines
real 0m0.016s
user 0m0.012s
sys 0m0.000s
While reading a whole file still takes about a second.
$ time gawk 'ARGIND==1 { L[$0]++ }; ARGIND==2 && FNR in L' lines file > file.lines
real 0m0.780s
user 0m0.756s
sys 0m0.016s
Provided your system supports sed -f - (i.e. for sed to read its script on standard input; it works on Linux, but not on some other platforms) you can turn the file of line numbers into a sed script, naturally using sed:
sed 's/$/p/' lines | sed -n -f - inputfile >output
If the lines you're interested in are close to the beginning of the file, you can make use of head and tail to efficiently extract specific lines.
For your example line numbers (assuming that list doesn't go on until close to 200,000), a dummy but still efficient approach to read those lines would be the following:
for n in 2 10 15 21; do
head -n $n /your/large/file | tail -1
done
sed Example
sed -n '2p' file
awk Example
awk 'NR==2' file
this will print 2nd line of file
use same logic in loop & try.
say a for loop
for VARIABLE in 2 10 15 21
do
awk "NR==$VARIABLE" file
done
Give your line numbers this way..

Print text between two lines (from list of line numbers in file) in Unix [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I have a sample file which has thousands of lines.
I want to print text between two line numbers in that file. I don't want to input line numbers manually, rather I have a file which contains list of line numbers between which text has to be printed.
Example : linenumbers.txt
345|789
999|1056
1522|1366
3523|3562
I need a shell script which will read line numbers from this file and print the text between each range of lines into a separate (new) file.
That is, it should print lines between 345 and 789 into a new file, say File1.txt, and print text between lines 999 and 1056 into a new file, say File2.txt, and so on.
considering your target file has only thousands of lines. here is a quick and dirty solution.
awk -F'|' '{system("sed -n \""$1","$2"p\" targetFile > file"NR)}' linenumbers.txt
the targetFile is your file containing thousands of lines.
the oneliner does not require your linenumbers.txt to be sorted.
the oneliner allows line range to be overlapped in your linenumbers.txt
after running the command above, you will have n filex files. n is the row counts of linenumbers.txt x is from 1-n you can change the filename pattern as you want.
Here's one way using GNU awk. Run like:
awk -f script.awk numbers.txt file.txt
Contents of script.awk:
BEGIN {
# set the field separator
FS="|"
}
# for the first file in the arguments list
FNR==NR {
# add the row number and field one as keys to a multidimensional array with
# a value of field two
a[NR][$1]=$2
# skip processing the rest of the code
next
}
# for the second file in the arguments list
{
# for every element in the array's first dimension
for (i in a) {
# for every element in the second dimension
for (j in a[i]) {
# ensure that the first field is treated numerically
j+=0
# if the line number is greater than the first field
# and smaller than the second field
if (FNR>=j && FNR<=a[i][j]) {
# print the line to a file with the suffix of the first file's
# line number (the first dimension)
print > "File" i
}
}
}
}
Alternatively, here's the one-liner:
awk -F "|" 'FNR==NR { a[NR][$1]=$2; next } { for (i in a) for (j in a[i]) { j+=0; if (FNR>=j && FNR<=a[i][j]) print > "File" i } }' numbers.txt file.txt
If you have an 'old' awk, here's the version with compatibility. Run like:
awk -f script.awk numbers.txt file.txt
Contents of script.awk:
BEGIN {
# set the field separator
FS="|"
}
# for the first file in the arguments list
FNR==NR {
# add the row number and field one as a key to a pseudo-multidimensional
# array with a value of field two
a[NR,$1]=$2
# skip processing the rest of the code
next
}
# for the second file in the arguments list
{
# for every element in the array
for (i in a) {
# split the element in to another array
# b[1] is the row number and b[2] is the first field
split(i,b,SUBSEP)
# if the line number is greater than the first field
# and smaller than the second field
if (FNR>=b[2] && FNR<=a[i]) {
# print the line to a file with the suffix of the first file's
# line number (the first pseudo-dimension)
print > "File" b[1]
}
}
}
Alternatively, here's the one-liner:
awk -F "|" 'FNR==NR { a[NR,$1]=$2; next } { for (i in a) { split(i,b,SUBSEP); if (FNR>=b[2] && FNR<=a[i]) print > "File" b[1] } }' numbers.txt file.txt
I would use sed to process the sample data file because it is simple and swift. This requires a mechanism for converting the line numbers file into the appropriate sed script. There are many ways to do this.
One way uses sed to convert the set of line numbers into a sed script. If everything was going to standard output, this would be trivial. With the output needing to go to different files, we need a line number for each line in the line numbers file. One way to give line numbers is the nl command. Another possibility would be to use pr -n -l1. The same sed command line works with both:
nl linenumbers.txt |
sed 's/ *\([0-9]*\)[^0-9]*\([0-9]*\)|\([0-9]*\)/\2,\3w file\1.txt/'
For the given data file, that generates:
345,789w > file1.txt
999,1056w > file2.txt
1522,1366w > file3.txt
3523,3562w > file4.txt
Another option would be to have awk generate the sed script:
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt
If your version of sed will allow you to read its script from standard input with -f - (GNU sed does; BSD sed does not), then you can convert the line numbers file into a sed script on the fly, and use that to parse the sample data:
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt |
sed -n -f - sample.data
If your system supports /dev/stdin, you can use one of:
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt |
sed -n -f /dev/stdin sample.data
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt |
sed -n -f /dev/fd/0 sample.data
Failing that, use an explicit script file:
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt > sed.script
sed -n -f sed.script sample.data
rm -f sed.script
Strictly, you should deal with ensuring the temporary file name is unique (mktemp) and removed even if the script is interrupted (trap):
tmp=$(mktemp sed.script.XXXXXX)
trap "rm -f $tmp; exit 1" 0 1 2 3 13 15
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt > $tmp
sed -n -f $tmp sample.data
rm -f $tmp
trap 0
The final trap 0 allows your script to exit successfully; omit it, and you script will always exit with status 1.
I've ignored Perl and Python; either could be used for this in a single command. The file management is just fiddly enough that using sed seems simpler. You could also use just awk, either with a first awk script writing an awk script to do the heavy duty work (trivial extension of the above), or having a single awk process read both files and produce the required output (harder, but far from impossible).
If nothing else, this shows that there are many possible ways of doing the job. If this is a one-off exercise, it really doesn't matter very much which you choose. If you will be doing this repeatedly, then choose the mechanism that you like. If you're worried about performance, measure. It is likely that converting the line numbers into a command script is a negligible cost; processing the sample data with the command script is where the time is taken. I would expect sed to excel at that point; I've not measured to confirm that it does.
You could do the following
# myscript.sh
linenumbers="linenumber.txt"
somefile="afile"
while IFS=\| read start end ; do
echo "sed -n '$start,${end}p;${end}q;' $somefile > $somefile-$start-$end"
done < $linenumbers
run it like so sh myscript.sh
sed -n '345,789p;789q;' afile > afile-345-789
sed -n '999,1056p;1056q;' afile > afile-999-1056
sed -n '1522,1366p;1366q;' afile > afile-1522-1366
sed -n '3523,3562p;3562q;' afile > afile-3523-3562
then when you're happy do sh myscript.sh | sh
EDIT Added William's excellent points on style and correctness.
EDIT Explanation
The basic idea is to get a script to generate a series of shell commands that can be checked for correctness first before being executed by "| sh".
sed -n '345,789p;789q; means use sed and don't echo each line (-n) ; there are two commands saying from line 345 to 789 p(rint) the lines and the second command is at line 789 q(uit) - by quitting on the last line you save having sed read all the input file.
The while loop reads from the $linenumbers file using read, read if given more than one variable name populates each with a field from the input, a field is usually separated by space and if there are too few variable names then read will put the remaining data into the last variable name.
You can put the following in at your shell prompt to understand that behaviour.
ls -l | while read first rest ; do
echo $first XXXX $rest
done
Try adding another variable second to the above to see what happens then, it should be obvious.
The problem is your data is delimited by |s and that's where using William's suggestion of IFS=\| works as now when reading from the input the IFS has changed and the input is now separated by |s and we get the desired result.
Others can feel free to edit,correct and expand.
To extract the first field from 345|789 you can e.g use awk
awk -F'|' '{print $1}'
Combine that with the answers received from your other question and you will have a solution.
This might work for you (GNU sed):
sed -r 's/(.*)\|(.*)/\1,\2w file-\1-\2.txt/' | sed -nf - file

Shell Script error unexpected else token

I am using below code to read records from one file and comparing its 6 field with other file and accordingly writing the output to the files.
for FILE_NAME in `GET_FILE_LIST $1 $2 $3`
do
y=`echo $FILE_NAME|awk '{print $6}'`"X"
echo "Start --- `date`" >> $LOGF
echo $y >> $LOGF
cat $x |nawk ' BEGIN { FS=",";
while ((getline < "SUBNO.txt") > 0)
myarray[$1] = $2
if(myarray[$6]==0)
' > {${FILE_NAME}_P}
else
' > {${FILE_NAME}_S}
fi
UPD_FILE_STATUS $FILE_NAME 35
done
The issue is I am geeting following error:
./TEST.sh: line 95: syntax error near unexpected token `else'
./TEST.sh: line 95: `else'
I have following case:
In one text file I have data like this:
GGO 099E7C5S 34 533196588 45696 22
PPC 93403DSA 35 784397429 44696 56
In the second file data is like this:
22,0
24,1
26,0
What I want to do is write line from first file, compare the last field value with second file.If it is value is 0 put it in new file_1 and if it is 1 then put it in file_2.
I hope this will clear the case.
Please help.
There are a lot of bad practices in this script which I will not address here. The problem you have is that you seem confused about quoting rules. Consider the code:
cat $x |nawk ' BEGIN { FS=",";
while ((getline < "SUBNO.txt") > 0)
myarray[$1] = $2
if(myarray[$6]==0)
' > {${FILE_NAME}_P}
else
The single quotes delimit the script that is passed to nawk, so the above snippet is essentially equal to:
cat $x | nawk cmds > {${FILE_NAME}_P}
else
The colorization of your editor (and even in the question's display here on SO) should make this clear. You will notice that the else does not have a matching if, so the shell treats that as a syntax error.
Please note that you are redirecting to a file named {xxx_P}. It is unusual to put { and } characters in a filename.
This will do what you updated your question to say you wanted (split file1 based on info from file2)
awk 'NR==FNR{sfx[$1]=$2; next} {print > ("file_" sfx[$NF]+0)}' FS="," file2 FS=" " file1
You didn't say what to do with lines from file1 when the last field on the line has no entry in file2 so my script treats it the same as if there was a 0-ending entry in file2. If that's not what you want, it's an easy fix.
The following will extract from file2 all lines whose sixth field is present with a ",0" suffix in file1. It does this by transforming the relevant lines from file1 into a simple awk script. Repeat with the ",1" suffix to get the other set, or learn enough awk to do it all in a single pass; it's not hard.
sed /,0$/!d;s///;s/.*/$6 == "&"/p' file1 |
awk -f - file2
The sed script is somewhat obscure; it deletes any non-matching lines, then replaces the matching suffix with nothing, then wraps the remainder with a suitable awk condition.
Depending on taste and requirements, you could make the second script a sed script instead (hint: look for the w command) or the first script an awk script as well.

Search file, show matches and first line

I've got a comma separated textfile, which contains the column headers in the first line:
column1;column2;colum3
foo;123;345
bar;345;23
baz;089;09
Now I want a short command that outputs the first line and the matching line(s). Is there a shorter way than:
head -n 1 file ; cat file | grep bar
This should do the job:
sed -n '1p;2,${/bar/p}' file
where:
1p will print the first line
2,$ will match from second line to the last line
/bar/p will print those lines that match bar
Note that this won't print the header line twice if there's a match in the columns names.
This might work for you:
cat file | awk 'NR<2;$0~v' v=baz
column1;column2;colum3
baz;089;09
Usually cat file | ... is useless but in this case it keeps the file argument out of the way and allows the variable v to be amended quickly.
Another solution:
cat file | sed -n '1p;/foo/p'
column1;column2;colum3
foo;123;345
You can use grouping commands, then pipe to column command for pretty-printing
$ { head -1; grep bar; } <input.txt | column -ts';'
column1 column2 colum3
bar 345 23
What if the first row contains bar too? Then it's printed two times with your version. awk solution:
awk 'NR == 1 { print } NR > 1 && $0 ~ "bar" { print }' FILE
If you want the search sting as the almost last item on the line:
awk 'ARGIND > 1 { exit } NR == 1 { print } NR > 1 && $0 ~ ARGV[2] { print }' FILE YOURSEARCHSTRING 2>/dev/null
sed solution:
sed -n '1p;1d;/bar/p' FILE
The advantage for both of them, that it's a single process.
head -n 1 file && grep bar file Maybe there is even a shorter version but will get a bit complicated.
EDIT: as per bobah 's comment I have added && between the commands to have only a single error for missing file
Here is the shortest command yet:
awk 'NR==1||/bar/' file

Resources