This is my sample file.
I want to do this.
I have fixed requirement to delete 2nd and 3rd line keeping the 1st line.
From the bottom, I want to delete above 2 lines excluding last line, as I wouldn't know what my last line number is as it depends on file.
Once I delete my 2nd and 3rd line 4th line should ideally come at 2nd and so on, same for a bottom after delete.
I want to use head/tail command and modify the existing file only. as Changes to write back to the same file.
Sample file text format.
Input File
> This is First Line
> Delete Delete Delete This Line
> Delete Delete Delete This Line
> ..
> ..
> ..
> ..
> Delete Delete Delete This Line
> Delete Delete Delete This Line
> This is Last Line, should not be deleted It could be come at any line
number (variable)
Output file (same file modified)
This is First Line
..
..
..
..
This is Last Line, should not be deleted It could be come at any line number (variable)
Edit - Because of compatibility issues on Unix (Using HP Unix on ksh shell) I want to implement this using head/tail/awk. not sed.
Adding solution as per OP's request to make it genuine solution.
Approach: In this solution OP could provide lines from starting point and from ending point of any Input_file and those lines will be skipped.
What code will do: I have written code in that way it will generate an awk code as per your given lines to be skipped then and will run it too.
cat print_lines.ksh
start_line="2,3"
end_line="2,3"
total_lines=$(wc -l<Input_file)
awk -v len="$total_lines" -v OFS="||" -v s1="'" -v start="$start_line" -v end="$end_line" -v lines=$(wc -l <Input_file) '
BEGIN{
num_start=split(start, a,",");
num_end=split(end, b,",");
for(i=1;i<=num_start;i++){
val=val?val OFS "FNR=="a[i]:"FNR=="a[i]};
for(j=1;j<=num_end;j++){
b[j]=b[j]>1?len-(b[j]-1):b[j];
val=val?val OFS "FNR=="b[j]:"FNR=="b[j]};
print "awk " s1 val "{next} 1" s1" Input_file"}
' | sh
Change Input_file name to your actual file name and let me know how it goes then.
Following awk may help you in same(Since I don't have Hp system so didn't test it).
awk -v lines=$(wc -l <Input_file) 'FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){next} 1' Input_file
EDIT: Adding non-one liner form of solution too now.
awk -v lines=$(wc -l <Input_file) '
FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){
next}
1
' Input_file
wc + sed solution:
len=$(wc -l inpfile | cut -d' ' -f1)
sed "$(echo "$((len-2)),$((len-1))")d; 2,3d" inpfile > tmp_f && mv tmp_f inpfile
$ cat inputfile
> This is First Line
> ..
> ..
> ..
> ..
> This is Last Line, should not be deleted It could be come at any line
Perl suggestion... read whole file into array #L, get index of last line. Delete 2nd last, 3rd last, 3rd and 2nd line. Print what's left.
perl -e '#L=<>; $p=$#L; delete $L[$p-1]; delete $L[$p-2]; delete $L[2]; delete $L[1]; print #L' file.txt
Or, maybe a little more succinctly with splice:
perl -e '#L=<>; splice #L,1,2; splice #L,$#L-2,2; print #L' file.txt
If you wish to have some flexibility a ksh script approach may work, though little expensive in terms of resources :
#!/bin/ksh
[ -f "$1" ] || echo "Input is not a file" || exit 1
total=$(wc -l "$1" | cut -d' ' -f1 )
echo "How many lines to delete at the end?"
read no
[ -z "$no" ] && echo "Not sure how many lines to delete, aborting" && exit 1
sed "2,3d;$((total-no)),$((total-1))d" "$1" >tempfile && mv tempfile "$1"
And feed the file as argument to the script.
Notes
This deletes second and third lines.
Plus no number of lines from last excluding last as read from user.
Note: My ksh version is 93u+ 2012-08-01
awk '{printf "%d\t%s\n", NR, $0}' < file | sed '2,3d;N;$!P;D' file
The awk here serves the purpose of providing line numbers and then passing the output to the sed which uses the line numbers to do the required operations.
%d : Used to print the numbers. You can also use '%i'
'\t' : used to place a tab between the number and string
%s : to print the string of charaters
'\n' : To create a new line
NR : to print lines numbers starting from 1
For sed
N: Read/append the next line of input into the pattern space.
$! : is for not deleting the last line
D : This is used when pattern space contains no new lines normal and start a new cycle as if the d command was issued. Otherwise, delete text in the pattern space up to the specified lines, and restart cycle with the resultant pattern space, without reading a new line of input.
P : Print up to the first embedded newline of the current pattern space.This
prints the lines after removing the subjected lines.
I enjoyed this task and wrote awk script for more scaleable case (huge files).
Reading/scanning the input file once (no need to know line count), not storing the whole file in memory.
script.awk
BEGIN { range = 3} # define sliding window range
{lines[NR] = $0} # capture each line in array
NR == 1 {print} # print 1st line
NR > range * 2{ # for lines in sliding window range bottom
print lines[NR - range]; # print sliding window top line
delete lines[NR - range]; # delete sliding window top line
}
END {print} # print last line
running:
awk -f script.awk input.txt
input.txt
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
output:
line 1
line 4
line 5
line 6
line 7
line 10
I have this file:
2016,05,P,0002 ,CJGLOPSD8
00,BBF,BBDFTP999,051000100,GBP, , -2705248.00
00,BBF,BBDFTP999,059999998,GBP, , -3479679.38
00,BBF,BBDFTP999,061505141,GBP, , -0.40
00,BBF,BBDFTP999,061505142,GBP, , 6207621.00
00,BBF,BBDFTP999,061505405,GBP, , -0.16
00,BBF,BBDFTP999,061552000,GBP, , -0.24
00,BBF,BBDFTP999,061559010,GBP, , -0.44
00,BBF,BBDFTP999,062108021,GBP, , -0.34
00,BBF,BBDFTP999,063502007,GBP, , -0.28
I want to programmatically (in unix, or informatica if possible) grab the first two fields in the top row, concatenate them, append them to the end of each line and remove that first row.
Like so:
00,BBF,BBDFTP999,051000100,GBP,,-2705248.00,201605
00,BBF,BBDFTP999,059999998,GBP,,-3479679.38,201605
00,BBF,BBDFTP999,061505141,GBP,,-0.40,201605
00,BBF,BBDFTP999,061505142,GBP,,6207621.00,201605
00,BBF,BBDFTP999,061505405,GBP,,-0.16,201605
00,BBF,BBDFTP999,061552000,GBP,,-0.24,201605
00,BBF,BBDFTP999,061559010,GBP,,-0.44,201605
00,BBF,BBDFTP999,062108021,GBP,,-0.34,201605
00,BBF,BBDFTP999,063502007,GBP,,-0.28,201605
This is my current attempt:
awk -vvar1=`cat OF\ OPSDOWN8.CSV | head -1 | cut -d',' -f1` -vvar2=`cat OF\ OPSDOWN8.CSV | head -1 | cut -d',' -f2` 'BEGIN {FS=OFS=","} {print $0, var 1var2}' OF\ OPSDOWN8.CSV> OF_OPSDOWN8.csv
Any pointers? I've tried looking around the forum but can only find answers to part of my question.
Thanks for your help.
Use this awk:
awk 'BEGIN{FS=OFS=","} NR==1{val=$1$2;next} {gsub(/ */,"");print $0,val}' file
Explanation:
BEGIN{FS=OFS=","} - This block will set FS (Field Separator) and OFS (Output Field Separator) as ,.
NR==1 - Working with line number 1. Here, $1 and $2 denotes field number.
print $0,val - Printing $0 (whole line) and stored value from val.
I would use the following awk command:
awk 'NR==1{d=$1$2;next}{$(NF+1)=d;gsub(/[[:space:]]/,"")}1' FS=, OFS=, file
Explanation:
NR==1{d=$1$2;next} applies on line 1 and set's a variable d(ate) to the value of the first and the second field. The variable is being used when processing the remaining lines. next tells awk to go ahead with the next line right away without processing further instructions on this line.
{$(NF+1)=d;gsub(/[[:space:]]/,"")}1 appends a new field to the line (NF is the number of fields, assigning d to $(NF+1) effectively adds a field. gsub() is used to removing spaces. 1 at the end always evaluates to true and makes awk print the modified line.
FS=, is a command line argument. It set's the input field delimiter to ,.
OFS=, is a command line argument. It set's the output field delimiter to ,.
Output:
00,BBF,BBDFTP999,051000100,GBP,,-2705248.00,201605
00,BBF,BBDFTP999,059999998,GBP,,-3479679.38,201605
00,BBF,BBDFTP999,061505141,GBP,,-0.40,201605
00,BBF,BBDFTP999,061505142,GBP,,6207621.00,201605
00,BBF,BBDFTP999,061505405,GBP,,-0.16,201605
00,BBF,BBDFTP999,061552000,GBP,,-0.24,201605
00,BBF,BBDFTP999,061559010,GBP,,-0.44,201605
00,BBF,BBDFTP999,062108021,GBP,,-0.34,201605
00,BBF,BBDFTP999,063502007,GBP,,-0.28,201605
With sed :
sed '1{s/\([^,]*\),\([^,]*\),.*/\1\2/;h;d};/.*/G;s/\n/,/;s/ //g' file
in ERE mode :
sed -r '1{s/([^,]*),([^,]*),.*/\1\2/;h;d};/.*/G;s/\n/,/;s/ //g' file
Output :
00,BBF,BBDFTP999,051000100,GBP,,-2705248.00,201605
00,BBF,BBDFTP999,059999998,GBP,,-3479679.38,201605
00,BBF,BBDFTP999,061505141,GBP,,-0.40,201605
00,BBF,BBDFTP999,061505142,GBP,,6207621.00,201605
00,BBF,BBDFTP999,061505405,GBP,,-0.16,201605
00,BBF,BBDFTP999,061552000,GBP,,-0.24,201605
00,BBF,BBDFTP999,061559010,GBP,,-0.44,201605
00,BBF,BBDFTP999,062108021,GBP,,-0.34,201605
00,BBF,BBDFTP999,063502007,GBP,,-0.28,201605
This might work for you (GNU sed):
sed '1s/,//;1s/,.*//;1h;1d;s/ //g;G;s/\n/,/' file
For the first line only: remove the first comma, remove from the next comma to the end of the line, store the amended line in the hold space (HS) and then delete the current line (the d abruptly ends processing). For subsequent lines: remove all spaces, append the HS and replace the newline (from the G command) with a comma.
Or if you prefer:
sed '1{s/,//;s/,.*//;h;d};s/ //g;G;s/\n/,/' file
If you want to use Informatica for this, use two Source Qualifiers. Read the file twice - just one line in one SQ (filter out the rest) and in the second SQ read the whole file except the first line (skip header). Join the two on dummy port and you're done.
1,n1,abcd,1234
2,n2,abrt,5666
,h2,yyyy,123x
3,h2,yyyy,123y
3,h2,yyyy,1234
,k1,yyyy,5234
4,22,yyyy,5234
the above given is my input file abc.txt , all I want the missing first column value should fill with next row first value.
example:
3,h2,yyyy,123x
3,h2,yyyy,123y
I want output like below,
1,n1,abcd,1234
2,n2,abrt,5666
3,h2,yyyy,123x// the missing first column value 3 should fill with second row first value
3,h2,yyyy,123y
3,h2,yyyy,1234
4,k1,yyyy,5234
4,22,yyyy,5234
How to implement this with help of AWK or some other alternate in shell script,please help.
Using awk you can do:
awk -F, '$1 ~ /^ *$/ {
p=p RS $0
next
}
p!="" {
gsub(RS " +", RS $1, p)
sub("^" RS, "", p)
print p
p=""
} 1' file
1,n1,abcd,1234
2,n2,abrt,5666
3,h2,yyyy,123x
3,h2,yyyy,123y
3,h2,yyyy,1234
4,k1,yyyy,5234
4,22,yyyy,5234
I would reverse the file, and then replace the value from the previous line:
tac filename | awk -F, '$1 ~ /^[[:blank:]]*$/ {$1 = prev} {print; prev=$1}' | tac
This will also fill in missing values on multiple lines.
With GNU sed:
$ sed '/^ ,/{N;s/ \(.*\n\)\([^,]*\)\(.*\)/\2\1\2\3/}' infile
1,n1,abcd,1234
2,n2,abrt,5666
3,h2,yyyy,123x
3,h2,yyyy,123y
3,h2,yyyy,1234
4,k1,yyyy,5234
4,22,yyyy,5234
The sed command does the following:
/^ ,/ { # If the line starts with 'space comma'
N # Append the next line
# Extract the value before the comma, prepend to first line
s/ \(.*\n\)\([^,]*\)\(.*\)/\2\1\2\3/
}
BSD sed would require an extra semicolon before the closing brace.
This only works with non-contiguous lines with missing values.