I have a text file and I need to delete the first blank line and then all the text after the 2nd blank line - bash

I'm using bash and I have a file that is in 3 parts of text. The first part, then a blank line, then the 2nd part then another blank line, then the file 3 part of text. I need to output this to a new file that contains only the first 2 parts without the blank line in between. I've been playing with sed and awk, but can't quite figure it out.

Most simply with awk:
awk -v RS= 'NR <= 2' filename
With an empty record separator RS, awk splits the file into records at empty lines. With the selection NR <= 2, only the first two are printed (delimited by the default output record separator, which is a newline).
If the file is very large, it might be prudent to amend this to
awk -v RS= '1; NR == 2 { exit }' filename
This stops processing the file after the second record and prints all until then.
Addendum: Obligatory crazy sed solution (not recommended for use, written for fun):
sed -n '/^$/ { x; /./q; H; d; }; p' filename

Related

delete lines if firstline matches expression, but next 2 lines do not match different expression

I have a test file in this format:
G03X22Y22.5
G01X48.5
M98P9001 (OFF)****
G00X20Y25
M98P8051 (FAST CUT)
G01X22Y34
G01X25Y33
I am trying to make a bash or MSDOS script that will :
Find all lines in the file that match : M98P9001
if the NEXT 2 LINES do not contain the code { M98P8050, M98P8080 OR M09 } Delete all 3 lines . which would result in the output :
G03X22Y22.5
G01X48.5
G01X22Y34
G01X25Y33
I've tried solutions with SED or AWK, but haven't gotten the right one yet:
sed -e '/M98P9001/,+2d' input.txt >> output.txt
this one will always delete all 3 lines after finding the match , but I need to only delete the lines if the next 2 lines following the match do not have a match with { M98P8050, M98P8080 OR M09 }.
a mark and sweep approach
$ awk 'NR==FNR {if(!(/M98P80[58]0|M09/ && p~/M98P80[58]0|M09/) && pp~/M98P9001/)
{a[NR]; a[NR-1]; a[NR-2]}
pp=p; p=$0; next}
!(FNR in a)' file{,}
G03X22Y22.5
G01X48.5
G01X22Y34
G01X25Y33
This seems to give your desired output:
awk '
/M98P9001/ {
getline l2; getline l3;
if((l2 l3)~/M98P8050|M98P8080|M09/) printf "%s\n%s\n%s\n", $0, l2, l3;
next;
}
{ print; }'
Description:
If first line pattern match, read in next two lines to variables.
Check concatenation of both lines for any of the 3 secondary patterns
If match, print all three lines, else print nothing.
go to next record.
on all other lines, print.
This might work for you (GNU sed):
sed -E ':a;N;s/\n/&/2;Ta;/^[^\n]*M98P9001/{/\n.*(M98P8050|M98P8080|M09)/!d};P;D' file
Open a three line window throughout the length of the file.
If the first line of the window contains M98P9001 and either of the second or third lines do not contain M98P8050, M98P8080 or M09 delete the entire window and repeat.
Otherwise, print/delete the first line of the window and repeat.
N.B. The idiom :a;N;s/\n/&/2;Ta tops up the three line window.

How to replace a specific character in a file, only on the lines by counting this specific character in the line?

I would like to double the 4th comma in the lines counting 7 and only 7 commas in all the csv's of a folder.
In this command line, I double the 4th comma:
sed  's/,/,,/4' Person_7.csv > new.csv
In this command line, I can find and count all the commas in a line:
sed 's/[^,]//g' dat | awk '{ print length }'
In this command line, I can count and create a new file with lines containing 7 commas:
awk -F , 'NF == 7' <Person_test.csv >Person_7.csv
But I don't know how to do the specific work...
You need something to select only the lines that contain exactly 7 commas and then operate on just these lines. You can do that with sed:
sed '/^\([^,]*,\)\{7\}[^,]*$/s/,/&&/4'
where ^\([^,]*,\)\{7\}[^,]*$ defines a line that contains exactly 7 commas.
It's a bit easier with awk, though:
awk -F, -v OFS=, 'NF == 8 { $4 = $4 OFS } 1'
This sets input and output field separators to ,, and then for lines with 8 fields (7 commas) appends a , to the end of the 4th field, doubling the comma. The final 1 makes sure every line gets printed.

Unix Shell Scripting-how can i remove particular characers inside a text file?

I have an one text file. This file has 5 rows and 5 columns. All the columns are separated by "|" (symbol). In that 2nd column(content) length should be 7 characters.
If 2nd column length is more than 7 characters. Then,I want to remove those extra characters without opening that file.
For example:
cat file1
ff|hahaha1|kjbsb|122344|jbjbnjuinnv|
df|hadb123_udcvb|sbfuisdbvdkh|122344|jbjbnjuinnv|
gf|harayhe_jnbsnjv|sdbvdkh|12234|jbjbnj|
qq|kkksks2|datetag|7777|jbjbnj|
jj|harisha|hagte|090900|hags|
For the above case 2nd and 3rd rows having 2nd column length is more than 7 characters. Now i want to remove those extra characters without open the input file using awk or sed command
I'm waiting for your responses guys.
Thanks in advance!!
Take a substring of length 7 from the second column with awk:
awk -F'|' -v OFS='|' '{ $2 = substr($2, 1, 7) }1' file
Now any strings longer than 7 characters will be made shorter. Any strings that were shorter will be left as they are.
The 1 at the end is the shortest true condition to trigger the default action, { print }.
If you're happy with the changes, then you can overwrite the original file like this:
awk -F'|' -v OFS='|' '{ $2 = substr($2, 1, 7) }1' file > tmp && mv tmp file
i.e. redirect to a temporary file and then overwrite the original.
First try
sed 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1
What is happening here? We construct the command step-by-step:
# Replace something
sed 's/hadb123_udcvb/replaced/' file1
# Remember the matched string (will be used in a later command)
sed 's/\(hadb123_udcvb\)/replaced/' file1
# Replace a most 7 characters without a '|' (one time each line)
sed 's/\([^|]\{7\}\)/replaced/' file1
# Remove additional character until a '|'
sed 's/\([^|]\{7\}\)[^|]*/replaced/' file1
# Put back the string you remembered
sed 's/\([^|]\{7\}\)[^|]*/\1/' file1
# Extend teh matched string with Start-of-line (^), any-length first field, '|'
sed 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1
When this shows the desired output, you can add the option -i for changing the input file:
sed -i 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1

sed/awk - Put all text on the same line as a preceding number

How can I get all text that proceeds 'number:number' onto the same line as the preceding 'number:number'?
10:15
text line one
text line two
text no pattern
11:12
random text
text is random
totally random
could be four lines
could be five
Should then become
10:15 text line one text line two text no pattern
11:12 random text text is random totally random could be four lines could be five
This works for your example-
tr '\n' ' ' < file.txt | sed 's/[0-9]*:[0-9]*/\n&/g'
Explanation-
tr will initially put everything on the same line.
Then that sed one liner will insert new lines before each num:num pattern.
Given that input file all you need is to tell awk to read a blank-line-separated paragraph at a time using RS=<null> and recompile each record using the default OFS value of a blank char
$ awk -v RS= '{$1=$1}1' file
10:15 text line one text line two text no pattern
11:12 random text text is random totally random could be four lines could be five
Both sed and awk solutions join lines till a new record is detected or input is done in which case the joined lines are printed and cleared - use either solution
the sed oneliner
sed -nr '/^[0-9]{2}:[0-9]{2}$/!{H;$!b}; x; s/\n/ /gp'
the awk script
awk '
!/^[0-9]{2}:[0-9]{2}$/ {
lines=lines" "$0
next
}
{if(lines) print lines; lines=$0}
END {print lines}
'
Here is an GNU AWK script:
script.awk
BEGIN { RS = "\n[0-9]+:[0-9]+|\n$" }
{ gsub(/\n/,"",$0)
printf( "%s%s ", $0,RT) }
Use it like this awk -f script.awk file.txt
It uses the GNU AWK specific extensions RT and regex RS:
the record separator is set to "colon separated number pairs".
to get the final newline at the end of the file the "|\n$" is added to match the last newline in the file.
In order to start separation at the second pair: the "\n" is added in front. Thus the first colon separated number pair "10:15" is included in the first $0 and not in RT.
The trick here is that you want to split the file on paragraphs instead of lines. In awk, if you set RS="" it enables paragraph mode. Each iteration of the awk loop will have a paragraph in $0. You can then substitute the newlines and turn them into spaces.
awk <data.txt 'BEGIN { RS = "" ; FS = "\n" } { gsub(/\n/, " ", $0) ; print }'
Output:
10:15 text line one text line two text no pattern
11:12 random text text is random totally random could be four lines could be five
The benefit of this is that awk handles all the special cases for you: files that end in a blank line, end without a blank line, end without a newline, etc.

Append and replace using awk/sed

I have this file:
2016,05,P,0002 ,CJGLOPSD8
00,BBF,BBDFTP999,051000100,GBP, , -2705248.00
00,BBF,BBDFTP999,059999998,GBP, , -3479679.38
00,BBF,BBDFTP999,061505141,GBP, , -0.40
00,BBF,BBDFTP999,061505142,GBP, , 6207621.00
00,BBF,BBDFTP999,061505405,GBP, , -0.16
00,BBF,BBDFTP999,061552000,GBP, , -0.24
00,BBF,BBDFTP999,061559010,GBP, , -0.44
00,BBF,BBDFTP999,062108021,GBP, , -0.34
00,BBF,BBDFTP999,063502007,GBP, , -0.28
I want to programmatically (in unix, or informatica if possible) grab the first two fields in the top row, concatenate them, append them to the end of each line and remove that first row.
Like so:
00,BBF,BBDFTP999,051000100,GBP,,-2705248.00,201605
00,BBF,BBDFTP999,059999998,GBP,,-3479679.38,201605
00,BBF,BBDFTP999,061505141,GBP,,-0.40,201605
00,BBF,BBDFTP999,061505142,GBP,,6207621.00,201605
00,BBF,BBDFTP999,061505405,GBP,,-0.16,201605
00,BBF,BBDFTP999,061552000,GBP,,-0.24,201605
00,BBF,BBDFTP999,061559010,GBP,,-0.44,201605
00,BBF,BBDFTP999,062108021,GBP,,-0.34,201605
00,BBF,BBDFTP999,063502007,GBP,,-0.28,201605
This is my current attempt:
awk -vvar1=`cat OF\ OPSDOWN8.CSV | head -1 | cut -d',' -f1` -vvar2=`cat OF\ OPSDOWN8.CSV | head -1 | cut -d',' -f2` 'BEGIN {FS=OFS=","} {print $0, var 1var2}' OF\ OPSDOWN8.CSV> OF_OPSDOWN8.csv
Any pointers? I've tried looking around the forum but can only find answers to part of my question.
Thanks for your help.
Use this awk:
awk 'BEGIN{FS=OFS=","} NR==1{val=$1$2;next} {gsub(/ */,"");print $0,val}' file
Explanation:
BEGIN{FS=OFS=","} - This block will set FS (Field Separator) and OFS (Output Field Separator) as ,.
NR==1 - Working with line number 1. Here, $1 and $2 denotes field number.
print $0,val - Printing $0 (whole line) and stored value from val.
I would use the following awk command:
awk 'NR==1{d=$1$2;next}{$(NF+1)=d;gsub(/[[:space:]]/,"")}1' FS=, OFS=, file
Explanation:
NR==1{d=$1$2;next} applies on line 1 and set's a variable d(ate) to the value of the first and the second field. The variable is being used when processing the remaining lines. next tells awk to go ahead with the next line right away without processing further instructions on this line.
{$(NF+1)=d;gsub(/[[:space:]]/,"")}1 appends a new field to the line (NF is the number of fields, assigning d to $(NF+1) effectively adds a field. gsub() is used to removing spaces. 1 at the end always evaluates to true and makes awk print the modified line.
FS=, is a command line argument. It set's the input field delimiter to ,.
OFS=, is a command line argument. It set's the output field delimiter to ,.
Output:
00,BBF,BBDFTP999,051000100,GBP,,-2705248.00,201605
00,BBF,BBDFTP999,059999998,GBP,,-3479679.38,201605
00,BBF,BBDFTP999,061505141,GBP,,-0.40,201605
00,BBF,BBDFTP999,061505142,GBP,,6207621.00,201605
00,BBF,BBDFTP999,061505405,GBP,,-0.16,201605
00,BBF,BBDFTP999,061552000,GBP,,-0.24,201605
00,BBF,BBDFTP999,061559010,GBP,,-0.44,201605
00,BBF,BBDFTP999,062108021,GBP,,-0.34,201605
00,BBF,BBDFTP999,063502007,GBP,,-0.28,201605
With sed :
sed '1{s/\([^,]*\),\([^,]*\),.*/\1\2/;h;d};/.*/G;s/\n/,/;s/ //g' file
in ERE mode :
sed -r '1{s/([^,]*),([^,]*),.*/\1\2/;h;d};/.*/G;s/\n/,/;s/ //g' file
Output :
00,BBF,BBDFTP999,051000100,GBP,,-2705248.00,201605
00,BBF,BBDFTP999,059999998,GBP,,-3479679.38,201605
00,BBF,BBDFTP999,061505141,GBP,,-0.40,201605
00,BBF,BBDFTP999,061505142,GBP,,6207621.00,201605
00,BBF,BBDFTP999,061505405,GBP,,-0.16,201605
00,BBF,BBDFTP999,061552000,GBP,,-0.24,201605
00,BBF,BBDFTP999,061559010,GBP,,-0.44,201605
00,BBF,BBDFTP999,062108021,GBP,,-0.34,201605
00,BBF,BBDFTP999,063502007,GBP,,-0.28,201605
This might work for you (GNU sed):
sed '1s/,//;1s/,.*//;1h;1d;s/ //g;G;s/\n/,/' file
For the first line only: remove the first comma, remove from the next comma to the end of the line, store the amended line in the hold space (HS) and then delete the current line (the d abruptly ends processing). For subsequent lines: remove all spaces, append the HS and replace the newline (from the G command) with a comma.
Or if you prefer:
sed '1{s/,//;s/,.*//;h;d};s/ //g;G;s/\n/,/' file
If you want to use Informatica for this, use two Source Qualifiers. Read the file twice - just one line in one SQ (filter out the rest) and in the second SQ read the whole file except the first line (skip header). Join the two on dummy port and you're done.

Resources