I wrote a correctly working sed script which replaces multiple spaces with single space between tokens (it skips lines with # or //) :
#!/bin/sed -f
/.*#/ !{
/\/\//n
# handle more than one space between tokens
s/\([^ ]\)\s\+/\1 /g
}
i run it on ubuntu like this: ./spaces.sed < spa.txt
spa.txt:
/** spa.txt text
date : some date
hih+jjhh jgjg
if ( hjh>=hjhjh )
y **/
# this is a comment
// this is a comment
lines begins here ;
/****** this line is comment ****/
some more lines
// again comment
more lines words
/** again multi line co
mmment it
comment line
follows till here**/
file ends
now i want to add the functionality that script should skip over lines between a pattern (pattern can be distributed in multiple lines). This is the pattern: /* and */
I tried many things but of no use:
#!/bin/sed -f
/.*#/ !{
/\/\*/,/\*\// {
/\/\*/n #it skips successfully the /* line
n #also skips next line
/\*\// !{
}
}
/\/\//n
# handle more than one space between tokens
s/\([^ ]\)\s\+/\1 /g
}
but script isn't working as expected.
Expected output:
/** spa.txt text
date : some date
hih+jjhh jgjg
if ( hjh>=hjhjh )
y **/
# this is a comment
// this is a comment
lines begins here ;
/****** this line is comment ****/
some more lines
// again comment
more lines words
/** again multi line co
mmment it
comment line
follows till here**/
file ends
suggestions?
Thanks
I'd re-engineer the script a bit, to handle # and // comments on their own. With the /* … */ comments, you have to deal with single-line and multi-line variants separately. I'd also use the [[:space:]] notation to spot spaces or tabs. I prefer to avoid backslashes (an aversion caused by working with troff in the days of my youth — if you've never needed 16 backslashes in a row to get the desired effect, you've not suffered enough), so I use \%…% to choose the % character as the search marker instead of / (which means there's no need to escape the slashes in the pattern with a backslash), and I use [*] instead of \*. The { p; d; } notation prints the current line and then deletes it and moves onto the next line. (Using n appends the next line to the current line; it isn't what you need.). The second semicolon isn't required by GNU sed but is by BSD (macOS) sed. The spaces in those braces are optional but make it easier to read.
Putting this together, you might have spaces.sed like this:
#!/bin/sed -f
# Comments with a #
/#/ { p; d; }
# Comments with //
\%//% { p; d; }
# Single line /* ... */ comments
\%/[*].*[*]/% { p; d; }
# Multi-line /* ... */ comments
\%/[*]%,\%[*]/% { p; d; }
s/\([^[:space:]]\)[[:space:]]\{2,\}/\1 /g
On your sample data (thanks for including it!), this produces:
/** spa.txt text
date : some date
hih+jjhh jgjg
if ( hjh>=hjhjh )
y **/
# this is a comment
// this is a comment
lines begins here ;
/****** this line is comment ****/
some more lines
// again comment
more lines words
/** again multi line co
mmment it
comment line
follows till here**/
file ends
That looks like what you wanted.
Limitations
It doesn't remove multiple spaces at the start of a line.
the leading blanks are not removed.
If you have a line with multiple spaces and // or #, the multiple spaces remain:
these spaces // survive
so do # these
If you have multiple single line comments on a single line, you don't get spaces removed in between them:
/* these */ spaces are not /* removed */
If you have a single-line comment and the start of a multi-line comment on a single line, the multi-line comment is not spotted. Similarly, if you have a multi-line comment that ends on a line and has a single-line comment starting after it, then if there are any multiple spaces between the end of the one comment and the start of the next, they are not handled.
/* this */ is not /* handled
very well */ nor are these /* spaces */
This doesn't deal with the subtleties of backslash-newline in the middle of a start or end comment symbol, nor with backslash-newline at the end of a // comment. Only brain-dead programs (or programmers) produce such comments, so it shouldn't be a real problem. Fortunately, you're not writing a compiler; those have to deal with the nonsense. And don't get me started on trigraphs!
It doesn't handle comment-like sequences inside strings (or multi-character character constants):
"/* this is not a comment */"
'/*', ' ', '*/'
However, most of these issues are subtle enough that you're probably OK without dealing with them. If you must deal with them, then you need a program, not a sed script (assuming you value your sanity).
I am trying to right and left outer join these two RECFM VB files but
I don't get anything from the F2 file.
//STEP2000 EXEC PGM=SORT
//* JOIN
//*
//SYSOUT DD SYSOUT=*
//*
//SORTJNF1 DD DSN=YXX122.TEMP.EXPORT.TYPEN,
// DISP=SHR
//*
//SORTJNF2 DD DSN=YXX122.TEMP.EXPORT.TYPEC,
// DISP=SHR
//*
//SORTOUT DD DSN=YXX122.DYXX122.EXPORT.XSUM,
// DISP=(NEW,CATLG,DELETE),
// UNIT=(DEV,2),
// SPACE=(CYL,(150,20),RLSE),
// DCB=(RECFM=VB,LRECL=304,BLKSIZE=0)
//*
//SYSIN DD *
SORT FIELDS=COPY
JOINKEYS FILES=F1,
FIELDS=(13,4,A,18,5,A,17,1,A,23,1,A,33,8,A,41,4,A)
JOINKEYS FILES=F2,
FIELDS=(13,4,A,18,5,A,17,1,A,23,1,A,33,8,A,41,4,A)
JOIN UNPAIRED,F1,F2
REFORMAT FIELDS=(F1:5,300)
OUTFIL FTOV
//
The problem is I can't find how the REFORMAT FIELDS the F2 file.
I tried with REFORMAT FIELDS=(F1:5,300,F2:5,300) but the outfile was with a length of 600.
I will like to know how to have both file F1 and F2 in my SORTOUT file with a VB length 304.
Any idea on how to fix this problem?
It turns out you have DFSORT, not SyncSORT which makes things simpler, as you can definitely use the Match Marker ? in the REFORMAT statement. Up-to-date SyncSORT may have the Match Marker as an undocumented feature.
Putting all the unmatched records on one OUTFIL may be confusing (you won't know which input they have come from).
This conceptualises your join (where the Output is the joined data, and b represents blank).
F1
A
C
E
F2
B
C
F
Output
Ab
bB
Eb
bF
So if you want B and F you need to specify some data from F2. You also need to identify the "blanks" so that you know which part of the REFORMAT record currently has data in (DFSORT has a Match Marker for this, SyncSORT does not).
For that you need to identify one byte which can never be blank in the record. If that is not possible, one byte which can never be another given value (which you specify on FILL= on the REFORMAT). Failing that, two or more bytes with the same characteristics. As a final fail-safe you can check the entire part of the REFORMAT record from one file or the other for blank.
Since you want V-type output, you could make your REFORMAT record variable:
REFORMAT FIELDS=(F1:1,4,?,F1:5,300,F2:5)
And use VLTRIM on OUTFIL.
Or fixed:
REFORMAT FIELDS=(F1:5,300,F2:5,300)
And use FTOV with VLTRIM on OUTFIL.
Then you need some code, which tests the byte/bytes/partofdata you have chosen for being space/thevalueyouhavechosen and uses BUILD to create a record which contains the data you want (plus trailing blanks/values which will be killed by the VLTRIM).
IFTHEN=(WHEN=(logicalexpression),
BUILD=(1,4,5,300)),
IFTHEN=(WHEN=NONE,
BUILD=(1,4,305,300))
Or
IFTHEN=(WHEN=(logicalexpression),
BUILD=(1,300)),
IFTHEN=(WHEN=NONE,
BUILD=(301,300))
Here's some code which does what you want. Probably. I can't test it with SyncSORT.
Data:
F1
A 11111111111111111111111111111111111
C 2222222222222222222222
E 3
F2
B 4444444444444444
C 55555555555555555555555555
F 6666666666666
Code:
OPTION COPY
JOINKEYS F1=INA,FIELDS=(5,1,A),SORTED,NOSEQCK
JOINKEYS F2=INB,FIELDS=(5,1,A),SORTED,NOSEQCK
JOIN UNPAIRED,F1,F2,ONLY
REFORMAT FIELDS=(F1:1,4,F1:5,76,F2:5)
OUTFIL FNAMES=EXT,VLTRIM=C' ',
IFTHEN=(WHEN=(81,1,CH,EQ,C'2'),
BUILD=(1,4,82)),
IFTHEN=(WHEN=NONE,
BUILD=(1,4,5,76))
The Match Marker, ?, will be set to 1 for unmatched F2, 2 for unmatched F2 and B for matched records (which you won't get, because of the ONLY on the JOIN statement).
This presumes your data is already in sequence. Remove the SORTED,NOSEQCK for data which is not in sequence.
I've used an LRECL of 80 and a simple key and some simple data.
Output:
For EXT:
A 11111111111111111111111111111111111
B 4444444444444444
E 3
F 6666666666666
SORTOUT would show the unchanged REFORMAT record. That is for you to see how it works. You can remove the FNAMES=EXT or remove the SORTOUT from the JCL when you understand everything.
The F1:1,4 ensures that the REFORMAT record is variable-length. The 5,300 should use blank-padding for shorter records. That's why you need the VLTRIM later. The F2:5 says "file two, position five, to the end of the file two record".
If your data can have genuine trailing blanks, you'll have to use FILL= and VLTRIM= for the same character.
IFTHEN=(WHEN=(logicalexpression) processing finishes when an IFTHEN is true. So the combination in the code is effectively an IF/ELSE.
See also this, Compare two files and write it to "match" and "nomatch" files and Sync sort, Unpaired records of File1 have spaces for no records in F2 file. Can we replace those specific column's spaces by ZEROS? for further examples.
I have such task to do but I have no idea how to write it with sed function.
I have to change the way on commenting in a file from:
//something6
//something4
//something5
//something3
//something2
to
/*something6
* something4
* something5
* something3
* something2*/
from
//something6
//something4
//something5
//something3
//something2
to
/*something6
something4
something5
something3
something2*/
from
/*something6
* something4
* something5
* something3
* something2*/
to
//something6
//something4
//something5
//something3
//something2
from
/*something6
something4
something5
something3
something2*/
to
//something6
//something4
//something5
//something3
//something2
Those 4 patterns must be made by sed function (I guess but not sure about that).
Tried doing it but without luck. I can replace single words to other ones but how to change the way of commenting? No clue. Would be very gratefull for help and assisstance.
Given that the task is:
Please write a script that allows to change style of comments in source files for example : /* .... */ goes to // .... The style of comment is an argument of the script.
I have tried to use just typical:
sed -i 's/'"$lookingfor"'/'"$changing"'/g' $filename
In this context, either $lookingfor or $changing or both will contain slashes, so that simple formulation doesn't work, as you correctly observe.
The conversion of // comments to /* comments is easy as long as you know that you can choose an arbitrary character to separate the sections of the s/// command, such as %. So, for example, you could use:
sed -i.bak -e 's%// *\(.*\)%/*\1 */%'
This looks for a double-slash followed by zero or more spaces and anything and converts it to /* anything */.
The conversion of /* comments is much harder. There are two cases to be concerned about:
/* A single line comment */
/*
** A multiline comment
*/
That's before you get into:
/* OK */ "/* OK */" /* Really?! */
which is a single line containing two comments and a string containing text that outside a string would look like a comment. This I am studiously ignoring! Or, more accurately, I am studiously deciding that it will be OK when converted to:
// OK */ "/* OK */" /* Really?!
which isn't the same at all, but serves you right for writing convoluted C in the first place.
You can deal with the first case with something like:
sed -e '\%/\*\(.*\)\*/% { s%%//\1%; n; }'
I have the grouping braces and the n command in there so that single line comments don't also match the second case:
-e '\%/\*%,\%\*/% {
\%/\*% { s%/\*\(.*\)%//\1%; n; }
\%\*/% { s%\(.*\)\*/%//\1%; n; }
s%^\( *\)%\1//%
}'
The first line selects a range of lines between one matching /* and the next matching */. The \% tells sed to use the % instead of / as the search delimiter. There are three operations within the outer grouping { … }:
Convert /*anything into //anything and start on the next line.
Convert anything*/ into //anything and start on the next line.
Convert any other line so that it preserves leading blanks but puts // after them.
This is still ridiculously easy to subvert if the comments are maliciously formed. For example:
/* a comment */ int x = 0;
is mapped to:
// a comment int x = 0;
Fixing problems like that, and the example with a string, is something I'd not even start trying in sed. And that's before you get onto the legal but implausible C comments, like:
/\
\
* comment
*\
\
/
/\
/\
noisiness \
commentary \
continued
Which contains just two comments (but does contain two comments!). And before you decide to deal with trigraphs (??/ is a backslash). Etc.
So, a moderate approximation to a C to C++ comment conversion is:
sed -e '\%/\*\(.*\)\*/% { s%%//\1%; n; }' \
-e '\%/\*%,\%\*/% {
\%/\*% { s%/\*\(.*\)%//\1%; n; }
\%\*/% { s%\(.*\)\*/%//\1%; n; }
s%^\( *\)%\1//%
}' \
-i.bak "$#"
I'm assuming you aren't using a C shell; if you are, you need more backslashes at the ends of the lines in the script so that the multi-line single-quoted sed command is treated correctly.
How should I do program in lex (or flex) for removing nested comments from text and print just the text which is not in comments?
I should probably somehow recognize states when I am in comment and number of starting "tags" of block comment.
Lets have rules:
1.block comment
/*
block comment
*/
2. line comment
// line comment
3. Comments can be nested.
Example 1
show /* comment /* comment */ comment */ show
output:
show show
Example 2
show /* // comment
comment
*/
show
output:
show
show
Example 3
show
///* comment
comment
// /*
comment
//*/ comment
//
comment */
show
output:
show
show
You got the theory right. Here's a simple implementation; could be improved.
%x COMMENT
%%
%{
int comment_nesting = 0;
%}
"/*" BEGIN(COMMENT); ++comment_nesting;
"//".* /* // comments to end of line */
<COMMENT>[^*/]* /* Eat non-comment delimiters */
<COMMENT>"/*" ++comment_nesting;
<COMMENT>"*/" if (--comment_nesting == 0) BEGIN(INITIAL);
<COMMENT>[*/] /* Eat a / or * if it doesn't match comment sequence */
/* Could have been .|\n ECHO, but this is more efficient. */
([^/]*([/][^/*])*)* ECHO;
%%
This is exactly what you need : yy_push_state(COMMENT) Its uses a stack to store our states which comes handy in nested situations.
I am afraid that #rici 's answer might be wrong. First we need to record line no and might change the file line directive later. Second giving open_sign and close_sign. We have following principles:
1) using an integer for stack control: push for open sign, popup for close sign
2) eat up CHARACTER BEFORE EOF and close sign WITHOUT open sign inside
<comments>{open} {no_open_sign++;}
<comments>\n {curr_lineno++;}
<comments>[^({close})({open})(EOF)] /*EAT characters by doing nothing*/
3) Errors might happen when no_open_sign down to zero, hence
<comments>{close} similar as above post
4) EOF should not be inside the string, hence you need a rule
<comments>(EOF) {return ERROR_TOKEN;}
to make it more robust, you also need to have another close checking rule out side of
And in practice, you should use negative look before and look behind regular expression gramma if your lexical analyzer supports it.
How do I change the comments to look like /* */ instead of // in VS 2008?
// this is a line comment, it will only comment this line
// for the next line you need to repeat //
/* this is a block comment
you can do all sort of stuff here
and you won't have to worry about beginning the line with some special chars
until the end*/
Since those two types of comments are a bit different I would say that you should both of them. It's not an error to have line and block comments in the same file.
I suppose you could run a regexp replace that will replace // on the beginning of the line with /* and add */ at the end but you will end up with something like this
/* first line comment */
/* second line comment */
/* third line comment */
/* forth line comment */