The GNU CPP says about backslash new lines
A continued line is a line which ends with a backslash, ‘\’. The backslash is removed and the following line is joined with the current one. No space is inserted, so you may split a line anywhere, even in the middle of a word. (It is generally more readable to split lines only at white space.)
When I compile the following, breaking the printf word I get an error like:
error: unknown type name 'prin'
// Online C compiler https://www.programiz.com/c-programming/online-compiler/
#include <stdio.h>
int main() {
// Write C code here
prin\
tf("Hello world");
return 0;
}
Why are the lines not joined together?
The lines are joined together. However, as the second line starts with indent, you end up with prin tf. Remove all leading white space on the line with tf, and it will compile.
Related
I wrote a correctly working sed script which replaces multiple spaces with single space between tokens (it skips lines with # or //) :
#!/bin/sed -f
/.*#/ !{
/\/\//n
# handle more than one space between tokens
s/\([^ ]\)\s\+/\1 /g
}
i run it on ubuntu like this: ./spaces.sed < spa.txt
spa.txt:
/** spa.txt text
date : some date
hih+jjhh jgjg
if ( hjh>=hjhjh )
y **/
# this is a comment
// this is a comment
lines begins here ;
/****** this line is comment ****/
some more lines
// again comment
more lines words
/** again multi line co
mmment it
comment line
follows till here**/
file ends
now i want to add the functionality that script should skip over lines between a pattern (pattern can be distributed in multiple lines). This is the pattern: /* and */
I tried many things but of no use:
#!/bin/sed -f
/.*#/ !{
/\/\*/,/\*\// {
/\/\*/n #it skips successfully the /* line
n #also skips next line
/\*\// !{
}
}
/\/\//n
# handle more than one space between tokens
s/\([^ ]\)\s\+/\1 /g
}
but script isn't working as expected.
Expected output:
/** spa.txt text
date : some date
hih+jjhh jgjg
if ( hjh>=hjhjh )
y **/
# this is a comment
// this is a comment
lines begins here ;
/****** this line is comment ****/
some more lines
// again comment
more lines words
/** again multi line co
mmment it
comment line
follows till here**/
file ends
suggestions?
Thanks
I'd re-engineer the script a bit, to handle # and // comments on their own. With the /* … */ comments, you have to deal with single-line and multi-line variants separately. I'd also use the [[:space:]] notation to spot spaces or tabs. I prefer to avoid backslashes (an aversion caused by working with troff in the days of my youth — if you've never needed 16 backslashes in a row to get the desired effect, you've not suffered enough), so I use \%…% to choose the % character as the search marker instead of / (which means there's no need to escape the slashes in the pattern with a backslash), and I use [*] instead of \*. The { p; d; } notation prints the current line and then deletes it and moves onto the next line. (Using n appends the next line to the current line; it isn't what you need.). The second semicolon isn't required by GNU sed but is by BSD (macOS) sed. The spaces in those braces are optional but make it easier to read.
Putting this together, you might have spaces.sed like this:
#!/bin/sed -f
# Comments with a #
/#/ { p; d; }
# Comments with //
\%//% { p; d; }
# Single line /* ... */ comments
\%/[*].*[*]/% { p; d; }
# Multi-line /* ... */ comments
\%/[*]%,\%[*]/% { p; d; }
s/\([^[:space:]]\)[[:space:]]\{2,\}/\1 /g
On your sample data (thanks for including it!), this produces:
/** spa.txt text
date : some date
hih+jjhh jgjg
if ( hjh>=hjhjh )
y **/
# this is a comment
// this is a comment
lines begins here ;
/****** this line is comment ****/
some more lines
// again comment
more lines words
/** again multi line co
mmment it
comment line
follows till here**/
file ends
That looks like what you wanted.
Limitations
It doesn't remove multiple spaces at the start of a line.
the leading blanks are not removed.
If you have a line with multiple spaces and // or #, the multiple spaces remain:
these spaces // survive
so do # these
If you have multiple single line comments on a single line, you don't get spaces removed in between them:
/* these */ spaces are not /* removed */
If you have a single-line comment and the start of a multi-line comment on a single line, the multi-line comment is not spotted. Similarly, if you have a multi-line comment that ends on a line and has a single-line comment starting after it, then if there are any multiple spaces between the end of the one comment and the start of the next, they are not handled.
/* this */ is not /* handled
very well */ nor are these /* spaces */
This doesn't deal with the subtleties of backslash-newline in the middle of a start or end comment symbol, nor with backslash-newline at the end of a // comment. Only brain-dead programs (or programmers) produce such comments, so it shouldn't be a real problem. Fortunately, you're not writing a compiler; those have to deal with the nonsense. And don't get me started on trigraphs!
It doesn't handle comment-like sequences inside strings (or multi-character character constants):
"/* this is not a comment */"
'/*', ' ', '*/'
However, most of these issues are subtle enough that you're probably OK without dealing with them. If you must deal with them, then you need a program, not a sed script (assuming you value your sanity).
I've gotten lost in an edge case of sorts. I'm working on a conversion of some old plaintext documentation to reST/Sphinx format, with the intent of outputting to a few formats (including HTML and text) from there. Some of the documented functions are for dealing with bitstrings, and a common case within these is a sentence like the following: Starting character is the blank " " which has the value 0.
I tried writing this as an inline literal the following ways: Starting character is the blank `` `` which has the value 0. or Starting character is the blank :literal:` ` which has the value 0. but there are a few problems with how these end up working:
reST syntax objects to a whitespace immediately inside of the literal, and it doesn't get recognized.
The above can be "fixed"--it looks correct in the HTML () and plaintext (" ") output--with a non-breaking space character inside the literal, but technically this is a lie in our case, and if a user copied this character, they wouldn't be copying what they expect.
The space can be wrapped in regular quotes, which allows the literal to be properly recognized, and while the output in HTML is probably fine (" "), in plaintext it ends up double-quoted as "" "".
In both 2/3 above, if the literal falls on the wrap boundary, the plaintext writer (which uses textwrap) will gladly wrap inside the literal and trim the space because it's at the start/end of the line.
I feel like I'm missing something; is there a good way to handle this?
Try using the unicode character codes. If I understand your question, this should work.
Here is a "|space|" and a non-breaking space (|nbspc|)
.. |space| unicode:: U+0020 .. space
.. |nbspc| unicode:: U+00A0 .. non-breaking space
You should see:
Here is a “ ” and a non-breaking space ( )
I was hoping to get out of this without needing custom code to handle it, but, alas, I haven't found a way to do so. I'll wait a few more days before I accept this answer in case someone has a better idea. The code below isn't complete, nor am I sure it's "done" (will sort out exactly what it should look like during our review process) but the basics are intact.
There are two main components to the approach:
introduce a char role which expects the unicode name of a character as its argument, and which produces an inline description of the character while wrapping the character itself in an inline literal node.
modify the text-wrapper Sphinx uses so that it won't break at the space.
Here's the code:
class TextWrapperDeux(TextWrapper):
_wordsep_re = re.compile(
r'((?<!`)\s+(?!`)|' # whitespace not between backticks
r'(?<=\s)(?::[a-z-]+:)`\S+|' # interpreted text start
r'[^\s\w]*\w+[a-zA-Z]-(?=\w+[a-zA-Z])|' # hyphenated words
r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))') # em-dash
#property
def wordsep_re(self):
return self._wordsep_re
def char_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
"""Describe a character given by unicode name.
e.g., :char:`SPACE` -> "char:` `(U+00020 SPACE)"
"""
try:
character = nodes.unicodedata.lookup(text)
except KeyError:
msg = inliner.reporter.error(
':char: argument %s must be valid unicode name at line %d' % (text, lineno))
prb = inliner.problematic(rawtext, rawtext, msg)
return [prb], [msg]
app = inliner.document.settings.env.app
describe_char = "(U+%05X %s)" % (ord(character), text)
char = nodes.inline("char:", "char:", nodes.literal(character, character))
char += nodes.inline(describe_char, describe_char)
return [char], []
def setup(app):
app.add_role('char', char_role)
The code above lacks some glue to actually force the use of the new TextWrapper, imports, etc. When a full version settles out I may try to find a meaningful way to republish it; if so I'll link it here.
Markup: Starting character is the :char:`SPACE` which has the value 0.
It'll produce plaintext output like this: Starting character is the char:` `(U+00020 SPACE) which has the value 0.
And HTML output like: Starting character is the <span>char:<code class="docutils literal"> </code><span>(U+00020 SPACE)</span></span> which has the value 0.
The HTML output ends up looking roughly like: Starting character is the char:(U+00020 SPACE) which has the value 0.
Firstly, sorry for the question, I know I've heard something that could help, but I just can't remember.
Basically I would like to create my own syntax for a programming language. For example this code:
WRITE OUT 'Hello World!'
NEW LINE
would turn into this Java code:
System.out.print("Hello World!");
System.out.println();
How could I achieve this? Is there a method?
Olá.
There are techniques and proper algorithms to do that.
Search for "compiler techniques" and "Interpreter pattern".
An initial approach could be a basic pattern interpreter.
Assuming simple sentences and only one sentence per line, you could read the input file line by line and search for defined patterns (regular expressions).
The patterns describe the structure of the commands in your invented language.
If you get a match then you do the translation.
In particular, we use the regex.h library in c to perform the regular expression search.
Of course regex is also available in java.
Ex. NEW LINE match the pattern " *NEW +LINE *"
The * means that the preceding character occurs 0 or more times.
The + means that the preceding character occurs 1 or more times.
Thus, this pattern can match the command " NEW LINE " with arbitrary spaces between the words.
Ex. WRITE OUT 'Hello World!' match the pattern "WRITE OUT '([[:print:]]*)'"
or if you want to allow spaces " *WRITE +OUT +'([[:print:]]*)' *"
[[:print:]] means: match one printable character (ex. 'a' or 'Z' or '0' or '+')
Thus, [[:print:]]* match a sequence of 0, 1 or more printable characters
If a line of your input file matched the pattern of some command then you can do the translation, but in most cases you will need to retrieve some information before,
ex. the arbitrary text after WRITE OUT. Thats why you need to put parenthesis around [[:print:]]*. That will indicate to the function that perform the search that you want retrieve that particular part of your pattern.
A nice coincidence is that I recently assisted a friend with an college project similar to the problem you want to solve: a translator from c to basic. I reused that code to make an example for you.
I tested the code and it works.
It can translate:
WRITE OUT 'some text'
WRITE OUT variable
NEW LINE
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
#include <string.h>
#define STR_SHORT 100
#define MATCHES_SIZE 10
/**************************************************************
Returns the string of a match
**************************************************************/
char * GetExp(char *Source, char *Destination, regmatch_t Matches) {
//Source The string that was searched
//Destination Will contains the matched string
//Matches One element of the vector passed to regexec
int Length = Matches.rm_eo - Matches.rm_so;
strncpy(Destination, Source+Matches.rm_so, Length);
Destination[Length]=0;
return Destination;
}
/**************************************************************
MAIN
**************************************************************/
int main(int argc, char *argv[]) {
//Usage
if (argc==1) {
printf("Usage:\n");
printf("interpreter source_file\n");
printf("\n");
printf("Implements a very basic interpreter\n");
return 0;
}
//Open the source file
FILE *SourceFile;
if ( (SourceFile=fopen(argv[1], "r"))==NULL )
return 1;
//This variable is used to get the strings that matched the pattern
//Matches[0] -> the whole string being searched
//Matches[1] -> first parenthetical
//Matches[2] -> second parenthetical
regmatch_t Matches[MATCHES_SIZE];
char MatchedStr[STR_SHORT];
//Regular expression for NEW LINE
regex_t Regex_NewLine;
regcomp(&Regex_NewLine, " *NEW +LINE *", REG_EXTENDED);
//Regular expression for WRITE OUT 'some text'
regex_t Regex_WriteOutStr;
regcomp(&Regex_WriteOutStr, " *WRITE +OUT +'([[:print:]]*)' *", REG_EXTENDED);
//Regular expresion for WRITE OUT variable
regex_t Regex_WriteOutVar;
regcomp(&Regex_WriteOutVar, " *WRITE +OUT +([_[:alpha:]][[:alnum:]]*) *", REG_EXTENDED);
//Regular expression for an empty line'
regex_t Regex_EmptyLine;
regcomp(&Regex_EmptyLine, "^([[:space:]]+)$", REG_EXTENDED);
//Now we read the file line by line
char Buffer[STR_SHORT];
while( fgets(Buffer, STR_SHORT, SourceFile)!=NULL ) {
//printf("%s", Buffer);
//Shorcut for an empty line
if ( regexec(&Regex_EmptyLine, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("\n");
continue;
}
//NEW LINE
if ( regexec(&Regex_NewLine, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.println();\n");
continue;
}
//WRITE OUT 'some text'
if ( regexec(&Regex_WriteOutStr, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.print(\"%s\");\n", GetExp(Buffer, MatchedStr, Matches[1]));
continue;
}
//WRITE OUT variable
//Assumes variable is a string variable
if ( regexec(&Regex_WriteOutVar, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.print(\"%%s\", %s);\n", GetExp(Buffer, MatchedStr, Matches[1]));
continue;
}
//Unknown command
printf("Unknown command: %s", Buffer);
}
return 0;
}
Proper solution for this question requires the following steps:
Parse the original syntax code and create a syntax tree.
That is commonly done with tools like ANTLR.
Go through the syntax tree and either convert it to Java code, or to a Java syntax tree.
Both of those steps have their own complexity, so it would be better to ask separate questions about specific issues you encounter while implementing them.
Strictly speaking you can skip step 2 and generate Java directly when parsing, but unless your language is very simple renaming of Java concepts, you wouldn't be able to do that easily.
I'm working on a simple Ruby program that should count of the lines of text in a Java file that contain actual Java code. The line gets counted even if it has comments in it, so basically only lines that are just comments won't get counted.
I was thinking of using a regular expression to approach this problem. My program will just iterate line by line and compare it to a "regexp", like:
while line = file.gets
if line =~ regex
count+=1
end
end
I'm not sure what regexp format to use for that, though. Any ideas?
Getting the count for "Lines of code" can be a little subjective. Should auto-generated stuff like imports and package name really count? A person usually didn't write it. Does a line with just a closing curly brace count? There's not really any executing logic on that line.
I typically use this regex for counting Java lines of code:
^(?![ \s]*\r?\n|import|package|[ \s]*}\r?\n|[ \s]*//|[ \s]*/\*|[ \s]*\*).*\r?\n
This will omit:
Blank lines
Imports
Lines with the package name
Lines with just a }
Lines with single line comments //
Opening multi-line comments ((whitespace)/* whatever)
Continuation of multi-line comments ((whitespace)* whatever)
It will also match against either \n or \r\n newlines (since your source code could contain either depending on your OS).
While not perfect, it seems to come pretty close to matching against all, what I would consider, "legitimate" lines of code.
count = 0
file.each_line do |ln|
# Manage multiline and single line comments.
# Exclude single line if and only if there isn't code on that line
next if ln =~ %r{^\s*(//|/\*[^*]*\*/$|$)} or (ln =~ %r{/\*} .. ln =~ %r{\*/})
count += 1
end
There's only a problem with lines that have a multilines comment but also code, for example:
someCall(); /* Start comment
this a comment
even this
*/ thisShouldBeCounted();
However:
imCounted(); // Comment
meToo(); /* comment */
/* comment */ yesImCounted();
// i'm not
/* Nor
we
are
*/
EDIT
The following version is a bit more cumbersome but correctly count all cases.
count = 0
comment_start = false
file.each_line do |ln|
# Manage multiline and single line comments.
# Exclude single line if and only if there isn't code on that line
next if ln =~ %r{^\s*(//|/\*[^*]*\*/$|$)} or (ln =~ %r{^\s*/\*} .. ln =~ %r{\*/}) or (comment_start and not ln.include? '*/')
count += 1 unless comment_start and ln =~ %r{\*/\s*$}
comment_start = ln.include? '/*'
end
(I'm using Mac OS X, and this question might be specific to that variant of Unix)
I'm trying to split a file using csplit with a regular expression. It consists of various articles merged into one single long text file. Each article ends with "All Rights Reserved". This is at the end of the line: grep Reserved$ finds them all. Only, csplit claims there is no match.
csplit filename /Reserved$/
yields
csplit: Reserved$: no match
which is a clear and obvious lie. If I leave out the $, it works; but I want to be sure that I don't get any stray occurrences of 'Reserved' in the middle of the text. I tried a different word with the beginning-of-line character ^, and that seems to work. Other words (which do occur at the end of a line in the data) also do not match when used (eg and$).
Is this a known bug with OS X?
[Update: I made sure it's not a DOS/Unix line end character issue by removing all carriage return characters]
I have downloaded the source code of csplit from http://www.opensource.apple.com/source/text_cmds/text_cmds-84/csplit/csplit.c and tested this in the debugger.
The pattern is compiled with
if (regcomp(&cre, re, REG_BASIC|REG_NOSUB) != 0)
errx(1, "%s: bad regular expression", re);
and the lines are matched with
/* Read and output lines until we get a match. */
first = 1;
while ((p = csplit_getline()) != NULL) {
if (fputs(p, ofp) == EOF)
break;
if (!first && regexec(&cre, p, 0, NULL, 0) == 0)
break;
first = 0;
}
The problem is now that the lines returned by csplit_getline() still have a trailing newline character \n. Therefore "Reserved" are not the last characters in the string and the pattern "Reserved$" does not match.
After a quick-and-dirty insertion of
p[strlen(p)-1] = 0;
to remove the trailing newline from the input string the "Reserved$" pattern worked as expected.
There seem to be more problems with csplit in Mac OS X, see the remarks to the answer of Looking for correct Regular Expression for csplit (the repetition count {*} does also not work).
Remark: You can match "Reserved" at the end of the line with the following trick:
csplit filename /Reserved<Ctrl-V><Ctrl-J>/
where you actually use the Control keys to enter a newline character on the command line.