sed replace text at based on line number dynamically - shell

I am looking for bash script comments from // to /* */
I got partial working
sed -i '14s/////*/' a.c
this is working like // with */ how to add */ at the end.
Originl script
#include <stdio.h>
char buffer[10] = {'0'}; // comment1
int main()
{
printf("Hello World"); // Comment2
return 0;
}
Expected file
#include <stdio.h>
char buffer[10] = {'0'}; /* comment1 */
int main()
{
printf("Hello World"); /* Comment2 */
return 0;
}

Simplest solution
Assuming the idiosyncratic spacing in the desired output shown in the question is unintentional:
sed 's%// *\(.*\)%/* \1 */%'
The keys here are:
Using % instead of / to mark the separate parts of the s/// (or s%%%) command.
Capturing the text of the comment in \(…\).
Replacing it with \1 (preceded by /* and followed by */ and single spaces.
Working on a direct copy of the data from the question, the output is:
#include <stdio.h>
char buffer[10] = {'0'}; /* comment1 */
int main()
{
printf("Hello World"); /* Comment2 */
return 0;
}
Improving the handling of spaces
There are trailing blanks after the comments — ugly! We can fix that with care:
sed 's%//[[:space:]]*\(.*[^[:space:]]\)[[:space:]]*$%/* \1 */%'
That matches zero or more spaces after the // opening the comment, and matches up to the last non-space before an optional string of spaces at the end of the line. That generates:
#include <stdio.h>
char buffer[10] = {'0'}; /* comment1 */
int main()
{
printf("Hello World"); /* Comment2 */
return 0;
}
And you can deal with all trailing white space first, which is probably a good idea anyway, using:
sed -e 's/[[:space:]]\{1,\}$//' -e 's%//[[:space:]]*\(.*\)%/* \1 */%'
which yields:
#include <stdio.h>
char buffer[10] = {'0'}; /* comment1 */
int main()
{
printf("Hello World"); /* Comment2 */
return 0;
}
That differs from the previous output by not having a space after main().
Proper comment handling is hard!
Note that this simple code can easily be confused by valid C, such as:
printf("// this is not a comment\n");
To understand C fully enough not to make that mistake is beyond sensible sed. Less seriously, it will miss some valid but implausible character sequences that are officially comments, such as:
/\
/this is a comment\
and this is also part of the comment\
even with extra spaces
and if you allow trigraphs (don't), then:
/??/
/??/
This is part of the comment started two lines before!
This sort of stuff shouldn't afflict any actual code base, but are the sorts of garbage that compiler writers have to handle correctly.

Related

How would I write a program that reads from a standard input and outputs only 6 characters to a line?

For example if the input was:
My name is Alex and
I also love coding
The correct output should be:
1:My nam
1:e is A
1:lex an
1:d
2:I also
2: love
2:coding
So far I have this
int main () {
string i;
i.substr(0,6);
while (getline(cin, i)) {
cout << i << endl;
}
}
Using ranges, what you ask is almost as easy as
auto result = view | split('\n') | transform(chunk(6));
where view represents somehow the input, | split('\n') splits that input in several lines, and | transform(chunk(6)) transforms each line by splitting it in chunks of 6 chars. The result is therefore a "range of ranges of chunks", on which you can loop with a double nested for.
Here's a full example:
#include <iostream>
#include <sstream>
#include <string>
#include <fstream>
#include <range/v3/range/conversion.hpp>
#include <range/v3/view/chunk.hpp>
#include <range/v3/view/istream.hpp>
#include <range/v3/view/split.hpp>
#include <range/v3/view/transform.hpp>
// Comment/uncomment the line below
//#define FROM_FILE
using namespace ranges;
using namespace ranges::views;
int main() {
// prepare a path-to-file or string buffer
#ifdef FROM_FILE
std::string path_to_file{"/path/to/file"};
#else
std::basic_stringbuf<char> strbuf{"My name is Alex and\nI also love coding"};
#endif
// generate an input stream from the file or the string buffer
#ifdef FROM_FILE
std::ifstream is(path_to_file);
#else
std::istream is(&strbuf);
#endif
// prevent the stream from skipping whitespaces
is >> std::noskipws;
// generate a range view on the stream
ranges::istream_view<char> view(is);
// manipulate the view
auto out_lines = view | split('\n') // split at line breaks
| transform(chunk(6)); // split each in chunks of 6
// output
int index{};
for (auto line : out_lines) {
++index;
for (auto chunk_of_6 : line) {
std::cout << index << ':'
<< (chunk_of_6 | to<std::string>)
<< std::endl;
}
}
}
First I suggest that you give your variables meaningful names. i isn't good for a variable you use to read lines from std::cin. I've changed that name to line in my example below.
You are on the right track with i.substr(0,6); but you've placed it outside of the loop where i is empty - and you don't print it.
You are also supposed to prepend each line with the line number but that part is completely missing.
You have also missed that you should print the next 6 characters of the read line on the next line until you've printed everything that you read.
Here's an example how that could be fixed:
#include <iostream>
#include <string>
int main() {
unsigned max_len = 6;
std::string line;
for(unsigned line_number = 1; std::getline(std::cin, line); ++line_number) {
// loop until the read line is empty:
while(!line.empty()) {
// print max `max_len` characters and prepend it with the line number:
std::cout << line_number << ':' << line.substr(0, max_len) << '\n';
// if the line was longer than `max_len` chars, remove the first
// `max_len` chars:
if(line.size() > max_len) {
line = line.substr(max_len);
} else { // otherwise, make it empty
line.clear();
}
}
}
}

Unable to find cause of 'syntax error' in Bison code

I'm trying to connect simple flex and bison code that would just recognize a character for now. Yet I'm facing this error. I've read through a lot of answers to figure out what is wrong but am lost. Any help would be highly appreciated as I'm just starting out to explore this and could not find a lot of resources for it.
This is my .l file
%{
#include <stdlib.h>
#include <stdio.h>
#include "MiniJSC.tab.h"
void yyerror (char *s);
int yylex();
%}
%%
[0-9]+ { yylval.num = atoi(yytext); return T_INT_VAL; }
%%
int yywrap (void) {return 1;}
my .y file
%{
void yyerror (char *s);
int yylex();
#include <stdio.h> /* C declarations used in actions */
#include <stdlib.h>
%}
%union {int num; char id;} /* Yacc definitions */
%start line
%token print
%token T_INT_VAL
%type <num> line
%type <num> term
%type <num> T_INT_VAL
%%
/* descriptions of expected inputs corresponding actions (in C) */
line : print term ';' {printf("Printing %d\n", $2);}
;
term : T_INT_VAL {$$ = $1;}
;
%% /* C code */
void yyerror (char *s) {
fprintf (stderr, "%s\n", s);
}
int main (void) {
return yyparse ( );
}
The compilation and output:
$ bison MiniJSC.y -d
$ lex MiniJSC.l
$ gcc lex.yy.c MiniJSC.tab.c
$ ./a.out
10
syntax error
$
line : print term ';'
According to this, a valid line contains a print token followed by a term. Since a term must be a T_INT_VAL token, that means a valid line is a print token followed by a T_INT_VAL token.
Your input consists only of a T_INT_VAL token, so it is not a valid line and that's why you get a syntax error.
Also note that your lexer never produces a print token, so even if you entered print 10 as the input, it'd be an error since the lexer isn't going to recognize print as a token. So you should add a pattern for that as well.
You should also rename print to match your naming convention for tokens (i.e. ALL_CAPS).

Accepting and printing a string

Can we accept and print a string like this in c++?
This code is not working properly.
#include<iostream>
#include<string>
using namespace std;
main()
{
string a;char ch;
for(int i=0;i<5;i++)
{cin>>ch;
a[i]=ch;
}
a[5]='\0';
cout<<a;
}
I am able to print individual elements like a[1],a[2],etc but unable to print the entire string.Why?
If you want to take a string, you could do the following.
#include <iostream>
int main() {
std::string str;
std::getline(std::cin, str);
std::cout << str;
}
Also, C++ automatically null terminates any string literal you use.
Well it's not really anywhere near best-practices but to fix your immediate issue you need to actually resize the string.
#include<iostream>
#include<string>
main()
{
std::string a;char ch;
a.resize(5); // <--- reserves memory
for(int i=0;i<5;i++)
{
std::cin>>ch;
a[i]=ch;
}
a[5]='\0'; //<-- unnecessary
st::cout<<a;
}
alternatively you can append the characters
#include<iostream>
#include<string>
main()
{
std::string a;char ch;
for(int i=0;i<5;i++)
{
std::cin>>ch;
a+=ch;
}
std::cout<<a;
}
The real problem here is not that you can't read or can't print the string, is that you are writing to unallocated memory. operator[], which is what you are using when you do something like a[i]=ch, does not do any kind of boundary checking and thus you are causing undefined behavior. In my machine, nothing is printed, for instance.
In short, you need to make sure that you have space to write your characters. If you are certain that you are going to read 5 characters (and adding a \0 at the end, making it 6 in length), you could do something like this:
std::string a(6, '\0')
If you are uncertain of how many characters you are going to read, std::string is ready to allocate space as need, but you need to use std::push_back to give it a chance to do so. Your loop contents would be something like:
cin >> ch;
a.push_back(ch);
If you are uncertain where the std::string object is coming from (as in, this is library code that accepts a std::string as an argument, you could use at(i) (e.g, a.at(i) = ch instead of a[i] = ch), which throws an exception if it is out of range.
You can print the string like this
#include<iostream>
#include<string>
using namespace std;
int main()
{
string a;char ch;
for(int i=0;i<5;i++)
{
cin>>ch;
a.push_back(ch);
}
a.push_back('\0');
cout << a;
return 0;
}

How to open a file containing ASCII numbers?

I have an assignment that tasks me with reading from a file that contains a series of numbers in ASCII decimal format and convert them to integers. I've made a function that does this but I don't know what the numbers are in the file. How do I see open a file that contains these type of numbers? Whenever I open it in a text editor or some other program I end up with series of integer numbers. Is this what it should look like?
Thank you in advance
Assuming you have a text file containing a series of numbers in ASCII decimal format, one number per line, you can easily accomplish your task using a C program like this one:
#include <stdlib.h>
#include <stdio.h>
#define MAX_LINE_LEN (32)
int main ( int argc, char * argv[] )
{
FILE * pf;
char line[ MAX_LINE_LEN ];
/* open text file for reading */
pf = fopen( "integers.txt", "r" );
if( !pf )
{
printf("error opening input file.\n");
return 1;
}
/* loop though the lines of the file */
while( fgets( line, MAX_LINE_LEN, pf ) )
{
/* convert ASCII to integer */
int n = atoi( line );
/* display integer */
printf("%d\n", n );
}
/* close text file */
fclose( pf );
return 0;
}
/* eof */

Using sed to transform a C struct and typedef

I have a couple structure definitions in my input code. For example:
struct node {
int val;
struct node *next;
};
or
typedef struct {
int numer;
int denom;
} Rational;
I used the following line to convert them into one line and copy it twice.
sed '/struct[^(){]*{/{:l N;s/\n//;/}[^}]*;/!t l;s/ */ /g;p;p}'
the result is this:
struct node { int val; struct node *next;};
struct node { int val; struct node *next;};
struct node { int val; struct node *next;};
typedef struct { int numer; int denom;} Rational;
typedef struct { int numer; int denom;} Rational;
typedef struct { int numer; int denom;} Rational;
This is what I want:
I would like the first line to be restored to the original structure block
I would like the second line to turn into to a function heading that looks like this...
void init_structName( structName *var, int data1, int data2 )
-structName is basically the name of the structure.
-var is any name you like.
-data1, data2.... are values that are in the struct.
3.I would like the third line to turn into to the function body. Where I initialize the the data parameters. It would look like this.
{
var->data1 = data1;
var->data2 = data2;
}
Keep in mind that ALL my struct definitions in the input file are placed in one line and copied three times. So when the code finds a structure defintion it can assume that there will be two more copies below.
For example, this is the output I want if the input file had the repeating lines shown above.
struct node {
int val;
struct node *next;
};
void init_node(struct node *var, int val, struct node *next)
{
var->val = val;
var->next = next;
}
typedef struct {
int numer;
int denom;
} Rational;
void init_Rational( Rational *var, int numer, int denom )
{
var->numer = numer;
var->denom = denom;
}
In case someone was curious. These functions will be called from the main function to initialize the struct variables.
Can someone help? I realize this is kind of tough.
Thanks so much!!
Seeing that sed is Turing Complete, it is possible to do it in a single go, but that doesn't mean that the code is very user friendly =)
My attempt at a solution would be:
#!/bin/sed -nf
/struct/b continue
p
d
: continue
# 1st step:
s/\(struct\s.*{\)\([^}]*\)\(}.*\)/\1\
\2\
\3/
s/;\(\s*[^\n}]\)/;\
\1/g
p
s/.*//
n
# 2nd step:
s/struct\s*\([A-Za-z_][A-Za-z_0-9]*\)\s*{\([^}]*\)}.*/void init_\1(struct \1 *var, \2)/
s/typedef\s*struct\s*{\([^}]*\)}\s*\([A-Za-z_][A-Za-z_0-9]*\)\s*;/void init_\2(struct \2 *var, \1)/
s/;/,/g
s/,\s*)/)/
p
s/.*//
n
# 3rd step
s/.*{\s*\([^}]*\)}.*/{\
\1}/
s/[A-Za-z \t]*[\* \t]\s*\([A-Za-z_][A-Za-z_0-9]*\)\s*;/\tvar->\1 = \1;\
/g
p
I'll try to explain everything I did, but firstly I must warn that this probably isn't very generalized. For example, it assumes that the three identical lines follow each other (ie. no other line between them).
Before starting, notice that the file is a script that requires the "-n" flag to run. This tells sed to not print anything to standard output unless the script explicitly tells it to (through the "p" command, for example). The "-f" options is a "trick" to tell sed to open the file that follows. When executing the script with "./myscript.sed", bash will execute "/bin/sed -nf myscript.sed", so it will correctly read the rest of the script.
Step zero would be just a check to see if we have a valid line. I'm assuming every valid line contains the word struct. If the line is valid, the script branches (jumps, the "b" command is equivalent to the goto statement in C) to the continue label (differently from C, labels start with ":", rather than ending with it). If it isn't valid, we force it to be printed with the "p" command, and then delete the line from pattern space with the "d" command. By deleting the line, sed will read the next line and start executing the script from the beginning.
If the line is valid, the actions to change the lines start. The first step is to generate the struct body. This is done by a series of commands.
Separate the line into three parts, everything up to the opening bracket, everything up to the closing bracket (but without including it), and everything from the closing bracket (now including it). I should mention that one of the quirks of sed is that we search for newlines with "\n", but write newlines with a "\" followed by an actual newline. That's why this command is split into three different lines. IIRC this behaviour is specific to POSIX sed, but probably the GNU version (present in most Linux distributions) allows writing a newline with "\n".
Add a newline after every semicolon. The this works is a bit awkward, we copy everything after the semicolon after a newline inserted after the semicolon. The g flag tells sed to do this repeatedly, and that is why it works. Also note again the newline escaping.
Force the result to be printed
Before the second step, we manually clear the lines from the pattern-space (ie. buffer), so we can start fresh for the next line. If we did this with the "d" command, sed would start reading the commands from the start of the file again. The "n" command then reads the next line into the pattern-space. After that, we start the commands to transform the line into a function declaration:
We first match the word struct, followed by zero or more white space, then followed by a C identifier that can start with underscore or alphabetic letters, and can contain underscores and alphanumeric characters. The identifier is captured into the "variable" "\1". We then match the content between brackets, which is stored into "\2". These are then used to generate the function declaration.
We then do the same process, but now for the "typedef" case. Notice that now the identifier is after the brackets, so "\1" now contains the contents inside the brackets and "\2" contains the identifier.
Now we replace all semicolons with commas, so it can start looking more like a function definition.
The last substitute command removes the extra comma before the closing parenthesis.
Finally print the result.
Again, before the last step, manually clean the pattern-space and read the next line. The step will then generate the function body:
Match and capture everything inside the brackets. Notice the ".*" before the opening bracket and after the closing bracket. This is used so only the contents of the brackets are written afterwards. When writing the output, we place the opening the bracket in a separate line.
We match alphabetic characters and spaces, so we can skip the type declaration. We require at least a white space character or an asterisk (for pointers) to mark the start of the identifier. We then proceed to capture the identifier. This only works because of what follows the capture: we explicitly require that after the identifier there are only optional white spaces followed by a semicolon. This forces the expression to get the identifier characters before the semicolon, ie. if there are more than two words, it will only get the last word. Therefore it would work with "unsigned int var", capturing "var" correctly. When writing the output, we place some indentation, followed by the desired format, including the escaped newline.
Print the final output.
I don't know if I was clear enough. Feel free to ask for any clarifications.
Hope this helps =)
This should give you a few tips on how inappropriate sed actually is for this sort of task. I couldn't figure out how to do it in one pass and by the time I finished writing the scripts, I noticed you were expecting somewhat different results.
Your problem is better suited for a scripting language and a parsing library. Consider python + pyparsing (here is an example C struct parsing grammar, but you would need something much simpler than that) or perl6's rules.
Still, perhaps this will be of some use if you decide to stick to sed:
pass-one.sh
#!/bin/sed -nf
/^struct/ {
s|^\(struct[^(){]*{\)|\1\n|
s|[^}];|;\n|gp
a \\n
}
/^typedef/ {
h
# create signature
s|.*{\(.*\)} \(.*\);|void init_\2( \2 *var, \1 ) {|
# insert argument list to signature and remove trailing ;
s|\([^;]*\); ) {|\1 ) {|g
s|;|,|g
p
g
# add constructor (further substitutions follow in pass-two)
s|.*{\(.*\)}.*|\1|
s|;|;\n|g
s|\n$||p
a }
a \\n
}
pass-two.sh
#!/bin/sed -f
# fix struct indent
/^struct/ {
:loop1
n
s|^ | |
t loop1
}
# unsigned int name -> var->name = name
/^void init_/{
:loop2
n
s|.* \(.*\);| var->\1 = \1;|
t loop2
}
Usage
$ cat << EOF | ./pass-one.sh | ./pass-two.sh
struct node { int val; struct node *next;};
typedef struct { int numer; int denom;} Rational;
struct node { int val; struct node *next;};
typedef struct { int numer; unsigned int denom;} Rational;
EOF
struct node {
int va;
struct node *nex;
};
void init_Rational( Rational *var, int numer, int denom ) {
var->numer = numer;
var->denom = denom;
}
struct node {
int va;
struct node *nex;
};
void init_Rational( Rational *var, int numer, unsigned int denom ) {
var->numer = numer;
var->denom = denom;
}

Resources