I've used /dev/null a lot in bash programming to send unnecessary output into a black hole.
For example, this command:
$ echo 'foo bar' > /dev/null
$
Will not echo anything. I've read that /dev/null is an empty file used to dispose of unwanted output through redirection. But how exactly does this disposal take place? I can't imagine /dev/null writing the content to a file and then immediately deleting that file. So what actually happens when you redirect to this file?
>/dev/null redirects the command standard output to the null device, which is a special device which discards the information written to it.
It's all implemented via file_operations (drivers/char/mem.c if you're curious to look yourself):
static const struct file_operations null_fops = {
.llseek = null_lseek,
.read = read_null,
.write = write_null,
.splice_write = splice_write_null,
};
write_null is what's called when you write to /dev/null. It always returns the same number of bytes that you write to it:
static ssize_t write_null(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
return count;
}
That's it. The buffer is just ignored.
Related
I have a text file and want to remove the first line (header), to read the file without header into a pipeline. This seems like a trivial question that has been answered many times, but due to the size of the files I'm facing, the solutions i found so far were not working. For my test runs i used echo "$(tail -n +2 "$FILE_NAME")" > "$FILE_NAME", but running this with my a bigger file results in the following error: bash: xrealloc: cannot allocate 18446744071562067968 bytes (1679360 bytes allocated) Is there any method that edits the file in place? Loading them into the memory wont work, some of my files are up to 400 Gb in size.
Thanks for the help!
You can use code like this:
awk 'NR!=1 {print}' input_file >output file
This will send to output file all but first line. You can use this construction to do your operations:
awk 'NR!=1 {print}' input_file|operation1|operation2...
Changing your command on this way can do the work:
tail -n +2 "$FILE_NAME" > "${FILE_NAME}.new"
This will need double diskspace
Tail is reasonably efficient for this operation.
The issue is with you wanting to overwrite the original file.
Using bash "$()" to defer the creation of the output file means bash has to hold the content in memory, hence the error message. For large files you would be better off writing the output to a temporary file, then use mv to move that over the original.
When sed is used in overwrite mode it does exactly this (for anything over a few lines).
sed -i 1d "$FILE_NAME"
It runs sed with the verysimple script 1d which picks the first line (the 1 selector) and deletes it (the d command). Thanks to the in-place option -i your file will be overwritten without using an intermediate file.
Even though you do not bother with an intermediate file, sed uses his own intermediate file internally. Your disk usage will suffer up to twice the file size during this operation.
I'm just going to address the "edit the file in place" portion of the question, although it appears that was not really what you were looking for. You will find many solutions describing features that claim to do in-place editing, but usually those solutions don't actually edit the file at all. Instead, they write to a temporary file and then overwrite the original with the temporary file. (eg, sed --in-place is a common solution which writes to a temporary file). Editing the file in place is something that you almost never actually want to do, since mutating a file is dangerous. Truly, if you believe you want to edit a file in place, give it serious thought and assume that you are wrong. However, if for some reason you really do need to do it, it's probably safest to just do it:
#include <err.h>
#include <stdio.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <unistd.h>
FILE * xfopen(const char *path, const char *mode);
int is_regular(int, const char *);
int
main(int argc, char **argv)
{
const char *rpath = argc > 1 ? argv[1] : "stdin";
const char *wpath = argc > 1 ? argv[1] : "stdout";
FILE *fr = argc > 1 ? xfopen(rpath, "r") : stdin;
FILE *fw = argc > 1 ? xfopen(wpath, "r+") : stdout;
char buf[BUFSIZ];
int c;
size_t rc;
off_t length = 0;
/* Discard the first line */
while( (c = getc(fr)) != EOF && c != '\n' ) {
;
}
if( c != EOF) while( (rc = fread(buf, 1, BUFSIZ, fr)) > 0) {
size_t wc;
wc = fwrite(buf, 1, rc, fw);
length += wc;
if( wc!= rc) {
break;
}
}
if( fclose(fr) ) {
err(EXIT_FAILURE, "%s", rpath);
}
if( is_regular(fileno(fw), wpath) && ftruncate(fileno(fw), length)) {
err(EXIT_FAILURE, "%s", wpath);
}
if( fclose(fw)) {
err(EXIT_FAILURE, "%s", wpath);
}
return EXIT_SUCCESS;
}
FILE *
xfopen(const char *path, const char *mode)
{
FILE *fp = fopen(path, mode);
if( fp == NULL ) {
perror(path);
exit(EXIT_FAILURE);
}
return fp;
}
int
is_regular(int fd, const char *name)
{
struct stat s;
if( fstat(fd, &s) == -1 ) {
perror(name);
exit(EXIT_FAILURE);
}
return !!(s.st_mode & S_IFREG);
}
By being explicit, it's pretty clear that you can easily lose data in the file. But if you want to avoid reading the entire file into memory, or avoid having two copies on some backing media at the same time, there's no way to avoid doing that and any solution which obscures that risk is fooling you. So making it explicit and knowing where the dangers lie is the right thing to do.
We can use the -i (in-place) option with sed to write the change back to the input file instead of printing the result to stdout:
sed -i '1d' FILE
I have created a misc driver and has made a sample read function like this
static ssize_t test_read(struct file *file, char __user *buffer,
size_t count, loff_t *ppos)
{
pr_info("Count arg : %d\n",count);
return ret;
}
I now try to read the device using a userspace code as shown below
uint64_t read_buff;
fread(&read_buff, sizeof(read_buff), 1, fp)
The dmesg log I get is
[ 1593.273163] Count arg : 4096
I was expecting it to be of the size of uint64_t. Could anybody point me why I get an unexpected value?
Seems that fread() tries to buffer some data for userland. I found source code of one fread() that buffers data (in __srefill()). So, it's OK for fread() to do so.
If you want to avoid such unexpected results, lower one level down and work with read() function in userland.
I'm trying to view the filename via kgdb, so I cannot call functions and macros to get it programatically. I need to find it by manually inspecting data structures.
Like if I had a breakpoint here in gdb, how could I look around with gdb and find the filename?
I've tried looking around in filp.f_path, filp.f_inode, etc. I cannot see the filename anywhere.
ssize_t do_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos)
{
struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = len };
struct kiocb kiocb;
ssize_t ret;
init_sync_kiocb(&kiocb, filp);
kiocb.ki_pos = *ppos;
kiocb.ki_left = len;
kiocb.ki_nbytes = len;
ret = filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos);
if (-EIOCBQUEUED == ret)
ret = wait_on_sync_kiocb(&kiocb);
*ppos = kiocb.ki_pos;
return ret;
}
You can get the filename from struct file *filp with filp->f_path.dentry->d_iname.
To get the full path call dentry_path_raw(filp->f_path.dentry,buf,buflen).
In the Linux kernel, the file structure is essentially how the kernel "sees" the file. The kernel is not interested in the file name, just the inode of the open file. This means that all of the other information which is important to the user is lost.
EDIT: This answer is wrong. You can get the dentry using filp->f_path.dentry. From there you can get the name of the dentry or the full path using the relevant FS flags.
The path is stored in the file->f_path structure as it's name implies. Just not in a plain-text form, but parsed into objects that are more useful for kernel operation, namely a chain of dentry structures, and the vfsmount structure pointing to the root of the current subtree.
You can use the d_path function to regenerate a human-readable path name for a struct path like file->f_path. Note that however this is not a cheap operation and it may slow down your workload significantly.
The above mentioned issues about open but unlinked files, multiple hardlinks and similar are valid for mapping from and inode to a pathname, and open file always has a path associated with it. If the file has been unlinked d_path will prepend a " (deleted)" to the name, and if the filename it has been opened with has been changed to something else using rename since it was opened d_path will not print the original name, but the current name of the entry that was used for opening it.
filp->f_path.dentry->d_name.name
This worked for me
I am newbei to driver programming i am started writing the simple char driver . Then i created special file for my char driver mknod /dev/simple-driver c 250 0 .when it type cat /dev/simple-driver. it shows the string "Hello world from Kernel mode!". i know that function
static const char g_s_Hello_World_string[] = "Hello world tamil_vanan!\n\0";
static const ssize_t g_s_Hello_World_size = sizeof(g_s_Hello_World_string);
static ssize_t device_file_read(
struct file *file_ptr
, char __user *user_buffer
, size_t count
, loff_t *possition)
{
printk( KERN_NOTICE "Simple-driver: Device file is read at offset =
%i, read bytes count = %u", (int)*possition , (unsigned int)count );
if( *possition >= g_s_Hello_World_size )
return 0;
if( *possition + count > g_s_Hello_World_size )
count = g_s_Hello_World_size - *possition;
if( copy_to_user(user_buffer, g_s_Hello_World_string + *possition, count) != 0 )
return -EFAULT;
*possition += count;
return count;
}
is get called . This is mapped to (*read) in file_opreation structure of my driver .My question is how this function is get called , how the parameters like struct file,char,count, offset are passed bcoz is i simply typed cat command ..Please elabroate how this happening
In Linux all are considered as files. The type of file, whether it is a driver file or normal file depends upon the mount point where it is mounted.
For Eg: If we consider your case : cat /dev/simple-driver traverses back to the mount point of device files.
From the device file name simple-driver it retrieves Major and Minor number.
From those number(especially from minor number) it associates the driver file for your character driver.
From the driver it uses struct file ops structure to find the read function, which is nothing but your read function:
static ssize_t device_file_read(struct file *file_ptr, char __user *user_buffer, size_t count, loff_t *possition)
User_buffer will always take sizeof(size_t count).It is better to keep a check of buffer(In some cases it throws warning)
String is copied to User_buffer(copy_to_user is used to check kernel flags during copy operation).
postion is 0 for first copy and it increments in the order of count:position+=count.
Once read function returns the buffer to cat. and cat flushes the buffer contents on std_out which is nothing but your console.
cat will use some posix version of read call from glibc. Glibc will put the arguments on the stack or in registers (this depends on your hardware architecture) and will switch to kernel mode. In the kernel the values will be copied to the kernel stack. And in the end your read function will be called.
I have a couple structure definitions in my input code. For example:
struct node {
int val;
struct node *next;
};
or
typedef struct {
int numer;
int denom;
} Rational;
I used the following line to convert them into one line and copy it twice.
sed '/struct[^(){]*{/{:l N;s/\n//;/}[^}]*;/!t l;s/ */ /g;p;p}'
the result is this:
struct node { int val; struct node *next;};
struct node { int val; struct node *next;};
struct node { int val; struct node *next;};
typedef struct { int numer; int denom;} Rational;
typedef struct { int numer; int denom;} Rational;
typedef struct { int numer; int denom;} Rational;
This is what I want:
I would like the first line to be restored to the original structure block
I would like the second line to turn into to a function heading that looks like this...
void init_structName( structName *var, int data1, int data2 )
-structName is basically the name of the structure.
-var is any name you like.
-data1, data2.... are values that are in the struct.
3.I would like the third line to turn into to the function body. Where I initialize the the data parameters. It would look like this.
{
var->data1 = data1;
var->data2 = data2;
}
Keep in mind that ALL my struct definitions in the input file are placed in one line and copied three times. So when the code finds a structure defintion it can assume that there will be two more copies below.
For example, this is the output I want if the input file had the repeating lines shown above.
struct node {
int val;
struct node *next;
};
void init_node(struct node *var, int val, struct node *next)
{
var->val = val;
var->next = next;
}
typedef struct {
int numer;
int denom;
} Rational;
void init_Rational( Rational *var, int numer, int denom )
{
var->numer = numer;
var->denom = denom;
}
In case someone was curious. These functions will be called from the main function to initialize the struct variables.
Can someone help? I realize this is kind of tough.
Thanks so much!!
Seeing that sed is Turing Complete, it is possible to do it in a single go, but that doesn't mean that the code is very user friendly =)
My attempt at a solution would be:
#!/bin/sed -nf
/struct/b continue
p
d
: continue
# 1st step:
s/\(struct\s.*{\)\([^}]*\)\(}.*\)/\1\
\2\
\3/
s/;\(\s*[^\n}]\)/;\
\1/g
p
s/.*//
n
# 2nd step:
s/struct\s*\([A-Za-z_][A-Za-z_0-9]*\)\s*{\([^}]*\)}.*/void init_\1(struct \1 *var, \2)/
s/typedef\s*struct\s*{\([^}]*\)}\s*\([A-Za-z_][A-Za-z_0-9]*\)\s*;/void init_\2(struct \2 *var, \1)/
s/;/,/g
s/,\s*)/)/
p
s/.*//
n
# 3rd step
s/.*{\s*\([^}]*\)}.*/{\
\1}/
s/[A-Za-z \t]*[\* \t]\s*\([A-Za-z_][A-Za-z_0-9]*\)\s*;/\tvar->\1 = \1;\
/g
p
I'll try to explain everything I did, but firstly I must warn that this probably isn't very generalized. For example, it assumes that the three identical lines follow each other (ie. no other line between them).
Before starting, notice that the file is a script that requires the "-n" flag to run. This tells sed to not print anything to standard output unless the script explicitly tells it to (through the "p" command, for example). The "-f" options is a "trick" to tell sed to open the file that follows. When executing the script with "./myscript.sed", bash will execute "/bin/sed -nf myscript.sed", so it will correctly read the rest of the script.
Step zero would be just a check to see if we have a valid line. I'm assuming every valid line contains the word struct. If the line is valid, the script branches (jumps, the "b" command is equivalent to the goto statement in C) to the continue label (differently from C, labels start with ":", rather than ending with it). If it isn't valid, we force it to be printed with the "p" command, and then delete the line from pattern space with the "d" command. By deleting the line, sed will read the next line and start executing the script from the beginning.
If the line is valid, the actions to change the lines start. The first step is to generate the struct body. This is done by a series of commands.
Separate the line into three parts, everything up to the opening bracket, everything up to the closing bracket (but without including it), and everything from the closing bracket (now including it). I should mention that one of the quirks of sed is that we search for newlines with "\n", but write newlines with a "\" followed by an actual newline. That's why this command is split into three different lines. IIRC this behaviour is specific to POSIX sed, but probably the GNU version (present in most Linux distributions) allows writing a newline with "\n".
Add a newline after every semicolon. The this works is a bit awkward, we copy everything after the semicolon after a newline inserted after the semicolon. The g flag tells sed to do this repeatedly, and that is why it works. Also note again the newline escaping.
Force the result to be printed
Before the second step, we manually clear the lines from the pattern-space (ie. buffer), so we can start fresh for the next line. If we did this with the "d" command, sed would start reading the commands from the start of the file again. The "n" command then reads the next line into the pattern-space. After that, we start the commands to transform the line into a function declaration:
We first match the word struct, followed by zero or more white space, then followed by a C identifier that can start with underscore or alphabetic letters, and can contain underscores and alphanumeric characters. The identifier is captured into the "variable" "\1". We then match the content between brackets, which is stored into "\2". These are then used to generate the function declaration.
We then do the same process, but now for the "typedef" case. Notice that now the identifier is after the brackets, so "\1" now contains the contents inside the brackets and "\2" contains the identifier.
Now we replace all semicolons with commas, so it can start looking more like a function definition.
The last substitute command removes the extra comma before the closing parenthesis.
Finally print the result.
Again, before the last step, manually clean the pattern-space and read the next line. The step will then generate the function body:
Match and capture everything inside the brackets. Notice the ".*" before the opening bracket and after the closing bracket. This is used so only the contents of the brackets are written afterwards. When writing the output, we place the opening the bracket in a separate line.
We match alphabetic characters and spaces, so we can skip the type declaration. We require at least a white space character or an asterisk (for pointers) to mark the start of the identifier. We then proceed to capture the identifier. This only works because of what follows the capture: we explicitly require that after the identifier there are only optional white spaces followed by a semicolon. This forces the expression to get the identifier characters before the semicolon, ie. if there are more than two words, it will only get the last word. Therefore it would work with "unsigned int var", capturing "var" correctly. When writing the output, we place some indentation, followed by the desired format, including the escaped newline.
Print the final output.
I don't know if I was clear enough. Feel free to ask for any clarifications.
Hope this helps =)
This should give you a few tips on how inappropriate sed actually is for this sort of task. I couldn't figure out how to do it in one pass and by the time I finished writing the scripts, I noticed you were expecting somewhat different results.
Your problem is better suited for a scripting language and a parsing library. Consider python + pyparsing (here is an example C struct parsing grammar, but you would need something much simpler than that) or perl6's rules.
Still, perhaps this will be of some use if you decide to stick to sed:
pass-one.sh
#!/bin/sed -nf
/^struct/ {
s|^\(struct[^(){]*{\)|\1\n|
s|[^}];|;\n|gp
a \\n
}
/^typedef/ {
h
# create signature
s|.*{\(.*\)} \(.*\);|void init_\2( \2 *var, \1 ) {|
# insert argument list to signature and remove trailing ;
s|\([^;]*\); ) {|\1 ) {|g
s|;|,|g
p
g
# add constructor (further substitutions follow in pass-two)
s|.*{\(.*\)}.*|\1|
s|;|;\n|g
s|\n$||p
a }
a \\n
}
pass-two.sh
#!/bin/sed -f
# fix struct indent
/^struct/ {
:loop1
n
s|^ | |
t loop1
}
# unsigned int name -> var->name = name
/^void init_/{
:loop2
n
s|.* \(.*\);| var->\1 = \1;|
t loop2
}
Usage
$ cat << EOF | ./pass-one.sh | ./pass-two.sh
struct node { int val; struct node *next;};
typedef struct { int numer; int denom;} Rational;
struct node { int val; struct node *next;};
typedef struct { int numer; unsigned int denom;} Rational;
EOF
struct node {
int va;
struct node *nex;
};
void init_Rational( Rational *var, int numer, int denom ) {
var->numer = numer;
var->denom = denom;
}
struct node {
int va;
struct node *nex;
};
void init_Rational( Rational *var, int numer, unsigned int denom ) {
var->numer = numer;
var->denom = denom;
}