How to extract (read and delete) a line from file with a single command? - bash

I would like to extract the first line from a file, read into a variable and delete right afterwards, with a single command. I know sed can read the first line as follows:
sed '1q' file.txt
or delete it as follows:
sed '1q;d' file.txt
but can I somehow do both with a single command?
The reason for this is that multiple processes will be reading the first line of the file, and I want to minimize the chances of them getting the same line.

It's impossible.
Except you read the manpage, and have Gnu-sed:
echo -e {1..3}"\n" > input
cat input
1
2
3
sed -n '1p;2,$ Woutput' input
1
cat output
2
3
Explanation:
sed -n '1p;2,$ Woutput' input
-n no output by default
1p; print line 1
2,$ from line 2 until $ last line
W (non posix) Write buffer to file
From the man page gnu sed:
w filename
Write the current pattern space to filename.
W filename
Write the first line of the current pattern space to filename. This is a GNU extension.
However, reading and experimenting takes longer, than opening the file in a full blown office suite and deleting the line by hand, or invoking a text-to-speech framework and training it, to do the job.
It doesn't work if invoked in posix style:
sed -n --posix '1p;2,$ Woutput' input
And you still have the hard hanwork of renaming output to input again.
I didn't try to write to input in place, because that could damage my carefully crafted input file - try it on own risk:
sed -n '1p;2,$ Winput' input
However, you might set up a filesystem notify job, which always rename freshly created output files to input again. But I fear you can't do it from within the sed command. Except ... (to be continued)

Related

Sed through files without using for loop?

I have a small script which basically generates a menu of all the scripts in my ~/scripts folder and next to each of them displays a sentence describing it, that sentence being the third line within the script commented out. I then plan to pipe this into fzf or dmenu to select it and start editing it or whatever.
1 #!/bin/bash
2
3 # a script to do
So it would look something like this
foo.sh a script to do X
bar.sh a script to do Y
Currently I have it run a for loop over all the files in the scripts folder and then run sed -n 3p on all of them.
for i in $(ls -1 ~/scripts); do
echo -n "$i"
sed -n 3p "~/scripts/$i"
echo
done | column -t -s '#' | ...
I was wondering if there is a more efficient way of doing this that did not involve a for loop and only used sed. Any help will be appreciated. Thanks!
Instead of a loop that is parsing ls output + sed, you may try this awk command:
awk 'FNR == 3 {
f = FILENAME; sub(/^.*\//, "", f); print f, $0; nextfile
}' ~/scripts/* | column -t -s '#' | ...
Yes there is a more efficient way, but no, it doesn't only use sed. This is probably a silly optimization for your use case though, but it may be worthwhile nonetheless.
The inefficiency is that you're using ls to read the directory and then parse its output. For large directories, that causes lots of overhead for keeping that list in memory even though you only traverse it once. Also, it's not done correctly, consider filenames with special characters that the shell interprets.
The more efficient way is to use find in combination with its -exec option, which starts a second program with each found file in turn.
BTW: If you didn't rely on line numbers but maybe a tag to mark the description, you could also use grep -r, which avoids an additional process per file altogether.
This might work for you (GNU sed):
sed -sn '1h;3{H;g;s/\n/ /p}' ~/scripts/*
Use the -s option to reset the line number addresses for each file.
Copy line 1 to the hold space.
Append line 3 to the hold space.
Swap the hold space for the pattern space.
Replace the newline with a space and print the result.
All files in the directory ~/scripts will be processed.
N.B. You may wish to replace the space delimiter by a tab or pipe the results to the column command.

How to split a text file content by a string?

Suppose I've got a text file that consists of two parts separated by delimiting string ---
aa
bbb
---
cccc
dd
I am writing a bash script to read the file and assign the first part to var part1 and the second part to var part2:
part1= ... # should be aa\nbbb
part2= ... # should be cccc\ndd
How would you suggest write this in bash ?
You can use awk:
foo="$(awk 'NR==1' RS='---\n' ORS='' file.txt)"
bar="$(awk 'NR==2' RS='---\n' ORS='' file.txt)"
This would read the file twice, but handling text files in the shell, i.e. storing their content in variables should generally be limited to small files. Given that your file is small, this shouldn't be a problem.
Note: Depending on your actual task, you may be able to just use awk for the whole thing. Then you don't need to store the content in shell variables, and read the file twice.
A solution using sed:
foo=$(sed '/^---$/q;p' -n file.txt)
bar=$(sed '1,/^---$/b;p' -n file.txt)
The -n command line option tells sed to not print the input lines as it processes them (by default it prints them). sed runs a script for each input line it processes.
The first sed script
/^---$/q;p
contains two commands (separated by ;):
/^---$/q - quit when you reach the line matching the regex ^---$ (a line that contains exactly three dashes);
p - print the current line.
The second sed script
1,/^---$/b;p
contains two commands:
1,/^---$/b - starting with line 1 until the first line matching the regex ^---$ (a line that contains only ---), branch to the end of the script (i.e. skip the second command);
p - print the current line;
Using csplit:
csplit --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}" && sed -i '/---/d' foo_bar*
If version of coreutils >= 8.22, --suppress-matched option can be used and sed processing is not required, like
csplit --suppress-matched --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}".

How do I get rid of “--” line separator when using grep

I'm using the commands given below for splitting my fastq file into two separate paired end reads files:
grep '#.*/1' -A 3 24538_7#2.fq >24538_7#2_1.fq
grep '#.*/2' -A 3 24538_7#2.fq >24538_7#2_2.fq
But it's automatically introducing a -- line separator between the entries. Hence, making my fastq file inappropriate for further processing(because it then becomes an invalid fastq format).
So, I want to get rid of the line separator(--).
PS: I've found the answer for Linux machine but I'm using MacOS, and those didn't work on Mac terminal.
You can use the --no-group-separator option to suppress it (in GNU grep).
Alternatively, you could use (GNU) sed:
sed '\|#.*/1|,+3!d'
deletes all lines other than the one matching #.*/1 and the next three lines.
For macOS sed, you could use
sed -n '\|#.*/1|{N;N;N;p;}'
but this gets unwieldy quickly for more context lines.
Another approach would be to chain grep with itself:
grep '#.*/1' -A 3 file.fq | grep -v "^--"
The second grep selects non-matching (-v) lines that start with -- (though this pattern can sometimes be interpreted as a command line option, requiring some weird escaping like "[-][-]", which is why i put the ^ there).

Delete everything after a certain line in bash

I was wondering if there was a way to delete everything after a certain line of a text file in bash. So say there's a text file with 10 lines, and I want to delete every line after line number 4, so only the first 4 lines remained, how would I go about doing that?
You can use GNU sed:
sed -i '5,$d' file.txt
That is, 5,$ means the range line 5 until the end, and d means to delete.
Only the first 4 lines will remain.
The -i flag tells sed to edit the file in-place.
If you have only BSD sed, then the -i flag requires a backup file suffix:
sed -i.bak '5,$d' file.txt
As #ephemient pointed out, while this solution is simple,
it's inefficient because sed will still read the input until the end of the file, which is unnecessary.
As #agc pointed out, the inverse logic of my first proposal might be actually more intuitive. That is, do not print by default (-n flag),
and explicitly print range 1,4:
sed -ni.bak 1,4p file.txt
Another simple alternative, assuming that the first 4 lines are not excessively long and so they easily fit in memory, and also assuming that the 4th line ends with a newline character,
you can read the first 4 lines into memory and then overwrite the file:
lines=$(head -n 4 file.txt)
echo "$lines" > file.txt
Minor refinements on Janos' answer, ephemient's answer, and cdark's comment:
Simpler (and faster) sed code:
sed -i 4q file
When a filter util can't directly edit a file, there's
sponge:
head -4 file | sponge file
Most efficient for Linux might be truncate -- coreutils sibling util to fallocate, which offers the same minimal I/O of ephemient's more portable, (but more complex), dd-based answer:
truncate -s `head -4 file | wc -c` file
The sed method that #janos is simple but inefficient. It will read every line from the original file, even ones it could ignore (although that can be fixed using 4q), and -i actually creates a new file (which it renames to replace the original file). And there's the annoying bit where you need to use sed -i '5,$d' file.txt with GNU sed but sed -i '' '5,$d' file.txt with BSD sed in order to remove the existing file instead of leaving a backup.
Another method that performs less I/O:
dd bs=1 count=0 if=/dev/null of=file.txt \
seek=$(grep -b ^ file.txt | tail -n+5 | head -n1 | cut -d: -f1)
grep -b ^ file.txt prints out byte offsets on each line, e.g.
$ yes | grep -b ^
0:y
2:y
4:y
...
tail -n+5 skips the first 4 lines, outputting the 5th and subsequent lines
head -n1 takes only the next line (e.g. only the 5th line)
After head reads the one line, it will exit. This causes tail to exit because it has nowhere to output to anymore. This causes grep to exit for the same reason. Thus, the rest of file.txt does not need to be examined.
cut -d: -f1 takes only the first part before the : (the byte offset)
dd bs=1 count=0 if=/dev/null of=file.txt seek=N
using a block size of 1 byte, seek to block N of file.txt
copy 0 blocks of size 1 byte from /dev/null to file.txt
truncate file.txt here (because conv=notrunc was not given)
In short, this removes all data on the 5th and subsequent lines from file.txt.
On Linux there is a command named fallocate which can similarly extend or truncate a file, but that's not portable.
UNIX filesystems support efficiently truncating files in-place, and these commands are portable. The downside is that it's more work to write out.
(Also, dd will print some unnecessary stats to stderr, and will exit with an error if the file has fewer than 5 lines, although in that case it will leave the existing file contents in place, so the behavior is still correct. Those can be addressed also, if needed.)
If I don't know the line number, merely the line content (I need to know that there is nothing below the line containing 'knowntext' that I want to preserve.), then I use.
sed -i '/knowntext/,$d' inputfilename
to directly alter the file, or to be cautious
sed '/knowntext/,$d' inputfilename > outputfilename
where inputfilename is unaltered, and outputfilename contains the truncated version of the input.
I am not competent to comment on the efficiency of this, but I know that files of 20kB or so are dealt with faster than I can blink.
Using GNU awk (v. 4.1.0+, see here). First we create a test file (NOTICE THE DISCLAIMER):
$ seq 1 10 > file # THIS WILL OVERWRITE FILE NAMED file WITH TEST DATA
Then the code and validation (WILL MODIFY THE ORIGINAL FILE NAMED file):
$ awk -i inplace 'NR<=4' file
$ cat file
1
2
3
4
Explained:
$ awk -i inplace ' # edit is targetted to the original file (try without -i ...)
NR<=4 # output first 4 records
' file # file
You could also exit on line NR==5 which would be quicker if you redirected the output of the program to a new file (remove # for action) which would be the same as head -4 file > new_file:
$ awk 'NR==5{exit}1' file # > new_file
When testing, don't forget the seq part first.

Send output from `split` utility to stdout

From this question, I found the split utilty, which takes a file and splits it into evenly sized chunks. By default, it outputs these chunks to new files, but I'd like to get it to output them to stdout, separated by a newline (or an arbitrary delimiter). Is this possible?
I tried cat testfile.txt | split -b 128 - /dev/stdout
which fails with the error split: /dev/stdoutaa: Permission denied.
Looking at the help text, it seems this tells split to use /dev/stdout as a prefix for the filename, not to write to /dev/stdout itself. It does not indicate any option to write directly to a single file with a delimiter. Is there a way I can trick split into doing this, or is there a different utility that accomplishes the behavior I want?
It's not clear exactly what you want to do, but perhaps the --filter option to split will help out:
--filter=COMMAND
write to shell COMMAND; file name is $FILE
Maybe you can use that directly. For example, this will read a file 10 bytes at a time, passing each chunk through the tr command:
split -b 10 --filter "tr [:lower:] [:upper:]" afile
If you really want to emit a stream on stdout that has separators between chunks, you could do something like:
split -b 10 --filter 'dd 2> /dev/null; echo ---sep---' afile
If afile is a file in my current directory that looks like:
the quick brown fox jumped over the lazy dog.
Then the above command will result in:
the quick ---sep---
brown fox ---sep---
jumped ove---sep---
r the lazy---sep---
dog.
---sep---
From info page :
`--filter=COMMAND'
With this option, rather than simply writing to each output file,
write through a pipe to the specified shell COMMAND for each
output file. COMMAND should use the $FILE environment variable,
which is set to a different output file name for each invocation
of the command.
split -b 128 --filter='cat ; echo ' inputfile
Here is one way of doing it. You will get each 128 character into variable "var".
You may use your preferred delimiter to print or use it for further processing.
#!/bin/bash
cat yourTextFile | while read -r -n 128 var ; do
printf "\n$var"
done
You may use it as below at command line:
while read -r -n 128 var ; do printf "\n$var" ; done < yourTextFile
No, the utility will not write anything to standard output. The standard specification of it says specifically that standard output in not used.
If you used split, you would need to concatenate the created files, inserting a delimiter in between them.
If you just want to insert a delimiter every N th line, you may use GNU sed:
$ sed '0~3a\-----\' file
This inserts a line containing ----- every 3rd line.
To divide the file into chunks, separated by newlines, and write to stdout, use fold:
cat yourfile.txt | fold -w 128
...will write to stdout in "chunks" of 128 chars.

Resources