How can I delete every Xth line in a text file? - bash

Consider a text file with scientific data, e.g.:
5.787037037037037063e-02 2.048402977658663748e-01
1.157407407407407413e-01 4.021264347118673754e-01
1.736111111111111049e-01 5.782032163406526371e-01
How can I easily delete, for instance, every second line, or every 9 out of 10 lines in the file? Is it for example possible with a bash script?
Background: the file is very large but I need much less data to plot. Note that I am using Ubuntu/Linux.

This is easy to accomplish with awk.
Remove every other line:
awk 'NR % 2 == 0' file > newfile
Remove every 10th line:
awk 'NR % 10 != 0' file > newfile
The NR variable in awk is the line number. Anything outside of { } in awk is a conditional, and the default action is to print.

How about perl?
perl -n -e '$.%10==0&&print' # print every 10th line

You could possibly do it with sed, e.g.
sed -n -e 'p;N;d;' file # print every other line, starting with line 1
If you have GNU sed it's pretty easy
sed -n -e '0~10p' file # print every 10th line
sed -n -e '1~2p' file # print every other line starting with line 1
sed -n -e '0~2p' file # print every other line starting with line 2

Try something like:
awk 'NR%3==0{print $0}' file
This will print one line in three. Or:
awk 'NR%10<9{print $0}' file
will print 9 lines out of ten.

This might work for you (GNU sed):
seq 10 | sed '0~2d' # delete every 2nd line
1
3
5
7
9
seq 100 | sed '0~10!d' # delete 9 out of 10 lines
10
20
30
40
50
60
70
80
90
100

You can use a awk and a shell script. Awk can be difficult but...
This will delete specific lines you tell it to:
nawk -f awkfile.awk [filename]
awkfile.awk contents
BEGIN {
if (!lines) lines="3 4 7 8"
n=split(lines, lA, FS)
for(i=1;i<=n;i++)
linesA[lA[i]]
}
!(FNR in linesA)
Also I can't remember if VIM comes with the standard Ubuntu or not. If not get it.
Then open the file with vim
vim [filename]
Then type
:%!awk NR\%2 or :%!awk NR\%2
This will delete every other line. Just change the 2 to another integer for a different frequency.

Related

Extract lines from text file, using starting line number and amount of lines to extract, in bash?

I have seen How can I extract a predetermined range of lines from a text file on Unix? but I have a slightly different use case: I want to specify a starting line number, and a count/amount/number of lines to extract, from a text file.
So, I tried to generate a text file, and then compose an awk command to extract a count of 10 lines starting from line number 100 - but it does not work:
$ seq 1 500 > test_file.txt
$ awk 'BEGIN{s=100;e=$s+10;} NR>=$s&&NR<=$e' test_file.txt
$
So, what would be an easy approach to extract lines from a text file using a starting line number, and count of lines, in bash? (I'm ok with awk, sed, or any such tool, for instance in coreutils)
This gives you text that is inclusive of both end points
(eleven output lines, here).
$ START=100
$
$ sed -n "${START},$((START + 10))p" < test_file.txt
The -n says "no print by default".
And then the p says "print this line",
for lines within the example range of 100,110
When you want to use awk, use something like
seq 1 500 | awk 'NR>=100 && NR<=110'
Advantage of awk is the flexibility for changing the requirements.
When you want to use a variable start and skip the endpoints, it will be
start=100
seq 1 500 | awk -v start="${start}" 'NR > start && NR < start + 10'
Another alternative with tail and head:
tail -n +$START test_file.txt | head -n $NUMBER
If test_file.txt is very large and $START and $NUMBER are small, the following variant should be the fastest:
head -n $((START+NUMBER)) test_file.txt | tail -n +$START
Anyway, I prefer the sed solution noticed above for short input files:
sed -n "$START,$((START+NUMBER)) p" test_file.txt
sed -n "$Start,$End p" file
is likely a better way to get those lines.
$ seq 1 500 > test_file.txt
$ awk 'BEGIN{s=100;e=$s+10;} NR>=$s&&NR<=$e' test_file.txt
$
$s in GNU AWK means value of s-th field, $e in GNU AWK means value of e-th field. There are not fields yet in BEGIN clause so $s for any s is not set, as you use in arithemtic context it will be assumed to be 0 and therefore e will be set to value 10. Output of seq is single number per line, so there is not 10th field, so GNU AWK assumes it to be zero when asked to compare it with number, as NR is always strictly bigger than 0 your condition never holds so output is empty.
Use Range if you are able to prepare condition which holds solely for starting line and condition which holds solely for ending line, in this case
awk 'BEGIN{s=100}NR==s,NR==s+10' test_file.txt
gives output
100
101
102
103
104
105
106
107
108
109
110
Keep in mind that this will process whole file, if you have huge file and area of interest is relatively near begin, then you might decrease time consumption by ending processing at end of area of interest following way
awk 'BEGIN{s=100}NR>=s{print}NR==s+10{exit}' test_file.txt
(tested in GNU Awk 5.0.1)
This command extracts 30 lines starting from line 100
sed -n '100,$p' test_file.txt | head -30

Sed file from row number stored in array

I've an array such
echo ${arr[#]}
1 13 19 30 34
I would like to use this array to sed rows (1,13,19,30 and 34) from other file. I know that I can use a loop, but I would like to know if there is a more straightforward way to do this. So far I've not been able to do it.
Thanks
sed solution:
a=(1 13 19 30 34)
sed -n "$(sed 's/[^[:space:]]*/&p;/g' <<< ${a[#]})" file
This will extract 1, 13, 19, 30 and 34th rows from file
You can execute a single sed command on each line by appending the command and a semicolon to each line, and run the result as a sed program. This can be managed in a compact way using bash pattern replacement in variables and arrays; for example, to print the selected lines, use the p command (-n suppresses printing the unselected lines):
sed -n "${arr[*]/%/p;}"
Works fine also with more complex commands like s/from/to/:
sed "${arr[*]/%/s/from/to/;}"
This will perform the replacement only on the selected lines.
awk -v rows="${arr[*]}" 'BEGIN{split(rows,tmp); for (i in tmp) nrs[tmp[i]]} NR in nrs' file
You could use awk and the system function to run the sed command
awk '{ for (i=1;i<=NF;i++) { system("sed -n \""$i"p\" filename") } }' <<< ${arr[#]}
This can be open to command injection though and so assess the risk accordingly.

How to add 100 spaces at end of each line of a file in Unix

I have a file which is supposed to contain 200 characters in each line. I received a source file with only 100 characters in each line. I need to add 100 extra white spaces to each line now. If it were few blank spaces, we could have used sed like:
sed 's/$/ /' filename > newfilename
Since it's 100 spaces, can anyone tell me is it possible to add in Unix?
If you want to have fixed n chars per line (don't trust the input file has exact m chars per line) follow this. For the input file with varying number of chars per line:
$ cat file
1
12
123
1234
12345
extend to 10 chars per line.
$ awk '{printf "%-10s\n", $0}' file | cat -e
1 $
12 $
123 $
1234 $
12345 $
Obviously change 10 to 200 in your script. Here $ shows end of line, it's not there as a character. You don't need cat -e, here just to show the line is extended.
With awk
awk '{printf "%s%100s\n", $0, ""}' file.dat
$0 refers to the entire line.
Updated after Glenn's suggestion
Somewhat how Glenn suggests in the comments, the substitution is unnecessary, you can just add the spaces - although, taking that logic further, you don't even need the addition, you can just say them after the original line.
perl -nlE 'say $_," "x100' file
Original Answer
With Perl:
perl -pe 's/$/" " x 100/e' file
That says... "Substitute (s) the end of each line ($) with the calculated expression (e) of 100 repetitions of a space".
If you wanted to pad all lines to, say, 200 characters even if the input file was ragged (all lines of differing length), you could use something like this:
perl -pe '$pad=200-length;s/$/" " x $pad/e'
which would make up lines of 83, 102 and 197 characters to 200 each.
If you use Bash, you can still use sed, but use some readline functionality to keep you from manually typing 100 spaces (see manual for "Readline arguments").
You start typing normally:
sed 's/$/
Now, you want to insert 100 spaces. You can do this by prepending hitting the space bar with a readline argument to indicate that you want it to happen 100 times, i.e., you manually enter what would look like this as a readline keybinding:
M-1 0 0 \040
Or, if your meta key is the alt key: Alt+1 00Space
This inserts 100 spaces, and you get
sed 's/$/ /' filename
after typing the rest of the command.
This is useful for working in an interactive shell, but not very pretty for scripts – use any of the other solutions for that.
Just in case you are looking for a bash solution,
while IFS= read -r line
do
printf "%s%100s\n" "$line"
done < file > newfile
Test
Say I have a file with 3 lines it it as
$ wc -c file
16 file
$ wc -c newfile
316 newfile
Original Answer
spaces=$(echo {1..101} | tr -d 0-9)
while read line
do
echo -e "${line}${spaces}\n" >> newfile
done < file
You can use printf in awk:
awk '{printf "%s%*.s\n", $0, 100, " "}' filename > newfile
This printf will append 100 spaces at the end of each newline.
Another way in GNU awk using string-manipulation function sprintf.
awk 'BEGIN{s=sprintf("%-100s", "");}{print $0 s}' input-file > file-with-spaces
A proof with an example:-
$ cat input-file
1234jjj hdhyvb 1234jjj
6789mmm mddyss skjhude
khora77 koemm sado666
nn1004 nn1004 457fffy
$ wc -c input-file
92 input-file
$ awk 'BEGIN{s=sprintf("%-100s", "");}{print $0 s}' input-file > file-with-spaces
$ wc -c file-with-spaces
492 file-with-spaces

How do you use grep to find a pattern in a file, EDIT IT with awk (or some other thing), and then save it?

I need to edit specific lines in a text file. I have a pattern here, pattern.txt:
1
3
6
17
etc...
and a file with text, file.txt:
1 text
2 text
3 text
4 text
5 text
etc...
I want to add the words _PUT FLAG HERE to the end of each line of file.txt on lines that have match indicated by the pattern.txt.
I have
grep -F -f pattern.txt file.txt | awk '{print $0 "_PUT FLAG HERE" }'
But I can't seem to figure out a way to shove those changes back into the original file so it looks like this:
1 text_PUT FLAG HERE
2 text
3 text_PUT FLAG HERE
4 text
5 text
6 teeskjtkljeltsj _PUT FLAG HERE
etc...
It's a lot like trying to use tr, but much more convoluted. There should be a logical way to string AWK and grep, I just can't seem conceive of a way to put the pieces together into one pipe that would do this, and I can't find the answer anywhere. (If you explain a sed way to do this, please explain the regex.)
Assume your awk has been taken hostage.
A GNU sed/grep solution! To generate a sed script that does what you want, we get the lines to change from the input file:
$ grep -wFf pattern.txt file.txt
1 text
3 text
6 text
17 text
This matches complete words (-w) so 1 text is matched, but 11 text is not; -F is for fixed strings (no regex, should be faster) and -f pattern.txt reads the patterns to look for from a file.
Now we pipe this to sed to generate a script:
$ grep -wFf pattern.txt file.txt | sed 's#.*#/^&$/s/$/_PUT FLAG HERE/#'
/^1 text$/s/$/_PUT FLAG HERE/
/^3 text$/s/$/_PUT FLAG HERE/
/^6 text$/s/$/_PUT FLAG HERE/
/^17 text$/s/$/_PUT FLAG HERE/
The sed command in the pipe matches the complete line (.*) and assembles an address plus substitution command (& stands for the whole previously matched line).
Now we take all that and use it as input for sed by means of process substitution (requires Bash):
$ sed -f <(grep -wFf pattern.txt file.txt | sed 's#.*#/^&$/s/$/_PUT FLAG HERE/#') file.txt
1 text_PUT FLAG HERE
2 text
3 text_PUT FLAG HERE
4 text
5 text
6 text_PUT FLAG HERE
7 text
8 text
9 text
10 text
11 text
12 text
13 text
14 text
15 text
16 text
17 text_PUT FLAG HERE
Done!
Yes, yes, awk is shorter1, faster and more beautiful.
1 Actually not, but still.
Another remark: the grep step is not actually required, see answers by potong and Walter A.
Try this:
pattern.txt:
1
3
6
17
file.txt:
1 text
2 text
3 text
4 text
5 text
Use awk:
$ awk 'NR == FNR{seen[$1];next} $1 in seen{printf("%s_PUT FLAG HERE\n",$0);next}1' pattern.txt file.txt
Output:
1 text_PUT FLAG HERE
2 text
3 text_PUT FLAG HERE
4 text
5 text
awk to the rescue!
you don't need other tools with the full power of awk at your disposal
$ awk -v tag='_PUT FLAG HERE' 'NR==FNR{a[$1];next}
{print $0 ($1 in a?tag:"")}' pattern file
1 text_PUT FLAG HERE
2 text
3 text_PUT FLAG HERE
4 text
5 text
just as an exercise, do the same with join/sort
$ sort <(join pattern file --nocheck-order |
sed 's/$/_PUT_FLAG_HERE/') <(join -v2 pattern file --nocheck-order)
1 text_PUT_FLAG_HERE
2 text
3 text_PUT_FLAG_HERE
4 text
5 text
perhaps defining function for DRY
$ f() { join $1 pattern file --nocheck-order; }; sort <(f "" |
sed 's/$/_PUT_FLAG_HERE/') <(f -v2)
The solution of #Benjamin can be simplified to
sed -f <(sed 's#.*#/^& /s/$/_PUT FLAG HERE/#' pattern.txt) file.txt
Explanation
# Read awk commands from a file
sed -f awkcommands.txt pattern.txt file.txt
# Read awk commands from other command
sed -f <(other_command) file.txt
# Append string to every line by replacing end-of-line character $
sed 's/$/_PUT FLAG HERE/'
# Only append string on lines matching something
sed '/something/s/$/_PUT FLAG HERE/#'
# Only append string on lines matching something at the beginning of the line followed by a space
sed '/^something /s/$/_PUT FLAG HERE/#'
# Get the word something in above command selecting the whole line with .* and putting it in the new sed command with &.
# The slashes are used for the inner sed command, so use # here
sed 's#.*#/^& /s/$/_PUT FLAG HERE/#' pattern.txt
# Now all together:
sed -f <(sed 's#.*#/^& /s/$/_PUT FLAG HERE/#' pattern.txt) file.txt
This might work for you (GNU sed):
sed 's#.*#/&/s/$/_PUT FLAG HERE/#' pattern.txt | sed -f - file
This turns the pattern file into a sed script which is then invoked against the text file.
This solution uses only Bash (4.0+) features:
# Set up associative array 'patterns' whose keys are patterns
declare -A patterns
for pat in $(< pattern.txt) ; do patterns[$pat]=1 ; done
# Slurp all the lines of 'file.txt' into the 'lines' array
readarray -t lines < file.txt
# Write each old line in the file, possibly with a suffix, back to the file
for line in "${lines[#]}" ; do
read -r label text <<< "$line"
printf '%s%s\n' "$line" "${patterns[$label]+_PUT FLAG HERE}"
done > file.txt
NOTES:
The changes are written back to 'file.txt', as the question seems to specify.
Bash 4.0 or later is required for associative arrays and readarray.
Bash is very slow, so this solution may not be practical if either of the files is large (more than 10 thousand lines).

sed: Argument list too long

I have created a script in Unix environment. In the script, I used the sed command as shown below to delete some lines from the file. I want to delete a specified set of lines, not necessarily a simple range, from the file, specified by line numbers.
sed -i "101d; 102d; ... 4930d;" <file_name>
When I execute this it shows the following error:
sed: Arg is too long
Can you please help to resolve this problem?
If you want to delete a contiguous range of lines, you can specify a range of line numbers:
sed -i '101,4930d' file
If you want to delete some arbitrary set of lines that can't easily be expressed as a range, you can put the commands in a file rather than on the command line, and use sed -f.
For example, if foo.sed contains:
2d
4d
6d
8d
10d
then this:
sed -i -f foo.sed file
will delete lines 2, 4, 6, 8, and 10 from file. Putting the commands in a file rather than on the command line avoids limits on command line length.
If there's some pattern to the lines you want to delete, you might consider using a more sophisticated tool such as Awk or Perl.
I had this exact same problem.
I originally put the giant sed command sed -i "101d; 102d; ... 4930d;" <file_name> in a file and tried to execute as a bash script.
To fix - put only the deletion commands in a file and run that file as a sed script. I was able to execute 18,193 deletion commands that had failed to run before.
sed -i -f to_delete.sed input_file
to_delete.sed:
101d;102d;...4930d
With awk:
awk ' NR < 101 || NR > 4930 { print } ' input_file
This might work for you (GNU sed and awk):
cat <<\! >/tmp/a
> 2
> 4
> 6
> 8
> !
seq 10 >/tmp/b
sed 's/$/d/' /tmp/a | sed -f - /tmp/b
1
3
5
7
9
10
awk 'NR==FNR{a[$0];next};FNR in a{next};1' /tmp/{a,b}
1
3
5
7
9
10

Resources