Print every n lines from a file - bash

I'm trying to print every nth line from file, but n is not a constant but a variable.
For instance, I want to replace sed -n '1~5p' with something like sed -n '1~${i}p'.
Is this possible?

awk can also do it in a more elegant way:
awk -v n=YOUR_NUM 'NR%n==1' file
With -v n=YOUR_NUM you indicate the number. Then, NR%n==1 evaluates to true just when the line number is on a form of 7n+1, so it prints the line.
Note how good it is to use awk for this: if you want the lines on the form of 7n+k, you just need to do: awk -v n=7 'NR%n==k' file.
Example
Let's print every 7 lines:
$ seq 50 | awk -v n=7 'NR%n==1'
1
8
15
22
29
36
43
50
Or in sed:
$ n=7
$ seq 50 | sed -n "1~$n p" # quote the expression, so that "$n" is expanded
1
8
15
22
29
36
43
50

The point is, you should use double quote " instead of the single one to wrap your sed codes. Variable won't be expanded within single quote. so:
sed -n "1~${i} p"

You can do it like this for example:
i=3
sed "2,${i}s/.*/changed line/g" InputFile
Example:
AMD$ cat File
aaaaaaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbbbb
cccccccccccccccccccccc
ddddddddddddddddddddddd
eeeeeeeeeeeeeeeeeeeeeeee
fffffffffffffffffffff
ggggggggggggggggggggg
AMD$ i=4; sed "2,${i}s/.*/changed line/g" File
aaaaaaaaaaaaaaaaaaaaaaaa
changed line
changed line
changed line
eeeeeeeeeeeeeeeeeeeeeeee
fffffffffffffffffffff
ggggggggggggggggggggg
The key is to use " " for variable substitution.

why a sed ?
head -${i} YourFile

Related

POSIX: abcdef to ab bc cd de ef

Using POSIX sed or awk, I would like to duplicate every second character in every pair of neighboring characters and list every newly-formed pair on a new line.
example.txt:
abcd 10001.
Expected result:
ab
bc
cd
d
1
10
00
00
01
1.
So far, this is what I have (N.B. omit "--posix" if on macOS). For some reason, adding a literal newline character before \2 does not produce the expected result. Removing the first group and using \1 has the same effect. What am I missing?
sed --posix -E -e 's/(.)(.)/&\2\
/g' example.txt
abb
cdd
100
000
1..
Try:
$ echo "abcd 10001." | awk '{for(i=1;i<length($0);i++) print substr($0,i,2)}'
ab
bc
cd
d
1
10
00
00
01
1.
You may use
sed --posix -e 's/./&\
&/g' example.txt | sed '1d;$d'
The first sed command finds every char in the string and replaces with the same char, then a newline and then the same char again. Since it replaces first and last chars, the first and last resulting lines must be removed, which is achieved with sed '1d;$d'.
Had sed supported lookarounds, one could have used (?!^).(?!$) (any char but not at the start or end of string) and the last sed command would not have been necessary, but it is not possible with sed. You could use it in perl though, perl -pe 's/(?!^).(?!$)/$&\n$&/g' example.txt (see demo online, $& in the RHS is the same as & placeholder in sed, the whole match value).
With GNU awk could you please try following. Written and tested with shown samples and tested it in link
https://ideone.com/qahp0S
awk '
BEGIN{
FS=""
}
{
for(i=1;i<=(NF-1);i++){
print $i$(i+1)
}
}
' Input_file
Explanation: setting field separator as NULL in the BEGIN section of program for all lines here. Then in main program running a for loop which runs from 1st field to till 2nd last field. In that loop's each iteration printing current and next field.
Using same routine, it can be done in bash itself:
s='abcd 10001.'
for((i=0; i<${#s}-1; i++)); do echo "${s:i:2}"; done
ab
bc
cd
d
1
10
00
00
01
1.
Just for fun, a single sed consisting of 3 substitutions:
$ echo "abcd 10001." | sed 's/./&&/g;s/\(^.\|.$\)//g;s/../&\n/g'
The first part duplicates all characters, the second part removes the first and last character, the third part adds a newline character after each character-pair.
If you want to be POSIX compliant you have to do:
$ echo "abcd 10001." | sed -e 's/./&&/g' -e 's/^.//g' -e 's/.$//g' -e 's/../&\n/g'
Here we had to add an extra one as the expression \(^.\|.$) is an ERE and posix sed only accepts a BRE
This might work for you (GNU sed):
sed 's/.\(.\)/&\n\1/;/../P;D' file
Replace the first two characters by the first two characters, a newline and the second character.
Print the first line if it is two characters long, delete the first line and repeat.
Alternative, more long winded:
sed -E ':a;s/^(([^\n]{2}\n)*[^\n])([^\n])([^\n])/\1\3\n\3\4/;ta' file
Or, with no hardcoded new line:
sed -E '/.../{G;s/^(.(.))(.*)(.)/\1\4\2\3/;P;D}' file
Lastly:
sed 's/./&\n&/g;s/^..\|..$/g' file
Process substitution isn't specified by POSIX. The POSIX requirement was only specified for awk and sed, so maybe the next solution is acceptable:
paste -d '\0' <(echo; fold -w1 example.txt) <(fold -w1 example.txt) | grep ..
or
while read -n1 ch; do
printf "%s\n%s" "${ch}" "${ch}"
done < example.txt | grep ..
or
sed 's/./&&/g;s/.//' example.txt | grep -o ..

Multiplying all values in a txt file with another value

My aim is to multiply all values in a text file with a number. In my case it is 1000.
Original text in file:
0.00493293814
0.0438981727
0.149746656
0.443125129
0.882018387
0.975789607
0.995755374
1
I want the output to look like:
(so, changing the contents of the file to...)
4.93293814
43.8981727
149.746656
443.125129
882.018387
975.789607
995.755374
1000
Or even rather:
4.9
43.8
149.7
443.1
882.0
975.7
995.7
1000
I am using bash on macOS in the terminal.
If you have dc :
cat infile | dc -f - -e '1k1000sa[la*Sdz0!=Z]sZzsclZx[Ld1/psblcd1-sc1<Y]sYlYx'
Using Perl
perl -lpe ' $_=$_*1000 '
with inputs and inline replacing
$ cat andy.txt
0.00493293814
0.0438981727
0.149746656
0.443125129
0.882018387
0.975789607
0.995755374
1
$ perl -i -lpe ' $_=$_*1000 ' andy.txt
$ cat andy.txt
4.93293814
43.8981727
149.746656
443.125129
882.018387
975.789607
995.755374
1000
$
One decimal place
perl -lpe ' $_=sprintf("%0.1f",$_*1000 ) '
Zero decimal place and rounding off
perl -lpe ' $_=sprintf("%0.0f",$_*1000 ) '
Zero decimal place and Truncating
perl -lpe ' $_=sprintf("%0.0f",int($_*1000) ) '
awk to the rescue!
$ awk '{printf "%.1f\n", $1*1000}' file > tmp && mv tmp file
Using num-utils. For answers to 8 decimal places:
numprocess '/*1000/' n.txt
For rounded answers to 1 decimal place:
numprocess '/*1000/' n.txt | numround -n '.1'
Use sed to prefix each line with 1000*, then process the resulting mathematical expressions with bc. To show only the first digit after the decimal point you can use sed again.
sed 's/^/1000*/' yourFile | bc | sed -E 's/(.*\..).*/\1/'
This will print the latter of your expected outputs. Just as you wanted, decimals are cut rather than rounded (1.36 is converted to 1.3).
To remove all decimal digits either replace the last … | sed … with sed -E 's/\..*//' or use the following command
sed 's:^.*$:1000*&/1:' yourFile | bc
With these commands overwriting the file directly is not possible. You have to write to a temporary file (append > tmp && mv tmp yourFile) or use the sponge command from the package moreutils (append | sponge yourFile).
However, if you want to remove all decimal points after the multiplication there is a trick. Instead of actually multiplying by 1000 we can syntactically shift the decimal point. This can be done in one single sed command. sed has the -i option to overwrite input files.
sed -i.bak -E 's/\..*/&000/;s/^[^.]*$/&.000/;s/\.(...).*/\1/;s/^(-?)0*(.)/\1\2/' yourFile
The command changes yourFile's content to
4
43
149
443
882
975
995
1000
A backup yourFile.bak of the original is created.
The single sed command should work with every input number format too (even for things like -.1 → -100).

bash - how do I use 2 numbers on a line to create a sequence

I have this file content:
2450TO3450
3800
4500TO4560
And I would like to obtain something of this sort:
2450
2454
2458
...
3450
3800
4500
4504
4508
..
4560
Basically I would need a one liner in sed/awk that would read the values on both sides of the TO separator and inject those in a seq command or do the loop on its own and dump it in the same file as a value per line with an arbitrary increment, let's say 4 in the example above.
I know I can use several one temp file, go the read command and sorts, but I would like to do it in a one liner starting with cat filename | etc. as it is already part of a bigger script.
Correctness of the input is guaranteed so always left side of TOis smaller than bigger side of it.
Thanks
Like this:
awk -F'TO' -v inc=4 'NF==1{print $1;next}{for(i=$1;i<=$2;i+=inc)print i}' file
or, if you like starting with cat:
cat file | awk -F'TO' -v inc=4 'NF==1{print $1;next}{for(i=$1;i<=$2;i+=inc)print i}'
Something like this might work:
awk -F TO '{system("seq " $1 " 4 " ($2 ? $2 : $1))}'
This would tell awk to system (execute) the command seq 10 4 10 for lines just containing 10 (which outputs 10), and something like seq 10 4 40 for lines like 10TO40. The output seems to match your example.
Given:
txt="2450TO3450
3800
4500TO4560"
You can do:
echo "$txt" | awk -F TO '{$2<$1 ? t=$1 : t=$2; for(i=$1; i<=t; i++) print i}'
If you want an increment greater than 1:
echo "$txt" | awk -F TO -v p=4 '{$2<$1 ? t=$1 : t=$2; for(i=$1; i<=t; i+=p) print i}'
Give a try to this:
sed 's/TO/ /' file.txt | while read first second; do if [ ! -z "$second" ] ; then seq $first 4 $second; else printf "%s\n" $first; fi; done
sed is used to replace TO with space char.
read is used to read the line, if there are 2 numbers, seq is used to generate the sequence. Otherwise, the uniq number is printed.
This might work for you (GNU sed):
sed -r 's/(.*)TO(.*)/seq \1 4 \2/e' file
This evaluates the RHS of the substitution command if the LHS contains TO.

how to delete a large number of lines from a file

I have a file with ~700,000 lines and I would like to remove a bunch of specific lines (~30,000) using bash scripting or another method.
I know I can remove lines using sed:
sed -i.bak -e '1d;34d;45d;678d' myfile.txt # an example
I have the lines in a text file but I don't know if I can use it as input to sed, maybe perl??
Thanks
A few options:
sed <(sed 's/$/d/' lines_file) data_file
awk 'NR==FNR {del[$1]; next} !(FNR in del)' lines_file data_file
perl -MPath::Class -e '
%del = map {$_ => 1} file("lines_file")->slurp(chomp => 1);
$f = file("data_file")->openr();
while (<$f>) {
print unless $del{$.};
}
'
perl -ne'
BEGIN{ local #ARGV =pop; #h{<>} =() }
exists $h{"$.\n"} or print;
' myfile.txt lines
You can make the remove the lines using sed file.
First make a list of lines to remove. (One line number for one line)
$ cat lines
1
34
45
678
Make this file to sed format.
$ sed -e 's|$| d|' lines >lines.sed
$ cat lines.sed
1 d
34 d
45 d
678 d
Now use this sed file and give it as input to sed command.
$ sed -i.bak -f lines.sed file_with_70k_lines
This will remove the lines.
If you can create a text file of the format
1d
34d
45d
678d
then you can run something like
sed -i.bak -f scriptfile datafile
You can use a genuine editor for that, and ed is the standard editor.
I'm assuming your lines are in a file lines.txt, one number per line, e.g.,
1
34
45
678
Then (with a blatant bashism):
ed -s file.txt < <(sed -n '/^[[:digit:]]\+$/p' lines.txt | sort -nr | sed 's/$/d/'; printf '%s\n' w q)
A first sed selects only the numbers from file lines.txt (just in case).
There's something quite special to take into account here: that when you delete line 1, then line 34 in the original file becomes line 33. So it's better to remove the lines from the end: start with 678, then 45, etc. that's why we're using sort -nr (to sort the numbers in reverse order). A final sed appends d (ed's delete command) to the numbers.
Then we issue the w (write) and q (quit) commands.
Note that this overwrites the original file!

Translate newline to comma

I have a text file having a list of around 150 to 200 file names
abc.txt
pqr.txt
xyz.txt
...
...
I need a string of comma separated files.
Each string should have not more than 20 files. So the echo will look something like this...
$string1="abc.txt,pqr.txt,xyz.txt..."
$string2="abc1.txt,pqr1.txt,xyz1.txt..."
...
The number of strings will be different depending upon the number of lines in the file. I have written something like this...
#!/bin/sh
delim=','
for gsfile in `cat filelist.txt`
do
filelist=$filelist$delim$gsfile
echo $filelist
done
Translate command is working as expected, but how do I restrict each string to 20 filenames?
cat filelist.txt | tr '\n' ','
Just use xargs:
$ seq 1 50 | xargs -n20 | tr ' ' ,
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40
41,42,43,44,45,46,47,48,49,50
One way using sed:
Add like Igor Chubin 50 numbers to infile:
seq 1 50 >infile
Content of script.sed:
:b
## While not last line...
$! {
## Check if line has 19 newlines. Try substituting the line with itself and
## check if it succeed, then append next line and do it again in a loop.
s/\(\n[^n]*\)\{19\}/&/
ta
N
bb
}
## There are 20 lines in the buffer or found end of file, so substitute all '\n'
## with commas and print.
:a
s/\n/,/g
p
Run it like:
sed -nf script.sed infile
With following output:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40
41,42,43,44,45,46,47,48,49,50
This might work for you:
seq 41 | paste -sd ',,,,,,,,,,,,,,,,,,,\n'
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40
41
or GNU sed:
seq 41 | sed ':a;$bb;N;s/\n/&/19;Ta;:b;y/\n/,/'
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40
41
Use a flag in sed's s command to replace every 20th comma with a newline:
< filelist.txt tr '\n' , | sed ':a; s/,/\n/20; P; D; ta'; echo

Resources