How to add 100 spaces at end of each line of a file in Unix - shell

I have a file which is supposed to contain 200 characters in each line. I received a source file with only 100 characters in each line. I need to add 100 extra white spaces to each line now. If it were few blank spaces, we could have used sed like:
sed 's/$/ /' filename > newfilename
Since it's 100 spaces, can anyone tell me is it possible to add in Unix?

If you want to have fixed n chars per line (don't trust the input file has exact m chars per line) follow this. For the input file with varying number of chars per line:
$ cat file
1
12
123
1234
12345
extend to 10 chars per line.
$ awk '{printf "%-10s\n", $0}' file | cat -e
1 $
12 $
123 $
1234 $
12345 $
Obviously change 10 to 200 in your script. Here $ shows end of line, it's not there as a character. You don't need cat -e, here just to show the line is extended.

With awk
awk '{printf "%s%100s\n", $0, ""}' file.dat
$0 refers to the entire line.

Updated after Glenn's suggestion
Somewhat how Glenn suggests in the comments, the substitution is unnecessary, you can just add the spaces - although, taking that logic further, you don't even need the addition, you can just say them after the original line.
perl -nlE 'say $_," "x100' file
Original Answer
With Perl:
perl -pe 's/$/" " x 100/e' file
That says... "Substitute (s) the end of each line ($) with the calculated expression (e) of 100 repetitions of a space".
If you wanted to pad all lines to, say, 200 characters even if the input file was ragged (all lines of differing length), you could use something like this:
perl -pe '$pad=200-length;s/$/" " x $pad/e'
which would make up lines of 83, 102 and 197 characters to 200 each.

If you use Bash, you can still use sed, but use some readline functionality to keep you from manually typing 100 spaces (see manual for "Readline arguments").
You start typing normally:
sed 's/$/
Now, you want to insert 100 spaces. You can do this by prepending hitting the space bar with a readline argument to indicate that you want it to happen 100 times, i.e., you manually enter what would look like this as a readline keybinding:
M-1 0 0 \040
Or, if your meta key is the alt key: Alt+1 00Space
This inserts 100 spaces, and you get
sed 's/$/ /' filename
after typing the rest of the command.
This is useful for working in an interactive shell, but not very pretty for scripts – use any of the other solutions for that.

Just in case you are looking for a bash solution,
while IFS= read -r line
do
printf "%s%100s\n" "$line"
done < file > newfile
Test
Say I have a file with 3 lines it it as
$ wc -c file
16 file
$ wc -c newfile
316 newfile
Original Answer
spaces=$(echo {1..101} | tr -d 0-9)
while read line
do
echo -e "${line}${spaces}\n" >> newfile
done < file

You can use printf in awk:
awk '{printf "%s%*.s\n", $0, 100, " "}' filename > newfile
This printf will append 100 spaces at the end of each newline.

Another way in GNU awk using string-manipulation function sprintf.
awk 'BEGIN{s=sprintf("%-100s", "");}{print $0 s}' input-file > file-with-spaces
A proof with an example:-
$ cat input-file
1234jjj hdhyvb 1234jjj
6789mmm mddyss skjhude
khora77 koemm sado666
nn1004 nn1004 457fffy
$ wc -c input-file
92 input-file
$ awk 'BEGIN{s=sprintf("%-100s", "");}{print $0 s}' input-file > file-with-spaces
$ wc -c file-with-spaces
492 file-with-spaces

Related

Making bash output a certain word from a .txt file

I have a question on Bash:
Like the title says, I require bash to output a certain word, depending on where it is in the file. In my explicit example I have a simple .txt file.
I already found out that you can count the number of words within a file with the command:
wc -w < myFile.txt
An output example would be:
78501
There certainly is also a way to make "cat" to only show word number x. Something like:
cat myFile.txt | wordno. 3125
desired-word
Notice, that I will welcome any command, that gets this done, not only cat.
Alternatively or in addition, I would be happy to know how you can make certain characters in a file show, based on their place in it. Something like:
cat myFile.txt | characterno. 2342
desired-character
I already know how you can achieve this with a variable:
a="hello, how are you"
echo ${a:9:1}
w
Only problem is a variable can only be so long. Is it as long as a whole .txt file, it won't work.
I look forward to your answers!
You could use awkfor this job it splits the string at spaces and prints the $wordnumber stringpart and tr is used to remove newlines
cat myFile.txt | tr -d '\n' | awk -v wordnumber=5 '{ print $wordnumber }'
And if you want the for example 5th. character you could do this like so
head -c 5 myFile.txt | tail -c 1
Since you have NOT shown samples of Input_file or expected output so couldn't test it. You could simply do this with awk as follows could be an example.
awk 'FNR==1{print substr($0,2342,1);next}' Input_file
Where we are telling awk to look for 1st line FNR==1 and in substr where we tell awk to take character 2342 and next 1 means from that position take only 1 character you could increase its value or keep it as per your need too.
With gawk:
awk 'BEGIN{RS="[[:space:]]+"} NR==12345' file
or
gawk 'NR==12345' RS="[[:space:]]+" file
I'm setting the record separator to a sequences of spaces which includes newlines and then print the 12345th record.
To improve the average performance you can exit the script once the match is found:
gawk 'BEGIN{RS="[[:space:]]+"}NR==12345{print;exit}' file

Adding line number to a list redirected to a .txt file in Bash [duplicate]

How can I add numbers to the beginning of every line in a file?
E.g.:
This is
the text
from the file.
Becomes:
000000001 This is
000000002 the text
000000003 from the file.
Don't use cat or any other tool which is not designed to do that. Use the program:
nl - number lines of files
Example:
$ nl --number-format=rz --number-width=9 foobar
$ nl -n rz -w 9 foobar # short-hand
Because nl is made for it ;-)
AWK's printf, NR and $0 make it easy to have precise and flexible control over the formatting:
~ $ awk '{printf("%010d %s\n", NR, $0)}' example.txt
0000000001 This is
0000000002 the text
0000000003 from the file.
You're looking for the nl(1) command:
$ nl -nrz -w9 /etc/passwd
000000001 root:x:0:0:root:/root:/bin/bash
000000002 daemon:x:1:1:daemon:/usr/sbin:/bin/sh
000000003 bin:x:2:2:bin:/bin:/bin/sh
...
-w9 asks for numbers nine digits long; -nrz asks for the numbers to be formatted right-justified with zero padding.
cat -n thefile will do the job, albeit with the numbers in a slightly different format.
Easiest, simplest option is
awk '{print NR,$0}' file
See comment above on why nl isn't really the best option.
Here's a bash script that will do this also:
#!/bin/bash
counter=0
filename=$1
while read -r line
do
printf "%010d %s" $counter $line
let counter=$counter+1
done < "$filename"
perl -pe 'printf "%09u ", $.' -- example.txt

Matching a pattern with sed and getting an integer out at the same time

I have an xml file with these lines (among others):
#Env=DEV2,DEV3,DEV5,DEV6
#Enter your required DEV environment after the ENV= in the next line:
Env=DEV6
I need to:
Verify that the text after ENV= is of the pattern DEV{1..99}
extract the number (in this case, 6) from the line ENV=DEV6 to some environment variable
I know a bit of awk and grep, and can use those to get the number, but I'm thinking of Sed, which I'm told matches patterns nicer than awk and takes less time. Also, I'm concerned about long long lines of greps matching the beginning of the line for that particular Env= .
How would I go about doing it with Sed? would I get away with a shorter line?
I'm a sed newbie, read a bunch of tutorials and examples and got my fingers twisted trying to do both things at the same time...
Can use grep also if pcre regex is available
$ cat ip.txt
#Env=DEV2,DEV3,DEV5,DEV6
#Enter your required DEV environment after the ENV= in the next line:
Env=DEV6
foo
Env=DEV65
bar
Env=DEV568
$ grep -xoP 'Env=DEV\K[1-9][0-9]?' ip.txt
6
65
-x match whole line
-o output only matching text
-P use pcre regex
Env=DEV\K match Env=DEV but not part of output
[1-9][0-9]? range of 1 to 99
I suggest with GNU sed:
var=$(sed -nE 's/^Env=DEV([0-9]{1,2})$/\1/p' file)
echo "$var"
Output:
6
awk -F'Env=DEV' '/Env=DEV[0-9]$|Env=DEV[0-9][0-9]$/{print $2}' input
Input:
echo '
Env=DEV6
Env=DEVasd
Env=DEV62
Env=DEV622'
Output:
awk -F'Env=DEV' '/Env=DEV[0-9]$|Env=DEV[0-9][0-9]$/{print $2}' input
6
62
To store it into any variable:
var=$(awk command)
In awk. First some test cases:
$ cat file
foo
Env=DEV0
Env=DEV1
Env=DEV99
Env=DEV100
$ awk 'sub(/^Env=DEV/,"") && /^[1-9][0-9]?$/' file
1
99
You can used sed as
$ sed 's/^Env=DEV\([1-9][0-9]\?\)/\1/' file
6
You can directly use the above command in export command as
export YOUR_EXPORT_VARIABLE=$(sed 's/^Env=DEV\([1-9][0-9]\?\)/\1/' file)
(or) its pretty straight forward with perl
$ perl -nle 'print $1 if /Env=DEV.*?(\d+)/' file
6

Searching a file (grep/awk) for 2 carriage return/line-feed characters

I'm trying to write a script that'll simply count the occurrences of \r\n\r\n in a file. (Opening the sample file in vim binary mode shows me the ^M character in the proper places, and the newline is still read as a newline).
Anyway, I know there are tons of solutions, but they don't seem to get me what I want.
e.g. awk -e '/\r/,/\r/!d' or using $'\n' as part of the grep statement.
However, none of these seem to produce what I need. I can't find the \r\n\r\n pattern with grep's "trick", since that just expands one variable. The awk solution is greedy, and so gets me way more lines than I want/need.
Switching grep to binary/Perl/no-newline mode seems to be closer to what I want,
e.g. grep -UPzo '\x0D', but really what I want then is grep -UPzo '\x0D\x00\x0D\x00', which doesn't produce the output I want.
It seems like such a simple task.
By default, awk treats \n as the record separator. That makes it very hard to count \r\n\r\n. If we choose some other record separator, say a letter, then we can easily count the appearance of this combination. Thus:
awk '{n+=gsub("\r\n\r\n", "")} END{print n}' RS='a' file
Here, gsub returns the number of substitutions made. These are summed and, after the end of the file has been reached, we print the total number.
Example
Here, we use bash's $'...' construct to explicitly add newlines and linefeeds:
$ echo -n $'\r\n\r\n\r\n\r\na' | awk '{n+=gsub("\r\n\r\n", "")} END{print n}' RS='a'
2
Alternate solution (GNU awk)
We can tell it to treat \r\n\r\n as the record separator and then return the count (minus 1) of the number of records:
cat file <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
In awk, RS is the record separator and NR is the count of the number of records. Since we are using a multiple-character record separator, this requires GNU awk.
If the file ends with \r\n\r\n, the above would be off by one. To avoid that, the echo -n 1 statement is used to assure that there are always at least one character after the last \r\n\r\n in the file.
Examples
Here, we use bash's $'...' construct to explicitly add newlines and linefeeds:
$ echo -n $'abc\r\n\r\n' | cat - <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
1
$ echo -n $'abc\r\n\r\ndef' | cat - <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
1
$ echo -n $'\r\n\r\n\r\n\r\n' | cat - <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
2
$ echo -n $'1\r\n\r\n2\r\n\r\n3' | cat - <(echo 1) | awk 'END{print NR-1;}' RS='\r\n\r\n'
2

How can I delete every Xth line in a text file?

Consider a text file with scientific data, e.g.:
5.787037037037037063e-02 2.048402977658663748e-01
1.157407407407407413e-01 4.021264347118673754e-01
1.736111111111111049e-01 5.782032163406526371e-01
How can I easily delete, for instance, every second line, or every 9 out of 10 lines in the file? Is it for example possible with a bash script?
Background: the file is very large but I need much less data to plot. Note that I am using Ubuntu/Linux.
This is easy to accomplish with awk.
Remove every other line:
awk 'NR % 2 == 0' file > newfile
Remove every 10th line:
awk 'NR % 10 != 0' file > newfile
The NR variable in awk is the line number. Anything outside of { } in awk is a conditional, and the default action is to print.
How about perl?
perl -n -e '$.%10==0&&print' # print every 10th line
You could possibly do it with sed, e.g.
sed -n -e 'p;N;d;' file # print every other line, starting with line 1
If you have GNU sed it's pretty easy
sed -n -e '0~10p' file # print every 10th line
sed -n -e '1~2p' file # print every other line starting with line 1
sed -n -e '0~2p' file # print every other line starting with line 2
Try something like:
awk 'NR%3==0{print $0}' file
This will print one line in three. Or:
awk 'NR%10<9{print $0}' file
will print 9 lines out of ten.
This might work for you (GNU sed):
seq 10 | sed '0~2d' # delete every 2nd line
1
3
5
7
9
seq 100 | sed '0~10!d' # delete 9 out of 10 lines
10
20
30
40
50
60
70
80
90
100
You can use a awk and a shell script. Awk can be difficult but...
This will delete specific lines you tell it to:
nawk -f awkfile.awk [filename]
awkfile.awk contents
BEGIN {
if (!lines) lines="3 4 7 8"
n=split(lines, lA, FS)
for(i=1;i<=n;i++)
linesA[lA[i]]
}
!(FNR in linesA)
Also I can't remember if VIM comes with the standard Ubuntu or not. If not get it.
Then open the file with vim
vim [filename]
Then type
:%!awk NR\%2 or :%!awk NR\%2
This will delete every other line. Just change the 2 to another integer for a different frequency.

Resources