I need to write a shell script to append characters to each line in a text to make all lines be the same length. For example, if the input is:
Line 1 has 25 characters.
Line two has 27 characters.
Line 3: all lines must have the same number of characters.
Here "Line 3" has 58 characters (not including the newline character) so I have to append 33 characters to "Line 1" and 31 characters to "Line 2". The output should look like:
Line 1 has 25 characters.000000000000000000000000000000000
Line two has 27 characters.0000000000000000000000000000000
Line 3: all lines must have the same number of characters.
We can assume the max length (58 in the above example) is known.
Here is one way of doing it:
while read -r; do # Read from the file one line at a time
printf "%s" "$REPLY" # Print the line without the newline
for (( i=1; i<=((58 - ${#REPLY})); i++ )); do # Find the difference in length to iterate
printf "%s" "0" # Pad 0s
done
printf "\n" # Add the newline
done < file
Output:
Line 1 has 25 characters.000000000000000000000000000000000
Line two has 27 characters.0000000000000000000000000000000
Line 3: all lines must have the same number of characters.
Of course this is easy if you know the max length of the line. If you don't then you need to read the file in an array keep track of the length of each line and keeping the length of the line which is longest in a variable. Once you have completely read the file, you iterate your array and do the same for loop shown above.
awk '{print length($0)}' <file_name> | sort -nr | head -1
you would not need a loop to find the highest length
Here's a cryptic one:
perl -lpe '$_.="0"x(58-length)' file
Related
Im trying to make a script that creates a file say file01.txt that writes a number on each line.
001
002
...
998
999
then I want to read the file line by line and sum each line and say whether the number is even or odd.
sum each line like 0+0+1 = 1 which is odd
9+9+8 = 26 so even
001 odd
002 even
..
998 even
999 odd
I tried
while IFS=read -r line; do sum+=line >> file02.txt; done <file01.txt
but that sums the whole file not each line.
You can do this fairly easily in bash itself making use of built-in parameter expansions to trim leading zeros from the beginning of each line in order to sum the digits for odd / even.
When reading from a file (either a named file or stdin by default), you can use the initialization with default to use the first argument (positional parameter) as the filename (if given) and if not, just read from stdin, e.g.
#!/bin/bash
infile="${1:-/dev/stdin}" ## read from file provide as $1 or stdin
Which you will use infile with your while loop, e.g.
while read -r line; do ## loop reading each line
...
done < "$infile"
To trim the leading zeros, first obtain the substring of leading zeros trimming all digits from the right until only zeros remain, e.g.
leading="${line%%[1-9]*}" ## get leading 0's
Now using the same type parameter expansion with # instead of %% trim the leading zeros substring from the front of line saving the resulting number in value, e.g.
value="${line#$leading}" ## trim from front
Now zero your sum and loop over the digits in value to obtain the sum of digits:
for ((i=0;i<${#value};i++)); do ## loop summing digits
sum=$((sum + ${value:$i:1}))
done
All that remains is your even / odd test. Putting it altogether in a short example script that intentionally outputs the sum of digits in addition to your wanted "odd" / "even" output, you could do:
#!/bin/bash
infile="${1:-/dev/stdin}" ## read from file provide as $1 or stdin
while read -r line; do ## read each line
[ "$line" -eq "$line" 2>/dev/null ] || continue ## validate integer
leading="${line%%[1-9]*}" ## get leading 0's
value="${line#$leading}" ## trim from front
sum=0 ## zero sum
for ((i=0;i<${#value};i++)); do ## loop summing digits
sum=$((sum + ${value:$i:1}))
done
printf "%s (sum=%d) - " "$line" "$sum" ## output line w/sum
## (temporary output)
if ((sum % 2 == 0)); then ## check odd / even
echo "even"
else
echo "odd"
fi
done < "$infile"
(note: you can actually loop over the digits in line and skip removing the leading zeros substring. The removal ensure that if the whole value is used it isn't interpreted as an octal value -- up to you)
Example Use/Output
Using a quick process substitution to provide input of 001 - 020 on stdin you could do:
$ ./sumdigitsoddeven.sh < <(printf "%03d\n" {1..20})
001 (sum=1) - odd
002 (sum=2) - even
003 (sum=3) - odd
004 (sum=4) - even
005 (sum=5) - odd
006 (sum=6) - even
007 (sum=7) - odd
008 (sum=8) - even
009 (sum=9) - odd
010 (sum=1) - odd
011 (sum=2) - even
012 (sum=3) - odd
013 (sum=4) - even
014 (sum=5) - odd
015 (sum=6) - even
016 (sum=7) - odd
017 (sum=8) - even
018 (sum=9) - odd
019 (sum=10) - even
020 (sum=2) - even
You can simply remove the output of "(sum=X)" when you have confirmed it operates as you expect and redirect the output to your new file. Let me know if I understood your question properly and if you have further questions.
Would you please try the bash version:
parity=("even" "odd")
while IFS= read -r line; do
mapfile -t ary < <(fold -w1 <<< "$line")
sum=0
for i in "${ary[#]}"; do
(( sum += i ))
done
echo "$line" "${parity[sum % 2]}"
done < file01.txt > file92.txt
fold -w1 <<< "$line" breaks the string $line into lines of character
(one digit per line).
mapfile assigns array to the elements fed by the fold command.
Please note the bash script is not efficient in time and not suitable
for the large inputs.
With GNU awk:
awk -vFS='' '{sum=0; for(i=1;i<=NF;i++) sum+=$i;
print $0, sum%2 ? "odd" : "even"}' file01.txt
The FS awk variable defines the field separator. If it is set to the empty string (this is what the -vFS='' option does) then each character is a separate field.
The rest is trivial: the block between curly braces is executed for each line of the input. It compute the sum of the fields with a for loop (NF is another awk variable, its value is the number of fields of the current record). And it then prints the original line ($0) followed by the string even if the sum is even, else odd.
pure awk:
BEGIN {
for (i=1; i<=999; i++) {
printf ("%03d\n", i) > ARGV[1]
}
close(ARGV[1])
ARGC = 2
FS = ""
result[0] = "even"
result[1] = "odd"
}
{
printf("%s: %s\n", $0, result[($1+$2+$3) % 2])
}
Processing a file line by line, and doing math, is a perfect task for awk.
pure bash:
set -e
printf '%03d\n' {1..999} > "${1:?no path provided}"
result=(even odd)
mapfile -t num_list < "$1"
for i in "${num_list[#]}"; do
echo $i: ${result[(${i:0:1} + ${i:1:1} + ${i:2:1}) % 2]}
done
A similar method can be applied in bash, but it's slower.
comparison:
bash is about 10x slower.
$ cd ./tmp.Kb5ug7tQTi
$ bash -c 'time awk -f ../solution.awk numlist-awk > result-awk'
real 0m0.108s
user 0m0.102s
sys 0m0.000s
$ bash -c 'time bash ../solution.bash numlist-bash > result-bash'
real 0m0.931s
user 0m0.929s
sys 0m0.000s
$ diff --report-identical result*
Files result-awk and result-bash are identical
$ diff --report-identical numlist*
Files numlist-awk and numlist-bash are identical
$ head -n 5 *
==> numlist-awk <==
001
002
003
004
005
==> numlist-bash <==
001
002
003
004
005
==> result-awk <==
001: odd
002: even
003: odd
004: even
005: odd
==> result-bash <==
001: odd
002: even
003: odd
004: even
005: odd
read is a bottleneck in a while IFS= read -r line loop. More info in this answer.
mapfile (combined with for loop) can be slightly faster, but still slow (it also copies all the data to an array first).
Both solutions create a number list in a new file (which was in the question), and print the odd/even results to stdout. The path for the file is given as a single argument.
In awk, you can set the field separator to empty (FS="") to process individual characters.
In bash it can be done with substring expansion (${var:index:length}).
Modulo 2 (number % 2) to get odd or even.
Sorry in advance for the beginner question, but I'm quite stuck and keen to learn.
I am trying to echo a string (in hex) and then cut a piece of that with cut command. It looks like this:
for y in "${Offset}"; do
echo "${entry}" | cut -b 60-$y
done
Where echo ${Offset} results in
75 67 69 129 67 567 69
I would like each entry to be printed, and then cut from the 60th byte until the respective number in $Offset.
So the first entry would be cut 60-75.
However, I get an error:
cut: 67: No such file or directory
cut: 69: No such file or directory
cut: 129: No such file or directory
cut: 67: No such file or directory
cut: 567: No such file or directory
cut: 69: No such file or directory
I tried adding/removing parentheses around each variable but never got the right result.
Any help will be appreciated!
UPDATE: updated the code with changed from markp-fuso. However, this codes still does not work as intended. I would like to print every entry based on the respective offset, but it goes wrong. This prints every entry seven times, where each time is based on seven different offsets. Any ideas on how to fix this?
#!/bin/bash
MESSAGES=$( sqlite3 -csv file.db 'SELECT quote(data) FROM messages' | tr -d "X'" )
for entry in ${MESSAGES}; do
Offset='75 67 69 129 67 567 69'
for y in $Offset; do
echo "${entry:59:(y-59)}"
done
done
echo ${MESSAGES}
Results in seven strings with minimal length 80 bytes and max 600.
My output should be:
String one: cut by first offset
String two: cut by second offset
and so on...
In order for for to iterate over each space-separated "word" in $Offset, you need to get rid of the quotes, which are making it read as a single variable.
for y in ${Offset}; do
echo "${entry}" | cut -b 60-$y
done
To eliminate the sub-process that's going to be invoked due to the | cut ..., we could look at a comparable parameter expansion solution ...
Quick reminder on how to extract a substring from a variable:
${variable:start_position:length}
Keeping in mind that the first character in ${variable} is in position zero/0.
Next, we need to convert each individual offset (y) into a 'length':
length=$((y-60+1))
Rolling these changes into your code (and removing the quotes from around ${Offset}) gives us:
for y in ${Offset}
do
start=$((60-1))
length=$((y-60+1))
echo "${entry:${start}:${length}}"
#echo "${entry:59:(y-59)}"
done
NOTE: You can also replace the start/length/echo with the single commented-out echo.
Using a smaller data set for demo purposes, and using 3 (instead of 60) as the start of our extraction:
# base-10 character position
# 1 2
# 123456789012345678901234567
$ entry='123456789ABCDEFGHIabcdefghi'
$ echo ${#entry} # length of entry?
27
$ Offset='5 8 10 13 20'
$ for y in ${Offset}
do
start=$((3-1))
length=$((y-3+1))
echo "${entry:${start}:${length}}"
done
345 # 3-5
345678 # 3-8
3456789A # 3-10
3456789ABCD # 3-13
3456789ABCDEFGHIab # 3-20
And consolidating the start/length/echo into a single echo:
$ for y in ${Offset}
do
echo "${entry:2:(y-2)}"
done
345 # 3-5
345678 # 3-8
3456789A # 3-10
3456789ABCD # 3-13
3456789ABCDEFGHIab # 3-20
So what i'm trying to do is this: I've been using keybr.com to sharpen my typing skills and on this site you can "provide your own custom text." Now i've been taking chapters out of books to type so its a little more interesting than just typing groups of letters. Now I want to also insert numbers into the text. Specifically, between each word have something like "393" and random sets smaller and larger than that example.
so i have saved a chapter of a book into a file in my home folder. Now i just need a command to search for spaces and input a group of numbers and add a space so a sentence would look like this: The 293 dog 328 is 102 black. 334 The... etc.
I have looked up linux commands through search engines and i've found out how to replace strings in text files with:
sed -i 's/original/new/g' file.txt
and how to generate random numbers with:
$ shuf -i MIN-MAX -n COUNT
i just can not figure out how to output a one line command that will have random numbers between each word. I'm still-a-searching so thanks to anyone that takes the time to read my problem.
Perl to the rescue!
perl -pe 's/ /" " . (100 + int rand 900) . " "/ge' < input.txt > output.txt
-p reads the input line by line, after reading a line, it runs the code and prints the line to the output
s/// is similar to the substitution you know from sed
/g means global, i.e. it substitutes as many times as possible
/e means the replacement part is a code to run. In this case, the code generates a random number (100-999).
Given:
$ echo "$txt"
Here is some random words. Please
insert a number a space between each one.
Here is a simple awk to do that:
$ echo "$txt" | awk '{for (i=1;i<=NF;i++) printf "%s %d ", $i, rand()*100; print ""}'
Here 92 is 59 some 30 random 57 words. 74 Please 78
insert 43 a 33 number 77 a 10 space 78 between 83 each 76 one. 49
And here is roughly the same thing in pure Bash:
while read -r line; do
for word in $line; do
printf "%s %s" "$word $((1+$RANDOM % 100))"
done
echo
done < <(echo "$txt")
suppose I have file containing numbers like:
1 4 7
2 5 8
and I want to add 1 to all these numbers, making the output like:
2 5 8
3 6 9
is there a simple one-line command (e.g. awk) to realize this?
try following once.
awk '{for(i=1;i<=NF;i++){$i=$i+1}} 1' Input_file
EDIT: As per OP's request without loop, here is a solution(written as per shown sample only).
With hardcoding of number of fields.
awk -v RS='[ \n]' '{ORS=NR%3==0?"\n":" ";print $0+1}' Input_file
OR
Without hardcoding number of fields.
awk -v RS='[ \n]' -v col=$(awk 'FNR==1{print NF}' Input_file) '{ORS=NR%col==0?"\n":" ";print $0+1}' Input_file
Explanation: So in EDIT section 1st solution I have hardcoded the number of fields by mentioning 3 there, in OR solution of EDIT, I am creating a variable named col which will read the very first line of Input_file to get the number of fields. Then it will not read all the Input_file, Now coming onto the code I have set Record separator as space or new line to it will add them without using a loop and it will add space each time after incrementing 1 in their values. It will print new line only when number of lines are completely divided by value of col(which is why we have taken number of fields in -v col section).
In native bash (no awk or other external tool needed):
#!/usr/bin/env bash
while read -r -a nums; do # read a line into an array, splitting on spaces
out=( ) # initialize an empty output array for that line
for num in "${nums[#]}"; do # iterate over the input array...
out+=( "$(( num + 1 ))" ) # ...and add n+1 to the output array.
done
printf '%s\n' "${out[*]}" # then print that output array with a newline following
done <in.txt >out.txt # with input from in.txt and output to out.txt
You can do this using gnu awk:
awk -v RS="[[:space:]]+" '{$0++; ORS=RT} 1' file
2 5 8
3 6 9
If you don't mind Perl:
perl -pe 's/(\d+)/$1+1/eg' file
Substitute any number composed of multiple digits (\d+) with that number ($1) plus 1. /e means to execute the replacement calculation, and /g means globally throughout the file.
As mentioned in the comments, the above only works for positive integers - per the OP's original sample file. If you wanted it to work with negative numbers, decimals and still retain text and spacing, you could go for something like this:
perl -pe 's/([-]?[.0-9]+)/$1+1/eg' file
Input file
Some column headers # words
1 4 7 # a comment
2 5 cat dog # spacing and stray words
+5 0 # plus sign
-7 4 # minus sign
+1000.6 # positive decimal
-21.789 # negative decimal
Output
Some column headers # words
2 5 8 # a comment
3 6 cat dog # spacing and stray words
+6 1 # plus sign
-6 5 # minus sign
+1001.6 # positive decimal
-20.789 # negative decimal
I have a file with a bunch of text in it, separated by newlines:
ex.
"This is sentence 1.\n"
"This is sentence 2.\n"
"This is sentence 3. It has more characters then some other ones.\n"
"This is sentence 4. Again it also has a whole bunch of characters.\n"
I want to be able to use some set of command line tools that will, for each line, count the number of characters in each line, and then, if there are more than X characters per that line, split on periods (".") and then count the number of characters in each element of the split line.
ex. of final output, by line number:
1. 24
2. 24
3. 69: 20, 49 (i.e. "This is sentence 3" has 20 characters, "It has more characters then some other ones" has 49 characters)
wc only takes as input a file name, so I'm having trouble directing it it to take in a text string to do character count on
head -n2 processed.txt | tr "." "\n" | xargs -0 -I line wc -m line
gives me the error: ": open: No such file or directory"
awk is perfect for this. The code below should get you started and you can work out the rest:
awk -F. '{print length($0),NF,length($1)}' yourfile
Output:
23 2 19
23 2 19
68 3 19
70 3 19
It uses a period as the field separator (-F.), prints the length of the whole line ($0), the number of fields (NF), and the length of the first field ($1).
Here is another little example that prints the whole line and the length of each field:
awk -F. '{print $0;for(i=0;i<NF;i++)print length($i)}' yourfile
"This is sentence 1.\n"
23
19
"This is sentence 2.\n"
23
19
"This is sentence 3. It has more characters then some other ones.\n"
68
19
44
"This is sentence 4. Again it also has a whole bunch of characters.\n"
70
19
46
By the way, "wc" can process strings sent to its stdin like this:
echo -n "Hello" | wc -c
5
How about:
head -n2 processed.txt | tr "." "\n" | wc -m line
You should understand better what xargs does and how pipes work. Do google for a good tutorial on those before using them =).
xargs passes each line separately to the next utility. This is not what you want: you want wc to get all the lines here. So just pipe the entire output of tr to it.