So what i'm trying to do is this: I've been using keybr.com to sharpen my typing skills and on this site you can "provide your own custom text." Now i've been taking chapters out of books to type so its a little more interesting than just typing groups of letters. Now I want to also insert numbers into the text. Specifically, between each word have something like "393" and random sets smaller and larger than that example.
so i have saved a chapter of a book into a file in my home folder. Now i just need a command to search for spaces and input a group of numbers and add a space so a sentence would look like this: The 293 dog 328 is 102 black. 334 The... etc.
I have looked up linux commands through search engines and i've found out how to replace strings in text files with:
sed -i 's/original/new/g' file.txt
and how to generate random numbers with:
$ shuf -i MIN-MAX -n COUNT
i just can not figure out how to output a one line command that will have random numbers between each word. I'm still-a-searching so thanks to anyone that takes the time to read my problem.
Perl to the rescue!
perl -pe 's/ /" " . (100 + int rand 900) . " "/ge' < input.txt > output.txt
-p reads the input line by line, after reading a line, it runs the code and prints the line to the output
s/// is similar to the substitution you know from sed
/g means global, i.e. it substitutes as many times as possible
/e means the replacement part is a code to run. In this case, the code generates a random number (100-999).
Given:
$ echo "$txt"
Here is some random words. Please
insert a number a space between each one.
Here is a simple awk to do that:
$ echo "$txt" | awk '{for (i=1;i<=NF;i++) printf "%s %d ", $i, rand()*100; print ""}'
Here 92 is 59 some 30 random 57 words. 74 Please 78
insert 43 a 33 number 77 a 10 space 78 between 83 each 76 one. 49
And here is roughly the same thing in pure Bash:
while read -r line; do
for word in $line; do
printf "%s %s" "$word $((1+$RANDOM % 100))"
done
echo
done < <(echo "$txt")
Related
I was trying to solve one of my old assignment I am literally stuck in this one Can anyone help me?
There is a file called "datafile". This file has names of some friends and their
ages. But unfortunately, the names are not in the correct format. They should be
lastname, firstname
But, by mistake they are firstname,lastname
The task of the problem is writing a shell script called fix_datafile
to correct the problem, and sort the names alphabetically. The corrected filename
is called datafile.fix .
Please make sure the original structure of the file should be kept untouched.
The following is the sample of datafile.fix file:
#personal information
#******** Name ********* ***** age *****
Alexanderovich,Franklin 47
Amber,Christine 54
Applesum,Franky 33
Attaboal,Arman 18
Balad,George 38
Balad,Sam 19
Balsamic,Shery 22
Bojack,Steven 33
Chantell,Alex 60
Doyle,Jefry 45
Farland,Pamela 40
Handerman,jimmy 23
Kashman,Jenifer 25
Kasting,Ellen 33
Lorux,Allen 29
Mathis,Johny 26
Maxter,Jefry 31
Newton,Gerisha 40
Osama,Franklin 33
Osana,Gabriel 61
Oxnard,George 20
Palomar,Frank 24
Plomer,Susan 29
Poolank,John 31
Rochester,Benjami 40
Stanock,Verona 38
Tenesik,Gabriel 29
Whelsh,Elsa 21
If you can use awk (I suppose you can), than this there's a script which does what you need:
#!/bin/bash
RESULT_FILE_NAME="datafile.new"
cat datafile.fix | head -4 > datafile.new
cat datafile.fix | tail -n +5 | awk -F"[, ]" '{if(!$2){print()}else{print($2","$1, $3)}}' >> datafile.new
Passing -F"[, ]" allows awk to split columns both by , and space and all that remains is just print columns in a needed format. The downsides are that we should use if statement to preserve empty lines and file header also should be treated separately.
Another option is using sed:
cat datafile.fix | sed -E 's/([a-zA-Z]+),([a-zA-Z]+) ([0-9]+)/\2,\1 \3/g' > datafile.new
The downside is that it requires regex that is not as obvious as awk syntax.
awk -F[,\ ] '
!/^$/ && !/^#/ {
first=$1;
last=$2;
map[first][last]=$0
}
END {
PROCINFO["sorted_in"]="#ind_str_asc";
for (i in map) {
for (j in map[i])
{
print map[i][j]
}
}
}' namesfile > datafile.fix
One liner:
awk -F[,\ ] '!/^$/ && !/^#/ { first=$1;last=$2;map[first][last]=$0 } END { PROCINFO["sorted_in"]="#ind_str_asc";for (i in map) { for (j in map[i]) { print map[i][j] } } }' namesfile > datafile.fix
A solution completely in gawk.
Set the field separator to both , and space. Then ignore any lines that are empty or start with #. Mark the first and last variables based on the delimited fields and then create a two dimensional array called map indexed by first and last name and the value equal to the line. At the end, set the sort to indices string ascending and loop through the array printing the names in order as requested.
Completely in bash:
re="^[[:space:]]*([^#]([[:space:]]|[[:alpha:]])+),(([[:space:]]|[[:alpha:]])*[[:alpha:]]) *([[:digit:]]+)"
while read line
do
if [[ ${line} =~ $re ]]
then
echo ${BASH_REMATCH[3]},${BASH_REMATCH[1]} ${BASH_REMATCH[5]}
else
echo "${line}"
fi
done < names.txt
The core of this is to capture, using bash regex matching (=~ operator of the [[ command), parenthesis groupings, and the BASH_REMATCH array, the name before the comma (([^#]([[:space:]]|[[:alpha:]])+)), the name after the comma ((([[:space:]]|[[:alpha:]])*[[:alpha:]])), and the age ( *([[:digit:]]+)). The first-name regex is constructed so as to exclude comments, and the last-name regex is constructed as to handle multiple spaces before the age without including them in the name. Preconditions: Commented lines with or without leading spaces (^[[:space:]]*([^#]), or lines without a comma, are passed through unchanged. Either first names or last names may have internal spaces. Once the last name and first name are isolated, it is easy to print them in reverse order followed by the age (echo ${BASH_REMATCH[3]},${BASH_REMATCH[1]} ${BASH_REMATCH[5]}). Note that the letter/space groupings are counted as matches which is why we skip 2 and 4.
I have tried using awk and sed.
Try if this works
less dataflie.fix | sed 's/ /,/g' | awk -F "," '{print $2,$1,$3}' | sed 's/ /,/' | sed 's/^,//' | sort -u > dataflie_new.fix
Sorry in advance for the beginner question, but I'm quite stuck and keen to learn.
I am trying to echo a string (in hex) and then cut a piece of that with cut command. It looks like this:
for y in "${Offset}"; do
echo "${entry}" | cut -b 60-$y
done
Where echo ${Offset} results in
75 67 69 129 67 567 69
I would like each entry to be printed, and then cut from the 60th byte until the respective number in $Offset.
So the first entry would be cut 60-75.
However, I get an error:
cut: 67: No such file or directory
cut: 69: No such file or directory
cut: 129: No such file or directory
cut: 67: No such file or directory
cut: 567: No such file or directory
cut: 69: No such file or directory
I tried adding/removing parentheses around each variable but never got the right result.
Any help will be appreciated!
UPDATE: updated the code with changed from markp-fuso. However, this codes still does not work as intended. I would like to print every entry based on the respective offset, but it goes wrong. This prints every entry seven times, where each time is based on seven different offsets. Any ideas on how to fix this?
#!/bin/bash
MESSAGES=$( sqlite3 -csv file.db 'SELECT quote(data) FROM messages' | tr -d "X'" )
for entry in ${MESSAGES}; do
Offset='75 67 69 129 67 567 69'
for y in $Offset; do
echo "${entry:59:(y-59)}"
done
done
echo ${MESSAGES}
Results in seven strings with minimal length 80 bytes and max 600.
My output should be:
String one: cut by first offset
String two: cut by second offset
and so on...
In order for for to iterate over each space-separated "word" in $Offset, you need to get rid of the quotes, which are making it read as a single variable.
for y in ${Offset}; do
echo "${entry}" | cut -b 60-$y
done
To eliminate the sub-process that's going to be invoked due to the | cut ..., we could look at a comparable parameter expansion solution ...
Quick reminder on how to extract a substring from a variable:
${variable:start_position:length}
Keeping in mind that the first character in ${variable} is in position zero/0.
Next, we need to convert each individual offset (y) into a 'length':
length=$((y-60+1))
Rolling these changes into your code (and removing the quotes from around ${Offset}) gives us:
for y in ${Offset}
do
start=$((60-1))
length=$((y-60+1))
echo "${entry:${start}:${length}}"
#echo "${entry:59:(y-59)}"
done
NOTE: You can also replace the start/length/echo with the single commented-out echo.
Using a smaller data set for demo purposes, and using 3 (instead of 60) as the start of our extraction:
# base-10 character position
# 1 2
# 123456789012345678901234567
$ entry='123456789ABCDEFGHIabcdefghi'
$ echo ${#entry} # length of entry?
27
$ Offset='5 8 10 13 20'
$ for y in ${Offset}
do
start=$((3-1))
length=$((y-3+1))
echo "${entry:${start}:${length}}"
done
345 # 3-5
345678 # 3-8
3456789A # 3-10
3456789ABCD # 3-13
3456789ABCDEFGHIab # 3-20
And consolidating the start/length/echo into a single echo:
$ for y in ${Offset}
do
echo "${entry:2:(y-2)}"
done
345 # 3-5
345678 # 3-8
3456789A # 3-10
3456789ABCD # 3-13
3456789ABCDEFGHIab # 3-20
I have a little script to extract specific data and cleanup the output a little. It seems overly messy and i'm wondering if the script can be trimmed down a bit.
The input file contains of pairs of lines -- names, followed by numbers.
Line pairs where the numeric value is not between 80 and 199 should be discarded.
Pairs may sometimes, but will not always, be preceded or followed by blank lines, which should be ignored.
Example input file:
al12t5682-heapmemusage-latest.log
38
al12t5683-heapmemusage-latest.log
88
al12t5684-heapmemusage-latest.log
100
al12t5685-heapmemusage-latest.log
0
al12t5686-heapmemusage-latest.log
91
Example/wanted output:
al12t5683 88
al12t5684 100
al12t5686 91
Current script:
grep --no-group-separator -PxB1 '([8,9][0-9]|[1][0-9][0-9])' inputfile.txt \
| sed 's/-heapmemusage-latest.log//' \
| awk '{$1=$1;printf("%s ",$0)};NR%2==0{print ""}'
Extra input example
al14672-heapmemusage-latest.log
38
al14671-heapmemusage-latest.log
5
g4t5534-heapmemusage-latest.log
100
al1t0000-heapmemusage-latest.log
0
al1t5535-heapmemusage-latest.log
al1t4676-heapmemusage-latest.log
127
al1t4674-heapmemusage-latest.log
53
A1t5540-heapmemusage-latest.log
54
G4t9981-heapmemusage-latest.log
45
al1c4678-heapmemusage-latest.log
81
B4t8830-heapmemusage-latest.log
76
a1t0091-heapmemusage-latest.log
88
al1t4684-heapmemusage-latest.log
91
Extra Example expected output:
g4t5534 100
al1t4676 127
al1c4678 81
a1t0091 88
al1t4684 91
another awk
$ awk -F- 'NR%2{p=$1; next} 80<=$1 && $1<=199 {print p,$1}' file
al12t5683 88
al12t5684 100
al12t5686 91
UPDATE
for the empty line record delimiter
$ awk -v RS= '80<=$2 && $2<=199{sub(/-.*/,"",$1); print}' file
al12t5683 88
al12t5684 100
al12t5686 91
Consider implementing this in native bash, as in the following (which can be seen running with your sample input -- including sporadically-present blank lines -- at http://ideone.com/Qtfmrr):
#!/bin/bash
name=; number=
while IFS= read -r line; do
[[ $line ]] || continue # skip blank lines
[[ -z $name ]] && { name=$line; continue; } # first non-blank line becomes name
number=$line # second one becomes number
if (( number >= 80 && number < 200 )); then
name=${name%%-*} # prune everything after first "-"
printf '%s %s\n' "$name" "$number" # emit our output
fi
name=; number= # clear the variables
done <inputfile.txt
The above uses no external commands whatsoever -- so whereas it might be slower to run over large input than a well-implemented awk or perl script, it also has far shorter startup time since no interpreter other than the already-running shell is required.
See:
BashFAQ #1 - How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?, describing the while read idiom.
BashFAQ #100 - How do I do string manipulations in bash?; or The Bash-Hackers' Wiki on parameter expansion, describing how name=${name%%-*} works.
The Bash-Hackers' Wiki on arithmetic expressions, describing the (( ... )) syntax used for numeric comparisons.
perl -nle's/-.*//; $n=<>; print "$_ $n" if 80<=$n && $n<=199' inputfile.txt
With gnu sed
sed -E '
N
/\n[8-9][0-9]$/bA
/\n1[0-9]{2}$/!d
:A
s/([^-]*).*\n([0-9]+$)/\1 \2/
' infile
I have a file with a bunch of text in it, separated by newlines:
ex.
"This is sentence 1.\n"
"This is sentence 2.\n"
"This is sentence 3. It has more characters then some other ones.\n"
"This is sentence 4. Again it also has a whole bunch of characters.\n"
I want to be able to use some set of command line tools that will, for each line, count the number of characters in each line, and then, if there are more than X characters per that line, split on periods (".") and then count the number of characters in each element of the split line.
ex. of final output, by line number:
1. 24
2. 24
3. 69: 20, 49 (i.e. "This is sentence 3" has 20 characters, "It has more characters then some other ones" has 49 characters)
wc only takes as input a file name, so I'm having trouble directing it it to take in a text string to do character count on
head -n2 processed.txt | tr "." "\n" | xargs -0 -I line wc -m line
gives me the error: ": open: No such file or directory"
awk is perfect for this. The code below should get you started and you can work out the rest:
awk -F. '{print length($0),NF,length($1)}' yourfile
Output:
23 2 19
23 2 19
68 3 19
70 3 19
It uses a period as the field separator (-F.), prints the length of the whole line ($0), the number of fields (NF), and the length of the first field ($1).
Here is another little example that prints the whole line and the length of each field:
awk -F. '{print $0;for(i=0;i<NF;i++)print length($i)}' yourfile
"This is sentence 1.\n"
23
19
"This is sentence 2.\n"
23
19
"This is sentence 3. It has more characters then some other ones.\n"
68
19
44
"This is sentence 4. Again it also has a whole bunch of characters.\n"
70
19
46
By the way, "wc" can process strings sent to its stdin like this:
echo -n "Hello" | wc -c
5
How about:
head -n2 processed.txt | tr "." "\n" | wc -m line
You should understand better what xargs does and how pipes work. Do google for a good tutorial on those before using them =).
xargs passes each line separately to the next utility. This is not what you want: you want wc to get all the lines here. So just pipe the entire output of tr to it.
I have a question. I have a file with coordinates (TAB separated)
2 10
35 50
90 200
400 10000
...
I would like to substract the first column of the second line from the second column of the fist line , i.e. calculate the distance, i.e. I would like a file with
25
40
200
...
How could I do that using awk???
Thank you very much in advance
here is an awk one-liner may help you:
kent$ awk 'a{print $1-a}{a=$2}' file
25
40
200
Here's a pure bash solution:
{
read _ ps
while read f s; do
echo $((f-ps))
((ps=s))
done
} < input_file
This only works if you have (small) integers, as it uses bash's arithmetic. If you want to deal with arbitrary sized integers or floats, you can use bc (with only one fork):
{
read _ ps
while read f s; do
printf '%s-%s\n' "$f" "$ps"
ps=$s
done
} < input_file | bc
Now I leave the others give an awk answer!
Alright, since nobody wants to upvote my answer, here's a really funny solution that uses bash and bc:
a=( $(<input_file) )
printf -- '-(%s)+(%s);\n' "${a[#]:1:${#a[#]}-2}" | bc
or the same with dc (shorter but doesn't work with negative numbers):
a=( $(<input_file) )
printf '%s %sr-pc' "${a[#]:1:${#a[#]}-2}" | dc
using sed and ksh for evaluation
sed -n "
1x
1!H
$ !b
x
s/^ *[0-9]\{1,\} \(.*\) [0-9]\{1,\} *\n* *$/\1 /
s/\([0-9]\{1,\}\)\(\n\)\([0-9]\{1,\}\) /echo \$((\3 - \1))\2/g
s/\n *$//
w /tmp/Evaluate.me
"
. /tmp/Evaluate.me
rm /tmp/Evaluate.me