Sed capitalize first letter of word in key-value pair - bash

I'm currently working on my karaoke files and i see a lot of Non capitalized words.
The .txt files are structured as a key-value pair and i was wondering how to capitalize the first letter of every value word.
Example txt:
#TITLE:fire and Water
#ARTIST:Some band
#CREATOR:yunho
#LANGUAGE:Korean
#EDITION:UAS
#MP3:2NE1 - Fire.mp3
#COVER:2NE1 - Fire.jpg
#VIDEO:2NE1 - Fire.avi
#VIDEOGAP:11.6
#BPM:595
#GAP:3860
F -4 4 16 I
F 2 4 16 go
F 8 6 16 by
F 16 4 16 the
F 22 6 16 name
F 30 4 16 of
F 36 10 16 C
F 46 10 16 L
F 58 6 16 of
F 66 5 16 2
F 71 3 16 N
F 74 4 16 E
F 78 18 16 1
I'd like to capitalize the words after the keys TITLE, ARTISTS, LANGUAGE and EDITION
so for the example txt:
#TITLE:**F**ire **A**nd **W**ater
#ARTIST:**S**ome **B**and
#CREATOR:yunho
#LANGUAGE:**K**orean
#EDITION:**U**AS
#MP3:2NE1 - Fire.mp3
#COVER:2NE1 - Fire.jpg
#VIDEO:2NE1 - Fire.avi
#VIDEOGAP:11.6
#BPM:595
#GAP:3860
F -4 4 16 I
F 2 4 16 go
F 8 6 16 by
F 16 4 16 the
F 22 6 16 name
F 30 4 16 of
F 36 10 16 C
F 46 10 16 L
F 58 6 16 of
F 66 5 16 2
F 71 3 16 N
F 74 4 16 E
F 78 18 16 1
Another thing is that i have loads of these txt's files all in designated directories. I want to run the program from the parent recursive for all *.txt files
Example directories:
Library/Some Band/Some Band - Some Song/some txt file.txt
Library/Some Band2/Some Band2 - Some Song/sometxtfile.txt
Library/Some Band3/Some Band3 - Some Song/some3333 txt file.txt
I've tried to do so with find . -name '*.txt' -exec sed -i command {} +
but i got stuck on the search and replace with sed... anyone care to help me out?

You can use this gnu-sed command to uppercase starting letter for matching lines:
sed -E '/^#(TITLE|ARTIST|LANGUAGE|EDITION):/s/\b([a-z])/\u\1/g' file
#TITLE:Fire And Water
#ARTIST:Some Band
#CREATOR:yunho
#LANGUAGE:Korean
#EDITION:UAS
#MP3:2NE1 - Fire.mp3
#COVER:2NE1 - Fire.jpg
#VIDEO:2NE1 - Fire.avi
#VIDEOGAP:11.6
#BPM:595
#GAP:3860
F -4 4 16 I
F 2 4 16 go
F 8 6 16 by
F 16 4 16 the
F 22 6 16 name
F 30 4 16 of
F 36 10 16 C
F 46 10 16 L
F 58 6 16 of
F 66 5 16 2
F 71 3 16 N
F 74 4 16 E
F 78 18 16
For find + sed command use:
find . -name '*.txt' -exec \
sed -E -i '/^#(TITLE|ARTIST|LANGUAGE|EDITION):/s/\b([a-z])/\u\1/g' {} +

Related

Regularly spaced numbers between bounds without jot

I want to generate a sequence of integer numbers between 2 included bounds. I tried with seq, but I could only get the following:
$ low=10
$ high=100
$ n=8
$ seq $low $(( (high-low) / (n-1) )) $high
10
22
34
46
58
70
82
94
As you can see, the 100 is not included in the sequence.
I know that I can get something like that using jot:
$ jot 8 10 100
10
23
36
49
61
74
87
100
But the server I use does not have jot installed, and I do not have permission to install it.
Is there a simple method that I could use to reproduce this behaviour without jot?
If you don't mind launching an extra process (bc) and if it's available on that machine, you could also do it like this:
$ seq -f'%.f' 10 $(bc <<<'scale=2; (100 - 10) / 7') 100
10
23
36
49
61
74
87
100
Or, building on oguz ismail's idea (but using a precision of 4 decimal places):
$ declare -i low=10
$ declare -i high=100
$ declare -i n=8
$ declare incr=$(( (${high}0000 - ${low}0000) / (n - 1) ))
$
$ incr=${incr::-4}.${incr: -4}
$
$ seq -f'%.f' "$low" "$incr" "$high"
10
23
36
49
61
74
87
100
You can try this naive implementation of jot:
jot_naive() {
local -i reps=$1 begin=${2}00 ender=${3}00
local -i x step='(ender - begin) / (reps - 1)'
for ((x = begin; x <= ender; x += step)); do
printf '%.f\n' ${x::-2}.${x: -2}
done
}
You could use awk for that:
awk -v reps=8 -v begin=10 -v end=100 '
BEGIN{
step = (end - begin) / (reps-1);
for ( f = i = begin; i <= end; i = int(f += step) )
print i
}
'
10
22
35
48
61
74
87
100
UPDATE 1 ::: fixed double-printing of final row due to difference less than tiny value of epsilon
to maintain directional consistency, rounding is performed based on sign of final :
—- e.g. if final is negative, then any rounding is done as if the current step value (CurrSV) is negative, regardless of sign of CurrSV
———————————————
while i haven't tested every single possible edge case, i believe this version of the code should handle both positive and negative rounding properly, for the most part.
that said, this isn't a jot replacement at all - it only implements a very small subset of the steps counting feature instead of being a full blown clone of it:
{m,g}awk '
function __________(_) {
return -_<+_?_:-_
}
BEGIN {
CONVFMT = "%.250g"; OFMT = "%.13f"
_____ = (_+=_^=_______=______="")^-_^!_
} {
____ = (((_=$(__=(___=$NF)^(_<_)))^(_______=______="")*___\
)-(__=$++__))/--_
_________ = (_____=(-(_^=_<_))^(________=+____<-____\
)*(_/++_))^(++_^_++-+_*_--+-_)
if (-___<=+___) {
_____=__________(_____)
_________=__________(_________)
}
do { print ______,
++_______, int(__+_____), -____+(__+=____)
} while(________? ___<(__-_________) : (__+_________)<___)
print ______, ++_______, int(___+_____), ___, ORS
}' <<< $'8 -3 -100\n8 10 100\n5 -15 -100\n5 15 100\n11 100 11\n10 100 11'
|
1 -3 -3
2 -17 -16.8571428571429
3 -31 -30.7142857142857
4 -45 -44.5714285714286
5 -58 -58.4285714285714
6 -72 -72.2857142857143
7 -86 -86.1428571428572
8 -100 -100
1 10 10
2 23 22.8571428571429
3 36 35.7142857142857
4 49 48.5714285714286
5 61 61.4285714285714
6 74 74.2857142857143
7 87 87.1428571428572
8 100 100
1 -15 -15
2 -36 -36.2500000000000
3 -58 -57.5000000000000
4 -79 -78.7500000000000
5 -100 -100
1 15 15
2 36 36.2500000000000
3 58 57.5000000000000
4 79 78.7500000000000
5 100 100
1 100 100
2 91 91.1000000000000
3 82 82.2000000000000
4 73 73.3000000000000
5 64 64.4000000000000
6 55 55.5000000000000
7 47 46.6000000000000
8 38 37.7000000000000
9 29 28.8000000000000
10 20 19.9000000000000
11 11 11
1 100 100
2 90 90.1111111111111
3 80 80.2222222222222
4 70 70.3333333333333
5 60 60.4444444444445
6 51 50.5555555555556
7 41 40.6666666666667
8 31 30.7777777777778
9 21 20.8888888888889
10 11 11

Aternating columns in two files and make one file

I have two text files (tsv format), which each have 240 columns and 100 lines. I would like to sort the columns alternately and make one file (480 columns and 100 lines). How could I achieve this goal with standard command line tools in Linux?
Example (in case of a single line) :
FileA:
1 2 3 4 5 ・・・
FileB:
001 002 003 004 005 ・・・
Expected Result:
1 001 2 002 3 003 ・・・
just awk with "getline"
==> file1 <==
a b c d e f g h i j k l m
n o p q r s t u v w x y z
==> file2 <==
1 2 3 4 5 6 7 8 9 10 11 12 13
14 15 16 17 18 19 20 21 22 23 24 25 26
$ awk '{split($0,f1);
getline < "file2";
for(i=1;i<=NF;i++) printf "%s%s%s%s", f1[i], OFS, $i, (i==NF?ORS:OFS)}' file1
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13
n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26
if space is not the required output delimiter set OFS accordingly...
ps. getline use is normally discouraged for any non-trivial script, and usually should be avoided by beginners. See here for example for more explanation.
paste + awk solution:
Sample file1:
a b c d e f g h i j k l m n o p q r s t u v w x y z
a b c d e f g h i j k l m n o p q r s t u v w x y z
Sample file2:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
paste file1 file2 \
| awk '{ len=NF/2;
for (i=1; i<=len; i++)
printf "%s %s%s", $i, $(i+len),(i==len? ORS:OFS)
}'
The output:
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26
a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 i 9 j 10 k 11 l 12 m 13 n 14 o 15 p 16 q 17 r 18 s 19 t 20 u 21 v 22 w 23 x 24 y 25 z 26
Use bash to make some dummy files that match the spec, along with some letter-line suffixes to tell them apart:
for f in {A..z} {A..j} ; do echo $( seq -f '%g'"$f" 240 ) ; done > FileA
for f in {z..A} {j..A} ; do echo $( seq -f '%03.3g'"$f" 240 ) ; done > FileB
Use bash, paste and xargs:
paste -d' ' <(tr ' ' '\n' < FileA) <(tr ' ' '\n' < FileB) | xargs -L 240 echo
Since the output of that is a bit unweildy, show first ten lines, with both the first and last six columns:
paste -d' ' <(tr ' ' '\n' < FileA) <(tr ' ' '\n' < FileB) | xargs -L 240 echo |
head | cut -d' ' -f1-6,476-480
1A 001z 2A 002z 3A 003z 238z 239A 239z 240A 240z
1B 001y 2B 002y 3B 003y 238y 239B 239y 240B 240y
1C 001x 2C 002x 3C 003x 238x 239C 239x 240C 240x
1D 001w 2D 002w 3D 003w 238w 239D 239w 240D 240w
1E 001v 2E 002v 3E 003v 238v 239E 239v 240E 240v
1F 001u 2F 002u 3F 003u 238u 239F 239u 240F 240u
1G 001t 2G 002t 3G 003t 238t 239G 239t 240G 240t
1H 001s 2H 002s 3H 003s 238s 239H 239s 240H 240s
1I 001r 2I 002r 3I 003r 238r 239I 239r 240I 240r
1J 001q 2J 002q 3J 003q 238q 239J 239q 240J 240q

Remove rows that have a specific numeric value in a field

I have a very bulky file about 1M lines like this:
4001 168991 11191 74554 60123 37667 125750 28474
8 145 25 101 83 51 124 43
2985 136287 4424 62832 50788 26847 89132 19184
3 129 14 101 88 61 83 32 1 14 10 12 7 13 4
6136 158525 14054 100072 134506 78254 146543 41638
1 40 4 14 19 10 35 4
2981 112734 7708 54280 50701 33795 75774 19046
7762 339477 26805 148550 155464 119060 254938 59592
1 22 2 12 10 6 17 2
6 136 16 118 184 85 112 56 1 28 1 5 18 25 40 2
1 26 2 19 28 6 18 3
4071 122584 14031 69911 75930 52394 89733 30088
1 9 1 3 4 3 11 2 14 314 32 206 253 105 284 66
I want to remove rows that have a value less than 100 in the second column.
How to do this with sed?
I would use awk to do this. Example:
awk ' $2 >= 100 ' file.txt
this will only display every row from file.txt that has a column $2 greater than 100.
Use the following approach:
sed '/^\w+\s+([0-9]{1,2}|[0][0-9]+)\b/d' -E /tmp/test.txt
(replace /tmp/test.txt with your current file path)
([0-9]{1,2}|[0][0-9]+) - will match either digits from 0 to 99 OR a digits with leading zero (ex. 012, 00982)
d - delete the pattern space;
-E(--regexp-extended) - Use extended regular expressions rather than basic regular expressions
To remove matched lines in place use -i option:
sed -i -E '/^\w+\s+([0-9]{1,2}|[0][0-9]+)\b/d' /tmp/test.txt

Replace repeated elements in a list with unique identifiers

I have a list like the below:
1 . Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 . Sam 3 4 56 6 89
3 . Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 . Pig 2 5 67 2 21
(except the real list is 40 million lines long).
There are repeated elements in the second column (i.e. the ".")
I want to replace these with unique identifers (e.g. ".1", ".2", ".3"...".n")
I tried to do this with a bash loop / sed combination, but it didn't work...
Failed attempt:
for i in 1..4
do
sed -i "s_//._//."$i"_"$i""
done
(Essentially, I was trying to get sed to replace each n th "." with ".n", but this didn't work).
Here's a way to do it with awk (assuming your file is called input:
$ awk '$2=="."{$2="."++counter}{print}' input
1 .1 Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 .2 Sam 3 4 56 6 89
3 .3 Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 .4 Pig 2 5 67 2 21
The awk program replaces the second column ($2) by a string formed by concatenating . and a pre-incremented counter (++counter) if the second column was exactly .. It then prints out all the columns it got (with $2 modified or not) ({print}).
Plain bash alternative:
c=1
while read -r a b line ; do
if [ "$b" == "." ] ; then
echo "$a ."$((c++))" $line"
else
echo "$a $b $line"
fi
done < input
Since your question is tagged sed and bash, here are a few examples for completeness.
Bash only
Use parameter expansion. The second column will be unique, but not sequential:
i=1; while read line; do echo ${line/\./.$((i++))}; done < input
1 .1 Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 .3 Sam 3 4 56 6 89
3 .4 Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 .6 Pig 2 5 67 2 21
Bash + sed
sed cannot increment variables, it has to be done externally.
For each line, increment $i if line contains a ., then let sed append $i after the .
i=0
while read line; do
[[ $line == *.* ]] && i=$((i+1))
sed "s#\.#.$i#" <<<"$line"
done < input
Output:
1 .1 Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 .2 Sam 3 4 56 6 89
3 .3 Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 .4 Pig 2 5 67 2 21
you can use this command:
awk '{gsub(/\./,c++);print}' filename
Output:
1 0 Fred 1 6 78 8 09
1 1 Geni 1 4 68 9 34
2 2 Sam 3 4 56 6 89
3 3 Flit 2 4 56 8 34
3 4 Dog 2 5 67 8 78
3 5 Pig 2 5 67 2 21

Appending matching strings to specific lines (sed/bash)

Using bash/sed, I am trying to search for matching string and when a match is found it appends that variable to the end of the applicable line.
Two lists:
[linuxbox tmp]$ cat lista
a 23
c 4
e 55
b 2
f 44
d 74
[linuxbox tmp]$ cat listb
a 3
e 34
c 84
b 1
f 500
d 666666
#!/bin/bash
rm -rf listc
cat listb |while read rec
do
var1="$(echo $rec | awk '{ print $1 }')"
var2="$(echo $rec | awk '{ print $2 }')"
if egrep "^$var1" lista; then
sed "/^$var1/ s/$/ $var2/1" lista >> listc
fi
done
when I run it I get:
[linuxbox tmp]$ ./blah.sh
a 23
e 55
c 4
b 2
f 44
d 74
[linuxbox tmp]$ cat listc
a 23 3
c 4
e 55
b 2
f 44
d 74
a 23
c 4
e 55 34
b 2
f 44
d 74
a 23
c 4 84
e 55
b 2
f 44
d 74
a 23
c 4
e 55
b 2 1
f 44
d 74
a 23
c 4
e 55
b 2
f 44 500
d 74
a 23
c 4
e 55
b 2
f 44
d 74 666666
The output i'm trying to get to is:
a 23 3
e 55 34
c 4 84
b 2 1
f 44 500
d 74 666666
What am I doing wrong here? Is there a better way to accomplish this?
Thank you in advance.
If you don't mind getting a sorted output:
join <(sort lista) <(sort listb)
One way using awk:
awk 'FNR==NR { array[$1]=$2; next } { if ($1 in array) print $1, array[$1], $2 }' lista listb
Results:
a 23 3
e 55 34
c 4 84
b 2 1
f 44 500
d 74 666666
Based on your input files (no duplicate keys in a single file), the following will do the trick:
>> for key in $(awk '{print $1}' lista) ; do
+> echo $key $(awk -vK=$key '$1==K{$1="";print}' lista listb)
+> done
a 23 3
c 4 84
e 55 34
b 2 1
f 44 500
d 74 666666

Resources