How would I replace the last six spaces with comma in a text file from each line with bash?
I have:
$cat myfile
foo bar foo 6 1 3 23 1 20
foo bar 6 1 2 18 1 15
foo 5 5 0 15 1 21
What I want is:
$cat myfile
foo bar foo,6,1,3,23,1,20
foo bar,6,1,2,18,1,15
foo,5,5,0,15,1,21
Any help is appreciated! Thanks!
It looks like the rule could be to substitute any space before a digit for a comma:
sed 's/ \([0-9]\)/,\1/g' file
Alternatively, following your specification (replace the last six spaces), you could go for something like this:
awk '{for(i=1; i<=NF; ++i)printf "%s%s", $i, (i<NF-6?FS:(i<NF?",":RS))}' file
This loops through the field in the input, printing each one followed by either a space (FS), a comma or a newline (RS), depending on how close it is to the end of the line.
More complete sed with added rev (for reverse) might be
rev myfile | sed 's/ /,/; s/ /,/; s/ /,/; s/ /,/; s/ /,/; s/ /,/' | rev
sed part for first occurences of course can be simplified if needed!
This might work for you (GNU sed):
sed -r ':a;s/(.*) /\1,/;x;s/^/x/;/^x{6}/{z;x;b};x;ba' file
This uses greed to find the last space on a line and then keeps track of the number of spaces replaced by keeping a counter in the hold space.
Related
I have a file that looks like this:
ABCDEFGH
ABCDEFGH
ABC
ABCDEFGH
ABCDEFGH
ABCD
ABCDEFGH
Most of the lines have a fixed length of 8. But there are some lines in between that have a length less than 8. I need a simple line of code that appends each of those short lines to its previous line.
I have tried the following code but it takes lots of memory when working with large files.
cat FILENAME | awk 'BEGIN{OFS=FS="\t"}{print length($1), $1}' | tr
'\n' '\t' | sed 's/8/\n/g' | awk 'BEGIN{OFS="";FS="\t"}{print $2, $4}'
The output I expect:
ABCDEFGH
ABCDEFGHABC
ABCDEFGH
ABCDEFGHABCD
ABCDEFGH
If perl is your option, please try:
perl -0777 -pe 's/(\n)(.{1,7})$/\2/mg' filename
-0777 option tells perl to slurp all lines.
The pattern (\n)(.{1,7}) matches to a line with length less than 8, assigning \1 to a newline and \2 to the string.
The replacement \2 does not contain the preceding newline and is appended to the previous line.
sed <FILENAME 'N;/\n.\{8\}/!s/\n//;P;D'
N; - append next line to pattern space
/\n.\{8\}/ - does second line contain 8 characters?
!s/\n//; - no: join the two lines
P - print first line of pattern space
D - delete first line of pattern space, start next cycle
Default print without \n and append it to the last line when the current line has length 8.
The first and last line are special.
awk 'NR==1 {printf $0;next}
length($0)==8 {printf "\n"}
{printf("%s",$0)}
END { printf "\n" }' FILENAME
When you have GNU sed 4.2 (support -z option), you can try
EDIT (see comments): the inferiour
sed -rz 's/\n(.{0,7})\n/\1\n/g' FILENAME
If you like old traditional tools, you can use ed, the standard text editor:
printf '%s\n' 'g/^.\{,7\}$/-,.j' wq | ed -s filename
Given a file foo.txt containing file names such as:
2015_275_14_1,Siboney_by_The_Tailor_Maids
2015_275_16_1,Louis_Armstrong_Cant_Give_You_Anything_But_Love
2015_275_17_1,Benny_Goodman_Trio_Nice_Work_Avalon
2015_275_18_1,Feather_On_Jazz_Jazz_In_The_Concert_Hall
2015_235_1_1,Integration_Report_1
2015_273_2_1_1,Cab_Calloway_Home_Movie_1
2015_273_2_2_1,Cab_Calloway_Home_Movie_2
I want to replace the _ in the part before the comma with . and the _ in the second part after the comma with a space.
I can accomplish each individually with:
sed -E -i '' 's/([0-9]{4})_([0-9]{3})_([0-9]{2})_([0-9])/\1.\2.\3.\4./'
for the first part, and the second part then with:
sed -E -i '' "s/_/ /g"
But I was hoping to accomplish it in an easier fashion by using cut with sed but that doesn't work:
cut -d "," -f 1 foo.txt | sed -E -i '' "s/_/./g" foo.txt && cut -d "," -f 2 foo.txt | sed -E -i '' "s/_/ /g" foo.txt
No good.
So, is there a way to accomplish this with sed or maybe awk or maybe something else where I'm treating the , as a delimiter such as in cut?
Desired output:
2015.275.14.1,Siboney by The Tailor Maids
You can use awk to attain your goal, here's the method.
$ awk -F',' '{gsub(/_/,".",$1);gsub(/_/," ",$2);printf "%s,%s\n",$1,$2}' file
2015.275.14.1,Siboney by The Tailor Maids
2015.275.16.1,Louis Armstrong Cant Give You Anything But Love
2015.275.17.1,Benny Goodman Trio Nice Work Avalon
2015.275.18.1,Feather On Jazz Jazz In The Concert Hall
2015.235.1.1,Integration Report 1
2015.273.2.1.1,Cab Calloway Home Movie 1
2015.273.2.2.1,Cab Calloway Home Movie 2
Similar to #CWLiu's answer but I use OFS (output field separator) instead of adding back in the comma and having to add newline from using printf.
awk -F ',' 'BEGIN {OFS = FS} {gsub(/_/, ".", $1); gsub(/_/, " ", $2); print;}' foo.txt
Explanation:
-F ',' sets the field separator
BEGIN {OFS = FS} sets the output field separator (default space) equal to the field separator so the comma is printed back out
gsub(/_/, ".", $1) global substitution on the first column
gsub(/_/, " ", $2) global substitution on the second column
print print the whole line
$ awk 'BEGIN{FS=OFS=","} {gsub(/_/,".",$1); gsub(/_/," ",$2)} 1' file
2015.275.14.1,Siboney by The Tailor Maids
2015.275.16.1,Louis Armstrong Cant Give You Anything But Love
2015.275.17.1,Benny Goodman Trio Nice Work Avalon
2015.275.18.1,Feather On Jazz Jazz In The Concert Hall
2015.235.1.1,Integration Report 1
2015.273.2.1.1,Cab Calloway Home Movie 1
2015.273.2.2.1,Cab Calloway Home Movie 2
Try this for GNU sed:
$ cat input.txt
2015_275_14_1,Siboney_by_The_Tailor_Maids
2015_275_16_1,Louis_Armstrong_Cant_Give_You_Anything_But_Love
2015_275_17_1,Benny_Goodman_Trio_Nice_Work_Avalon
2015_275_18_1,Feather_On_Jazz_Jazz_In_The_Concert_Hall
2015_235_1_1,Integration_Report_1
2015_273_2_1_1,Cab_Calloway_Home_Movie_1
2015_273_2_2_1,Cab_Calloway_Home_Movie_2
$ sed -r ':loop;/^[^_]+,/{s/_/ /g;bend};s/_/./;bloop;:end' input.txt
2015.275.14.1,Siboney by The Tailor Maids
2015.275.16.1,Louis Armstrong Cant Give You Anything But Love
2015.275.17.1,Benny Goodman Trio Nice Work Avalon
2015.275.18.1,Feather On Jazz Jazz In The Concert Hall
2015.235.1.1,Integration Report 1
2015.273.2.1.1,Cab Calloway Home Movie 1
2015.273.2.2.1,Cab Calloway Home Movie 2
Explanation:
use s/_/./ to substitute _ to . until all _ before , have been substituted, which is judged by ^[^_]+,;
then, if ^[^_]+, matches, use s/_/ /g to subtitute all _ to after ,
You could cut and paste:
$ paste -d, <(cut -d, -f1 infile | sed 'y/_/./') <(cut -d, -f2 infile | sed 'y/_/ /')
2015.275.14.1,Siboney by The Tailor Maids
2015.275.16.1,Louis Armstrong Cant Give You Anything But Love
2015.275.17.1,Benny Goodman Trio Nice Work Avalon
2015.275.18.1,Feather On Jazz Jazz In The Concert Hall
2015.235.1.1,Integration Report 1
2015.273.2.1.1,Cab Calloway Home Movie 1
2015.273.2.2.1,Cab Calloway Home Movie 2
The process substitution <() lets you treat the output of commands like a file, and paste -d, pastes the output of each command side-by-side, separated by a comma.
The sed y command transliterates characters and is, in this case, equivalent to s/_/./g. and s/_/ /g.
You could also do it purely in sed, but it's a bit unwieldy:
sed 'h;s/.*,//;y/_/ /;x;s/,.*//;y/_/./;G;s/\n/,/' infile
Explained:
h # Copy pattern space to hold space
s/.*,// # Remove first part including comma
y/_/ / # Replace all "_" by spaces in the remaining second part
x # Swap pattern and hold space
s/,.*// # Remove second part including comma
y/_/./ # Replace all "_" by periods in the remaining first part
G # Append hold space to pattern space
s/\n/,/ # Replace linebreak with comma
Or, alternatively (from comment by potong):
sed 's/,/\n/;h;y/_/ /;x;y/_/./;G;s/\n.*\n/,/' infile
Explained:
s/,/\n/ # Replace comma by linebreak
h # Copy pattern space to hold space
y/_/ / # Replace all "_" by spaces
x # Swap pattern and hold space
y/_/./ # Replace all "_" by periods
G # Append hold space
s/\n.*\n/,/ # Remove second and third line in pattern space
I have a file with multiple lines of text similar to:
foo
1
2
3
bar
fool
1
2
3
bar
food
1
2
3
bar
So far the following gives me a closer answer:
sed -n '/foo/,/bar/ p' file.txt | sed -e '$d'
...but it fails by introducing duplicates if it encounters words like "food" or "fool". I want to make the code above do a whole word match only (i.e. grep -w), but inserting the \b switch doesn't seem to work:
sed -n '/foo/\b,/bar/ p' file.txt | sed -e '$d'
I would like to print anything after "foo" (including the first foo) up until "bar", but matching only "foo", and not "foo1".
Use the Regex tokens ^ and $ to indicate the start and end of a line respectively:
sed -n '/^foo$/,/^bar$/ p' file.txt
sed -n '/\<foo\>/,/\<bar\>/ p' file.txt
Or may be this if foo and bar have to be first word of any line.
sed -n '/^\<foo\>/,/^\<bar\>/ p' file
This question was asked in an interview. I could not answer! So getting some help here to understand the logic. i.e. how to put space between a number string and character string.
Given the string "1abc2abcd3efghi10z11jkl100pqrs" what command you use to get following result -
"1 abc 2 abcd 3 efghi 10 z 11 jkl 100 pqrs"
Thanks in advance.
Here is another -- yet simple -- way to think about it:
echo "1abc2abcd3efghi10z11jkl100pqrs" | \
sed -r 's/([0-9])([a-zA-Z])/\1 \2/g; s/([a-zA-Z])([0-9])/\1 \2/g'
add a whitespace between a digit-letter string & letter-digit string
() is to capture the group and \1 and \2 is to return the first and second captured group
With GNU sed:
$ echo "1abc2abcd3efghi10z11jkl100pqrs" | sed -e 's/[0-9]\+/ & /g' -e 's/^ \| $//'
1 abc 2 abcd 3 efghi 10 z 11 jkl 100 pqrs
With awk:
$ echo "1abc2abcd3efghi10z11jkl100pqrs" | awk '{gsub(/[0-9]+/," & ",$0); $1=$1}1'
1 abc 2 abcd 3 efghi 10 z 11 jkl 100 pqrs
gsub with substitute all numbers with space before and after it.
$1=$1 will re-compute entire line and add OFS (by default single
space)
I would have chosen sed over awk:
echo "1abc2abcd3efghi10z11jkl100pqrs" | sed 's/[0-9]\+/ & /g; s/^[ ]//; s/[ ]$//'
It surrounds each run of digits with spaces and afterwards removes the (possibly) leading and trailing ones.
It yields:
1 abc 2 abcd 3 efghi 10 z 11 jkl 100 pqrs
echo 1abc2abcd3efghi10z11jkl100pqrs | \
sed -r -e 's/([[:digit:]]+)/ \1 /g' -e 's/^ *//g' -e 's/ *$//g'
Take the expression -e 's/([[:digit:]]+)/ \1 /g' first.
The parentheses around [[:digit:]]+ 'capture' each sequence of one or more digits. Since it's the first capture group, it's referenced in the substitution by \1 (then there's the space before and after: \1 ).
The g tells sed to perform this substitution 'globally' on the input.
The -r before the expression tells sed to use extended regular expressions.
The other two 'expressions' (each expression has -e before it to show that it's an expression):
-e 's/^ *//g' will remove leading whitespace, and -e 's/ *$//g' will remove trailing whitespace.
Using perl:
echo 1abc2abcd3efghi10z11jkl100pqrs | perl -F'(\d+)' -ane \
'$F[0] and print "#F\n" or print "#F[1..$#F]"'
Some explanation:
-an together tells Perl to split each line of input and put the resulting fields into the array #F.
-F specifies a delimiter of one or more digits to use with -an to split the input. The parentheses cause the delimiters themselves to be stored in the array, not just the strings they separate.
-e specifies the code to run after each line is read. We simply want to print the contents of #F, with the default field separator (space) used to separate elements of the array. The and...or combination is used to ignore the first field if it is empty, as it will be if the input line starts with a delimiter.
How can covert a string of form ABC_DEF_GHI to AbcDefGhi using any online command such as sed etc. ?
Here's a one-liner using gawk:
echo ABC_DEF_GHI | gawk 'function cap(s){return toupper(substr(s,1,1))tolower(substr(s,2))}{n=split($0,x,"_");for(i=1;i<=n;i++)o=o cap(x[i]); print o}'
AbcDefGhi
Optimized awk 1-liner
awk -v RS=_ '{printf "%s%s", substr($0,1,1), tolower(substr($0,2))}'
Optimized sed 1-liner
sed 's/\(.\)\(..\)_\(.\)\(..\)_\(.\)\(..\)/\1\L\2\U\3\L\4\U\5\L\6/'
Edit:
Here's a gawk version:
gawk -F_ '{for (i=1;i<=NF;i++) printf "%s%s",substr($i,1,1),tolower(substr($i,2)); printf "\n"}'
Original:
Using sed for this is pretty scary:
sed -r 'h;s/(^|_)./\n/g;y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/;x;s/((^|_)(.))[^_]*/\3\n/g;G;:a;s/(^.*)([^\n])\n\n(.*)\n([^\n]*)$/\1\n\2\4\3/;ta;s/\n//g'
Here it is broken down:
# make a copy in hold space
h;
# replace all the characters which will remain upper case with newlines
s/(^|_)./\n/g;
# lowercase all the remaining characters
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/;
# swap the copy into pattern space and the lowercase characters into hold space
x;
# discard all but the characters which will remain upper case
s/((^|_)(.))[^_]*/\3\n/g;
# append the lower case characters to the end of pattern space
G;
# top of the loop
:a;
# shuffle the lower case characters back into their proper positions (see below)
s/(^.*)([^\n])\n\n(.*)\n([^\n]*)$/\1\n\2\4\3/;
# if a replacement was made, branch to the top of the loop
ta;
# remove all the newlines
s/\n//g
Here's how the shuffle works:
At the time it starts, this is what pattern space looks like:
A
D
G
bc
ef
hi
The shuffle loop picks up the string that's between the last newline and the end and moves it to the position before the two consecutive newlines (actually three) and moves the extra newline so it's before the character that it previously followed.
After the first step through the loop, this is what pattern space looks like:
A
D
Ghi
bc
ef
And processing proceeds similarly until there's nothing before the extra newline at which point the match fails and the loop branch is not taken.
If you want to title case a sequence of words separated by spaces, the script would be similar:
$ echo 'BEST MOVIE THIS YEAR' | sed -r 'h;s/(^| )./\n/g;y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/;x;s/((^| ).)[^ ]*/\1\n/g;G;:a;s/(^.*)( [^\n]*)\n\n(.*)\n([^\n]*)$/\1\n\2\4\3/;ta;s/^([^\n]*)(.*)\n([^\n]*)$/\1\3\2/;s/\n//g'
Best Movie This Year
One liner using perl:
$ echo 'ABC_DEF_GHI' | perl -npe 's/([A-Z])([^_]+)_?/$1\L$2\E/g;'
AbcDefGhi
This might work for you:
echo "ABC_DEF_GHI" |
sed 'h;s/\(.\)[^_]*\(_\|$\)/\1/g;x;y/'$(printf "%s" {A..Z} / {a..z})'/;G;:a;s/\(\(^[a-z]\)\|_\([a-z]\)\)\([^\n]*\n\)\(.\)/\5\4/;ta;s/\n//'
AbcDefGhi
Or using GNU sed:
echo "ABC_DEF_GHI" | sed 's/\([A-Z]\)\([^_]*\)\(_\|$\)/\1\L\2/g'
AbcDefGhi
Less scary sed version with tr:
echo ABC_DEF_GHI | sed -e 's/_//g' - | tr 'A-Z' 'a-z'