I try to capitalize the first letter in a CSV which is sorted like this:
a23;asd23;sdg3
What i want is a output like this
a23;Asd23;Sdg3
So the first String should be as is, but the second and third should have a capitalized first letter. I tried with AWK and SED but i didn't find the right solution. Can someone help?
Just capitilise all letters that follow a semicolon:
sed -e 's/;./\U&\E/g'
Bash (version 4 and up) has a "first uppercase" operator, ${var^}, but in this case I think it is better to use sed:
sed -r 's/(^|;)(.)/\1\U\2/g' <<< "a23;asd23;sdg3"
echo "a23;asd23;sdg3" | perl -ne 's/(?<=\W)(\w)/ uc($1) /gex;print $_'
a23;Asd23;Sdg3
$ var="a23;asd23;sdg3"
$ echo $var | awk -F";" '{for(i=2;i<=NF;i++) $i=toupper(substr($i,i,1))substr($i,1) }1' OFS=";"
a23;Sasd23;Gsdg3
Related
I am trying to use awk to get the name of a file given the absolute path to the file.
For example, when given the input path /home/parent/child/filename I would like to get filename
I have tried:
awk -F "/" '{print $5}' input
which works perfectly.
However, I am hard coding $5 which would be incorrect if my input has the following structure:
/home/parent/child1/child2/filename
So a generic solution requires always taking the last field (which will be the filename).
Is there a simple way to do this with the awk substr function?
Use the fact that awk splits the lines in fields based on a field separator, that you can define. Hence, defining the field separator to / you can say:
awk -F "/" '{print $NF}' input
as NF refers to the number of fields of the current record, printing $NF means printing the last one.
So given a file like this:
/home/parent/child1/child2/child3/filename
/home/parent/child1/child2/filename
/home/parent/child1/filename
This would be the output:
$ awk -F"/" '{print $NF}' file
filename
filename
filename
In this case it is better to use basename instead of awk:
$ basename /home/parent/child1/child2/filename
filename
If you're open to a Perl solution, here one similar to fedorqui's awk solution:
perl -F/ -lane 'print $F[-1]' input
-F/ specifies / as the field separator
$F[-1] is the last element in the #F autosplit array
Another option is to use bash parameter substitution.
$ foo="/home/parent/child/filename"
$ echo ${foo##*/}
filename
$ foo="/home/parent/child/child2/filename"
$ echo ${foo##*/}
filename
Like 5 years late, I know, thanks for all the proposals, I used to do this the following way:
$ echo /home/parent/child1/child2/filename | rev | cut -d '/' -f1 | rev
filename
Glad to notice there are better manners
It should be a comment to the basename answer but I haven't enough point.
If you do not use double quotes, basename will not work with path where there is space character:
$ basename /home/foo/bar foo/bar.png
bar
ok with quotes " "
$ basename "/home/foo/bar foo/bar.png"
bar.png
file example
$ cat a
/home/parent/child 1/child 2/child 3/filename1
/home/parent/child 1/child2/filename2
/home/parent/child1/filename3
$ while read b ; do basename "$b" ; done < a
filename1
filename2
filename3
I know I'm like 3 years late on this but....
you should consider parameter expansion, it's built-in and faster.
if your input is in a var, let's say, $var1, just do ${var1##*/}. Look below
$ var1='/home/parent/child1/filename'
$ echo ${var1##*/}
filename
$ var1='/home/parent/child1/child2/filename'
$ echo ${var1##*/}
filename
$ var1='/home/parent/child1/child2/child3/filename'
$ echo ${var1##*/}
filename
you can skip all of that complex regex :
echo '/home/parent/child1/child2/filename' |
mawk '$!_=$-_=$NF' FS='[/]'
filename
2nd to last :
mawk '$!--NF=$NF' FS='/'
child2
3rd last field :
echo '/home/parent/child1/child2/filename' |
mawk '$!--NF=$--NF' FS='[/]'
child1
4th-last :
mawk '$!--NF=$(--NF-!-FS)' FS='/'
echo '/home/parent/child000/child00/child0/child1/child2/filename' |
child0
echo '/home/parent/child1/child2/filename'
parent
major caveat :
- `gawk/nawk` has a slight discrepancy with `mawk` regarding
- how it tracks multiple,
- and potentially conflicting, decrements to `NF`,
- so other than the 1st solution regarding last field,
- the rest for now, are only applicable to `mawk-1/2`
just realized it's much much cleaner this way in mawk/gawk/nawk :
echo '/home/parent/child1/child2/filename' | …
'
awk ++NF FS='.+/' OFS= # updated such that
# root "/" still gets printed
'
filename
You can also use:
sed -n 's/.*\/\([^\/]\{1,\}\)$/\1/p'
or
sed -n 's/.*\/\([^\/]*\)$/\1/p'
I have bunch of string like "{one}two", where "{one}" could be different and "two" is always the same. I need to replace original sting with "three{one}", "three" is also constant. It could be easily done with python, for example, but I need it to be done with shell tools, like sed or awk.
If I understand correctly, you want:
{one}two --> three{one}
{two}two --> three{two}
{n}two --> three{n}
SED with a backreference will do that:
echo "{one}two" | sed 's/\(.*\)two$/three\1/'
The search store all text up to your fixed string, and then replace with the your new string pre-appended to the stored text. SED is greedy by default, so it should grab all text up to your fixed string even if there's some repeat in the variable part (e.gxx`., {two}two will still remap to three{two} properly).
Using sed:
s="{one}two"
sed 's/^\(.*\)two/three\1/' <<< "$s"
three{one}
echo "XXXtwo" | sed -E 's/(.*)two/three\1/'
Here's a Bash only solution:
string="{one}two"
echo "three${string/two/}"
awk '{a=gensub(/(.*)two/,"three\\1","g"); print a}' <<< "{one}two"
Output:
three{one}
awk '/{.*}two/ { split($0,s,"}"); print "three"s[1]"}" }' <<< "{one}two"
does also output
three{one}
Here, we are using awk to find the correct lines, and then split on "}" (which means your lines should not contain more than the one to indicate the field).
Through GNU sed,
$ echo 'foo {one}two bar' | sed -r 's/(\{[^}]*\})two/three\1/g'
foo three{one} bar
Basic sed,
$ echo 'foo {one}two bar' | sed 's/\({[^}]*}\)two/three\1/g'
foo three{one} bar
How can I replace every 5th comma in some input with a newline?
For example:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
becomes
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
Looking for a one-liner using something like sed...
This should work:
sed 's/\(\([^,]*,\)\{4\}[^,]*\),/\1\n/g'
Example:
$ echo "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15" |
> sed 's/\(\([^,]*,\)\{4\}[^,]*\),/\1\n/g'
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
This expression will do.
sed 's/\(\([0-9]\+,\)\{4\}\)\([0-9]\+\),/\1\3\n/g'
http://ideone.com/d4Va2
$ echo -n 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 | xargs -d, printf '%d,%d,%d,%d,%d\n'
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
The accepted solution works, but is overly complicated. Try:
sed ':d s/,/\n/5; P; D; Td'
Not all sed allow commands to be separated by semi-colons, so you may need a literal newline after each semi-colon. Also, I'm not sure that all sed allow a label followed by a command, so a literal newline may be required before the s command. In other words:
sed ':d
s/,/\n/5
P
D
Td'
nawk -F, '{for(i=1;i<=NF;i++){printf("%s%s",$i,i%5?",":"\n")}}' file3
test:
pearl.246> nawk -F, '{for(i=1;i<=NF;i++){printf("%s%s",$i,i%5?",":"\n")}}' file3
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
pearl.247>
I need to get a specific string from a bigger string:
From these Abcd1234_Tot9012_tore.dr or Abcd1234_Tot9012.tore.dr
I want to get those numbers which are between Tot and _ or . , so I should get 9012. Important thing is that the number of characters before and after these numbers may vary.
Could anyone give me a nice solution for this? Thanks in advance!
This should also work if you are looking only for numbers after Tot
[srikanth#myhost ~]$ echo "Abcd1234_Tot9012_tore.dr" | awk ' { match($0,/Tot([0-9]*)/,a); print a[1]; } '
9012
[srikanth#myhost ~]$ echo "Abcd1234_Tot9012.tore.dr" | awk ' { match($0,/Tot([0-9]*)/,a); print a[1]; } '
9012
I know this is tagged as bash/sed but perl is clearer for this kind of task, in my opinion. In case you're interested:
perl -ne 'print $1 if /Tot([0-9]+)[._]/' input.txt
-ne tells perl to loop the specified one-liner over the input file without printing anything by default.
The regex is readable as: match Tot, followed by a number, followed by either a dot or an underscore; capture the number (that's what the parens are for). As it's the first/capture group it's assigned to the $1 variable, which then is printed.
Pure Bash:
string="Abcd1234_Tot9012_tore.dr" # or ".tore.dr"
string=${string##*_Tot}
string=${string%%[_.]*}
echo "$string"
Remove longest leading part ending with '_Tot'.
Remove longest trailing part beginning with '_' or '.'.
Result:
9012
awk
string="Abcd1234_Tot9012_tore.dr"
num=$(awk -F'Tot|[._]' '{print $3}' <<<"$string")
sed
string="Abcd1234_Tot9012_tore.dr"
num=$(sed 's/.*\([0-9]\{4\}\).*$/\1/' <<<"$string")
Example
$ string="Abcd1234_Tot9012_tore.dr"; awk -F'Tot|[._]' '{print $3}' <<<"$string"
9012
$ string="Abcd1234_Tot9013.tore.dr"; sed 's/.*\([0-9]\{4\}\).*$/\1/' <<<"$string"
9013
You can use perl one-liner:
perl -pe 's/.*(?<=Tot)([0-9]{4}).*/\1/' file
Test:
[jaypal:~/Temp] cat file
Abcd1234_Tot9012_tore.dr
Abcd1234_Tot9012.tore.dr
[jaypal:~/Temp] perl -pe 's/.*(?<=Tot)([0-9]{4}).*/\1/' file
9012
9012
Using grep you can do:
str=Abcd1234_Tot9012.tore.dr; grep -o "Tot[0-9]*" <<< $str|grep -o "[0-9]*$"
OUTPUT:
9012
This might work for you:
echo -e "Abcd1234_Tot9012_tore.dr\nAbcd1234_Tot9012.tore.dr" |
sed 's/Tot[^0-9]*\([0-9]*\)[_.].*/\n\1/;s/.*\n//'
9012
9012
This works equally as well:
echo -e "Abcd1234_Tot9012_tore.dr\nAbcd1234_Tot9012.tore.dr" |
sed 's/.*Tot\([0-9]*\).*/\1/'
9012
9012
I want to extract a certain part of a string, if it exists. I'm interested in the xml filename, i.e i want whats between an "_" and ".xml".
This is ok, it prints "555"
MYSTRING=`echo "/sdd/ee/publ/xmlfile_555.xml" | sed 's/^.*_\([0-9]*\).xml/\1/'`
echo "STRING = $MYSTRING"
This is not ok because it returns the whole string. In this case I don't want any result.
It prints "/sdd/ee/publ/xmlfile.xml"
MYSTRING=`echo "/sdd/ee/publ/xmlfile.xml" | sed 's/^.*_\([0-9]*\).xml/\1/'`
echo "STRING = $MYSTRING"
Any ideas how to get an "empty" result in the second case.
thanks!
You just need to tell sed to keep its mouth shut if it doesn't find a match. The -n option is used for that.
MYSTRING=`echo "/sdd/ee/publ/xmlfile_555.xml" | sed -n 's/^.*_\([0-9]*\)\.xml/\1/p'`
I only made two changes to what you had: the aforementioned -n option to sed, and the p flag that comes after the s/// command, which tells sed to print the output only if the substitution was successfully done.
EDIT: I've also escaped the final . as suggested in the comments.
Try this?
basename /sdd/ee/publ/xmlfile_555.xml | awk -F_ '{print $2}'
The output is 555.xml
With the other one.
basename /sdd/ee/publ/xmlfile.xml | awk -F_ '{print $2}'
The output is an empty string.
$ path=/sdd/ee/publ/xmlfile_555.xml
$ echo ${path##*/}
xmlfile_555.xml
$ path=${path##*/}
$ echo ${path%.xml}
xmlfile_555
$ path=${path%.xml}
$ echo ${path##*_}
555