Now I have strings in the form "temp:10" and I use temp=$(echo $str|awk '{split($0,array,":")} END{print array[1]}') to split which is overkilled and slow..there must be a simpler to do this?
Use bash's parameter expansion with suffix removal:
temp=${str%%:*}
There's also the read command:
$ str="temp:10"
$ IFS=: read before after <<< "$str"
$ echo "$before"
temp
$ echo "$after"
10
If I understand you right, you need the value before the :, temp in this example. If so, then you can use the cut command:
cut -d':' -f1
Related
Suppose I have a string such like s=DNA128533_mutect2_filtered.vcf.gz. How could I extract the DNA128533 as an ID variable.
I tried
id=(cut -d_ -f1 <<< ${s}
echo $id
It seems not working. some suggestions? Thanks
No need to spend a sub-shell calling cut -d'_' -f1 and using bashism <<< "$s".
The POSIX shell grammar has built-in provision for stripping-out the trailing elements with variable expansion, without forking a costly sub-shell or using non-standard Bash specific <<<"here string".
#!/usr/bin/env sh
s=DNA128533_mutect2_filtered.vcf.gz
id=${s%%_*}
echo "$id"
You want to filter the DNA... part out of the filename. Therefore:
s="DNA128533_mutect2_filtered.vcf.gz"
id=$(echo "$s" | cut -d'_' -f1)
echo "$id"
If you want to use your way of doing it (with <<<), do this:
id=$(cut -d'_' -f1 <<< "$s")
echo "$id"
Your command has some syntax issues, like you are missing ).
And you want the output of the command to be stored in variable id, so you have to make it run via the $( ) syntax.
IFS is the bash way delimiter, we can cut string as below:
IFS='_' read -r -a array <<< "a_b_c_d"
echo "${array[0]}"
I have a string like that:
rthbp-0.0.3-sources.jar
And i want to get a new string called in someway (like string2) with the original string with just the version , so - '0.0.3' now rthbp- at the start will remain constant but after version "-" (-sources.jar) may change.
is that possible in bash, just to extract the version info ?
I am doing this - echo ${f:6} but only gives me 0.0.3-sources.jar
If you're using bash, you can extract an arbitrary substring by specifying both offset and length:
$ filename=rthbp-0.0.3-sources.jar
$ echo "${filename:6:5}"
#=> 0.0.3
But using exact character offsets like that is fragile. You might want to use something like this:
$ IFS=- read pre version post <<<"$filename"
$ echo "$version"
#=> 0.0.3
Or, somewhat more clunkily:
$ ltrim=${filename%%-*}-
$ rest=${filename#$ltrim}
$ version=${rest%%-*}
Or as others mentioned you could call out to cut or awk to do the splitting for you..
No one has mentioned regular expression matching yet, so I will.
[[ $string1 =~ rthbp-(.*)-sources.jar ]]
version=${BASH_REMATCH[1]}
(You may want a slightly more general regular expression; this just demonstrates how to match against a regular expression containing a capture group and how to extract the captured value.)
You can use read built-in:
s='rthbp-0.0.3-sources.jar'
IFS=- read a ver _ <<< "$s" && echo "$ver"
0.0.3
I would recommend just using cut for this. Define the delimiter as dashes and keep field two:
$ echo "rthbp-0.0.3-sources.jar" | cut -d'-' -f 2
0.0.3
If you want to use pure bash, you can use parameter expansion, but it isn't as clean. Assuming the version always starts in the same place and is the same length, you can use:
$ str="rthbp-0.0.3-sources.jar"
$ echo "${str:6:5}"
0.0.3
Another variant of regex match:
$ echo "rthbp-0.0.3-sources.jar"|grep -e '([[:digit:]]+\.)+[[:digit:]]+' -oP
0.0.3
echo "rthbp-0.0.3-sources.jar" | awk -F- '$2=="0.0.3"{print $2}'
0.0.3
echo "rthbp-0.0.3-sources.jar" | awk -F- '{print $2}'
0.0.3
I suppose to strip down a substring in my shell script. I am trying as follows:
fileName="Test_VSS_TT.csv.old"
here i want to remove the string ".csv.old" and my
test=${fileName%.*}
but getting bad substitution error.
you are looking for test=${filename%%.*}
the doc for parameter expansion in bash here and in zsh here
%.* will match the first .* pattern, whereas %%.* will match the longest one
[edit]
if sed is available, you could try something like that : echo "filename.txt.bin" | sed "s/\..*//g" which yields filename
Here you go,
$ echo $f
Test_VSS_TT.csv.old
$ test=${f%%.*}
$ echo $test
Test_VSS_TT
%% will do a longest match. So it matches from the first dot upto the last and then removes the matched characters.
If your intention is to extract file name without extension, then how about this?
$ echo ${fileName}
Test_VSS_TT.csv.old
$ test=`echo ${fileName} |cut -d '.' -f1`
$ echo $test
Test_VSS_TT
echo "Test_VSS_TT.csv.old"| awk -F"." '{print $1}'
Hy,
Can someone help me with splitting mac addresses from a log file? :-)
This:
000E0C7F6676
should be:
00:0E:0C:7F:66:76
Atm i split this with OpenOffice but with over 200 MAC Address' this is very boring and slow...
It would be nice if the solution is in bash. :-)
Thanks in advance.
A simple sed script ought to do it.
sed -e 's/[0-9A-F]\{2\}/&:/g' -e 's/:$//' myFile
That'll take a list of mac addresses in myFile, one per line, and insert a ':' after every two hex-digits, and finally remove the last one.
$ mac=000E0C7F6676
$ s=${mac:0:2}
$ for((i=1;i<${#mac};i+=2)); do s=$s:${mac:$i:2}; done
$ echo $s
00:00:E0:C7:F6:67:6
Pure Bash. This snippet
mac='000E0C7F6676'
array=()
for (( CNTR=0; CNTR<${#mac}; CNTR+=2 )); do
array+=( ${mac:CNTR:2} )
done
IFS=':'
string="${array[*]}"
echo -e "$string"
prints
00:0E:0C:7F:66:76
$ perl -lne 'print join ":", $1 =~ /(..)/g while /\b([\da-f]{12})\b/ig' file.log
00:0E:0C:7F:66:76
If you prefer to save it as a program, use
#! /usr/bin/perl -ln
print join ":" => $1 =~ /(..)/g
while /\b([\da-f]{12})\b/ig;
Sample run:
$ ./macs file.log
00:0E:0C:7F:66:76
imo, regular expressions are the wrong tool for a fixed width string.
perl -alne 'print join(":",unpack("A2A2A2A2A2A2",$_))' filename
Alternatively,
gawk -v FIELDWIDTHS='2 2 2 2 2 2' -v OFS=':' '{$1=$1;print }'
That's a little funky with the assignment to change the behavior of print. Might be more clear to just print $1,$2,$3,$4,$5,$6
Requires Bash version >= 3.2
#!/bin/bash
for i in {1..6}
do
pattern+='([[:xdigit:]]{2})'
done
saveIFS=$IFS
IFS=':'
while read -r line
do
[[ $line =~ $pattern ]]
mac="${BASH_REMATCH[*]:1}"
echo "$mac"
done < macfile.txt > newfile.txt
IFS=$saveIFS
If your file contains other information besides MAC addresses that you want to preserve, you'll need to modify the regex and possibly move the IFS manipulation inside the loop.
Unfortunately, there's not an equivalent in Bash to sed 's/../&:/' using something like ${mac//??/??:/}.
a='0123456789AB'
m=${a:0:2}:${a:2:2}:${a:4:2}:${a:6:2}:${a:8:2}:${a:10:2}
result:
01:23:45:67:89:AB
I am a newbie in Bash and I am doing some string manipulation.
I have the following file among other files in my directory:
jdk-6u20-solaris-i586.sh
I am doing the following to get jdk-6u20 in my script:
myvar=`ls -la | awk '{print $9}' | egrep "i586" | cut -c1-8`
echo $myvar
but now I want to convert jdk-6u20 to jdk1.6.0_20. I can't seem to figure out how to do it.
It must be as generic as possible. For example if I had jdk-6u25, I should be able to convert it at the same way to jdk1.6.0_25 so on and so forth
Any suggestions?
Depending on exactly how generic you want it, and how standard your inputs will be, you can probably use AWK to do everything. By using FS="regexp" to specify field separators, you can break down the original string by whatever tokens make the most sense, and put them back together in whatever order using printf.
For example, assuming both dashes and the letter 'u' are only used to separate fields:
myvar="jdk-6u20-solaris-i586.sh"
echo $myvar | awk 'BEGIN {FS="[-u]"}; {printf "%s1.%s.0_%s",$1,$2,$3}'
Flavour according to taste.
Using only Bash:
for file in jdk*i586*
do
file="${file%*-solaris*}"
file="${file/-/1.}"
file="${file/u/.0_}"
do_something_with "$file"
done
i think that sed is the command for you
You can try this snippet:
for fname in *; do
newname=`echo "$fname" | sed 's,^jdk-\([0-9]\)u\([0-9][0-9]*\)-.*$,jdk1.\1.0_\2,'`
if [ "$fname" != "$newname" ]; then
echo "old $fname, new $newname"
fi
done
awk 'if(match($9,"i586")){gsub("jdk-6u20","jdk1.6.0_20");print $9;}'
The if(match()) supersedes the egrep bit if you want to use it. You could use substr($9,1,8) instead of cut as well.
garph0 has a good idea with sed; you could do
myvar=`ls jdk*i586.sh | sed 's/jdk-\([0-9]\)u\([0-9]\+\).\+$/jdk1.\1.0_\2/'`
You're needing the awk in there is an artifact of the -l switch on ls. For pattern substitution on lines of text, sed is the long-time champion:
ls | sed -n '/^jdk/s/jdk-\([0-9][0-9]*\)u\([0-9][0-9]*\)$/jdk1.\1.0_\2/p'
This was written in "old-school" sed which should have greater portability across platforms. The expression says:
don't print lines unless they match -n
on lines beginning with 'jdk' do:
on a line that contains only "jdk-IntegerAuIntegerB"
change it to "jdk.1.IntegerA.0_IntegerB"
and print it
Your sample becomes even simpler as:
myvar=`echo *solaris-i586.sh | sed 's/-solaris-i586\.sh//'`