cut a string after a specified pattern (comma) - shell

I want to cut a string and assign it to a variable after first occurrence of comma.
my_string="a,b,c,d,e,f"
Output expected:
output="b,c,d,e,f"
When I use the command
output=`echo $my_string | cut -d ',' f2
I am getting only b as output.

Adding a dash '-' to the end of your -f2 will output the remainder of the string.
$ echo "a,b,c,d,e,f,g"|cut -d, -f2-
b,c,d,e,f,g

With parameter expansion instead of cut:
$ my_string="a,b,c,d,e,f"
$ output="${my_string#*,}"
$ echo "$output"
b,c,d,e,f
${my_string#*,} stands for "remove everything up to and including the first comma from my_string" (see the Bash manual).

You must add the minus sign (-) after the position you are looking for.
a=`echo $my_string|cut -d "," -f 2-`
echo $a
b,c,d,e,f

Related

Replace one character by the other (and vice-versa) in shell

Say I have strings that look like this:
$ a='/o\\'
$ echo $a
/o\
$ b='\//\\\\/'
$ echo $b
\//\\/
I'd like a shell script (ideally a one-liner) to replace / occurrences by \ and vice-versa.
Suppose the command is called invert, it would yield (in a shell prompt):
$ invert $a
\o/
$ invert $b
/\\//\
For example using sed, it seems unavoidable to use a temporary character, which is not great, like so:
$ echo $a | sed 's#/#%#g' | sed 's#\\#/#g' | sed 's#%#\\#g'
\o/
$ echo $b | sed 's#/#%#g' | sed 's#\\#/#g' | sed 's#%#\\#g'
/\\//\
For some context, this is useful for proper printing of git log --graph --all | tac (I like to see newer commits at the bottom).
tr is your friend:
% echo 'abc' | tr ab ba
bac
% echo '/o\' | tr '\\/' '/\\'
\o/
(escaping the backslashes in the output might require a separate step)
I think this can be done with (g)awk:
$ echo a/\\b\\/c | gawk -F "/" 'BEGIN{ OFS="\\" } { for(i=1;i<=NF;i++) gsub(/\\/,"/",$i); print $0; }'
a\/b/\c
$ echo a\\/b/\\c | gawk -F "/" 'BEGIN{ OFS="\\" } { for(i=1;i<=NF;i++) gsub(/\\/,"/",$i); print $0; }'
a/\b\/c
$
-F "/" This defines the separator, The input will be split in "/", and should no longer contain a "/" character.
for(i=1;i<=NF;i++) gsub(/\\/,"/",$i);. This will replace, in all items in the input, the backslash (\) for a slash (/).
If you want to replace every instance of / with \, you can uses the y command of sed, which is quite similar to what tr does:
$ a='/o\'
$ echo "$a"
/o\
$ echo "$a" | sed 'y|/\\|\\/|'
\o/
$ b='\//\\/'
$ echo "$b"
\//\\/
$ echo "$b" | sed 'y|/\\|\\/|'
/\\//\
If you are strictly limited to GNU AWK you might get desired result following way, let file.txt content be
\//\\\\/
then
awk 'BEGIN{FPAT=".";OFS="";arr["/"]="\\";arr["\\"]="/"}{for(i=1;i<=NF;i+=1){if($i in arr){$i=arr[$i]}};print}' file.txt
gives output
/\\////\
Explanation: I inform GNU AWK that field is any single character using FPAT built-in variable and that output field separator (OFS) is empty string and create array where key-value pair represent charactertobereplace-replacement, \ needs to be escaped hence \\ denote literal \. Then for each line I iterate overall all fields using for loop and if given field hold character present in array arr keys I do exchange it for corresponding value, after loop I print line.
(tested in gawk 4.2.1)

Need to split 1st string before delimiter which is comma(,)

Need to split the 1st string before delimiter comma.
For example
A="ABC:20.10.0-5,DEF:21.10.0-9,XYZ:20.10.0-9"
We need to extract 1st string before the comma(,) and the result should be like this -
B="ABC:20.10.0-5"
After this, I need to extract the numbers after colon(:) and before the dash(-). So the final value should be -
C="20.10.0"
It can be done with simple shell substitution:
A="ABC:20.10.0-5,DEF:21.10.0-9,XYZ:20.10.0-9"
B="${A%%,*}" # Remove everything after the first comma and the comma itself
nodash="${B%%-*}" # Remove everything after the dash and the dash itself
C="${nodash##*:}" # Remove everything before the colon and the colon itself
You can refer to the below code. I wasn't sure where exactly are you running it as there's only one tag and also that if you actually wanted to print double quotes in the output. Hence, i added them as well. I am assuming you are aware of cut command.
-bash-4.2$ cat test2.sh
#!/bin/bash
A="ABC:20.10.0-5,DEF:21.10.0-9,XYZ:20.10.0-9"
echo "A="\"$A\"
B=`echo $A | cut -d"," -f1`
echo "B="\"$B\"
C=`echo $B | cut -d":" -f2 | cut -d"-" -f1`
echo "C="\"$C\"
-bash-4.2$ ./test2.sh
A="ABC:20.10.0-5,DEF:21.10.0-9,XYZ:20.10.0-9"
B="ABC:20.10.0-5"
C="20.10.0"

Bash split word with same characters

How can I split string which contain more of the same characters.
For example name=John:adress=London. I need result name as variable and John:adress=London as value.
I have no idea how to. Thanks.
You can use cut.
# print first field
echo "name=John:#(ADDRESS=(LONDON=(STREET=XY)))" | cut -d = -f 1
# print remaining fields
echo "name=John:#(ADDRESS=(LONDON=(STREET=XY)))" | cut -d = -f 2-
You can use a cut and command grouping
INPUT='name=#(ADDRESS=(LONDON=(STREET=XY)))'
NAME=$(echo "$STR" | cut -d '=' -f 1)
INFO=$(echo "$STR" | cut -d '=' -f 2-)
The single quotes in the first line prevent any special bash symbols to be interpreted literally. The variable $NAME accepts the value of a command grouping, signified by $(). The $INPUT is echoed into the cut command, where the delimiter = is specified by the -d flag, and the first field (-f flag) is specified.
Next, the variable $INFO is assigned the value of the command grouping, where the second field until the end is signified. The dash after the two in this part: -f 2- tells bash to select everything after the first = sign to the end.
The first equals sign will not be in the $INFO variable at the end.

How to process large csv files efficiently using shell script, to get better performance than that for following script?

I have a large csv file input_file with 5 columns. I want to do two things to second column:
(1) Remove last character
(2) Append leading and trailing single quote
Following are the sample rows from input_file.dat
420374,2014-04-06T18:44:58.314Z,214537888,12462,1
420374,2014-04-06T18:44:58.325Z,214537850,10471,1
281626,2014-04-06T09:40:13.032Z,214535653,1883,1
Sample output would look like :
420374,'2014-04-06T18:44:58.314',214537888,12462,1
420374,'2014-04-06T18:44:58.325',214537850,10471,1
281626,'2014-04-06T09:40:13.032',214535653,1883,1
I have written a following code to do the same.
#!/bin/sh
inputfilename=input_file.dat
outputfilename=output_file.dat
count=1
while read line
do
echo $count
count=$((count + 1))
v1=$(echo $line | cut -d ',' -f1)
v2=$(echo $line | cut -d ',' -f2)
v3=$(echo $line | cut -d ',' -f3)
v4=$(echo $line | cut -d ',' -f4)
v5=$(echo $line | cut -d ',' -f5)
v2len=${#v2}
v2len=$((v2len -1))
newv2=${v2:0:$v2len}
newv2="'$newv2'"
row=$v1,$newv2,$v3,$v4,$v5
echo $row >> $outputfilename
done < $inputfilename
But it's taking lot of time.
Is there any efficient way to achieve this?
You can do this with awk
awk -v q="'" 'BEGIN{FS=OFS=","} {$2=q substr($2,1,length($2)-1) q}1' input_file.dat
How it works:
BEGIN{FS=OFS=","} : set input and output field separator (FS, OFS) to ,.
-v q="'" : assign a literal single quote to the variable q (to avoid complex escaping in the awk expression)
{$2=q substr($2,1,length($2)-1) q} : Replace the second field ($2) with a single quote (q) followed by the value of the 2nd field without the last character (substr(string, start, length)) and appending a literal single quote (q) at the end.
1 : Just invoke the default action, which is print the current (edited) line.

How to split a string in shell and get the last field

Suppose I have the string 1:2:3:4:5 and I want to get its last field (5 in this case). How do I do that using Bash? I tried cut, but I don't know how to specify the last field with -f.
You can use string operators:
$ foo=1:2:3:4:5
$ echo ${foo##*:}
5
This trims everything from the front until a ':', greedily.
${foo <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}
Another way is to reverse before and after cut:
$ echo ab:cd:ef | rev | cut -d: -f1 | rev
ef
This makes it very easy to get the last but one field, or any range of fields numbered from the end.
It's difficult to get the last field using cut, but here are some solutions in awk and perl
echo 1:2:3:4:5 | awk -F: '{print $NF}'
echo 1:2:3:4:5 | perl -F: -wane 'print $F[-1]'
Assuming fairly simple usage (no escaping of the delimiter, for example), you can use grep:
$ echo "1:2:3:4:5" | grep -oE "[^:]+$"
5
Breakdown - find all the characters not the delimiter ([^:]) at the end of the line ($). -o only prints the matching part.
You could try something like this if you want to use cut:
echo "1:2:3:4:5" | cut -d ":" -f5
You can also use grep try like this :
echo " 1:2:3:4:5" | grep -o '[^:]*$'
One way:
var1="1:2:3:4:5"
var2=${var1##*:}
Another, using an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
var2=${var2[#]: -1}
Yet another with an array:
var1="1:2:3:4:5"
saveIFS=$IFS
IFS=":"
var2=($var1)
IFS=$saveIFS
count=${#var2[#]}
var2=${var2[$count-1]}
Using Bash (version >= 3.2) regular expressions:
var1="1:2:3:4:5"
[[ $var1 =~ :([^:]*)$ ]]
var2=${BASH_REMATCH[1]}
$ echo "a b c d e" | tr ' ' '\n' | tail -1
e
Simply translate the delimiter into a newline and choose the last entry with tail -1.
Using sed:
$ echo '1:2:3:4:5' | sed 's/.*://' # => 5
$ echo '' | sed 's/.*://' # => (empty)
$ echo ':' | sed 's/.*://' # => (empty)
$ echo ':b' | sed 's/.*://' # => b
$ echo '::c' | sed 's/.*://' # => c
$ echo 'a' | sed 's/.*://' # => a
$ echo 'a:' | sed 's/.*://' # => (empty)
$ echo 'a:b' | sed 's/.*://' # => b
$ echo 'a::c' | sed 's/.*://' # => c
There are many good answers here, but still I want to share this one using basename :
basename $(echo "a:b:c:d:e" | tr ':' '/')
However it will fail if there are already some '/' in your string.
If slash / is your delimiter then you just have to (and should) use basename.
It's not the best answer but it just shows how you can be creative using bash commands.
If your last field is a single character, you could do this:
a="1:2:3:4:5"
echo ${a: -1}
echo ${a:(-1)}
Check string manipulation in bash.
Using Bash.
$ var1="1:2:3:4:0"
$ IFS=":"
$ set -- $var1
$ eval echo \$${#}
0
echo "a:b:c:d:e"|xargs -d : -n1|tail -1
First use xargs split it using ":",-n1 means every line only have one part.Then,pring the last part.
Regex matching in sed is greedy (always goes to the last occurrence), which you can use to your advantage here:
$ foo=1:2:3:4:5
$ echo ${foo} | sed "s/.*://"
5
A solution using the read builtin:
IFS=':' read -a fields <<< "1:2:3:4:5"
echo "${fields[4]}"
Or, to make it more generic:
echo "${fields[-1]}" # prints the last item
for x in `echo $str | tr ";" "\n"`; do echo $x; done
improving from #mateusz-piotrowski and #user3133260 answer,
echo "a:b:c:d::e:: ::" | tr ':' ' ' | xargs | tr ' ' '\n' | tail -1
first, tr ':' ' ' -> replace ':' with whitespace
then, trim with xargs
after that, tr ' ' '\n' -> replace remained whitespace to newline
lastly, tail -1 -> get the last string
For those that comfortable with Python, https://github.com/Russell91/pythonpy is a nice choice to solve this problem.
$ echo "a:b:c:d:e" | py -x 'x.split(":")[-1]'
From the pythonpy help: -x treat each row of stdin as x.
With that tool, it is easy to write python code that gets applied to the input.
Edit (Dec 2020):
Pythonpy is no longer online.
Here is an alternative:
$ echo "a:b:c:d:e" | python -c 'import sys; sys.stdout.write(sys.stdin.read().split(":")[-1])'
it contains more boilerplate code (i.e. sys.stdout.read/write) but requires only std libraries from python.

Resources