How to remove string between two characters and before the first occurrence using sed - bash

I would like to remove the string between ":" and the first "|" using sed.
input:
|abc:1.2.3|def|
output from sed:
|abc|def|
I managed to come up with sed 's|\(:\)[^|]*|\1|', but this sed command does not remove the first character (":"). How can I modify this command to also remove the colon?

You don't need to group : in your pattern and use it in substitution.
You should keep it simple:
s='|abc:1.2.3|def|'
sed 's/:[^|]*//' <<< "$s"
|abc|def|
: matches a colon and [^|]* matches 0 or more non-pipe characters

1st solution: With awk you could try following awk program.
awk 'match($0,/:[^|]*/){print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH)}' Input_file
Explanation: Using match function of awk, where matching from : to till first occurrence of | here. So what match function does is, whenever a regex is matched in it, it will SET values for its OOTB variables named RSTART and RLENGTH, so based on that we are printing sub-string to neglect matched part and print everything else as per required output in question.
2nd solution: Using FPAT option in GNU awk, try following, written and tested with your shown samples only.
awk -v FPAT=':[^|]*' '{print $1,$2}' Input_file

Related

sed extract part of string from a file

I've ben trying to extract only part of string from a file looking like this:
str1=USER_NAME
str2=justAstring
str3=https://product.org/v-4.5-bin.zip
str4=USER_HOME
I need to extract ONLY the version - in this case: 4.5
I did it by grep and then sed but now the output is 4.5-bin.zip
-> grep str3 file.txt
str3=https://product.org/v-4.5-bin.zip
-> echo str3=https://product.org/v-4.5-bin.zip | sed -n "s/^.*v-\(\S*\)/\1/p"
4.5-bin.zip
What should I do in order to remove also the -bin.zip at the end?
Thanks.
1st solution: With your shown samples, please try following sed code.
sed -n '/^str3=/s/.*-\([^-]*\)-.*/\1/p' Input_file
Explanation: Using sed's -n option which will STOP printing of values by default, to only print matched part. In main program checking condition if line starts from str3= then perform substitution there. In substitution catching everything between 1st - and next - in a capturing group and substituting whole line with it by using \1 and printing the matched portion only by using p option.
2nd solution: Using GNU grep you could try following grep program.
grep -oP '^str3=.*?-\K([^-]*)' Input_file
3rd solution: Using awk program for getting expected output as per shown smaples.
awk -F'-' '/^str3=/{print $2}' Input_file
4th solution: Using awk's match function to get expected results with help of using RSTART and RLENGTH variables which get set once a TRUE match is found by match function.
awk 'match($0,/^str3=.*-/){split(substr($0,RSTART,RLENGTH),arr,"-");print arr[2]}' Input_file
If you know the version contains just digits and dots, replace \S by [0-9.]. Also, match the remaining characters outside of the capture group to get it removed.
sed -n 's/^.*v-\([0-9.]*\).*/\1/p'

Delete first column of csv file [duplicate]

This question already has answers here:
awk - how to delete first column with field separator
(5 answers)
Closed 5 years ago.
I would like to know how i can delete the first column of a csv file with awk or sed
Something like this :
FIRST,SECOND,THIRD
To something like that
SECOND,THIRD
Thanks in advance
Following awk will be helping you in same.
awk '{sub(/[^,]*/,"");sub(/,/,"")} 1' Input_file
Following sed may also help you in same.
sed 's/\([^,]*\),\(.*\)/\2/' Input_file
Explanation:
awk ' ##Starting awk code here.
{
sub(/[^,]*/,"") ##Using sub for substituting everything till 1st occurence of comma(,) with NULL.
sub(/,/,"") ##Using sub for substituting comma with NULL in current line.
}
1 ##Mentioning 1 will print edited/non-edited lines here.
' Input_file ##Mentioning Input_file name here.
Using awk
$awk -F, -v OFS=, '{$1=$2; $2=$3; NF--;}1' file
SECOND,THIRD
With Sed
sed -i -r 's#^\w+,##g' test.csv
Grab the begin of the line ^, every character class [A-Za-z0-9] and also underscore until we found comma and replace with nothing.
Adding g after delimiters you can do a global substitution.
Using sed : ^[^,]+, regex represent the first column including the first comma. ^ means start of the line, [^,]+, means anything one or more times but a comma sign followed by a comma.
you can use -i with sed to make changes in file if needed.
sed -r 's/^[^,]+,//' input
SECOND,THIRD

substitute a letter at a specific position in the file itself using bash

I am trying to do this:
I have a file with content like below;
file:
abcdefgh
I am looking for a way to do this;
file:
aBCdefgh
So,make the 2nd and 3rd letter "capital/uppercase" in the file itself, because I have to do multiple conversions at different positions in a string in the file. Can someone please help me to know how to do this?
I came to know something like this below, but it does only for a single first character of the string in the file:
sed -i 's/^./\U&/' file
output:
Abcdefgh
Thanks much!
Change your sed approach to the following:
sed -i 's/\(.\)\(..\)/\1\U\2/' file
$ cat file
aBCdefgh
matching section:
\(.\) - match the 1st char of the string into the 1st captured group
\(..\) - match the next 2 chars placing into the 2nd captured group
replacement section:
\1 - points to the 1st parenthesized group \1 i.e. the 1st char
\U\2 - uppercase the characters from the 2nd captured group \2
Bonus approach for I want to capitalize "105th & 106th" characters:
sed -Ei 's/(.{104})(..)/\1\U\2/' file
awk on duty.
echo "abcdefgh" | awk '{print substr($0,1,1) toupper(substr($0,2,2)) substr($0,4)}'
Output will be as follows.
aBCdefgh
In case you have a Input_file and you want to save the edits into same Input_file.
awk '{print substr($0,1,1) toupper(substr($0,2,2)) substr($0,4)}' Input_file > temp_file && mv temp_file Input_file
Explanation: Please run above code as this is only for explanation purposes.
echo "abcdefgh" ##using echo command to print a string on the standard output.
| ##Pipe(|) is used for taking a command's standard output to pass as a standard input to another command(in this case echo is passing it's standard output to awk).
awk '{ ##Starting awk here.
##Print command in awk is being used to print anything variable, string etc etc.
##substring is awk's in-built utility which will allow us to get the specific parts of the line, variable. So it's syntax is substr(line/variable,starting point of the line/number,number of characters you need from the strating point mentioned), in case you haven't mentioned any number of characters it will take all the characters from starting point to till the end of the line.
##toupper, so it is also a awk's in-built utility which will covert any text to UPPER CASE passed to it, so in this case I am passing 2nd and 3rd character to it as per OP's request.
print substr($0,1,1) toupper(substr($0,2,2)) substr($0,4)}'

Shell Script Replace a Specified Column with sed

I have a example dataset separated by semicolon as below;
123;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
I would like to replace values in a specified column. Lets say I want to change "ZMIR" AS "IZMIR" but only for the third column, the ones on the second column must stay the same.
Desired output is;
123;IZMIR;IZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;IZMIR;bob
BBB;ANKR;RRRR;ABC
I tried;
sed 's/;ZMIR;/;IZMIR;/' file.txt
the problem is that it changes all the values on the file not just the 3rd one.
I also tried;
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
and here it specifies the column but, it somehow adds spaces;
123 I;IZMIR; ZMIR 123
abc;ANKAR;aaa;999
AAA ;IZMIR; ZMIR bob
BBB;ANKR;RRRR;ABC
sed doesn't know about columns, awk does (but in awk they're called "fields"):
awk 'BEGIN{FS=OFS=";"} $3=="ZMIR"{$3="IZMIR"} 1' file
Note that since the above is doing a literal string search and replace, you don't have to worry about regexp or backreference metacharacters in the search or replacement strings, unlike in a sed solution (see https://stackoverflow.com/a/29626460/1745001).
wrt what you tried previously with awk:
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
That says: find "ZMIR" in the 2nd semi-colon-separated field and replace it with ";IZMIR;" and also change every existing ";" on the line to a blank character.
To learn awk, read the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
If you exactly know where the word to replace is located and how many of them are in that line you could use sed with something like:
sed '3 s/ZMIR/IZMIR/2'
With the 3 in the beginning you are selecting the third line and with the 2 in the end the second occurrence. However the awk solution is a better one. But just that you know how it works in sed ;)
This might work for you (GNU sed):
sed -r 's/[^;]+/\n&\n/3;s/\nZMIR\n/IZMIR/;s/\n//g' file
Surround the required field by unique markers then replace the required string (plus markers) by the replacement string. Finally remove the unique markers.
Perl on Command Line
Input
123;IZMIR;ZMIR;123
000;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
$. == 1 means first row it does the work only for this row So second row $. == 2
$F[0] means first column and it only does on this column So fourth column $F[3]
-a -F\; means that delimiter is ;
what you want
perl -a -F\; -pe 's/$F[0]/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
for row == 2 and column == 2
perl -a -F\; -pe 's/$F[1]/***/ if $. == 2' your-file
123;IZMIR;ZMIR;123
abc;***;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
Also without -a -F
perl -pe 's/123/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
If you want to edit you can add -i option that means Edit in-place And that's it, it simply find, replace and save in the same file
perl -i -a -F\; and so on
You need to include some absolute references in the line:
^ for beginning of the line
unequivocal separation pattern
^.*ZMIR and [^;]*;ZMIR give different values where first take everything before ZMIR and sed take the longest possible
Specific
sed 's/^\([^;]*;[^;]*;\)ZMIR;/\1IZMIR;/' YourFile
generic where Old and New are batch variable (Remember, this is regex value so regex rules to apply like escaping some char)
#Old='ZMIR'
#New='IZMIR'
sed 's/^\(\([^;]*;\)\{2\}\)'${Old}';/\1'${New}';/' YourFile
In this simple case sed is an alternative, but awk is better for a complex or long line.

How to extract specific string in a file using awk, sed or other methods in bash?

I have a file with the following text (multiple lines with different values):
TokenRange(start_token:8050285221437500528,end_token:8051783269940793406,...
I want to extract the value of start_token and end_token. I tried awk and cut, but I am not able to figure out the best way to extract the targeted values.
Something like:
cat filename| get the values of start_token and end_token
grep -oP '(?<=token:)\d+' filename
Explanation:
-o: print only part that matches, not complete line
-P: use Perl regex engine (for look-around)
(?<=token:): positive look-behind – zero-width pattern
\d+: one or more digits
Result:
8050285221437500528
8051783269940793406
A (potentially more efficient) variant of this, as pointed out by hek2mgl in his comment, uses \K, the variable-width look-behind:
grep -oP 'token:\K\d+'
\K keeps everything that has been matched to the left of it, but does not include it in the match (see perlre).
Using awk:
awk -F '[(:,]' '{print $3, $5}' file
8050285221437500528 8051783269940793406
First value is start_token and last value is end_token.
a sed version
sed -e '/^TokenRange(/!d' -e 's/.*:\([0-9]*\),.*:\([0-9]*\),.*/\1 \2/' YourFile

Resources