add character to particular field using awk

add character to particular field using awk - shell

How to add '0' to the first and 9th digit of 2nd field using awk?
data
12345,20150303024955
output
12345,0201503030024955
I am just new to shell script.

Assuming by "add to" you mean "prefix with":
$ echo '12345,20150303024955' |
awk 'BEGIN{FS=OFS=","} {sub(/.{8}/,"&0",$2); $2="0"$2}1'
12345,0201503030024955

You asked for awk but this is also easy to do in sed:
$ echo '12345,20150303024955' | sed -r 's/,(.{8})/,0\10/'
12345,0201503030024955
How it works
-r
Turn on extended regex so that we don't need backslash escapes.
s/,(.{8})/,0\10/
Look for a comma followed by eight characters. Replace that with a comma, a zero, those eight characters, and another zero.

Related

Split 2nd occurrence of pattern of camel style text in sed

I am trying to create a key, value strings table for a mac app using sed and awk. So far I have got it to the point of having lines like:
"exif:DateTimeOriginal" = "DateTimeOriginal:\t";
I want to do a final step to get:
"exif:DateTimeOriginal" = "Date Time Original:\t";
In other words split up the second occurrence of the camel text.
I have seen sed like this:
sed 's/\([A-Z]\)/ \1/g'
Which would do it globally and then just do the 2nd occurrence with:
sed 's/\([A-Z]\)/ \1/2g'
Or is it 3rd occurrence. However, unfortunately on macos you can't combine a number with the g command.
So is there another way to do this?
BTW, I could make it so that you start with:
"exif:DateTimeOriginal" = DateTimeOriginal:\t";
That is, leave out the leading quote of the camel text, so that if a leading space is added by splitting the camel text, it would be added after the = which wouldn't matter. Then add the leading quote after the camel text is split.

Here is how you could do it with sed:
sed -E -e ':a' -e 's/^([^=]+)= (.*)([a-z])([A-Z])/\1= \2\3 \4/' -e 'ta'
The idea is to apply repeated substitutions (:a and ta) where you match the part you don't want to change ([^=]+) and then insert a space between a lowercase letter followed by an upper case letter ([a-z][A-Z]) in the remainder.

with GNU awk (not the default for your OS).
$ awk -F'"' -v OFS='"' '{$4=gensub(/([^A-Z])([A-Z])/,"\\1 \\2","g",$4)}1' file
"exif:DateTimeOriginal" = "Date Time Original:\t";
you may need [:lower:] or [:upper:] char classes based on your locale.

With any POSIX awk:
$ awk 'BEGIN{FS=OFS="\""} {gsub(/[[:upper:]]/," &",$4); sub(/^ /,"",$4)} 1' file
"exif:DateTimeOriginal" = "Date Time Original:\t";

This might work for you (GNU sed):
sed 'h;s/\B[[:upper:]]/ &/g;H;x;s/=.*=/=/' file
Make a copy of the current line.
Insert a space before all capitals within a word.
Append the result to the original line.
Remove the tail of the original line and the head of the result.

Using Perl
$ echo '"exif:DateTimeOriginal" = DateTimeOriginal:\t"' | perl -F'"' -lane ' $F[2]=~s/(?=[A-Z])/ /g;$F[2]=~s/\s+=\s+/=\"/g; print "\"$F[1]\"$F[2]\"" '
"exif:DateTimeOriginal"="Date Time Original: "
$

how to grep everything between single quotes?

I am having trouble figuring out how to grep the characters between two single quotes .
I have this in a file
version: '8.x-1.0-alpha1'
and I like to have the output like this (the version numbers can be various):
8.x-1.0-alpha1
I wrote the following but it does not work:
cat myfile.txt | grep -e 'version' | sed 's/.*\?'\(.*?\)'.*//g'
Thank you for your help.
Addition:
I used the sed command sed -n "s#version:\s*'\(.*\)'#\1#p"
I also like to remove 8.x- which I edited to sed -n "s#version:\s*'8.x-\(.*\)'#\1#p".
This command only works on linux and it does not work on MAC. How to change this command to make it works on MAC?
sed -n "s#version:\s*'8.x-\(.*\)'#\1#p"

If you just want to have that information from the file, and only that you can quickly do:
awk -F"'" '/version/{print $2}' file
Example:
$ echo "version: '8.x-1.0-alpha1'" | awk -F"'" '/version/{print $2}'
8.x-1.0-alpha1
How does this work?
An awk program is a series of pattern-action pairs, written as:
condition { action }
condition { action }
...
where condition is typically an expression and action a series of commands.
-F "'": Here we tell awk to define the field separator FS to be a <single quote> '. This means the all lines will be split in fields $1, $2, ... ,$NF and between each field there is a '. We can now reference these fields by using $1 for the first field, $2 for the second ... etc and this till $NF where NF is the total number of fields per line.
/version/{print $2}: This is the condition-action pair.
condition: /version/:: The condition reads: If a substring in the current record/line matches the regular expression /version/ then do action. Here, this is simply translated as if the current line contains a substring version
action: {print $2}:: If the previous condition is satisfied, then print the second field. In this case, the second field would be what the OP requests.
There are now several things that can be done.
Improve the condition to be /^version :/ && NF==3 which reads _If the current line starts with the substring version : and the current line has 3 fields then do action
If you only want the first occurance, you can tell the system to exit immediately after the find by updating the action to {print $2; exit}

I'd use GNU grep with pcre regexes:
grep -oP "version: '\\K.*(?=')" file
where we are looking for "version: '" and then the \K directive will forget what it just saw, leaving .*(?=') to match up to the last single quote.

Try something like this: sed -n "s#version:\s*'\(.*\)'#\1#p" myfile.txt. This avoids the redundant cat and grep by finding the "version" line and extracting the contents between the single quotes.
Explanation:
the -n flag tells sed not to print lines automatically. We then use the p command at the end of our sed pattern to explicitly print when we've found the version line.
Search for pattern: version:\s*'\(.*\)'
version:\s* Match "version:" followed by any amount of whitespace
'\(.*\)' Match a single ', then capture everything until the next '
Replace with: \1; This is the first (and only) capture group above, containing contents between single quotes.

When your only want to look at he quotes, you can use cut.
grep -e 'version' myfile.txt | cut -d "'" -f2

grep can almost do this alone:
grep -o "'.*'" file.txt
But this may also print lines you don't want to: it will print all lines with 2 single quotes (') in them. And the output still has the single quotes (') around it:
'8.x-1.0-alpha1'
But sed alone can do it properly:
sed -rn "s/^version: +'([^']+)'.*/\1/p" file.txt

Shell Script Replace a Specified Column with sed

I have a example dataset separated by semicolon as below;
123;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
I would like to replace values in a specified column. Lets say I want to change "ZMIR" AS "IZMIR" but only for the third column, the ones on the second column must stay the same.
Desired output is;
123;IZMIR;IZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;IZMIR;bob
BBB;ANKR;RRRR;ABC
I tried;
sed 's/;ZMIR;/;IZMIR;/' file.txt
the problem is that it changes all the values on the file not just the 3rd one.
I also tried;
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
and here it specifies the column but, it somehow adds spaces;
123 I;IZMIR; ZMIR 123
abc;ANKAR;aaa;999
AAA ;IZMIR; ZMIR bob
BBB;ANKR;RRRR;ABC

sed doesn't know about columns, awk does (but in awk they're called "fields"):
awk 'BEGIN{FS=OFS=";"} $3=="ZMIR"{$3="IZMIR"} 1' file
Note that since the above is doing a literal string search and replace, you don't have to worry about regexp or backreference metacharacters in the search or replacement strings, unlike in a sed solution (see https://stackoverflow.com/a/29626460/1745001).
wrt what you tried previously with awk:
awk -F";" '{gsub("ZMIR",";IZMIR;",$2)}1'
That says: find "ZMIR" in the 2nd semi-colon-separated field and replace it with ";IZMIR;" and also change every existing ";" on the line to a blank character.
To learn awk, read the book Effective Awk Programming, 4th Edition, by Arnold Robbins.

If you exactly know where the word to replace is located and how many of them are in that line you could use sed with something like:
sed '3 s/ZMIR/IZMIR/2'
With the 3 in the beginning you are selecting the third line and with the 2 in the end the second occurrence. However the awk solution is a better one. But just that you know how it works in sed ;)

This might work for you (GNU sed):
sed -r 's/[^;]+/\n&\n/3;s/\nZMIR\n/IZMIR/;s/\n//g' file
Surround the required field by unique markers then replace the required string (plus markers) by the replacement string. Finally remove the unique markers.

Perl on Command Line
Input
123;IZMIR;ZMIR;123
000;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
$. == 1 means first row it does the work only for this row So second row $. == 2
$F[0] means first column and it only does on this column So fourth column $F[3]
-a -F\; means that delimiter is ;
what you want
perl -a -F\; -pe 's/$F[0]/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
for row == 2 and column == 2
perl -a -F\; -pe 's/$F[1]/***/ if $. == 2' your-file
123;IZMIR;ZMIR;123
abc;***;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
Also without -a -F
perl -pe 's/123/***/ if $. == 1' your-file
output
***;IZMIR;ZMIR;123
abc;ANKAR;aaa;999
AAA;ZMIR;ZMIR;bob
BBB;ANKR;RRRR;ABC
If you want to edit you can add -i option that means Edit in-place And that's it, it simply find, replace and save in the same file
perl -i -a -F\; and so on

You need to include some absolute references in the line:
^ for beginning of the line
unequivocal separation pattern
^.*ZMIR and [^;]*;ZMIR give different values where first take everything before ZMIR and sed take the longest possible
Specific
sed 's/^\([^;]*;[^;]*;\)ZMIR;/\1IZMIR;/' YourFile
generic where Old and New are batch variable (Remember, this is regex value so regex rules to apply like escaping some char)
#Old='ZMIR'
#New='IZMIR'
sed 's/^\(\([^;]*;\)\{2\}\)'${Old}';/\1'${New}';/' YourFile
In this simple case sed is an alternative, but awk is better for a complex or long line.

Bash: Filtering records in a file based on multi column delimiter

Need help in Bash to filter records based on a multicolumn delimiter.
Delimiter is |^^|
Sample record
xyz#ATT.NET|^^|xyz|^^|307
Awk runs file when used with single character delimiter but not with multi character.
awk -F"|^^|" "NF !=3 {print}" file.txt
Any suggestions?

The issue is that every character in your delimiter is a regexp metacharacter so you need to escape them when appropriate so awk knows you want them treated literally. This might be overkill:
awk -F'\\|\\^\\^\\|' 'NF!=3' file.txt
but I can't test it since you only provided one line of input, not the selection of lines some of which do/don't match that'd be required to test the script.

awk -F "<regex>" ...
It is not a multicolumn delimiter, is is a regular expression
simple regex,
such as match this single char are what you get use to,
but not all there is.

One way is to escape all the regex characters as #Ed Morton answered.
Alternatively,
you can replace all |^^| with a single character which never shows in your file content, here let's say a comma
sed 's/|^^|/,/g' file.txt
xyz#ATT.NET,xyz,307
The command would be
sed 's/|^^|/,/g' file.txt | awk -F, 'NF != 3'

Remove all characters existing between first n occurences of a specific character in each line in shell

Say I have txt file with characters as follows:
abcd|123|kds|Name|Place|Phone
ldkdsd|323|jkds|Name1|Place1|Phone1
I want to remove all the characters in each line that exist within first 3 occurences of | character in each line. I want my output as:
Name|Place|Phone
Name1|Place1|Phone1
Could anyone help me figure this out? How can I achieve this using sed?

This would be a typical task for cut
cut -d'|' -f4- file
output:
Name|Place|Phone
Name1|Place1|Phone1
the -f4- means you want from the forth field till the end. Adjust the 4 if you have a different requirement.

You could try the below sed commad,
$ sed -r 's/^(\s*)[^|]*\|[^|]*\|[^|]*\|/\1/g' file
Name|Place|Phone
Name1|Place1|Phone1
^(\s*) captures all the spaces which are at the start.
[^|]*\|[^|]*\|[^|]*\| Matches upto the third |. So this abcd|123|kds| will be matched.
All the matched characters are replaced by the chars which are present inside the first captured group.

This might work for you (GNU sed):
sed 's/^\([^|]*|\)\{3\}//' file
or more readably:
sed -r 's/^([^|]*\|){3}//' file

sed 's/\(\([^|]*|\)\{3\}\)//' YourFile
this is a posix version, on GNU sed force --posix due to the use of | that is interpreted as "OR" and not in posix version.
Explaination
Replace the 3 first occurence (\{3\}) of [ any charcater but | followed by | (\([^|]*|\)) ] by nothing (// that is an empty pattern)

You can print the last 3 fields:
awk '{print $(NF-2),$(NF-1),$NF}' FS=\| OFS=\| file
Name|Place|Phone
Name1|Place1|Phone1

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio