Find and Replace with awk

Find and Replace with awk - bash

I have this value, cutted from .txt:
,Request Id,dummy1,dummy2,dummyN
I am trying to find and replace the space with "_", like this:
#iterator to read lines of txt
#if conditions
trim_line=$(echo "$user" | awk '{gsub(" ", "_", $0); print}')
echo $trim_line
but the echo is showing:
Id,dummy1,dummy2,dummyN
Expected output:
,Request_Id,dummy1,dummy2,dummyN
Where is my bug?
EDIT:
The echo of user is not the expected, it is:
Id,dummy1,dummy2,dummyN
And should be:
,Request Id,dummy1,dummy2,dummyN
To do this operation I am using:
for user in $(cut -d: -f1 $FILENAME)
do (....) find/replace

You can try bash search and replace substring :
echo $user
,Request Id,dummy1,dummy2,dummyN
echo ${user// /_} ## For all the spaces
,Request_Id,dummy1,dummy2,dummyN
echo ${user/ /_} ## For first match
This will replace all the blank spaces with _. Note that here two / are used after user. This is to do the search and replace operation on whole text. If you put only one / then search and replace would be done over first match.

Your problem is your use of a for loop to read the contents of your file. The shell splits the output of your command substitution $(cut -d: -f1 $FILENAME) on white space and you have one in the middle of your line, so it breaks.
Use a while read loop to read the file line by line:
while IFS=: read -r col junk; do
col=${col// /_}
# use $col here
done < "$FILENAME"
As others have mentioned, there's no need to use an external tool to make the substitution.
...That said, if you don't plan on doing something different (e.g. executing other commands) with each line, then the best option is to use awk:
awk -F: '{ gsub(/ /, "_", $1); print $1 }' "$FILENAME"
The output of this command is the first column of your input file, with the substitution made.

If your data is already in an environment variable, the fastest way is to directly use built-in bash replacement feature:
echo "${user// /_/}"
With awk, set the separator as , or the space character will be interpreted as the separator.
echo ",Request Id,dummy1,dummy2,dummyN" | awk -F, '{gsub(" ", "_", $0); print}'
,Request_Id,dummy1,dummy2,dummyN
note: if it's just to replace a character in a raw string (no tokens, no fields), bash, sed and tr are best suited.

Related

Writing the output of a command to specific columns of a csv file, unix

I wanted to write the output of command to specific columns (3rd and 5th) of the csv file.
#!/bin/bash
echo -e "Value,1\nCount,1" >> file.csv
echo "Header1,Header2,Path,Header4,Value,Header6" >> file.csv
sed 'y/ /,/' input.csv >> file.csv
input.csv in the above snippet will look something like this
1234567890 /training/folder
0325435287 /training/newfolder
Current output of file.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
1234567890,/training/folder
0325435287,/training/newfolder
Expected Output of file.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
,,/training/folder,,1234567890,
,,/training/newfolder,,0325435287,

All the operations can be done in a single awk:
awk -v OFS=, -v pre="Value,1\nCount,1" -v hdr="Header1,Header2,Path,Header4,Value,Header6" '
BEGIN {print pre; print hdr}
{print "", "", $1, "", $2, ""}
' input.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
,,i1234567890,,/training/folder,
,,0325435287,,/training/newfolder,

With sed you could try following code. Which is using sed's capability of back reference.
sed -E 's/(^[^ ]*) +(.*$)/,,\2,,\1,/' Input_file
Explanation: Using -E option of sed to enable ERE(extended regular expressions) first. Then in main program using s option to perform substitution operation. In 1st part of substitution creating 2 back references(capability to catch values by using regex and keep them in temp buffer memory to be used later on while substituting it with in 2nd part of substitution). In 2nd part of substitution substituting whole line with 2 commas followed by 2nd capturing group\2 followed by 2 commas followed by 1st capturing group \1 following by ,.

You can use awk instead of sed
cat input.csv | awk '{print ",," $1 "," $2 ","}' >> file.csv
awk can process a stdin input by line to line. It implements a print function and each word is processed as a argument (in your case, $1 and $2). In the above example, I added ,, and , as an inline argument.

You can trivially add empty columns as part of your sed script.
sed 'y/ /,/;s/,/,,/;s/^/,,/;s/$/,/' input.csv >> file.csv
This replaces the first comma with two, then adds two up front and one at the end.
Your expected output does not look like valid CSV, though. This is also brittle in that it will fail for any file names which contain a space or a comma.

How to extract two pieces of data from a string

I am trying to extract two pieces of data from a string and I have having a bit of trouble. The string is formatted like this:
11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd
What I am trying to achieve is to print the first column (11111111-2222:3333:4444:555555555555) and the third section of the colon string (cccccccc), on the same line with a space between the two, as the first column is an identifier. Ideally in a way that can just be run as one-line from the terminal.
I have tried using cut and awk but I have yet to find a good way to make this work.

How about a sed expression like this?
echo "11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd" |
sed -e "s/\(.*\) .*:.*:\(.*\):.*/\1 \2/"
Result:
11111111-2222:3333:4444:555555555555 cccccccc

The following awk script does the job without relying on the format of the first column.
awk -F: 'BEGIN {RS=ORS=" "} NR==1; NR==2 {print $3}'
Use it in a pipe or pass the string as a file (simply append the filename as an argument) or as a here-string (append <<< "your string").
Explanation:
Instead of lines this awk script splits the input into space-separated records (RS=ORS=" "). Each record is subdivided into :-separated fields (-F:). The first record will be printed as is (NR==1;, that's the same as NR==1 {print $0}). In the second record, we will only print the 3rd field (NR==2 {print {$3}}); in case of the record aaa:bbb:ccc:ddd the 3rd field is ccc.

I think the answer from user803422 is better but here's another option. Maybe it'll help you use cut in the future.
str='11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd'
first=$(echo "$str" | cut -d ' ' -f1)
second=$(echo "$str" | cut -d ':' -f6)
echo "$first $second"

With pure Bash Regex:
str='11111111-2222:3333:4444:555555555555 aaaaaaaa:bbbbbbbb:cccccccc:dddddddd'
echo "$([[ $str =~ (.*\ ).*:.*:([^:]*) ]])${BASH_REMATCH[1]}${BASH_REMATCH[2]}"
Explanations:
[[ $str =~ (.*\ ).*:.*:([^:]* ]]: Match $str against the POSIX Extended RegEx (.*\ ).*:.*:([^:]*) witch contains two capture groups: 1: (.*\ ) 0 or more of any characters, followed by a space; and capture group 2: ([^:]*) witch contains any number of characters that are not :.
$([[ $str =~ (.*\ ).*:.*:([^:]*) ]]): execute the RegEx match in a sub-shell during the string value expansion. (here it produces no output, but the RegEx captured groups are referenced later).
${BASH_REMATCH[1]}${BASH_REMATCH[2]}: expand the content of the RegEx captured groups that Bash keeps in the dedicated $BASH_REMATCH array.

Ignore comma after backslash in a line in a text file using awk or sed

I have a text file containing several lines of the following format:
name,list_of_subjects,list_of_sports,school
Eg1: john,science\,social,football,florence_school
Eg2: james,painting,tennis\,ping_pong\,chess,highmount_school
I need to parse the text file and print the output of fields ignoring the escaped commas. Here those will be fields 2 or 3 like this:
science, social
tennis, ping_pong, chess
I do not know how to ignore escaped characters. How can I do it with awk or sed in terminal?

Substitute \, with a character that your records do not contain normally (e.g. \n), and restore it before printing. For example:
$ awk -F',' 'NR>1{ if(gsub(/\\,/,"\n")) gsub(/\n/,",",$2); print $2 }' file
science,social
painting
Since first gsub is performed on the whole record (i.e $0), awk is forced to recompute fields. But the second one is performed on only second field (i.e $2), so it will not affect other fields. See: Changing Fields.
To be able to extract multiple fields with properly escaped commas you need to gsub \ns in all fields with a for loop as in the following example:
$ awk 'BEGIN{ FS=OFS="," } NR>1{ if(gsub(/\\,/,"\n")) for(i=1;i<=NF;++i) gsub(/\n/,"\\,",$i); print $2,$3 }' file
science\,social,football
painting,tennis\,ping_pong\,chess
See also: What's the most robust way to efficiently parse CSV using awk?.

You could replace the \, sequences by another character that won't appear in your text, split the text around the remaining commas then replace the chosen character by commas :
sed $'s/\\\,/\31/g' input | awk -F, '{ printf "Name: %s\nSubjects : %s\nSports: %s\nSchool: %s\n\n", $1, $2, $3, $4 }' | tr $'\31' ','
In this case using the ASCII control char "Unit Separator" \31 which I'm pretty sure your input won't contain.
You can try it here.

Why awk and sed when bash with coreutils is just enough:
# Sorry my cat. Using `cat` as input pipe
cat <<EOF |
name,list_of_subjects,list_of_sports,school
Eg1: john,science\,social,football,florence_school
Eg2: james,painting,tennis\,ping_pong\,chess,highmount_school
EOF
# remove first line!
tail -n+2 |
# substitute `\,` by an unreadable character:
sed 's/\\\,/\xff/g' |
# read the comma separated list
while IFS=, read -r name list_of_subjects list_of_sports school; do
# read the \xff separated list into an array
IFS=$'\xff' read -r -d '' -a list_of_subjects < <(printf "%s" "$list_of_subjects")
# read the \xff separated list into an array
IFS=$'\xff' read -r -d '' -a list_of_sports < <(printf "%s" "$list_of_sports")
echo "list_of_subjects : ${list_of_subjects[#]}"
echo "list_of_sports : ${list_of_sports[#]}"
done
will output:
list_of_subjects : science social
list_of_sports : football
list_of_subjects : painting
list_of_sports : tennis ping_pong chess
Note that this will be most probably slower then solution using awk.
Note that the principle of operation is the same as in other answers - substitute \, string by some other unique character and then use that character to iterate over the second and third field elemetns.

This might work for you (GNU sed):
sed -E 's/\\,/\n/g;y/,\n/\n,/;s/^[^,]*$//Mg;s/\n//g;/^$/d' file
Replace quoted commas by newlines and then revert newlines to commas and commas to newlines. Remove all lines that do not contain a comma. Delete empty lines.

Using Perl. Change the \, to some control char say \x01 and then replace it again with ,
$ cat laxman.txt
john,science\,social,football,florence_school
james,painting,tennis\,ping_pong\,chess,highmount_school
$ perl -ne ' s/\\,/\x01/g and print ' laxman.txt | perl -F, -lane ' for(#F) { if( /\x01/ ) { s/\x01/,/g ; print } } '
science,social
tennis,ping_pong,chess

You can perhaps join columns with a function.
function joincol(col, i) {
$col=$col FS $(col+1)
for (i=col+1; i<NF; i++) {
$i=$(i+1)
}
NF--
}
This might get used thusly:
{
for (col=1; col<=NF; col++) {
if ($col ~ /\\$/) {
joincol(col)
}
}
}
Note that decrementing NF is undefined behaviour in POSIX. It may delete the last field, or it may not, and still be POSIX compliant. This works for me in BSDawk and Gawk. YMMV. May contain nuts.

Use gawk's FPAT:
awk -v FPAT='(\\\\.|[^,\\\\]*)+' '{print $3}' file
#list_of_sports
#football
#tennis\,ping_pong\,chess
then use gnusub to replace the backslashes:
awk -v FPAT='(\\\\.|[^,\\\\]*)+' '{print gensub("\\\\", "", "g", $3)}' file
#list_of_sports
#football
#tennis,ping_pong,chess

Read line by line from a text file and print how I want in shell scripting

I want to read below file line by line from a text file and print how I want in shell scripting
Text file content:
zero#123456
one#123
two#12345678
I want to print this as:
zero#1-6
one#1-3
two#1-8
I tried the following:
file="readFile.txt"
while IFS= read -r line
do echo "$line"
done <printf '%s\n' "$file"

Create a script like below: my_print.sh
file="readFile.txt"
while IFS= read -r line
do
one=$(echo $line| awk -F'#' '{print $1}') ## This splits the line based on '#' and picks the 1st value. So, we get zero from 'zero#123456 '
len=$(echo $line| awk -F'#' '{print $2}'|wc -c) ## This takes the 2nd value which is 123456 and counts the number of characters
two=$(echo $line| awk -F'#' '{print $2}'| cut -c 1) ## This picks the 1st character from '123456' which is 1
three=$(echo $line| awk -F'#' '{print $2}'| cut -c $((len-1))) ## This picks the last character from '123456' which is 6
echo $one#$two-$three ## This is basically printing the output in the format you wanted 'zero#1-6'
done <"$file"
Run it like:
mayankp#mayank:~/$ sh my_print.sh
mayankp#mayank:~/$ cat output.txt
zero#1-6
one#1-3
two#1-8
Let me know of this helps.

It's no shell scripting (missed that first, sorry) but using perl with combined lookahead and lookbehind for a number:
$ perl -pe 's/(?<=[0-9]).*(?=[0-9])/-/' file
Text file content:
zero#1-6
one#1-3
two#1-8
Explained some:
s//-/ replace with a -
(?<=[0-9]) positive lookbehind, if preceeded by a number
(?=[0-9]) positive lookahead, if followed by a number

With sed:
sed -r 's/^(.+)#([0-9])[0-9]*([0-9])\s*$/\1#\2-\3/' readFile.txt
-r: using extented regular expressions (just to write some stuff without escaping them by a backslash)
s/expr1/expr2/: substitute expr1 by expr2
epxr1 is described by a regular expression, relevant matching patterns are caught by 3 capturing groups (parenthesized ones).
epxr2 retrieves captured strings (\1, \2, \3) and insert them in a formatted output (the one you wanted).
Regular-Expressions.info seems to be interesting to start with them. Also you can check your own regexp with Regx101.com.
Update: Also you could do that with awk:
awk -F'#' '{ \
gsub(/\s*/,"", $2) ; \
print $1 "#" substr($2, 1, 1) "-" substr($2, length($2), 1) \
}' < test.txt
I added a gsub() call because your file seems to have trailing blank characters.

Use "cut" in shell script without space as delimiter

I'm trying to write a script that reads the file content below and extract the value in the 6th column of each line, then print each line without the 6th column. The comma is used as the delimiter.
Input:
123,456,789,101,145,5671,hello world,goodbye for now
223,456,789,101,145,5672,hello world,goodbye for now
323,456,789,101,145,5673,hello world,goodbye for now
What I did was
#!/bin/bash
for i in `cat test_input.txt`
do
COLUMN=`echo $i | cut -f6 -d','`
echo $i | cut -f1-5,7- -d',' >> test_$COLUMN.txt
done
The output I got was
test_5671.txt:
123,456,789,101,145,hello
test_5672.txt:
223,456,789,101,145,hello
test_5673.txt:
323,456,789,101,145,hello
The rest of "world, goodbye for now" was not written into the output files, because it seems like the space between "hello" and "world" was used as a delimiter?
How do I get the correct output
123,456,789,101,145,hello world,goodbye for now

It's not a problem with the cut command but with the for loop you're using. For the first loop run the variable i will only contain 123,456,789,101,145,5671,hello.
If you insist to read the input file line-by-line (not very efficient), you'd better use a read-loop like this:
while read i
do
...
done < test_input.txt

echo '123,456,789,101,145,5671,hello world,goodbye for now' | while IFS=, read -r one two three four five six seven eight rest
do
echo "$six"
echo "$one,$two,$three,$four,$five,$seven,$eight${rest:+,$rest}"
done
Prints:
5671
123,456,789,101,145,hello world,goodbye for now
See the man bash Parameter Expansion section for the :+ syntax (essentially it outputs a comma and the $rest if $rest is defined and non-empty).
Also, you shouldn't use for to loop over file contents.

As ktf mentioned, your problem is not with cut but with the way you're passing the lines into cut. The solution he/she has provided should work.
Alternatively, you could achieve the same behaviour with a line of awk:
awk -F, '{for(i=1;i<=NF;i++) {if(i!=6) printf "%s%s",$i,(i==NF)?"\n":"," > "test_"$6".txt"}}' test_input.txt
For clarity, here's a verbose version:
awk -F, ' # "-F,": using comma as field separator
{ # for each line in file
for(i=1;i<=NF;i++) { # for each column
sep = (i == NF) ? "\n" : "," # column separator
outfile = "test_"$6".txt" # output file
if (i != 6) { # skip sixth column
printf "%s%s", $i, sep > outfile
}
}
}' test_input.txt

an easy method id to use tr commende to convert the espace carracter into # and after doing the cat commande retranslate it into the espace.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Find and Replace with awk - bash

Related

Writing the output of a command to specific columns of a csv file, unix

How to extract two pieces of data from a string

Ignore comma after backslash in a line in a text file using awk or sed

Read line by line from a text file and print how I want in shell scripting

Use "cut" in shell script without space as delimiter

Categories

Resources