convert white space to tab on first line of a tab delimited file - shell

I have multiple tab delimited files with the same column headers. However, the headers (1st row of the files) are delimited by white spaces instead of tabs. How can I convert the white space to tab on first line of a tab delimited file?

You can use sed for one line only:
sed -i.bak $'1s/ /\t/g' file.csv

Sounds like you can use awk:
awk -v OFS='\t' 'NR == 1 { $1 = $1 } 1' file
Assigning the first field of the first line $1 to itself causes awk to reformat the line, inserting the output field separator OFS (defined as a tab character). 1 is the shortest true condition, so awk does the default: { print } for every line.
To overwrite "in-place", use a temp file:
awk -v OFS='\t' 'NR == 1 { $1 = $1 } 1' file > tmp && mv tmp file
Note that this will interpret any number of spaces as a single field separator.

Related

grep few columns from a file to another file in shell

The following file is present in file1.txt:
mudId|~|mudType|~|mudNAme|~|mudDate|~|mudEndDate
100|~|Balance|~|Abc|~|21-09-2020|~|22-09-2020
101|~|Clone|~|Bcd|~|11-07-2020|~|12-07-2020
102|~|Ledger|~|Def|~|12-06-2019|~|13-06-2019
How to grep only the columns mudId, mudType and mudDate with all the rows into another file?
The columns are separated by |~|
To meet your criteria of specifying the field names from the heading row, you can use awk utilizing a Regular Expression as the Field-Separator variable (e.g. "[|][~][|]"). For the first record (line), read the field names as array indexes and set the value to the current field index. For your second rule, simply output the field value captured in your array that corresponds to the strings "mudId", "mudType" and "mudDate".
For example you can do:
awk '
BEGIN { FS="[|][~][|]"; OFS="|~|" }
FNR==1 { for(i=1;i<=NF;i++) arr[$i]=i; next }
{ print $arr["mudId"], $arr["mudType"], $arr["mudDate"] }
' file
(note: the above intentionally generalizes to meet your criteria where you want to specify the string names of the fields to output)
If you simply want to write fields 1, 2, & 4 to a new file, you would do:
awk -v FS="[|][~][|]" -v OFS="|~|" 'FNR>1 {print $1,$2,$4}' file
Example Use/Output
Simply copy/middle-mouse paste the above into an xterm where file is in the current directory, e.g.
$ awk '
> BEGIN { FS="[|][~][|]"; OFS="|~|" }
> FNR==1 { for(i=1;i<=NF;i++) arr[$i]=i; next }
> { print $arr["mudId"], $arr["mudType"], $arr["mudDate"] }
> ' file
100|~|Balance|~|21-09-2020
101|~|Clone|~|11-07-2020
102|~|Ledger|~|12-06-2019
(note: if you want the new file space-delimited, just remove OFS="|~|")
or
$ awk -v FS="[|][~][|]" -v OFS="|~|" 'FNR>1 {print $1,$2,$4}' file
100|~|Balance|~|21-09-2020
101|~|Clone|~|11-07-2020
102|~|Ledger|~|12-06-2019
To write the contents to a new filename, just redirect the output to a new filename (e.g. for the last line above, add ' file > newfile)
Look things over and let me know if you have further questions.
If the column is fixed by mudId|~|mudType|~|mudNAme|~|mudDate|~|mudEndDate, try this:
sed 's/|~|/\t/g' file1.txt | awk '{print $1"|~|"$2"|~|"$4}'
you should change \t to other character which will not occur in your file1.txt if the \t would exist in file1.txt, and then add -F'\t' after awk.

how to replace a string at a specific position in a csv file using bash

I have several .csv files and each csv file has lines which look like this.
AA,1,CC,1,EE
AA,FF,6,7,8,9
BB,6,7,8,99,AA
I am reading through each line of each csv file and then trying to replace the 4th position of each line beginning with AA with "ZZ"
Expected output
AA,1,CC,ZZ,EE
EE,FF,6,ZZ,8,9
BB,6,7,8,99,AA
However the variable "y" does contain the 4th variable "1" and "7" respectively, but when I use sed command it replaces the first occurrence of "1" with "ZZ".
How do I modify my code to replace only the 4th position of each line irrespective of what value it holds?
My code looks like this
$file = "name of file which contains list of all csv files"
for i in `cat file`
while IFS = read -r line;
do
if [[ $line == AA* ]] ; then
y=$(echo "$line" | cut -d',' -f 4)
sed -i "s/${y}/ZZ/" $i
fi
done < $i
Using sed, you can also direct that only the 4th field of a comma separated values file be changed to "ZZ" for lines beginning "AA" with:
sed -i '/^AA/s/[^,][^,]*/ZZ/4' file
Explanation
sed -i call sed to edit file in place;
general form /find/s/match/replace/occurrence; where
find is /^AA/ line beginning with "AA";
match [^,][^,]* a character not a comma followed by any number of non-commas;
replace /ZZ/4 the 4th occurrence of match with "ZZ".
Note, both awk and sed provide good solutions in this case so see the answers by #perreal and #RavinderSingh13
Example Input File
$ cat file
AA,1,CC,1,EE
AA,FF,6,7,8,9
BB,6,7,8,99,AA
Example Use/Output
(note: -i not used below so the changes are simply output to stdout)
$ sed '/^AA/s/[^,][^,]*/ZZ/4' file
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,8,99,AA
To robustly do this is just:
$ awk 'BEGIN{FS=OFS=","} $1=="AA"{$4="ZZ"} 1' csv
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,8,99,AA
Note that the above is doing a literal string comparison and a literal string replacement so unlike the other solutions posted so far it won't fail if the target string (AA in this example) contains regexp metachars like . or *, nor if it can be part of another string like AAX, nor if the replacement string (ZZ in this example) contains backreferences like & or \1.
If you want to map multiple strings in one pass:
$ awk 'BEGIN{FS=OFS=","; m["AA"]="ZZ"; m["BB"]="FOO"} $1 in m{$4=m[$1]} 1' csv
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,FOO,99,AA
and just like GNU sed has -i for "inplace" editing, GNU awk has -i inplace, so you can discard the shell loop and just do:
awk -i inplace '
BEGIN { FS=OFS="," }
(NR==FNR) { ARGV[ARGC++]=$0 }
(NR!=FNR) && ($1=="AA") { $4="ZZ" }
{ print }
' file
and it'll operate on all of the files named in file in one call to awk. "file" in that last case is your file containing a list of other CSV file names.
EDIT1: Since OP has changed requirement a bit do adding following now.
awk 'BEGIN{FS=OFS=","} /^AA/||/^BB/{$4="ZZ"} /^CC/||/^DD/{$5="NEW_VALUE"} 1' Input_file > temp_file && mv temp_file Input_file
Could you please try following.
awk -F, '/^AA/{$4="ZZ"} 1' OFS=, Input_file > temp_file && mv temp_file Input_file
OR
awk 'BEGIN{FS=OFS=","} /^AA/{$4="ZZ"} 1' Input_file > temp_file && mv temp_file Input_file
Explanation: Adding explanation to above code too now.
awk '
BEGIN{ ##Starting BEGIN section of awk which will be executed before reading Input_file.
FS=OFS="," ##Setting field separator and output field separator as comma here for all lines of Input_file.
} ##Closing block for BEGIN section of this program.
/^AA/{ ##Checking condition if a line starts from string AA then do following.
$4="ZZ" ##Setting 4th field as ZZ string as per OP.
} ##Closing this condition block here.
1 ##By mentioning 1 we are asking awk to print edited or non-edited line of Input_file.
' Input_file ##Mentioning Input_file name here.
Using sed:
sed -i 's/\(^AA,[^,]*,[^,]*,\)[^,]*/\1ZZ/' input_file

Unix Shell Scripting-how can i remove particular characers inside a text file?

I have an one text file. This file has 5 rows and 5 columns. All the columns are separated by "|" (symbol). In that 2nd column(content) length should be 7 characters.
If 2nd column length is more than 7 characters. Then,I want to remove those extra characters without opening that file.
For example:
cat file1
ff|hahaha1|kjbsb|122344|jbjbnjuinnv|
df|hadb123_udcvb|sbfuisdbvdkh|122344|jbjbnjuinnv|
gf|harayhe_jnbsnjv|sdbvdkh|12234|jbjbnj|
qq|kkksks2|datetag|7777|jbjbnj|
jj|harisha|hagte|090900|hags|
For the above case 2nd and 3rd rows having 2nd column length is more than 7 characters. Now i want to remove those extra characters without open the input file using awk or sed command
I'm waiting for your responses guys.
Thanks in advance!!
Take a substring of length 7 from the second column with awk:
awk -F'|' -v OFS='|' '{ $2 = substr($2, 1, 7) }1' file
Now any strings longer than 7 characters will be made shorter. Any strings that were shorter will be left as they are.
The 1 at the end is the shortest true condition to trigger the default action, { print }.
If you're happy with the changes, then you can overwrite the original file like this:
awk -F'|' -v OFS='|' '{ $2 = substr($2, 1, 7) }1' file > tmp && mv tmp file
i.e. redirect to a temporary file and then overwrite the original.
First try
sed 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1
What is happening here? We construct the command step-by-step:
# Replace something
sed 's/hadb123_udcvb/replaced/' file1
# Remember the matched string (will be used in a later command)
sed 's/\(hadb123_udcvb\)/replaced/' file1
# Replace a most 7 characters without a '|' (one time each line)
sed 's/\([^|]\{7\}\)/replaced/' file1
# Remove additional character until a '|'
sed 's/\([^|]\{7\}\)[^|]*/replaced/' file1
# Put back the string you remembered
sed 's/\([^|]\{7\}\)[^|]*/\1/' file1
# Extend teh matched string with Start-of-line (^), any-length first field, '|'
sed 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1
When this shows the desired output, you can add the option -i for changing the input file:
sed -i 's/\(^[^|]*|[^|]\{7\}\)[^|]*/\1/' file1

How to alter number of columns [with awk] only if a string is in the 1st column of the line while printing changed line and whole text

I want to replace the number of columns, only use 1st and last one for each line containing a >.
But then I want to print the whole file again, with the changed lines like this.
>TRF [name1]
AAAAAAAAAAAAAAAAAAAAAAAAAAATTGGA
ATGGGGGGGGGGGGGGGGGGGGGGGGGC
I have tried with this code but it only returns the changed lines. Thanks.
awk '$1 ~ />/ { print $1" "$NF}' file
You can use:
awk '$1 ~ />/ { $0 = $1 " " $NF} 1' file
Default action 1 in the end will print all lines from input.

How to replace the empty place with next line content in shell script

1,n1,abcd,1234
2,n2,abrt,5666
,h2,yyyy,123x
3,h2,yyyy,123y
3,h2,yyyy,1234
,k1,yyyy,5234
4,22,yyyy,5234
the above given is my input file abc.txt , all I want the missing first column value should fill with next row first value.
example:
3,h2,yyyy,123x
3,h2,yyyy,123y
I want output like below,
1,n1,abcd,1234
2,n2,abrt,5666
3,h2,yyyy,123x// the missing first column value 3 should fill with second row first value
3,h2,yyyy,123y
3,h2,yyyy,1234
4,k1,yyyy,5234
4,22,yyyy,5234
How to implement this with help of AWK or some other alternate in shell script,please help.
Using awk you can do:
awk -F, '$1 ~ /^ *$/ {
p=p RS $0
next
}
p!="" {
gsub(RS " +", RS $1, p)
sub("^" RS, "", p)
print p
p=""
} 1' file
1,n1,abcd,1234
2,n2,abrt,5666
3,h2,yyyy,123x
3,h2,yyyy,123y
3,h2,yyyy,1234
4,k1,yyyy,5234
4,22,yyyy,5234
I would reverse the file, and then replace the value from the previous line:
tac filename | awk -F, '$1 ~ /^[[:blank:]]*$/ {$1 = prev} {print; prev=$1}' | tac
This will also fill in missing values on multiple lines.
With GNU sed:
$ sed '/^ ,/{N;s/ \(.*\n\)\([^,]*\)\(.*\)/\2\1\2\3/}' infile
1,n1,abcd,1234
2,n2,abrt,5666
3,h2,yyyy,123x
3,h2,yyyy,123y
3,h2,yyyy,1234
4,k1,yyyy,5234
4,22,yyyy,5234
The sed command does the following:
/^ ,/ { # If the line starts with 'space comma'
N # Append the next line
# Extract the value before the comma, prepend to first line
s/ \(.*\n\)\([^,]*\)\(.*\)/\2\1\2\3/
}
BSD sed would require an extra semicolon before the closing brace.
This only works with non-contiguous lines with missing values.

Resources