How to print "\n" character using bash? - bash

I have a .csv file where some strings have some special characters like "\n" (new line).
I'm using this script to extract the data from column 1 and 3:
while IFS=";" read f1 f2 f3 f4
do
echo "\"$f1\" = \"$f3\";"
done < file.csv >file.txt
The main problem is that in some $f3 I have the \n special character and I need to print it.
At the moment, this script is omitting this character.
e.g. If I have
\nXPTO
it will print
XPTO
and I would expect that would print
\nXPTO
Thanks

awk to the rescue!
$ echo "a;b;\nXPTO;d" | awk -F';' '{print $1 "=" $3}'
a=\nXPTO
or with file in/out
$ awk ... input_file > output_file

Use read -r to prevent read from interpreting escape sequences:
while IFS=";" read -r f1 f2 f3 f4
do
echo "\"$f1\" = \"$f3\";"
done < file.csv >file.txt
Side Note: While it can be done with bash as I showed above, I agree with Karafka that awk is ideal for that kind of problems and performs very well. Better than bash itself, having that the input file has a significant size.

Related

Removing newlines in a txt file

I have a txt file in a format like this:
test1
test2
test3
How can I bring it into a format like this using bash?
test1,test2,test3
Assuming that “using Bash” means “without any external processes”:
if IFS= read -r line; then
printf '%s' "$line"
while IFS= read -r line; do
printf ',%s' "$line"
done
echo
fi
Old answer here
TL;DR:
cat "export.txt" | paste -sd ","
Another pure bash implementation that avoids explicit loops:
#!/usr/bin/env bash
file2csv() {
local -a lines
readarray -t lines <"$1"
local IFS=,
printf "%s\n" "${lines[*]}"
}
file2csv input.txt
You can use awk. If the file name is test.txt then
awk '{print $1}' ORS=',' test.txt | awk '{print substr($1, 1, length($1)-1)}'
The first awk commad joins the three lines with comma (test1,test2,test3,).
The second awk command just deletes the last comma from the string.
Use tool 'tr' (translate) and sed to remove last comma:
tr '\n' , < "$source_file" | sed 's/,$//'
If you want to save the output into a variable:
var="$( tr '\n' , < "$source_file" | sed 's/,$//' )"
Using sed:
$ sed ':a;N;$!ba;s/\n/,/g' file
Output:
test1,test2,test3
I think this is where I originally picked it up.
If you don't want a terminating newline:
$ awk '{printf "%s%s", sep, $0; sep=","}' file
test1,test2,test3
or if you do:
awk '{printf "%s%s", sep, $0; sep=","} END{print ""}' file
test1,test2,test3
Another loopless pure Bash solution:
contents=$(< input.txt)
printf '%s\n' "${contents//$'\n'/,}"
contents=$(< input.txt) is equivalent to contents=$(cat input.txt). It puts the contents of the input.txt file (with trailing newlines automatically removed) into the variable contents.
"${contents//$'\n'/,}" replaces all occurrences of the newline character ($'\n') in contents with the comma character. See Parameter expansion [Bash Hackers Wiki].
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why printf '%s\n' is used instead of echo.

Convert text file into a comma delimited string

I don't seem to locate an SO question that matches this exact problem.
I have a text file that has one text token per line, without any commas, tabs, or quotes. I want to create a comma delimited string based on the file content.
Input:
one
two
three
Output:
one,two,three
I am using this command:
csv_string=$(tr '\n' ',' < file | sed 's/,$//')
Is there a more efficient way to do this?
The usual command to do this is paste
csv_string=$(paste -sd, file.txt)
You can do it entirely with bash parameter expansion operators instead of using tr and sed.
csv_string=$(<file) # read file into variable
csv_string=${csv_string//$'\n'/,} # replace \n with ,
csv_string=${csv_string%,} # remove trailing comma
One way with Awk would be to reset the RS and treat the records as separated by blank lines. This would handle words with spaces and format them in CSV format as expected.
awk '{$1=$1}1' FS='\n' OFS=',' RS= file
The {$1=$1} is a way to reconstruct the fields in each line($0) of the file based on modifications to Field (FS/OFS) and/or Record separators(RS/ORS). The trailing 1 is to print every line with the modifications done inside {..}.
With Perl one-liner:
$ cat csv_2_text
one
two
three
$ perl -ne '{ chomp; push(#lines,$_) } END { $x=join(",",#lines); print "$x" }' csv_2_text
one,two,three
$ perl -ne ' { chomp; $_="$_," if not eof ;printf("%s",$_) } ' csv_2_text
one,two,three
$
From #codeforester
$ perl -ne 'BEGIN { my $delim = "" } { chomp; printf("%s%s", $delim, $_); $delim="," } END { printf("\n") }' csv_2_text
one,two,three
$
Tested the four approaches on a Linux box - Bash only, paste, awk, Perl, as well as the tr | sed approach shown in the question:
#!/bin/bash
# generate test data
seq 1 10000 > test.file
times=${1:-50}
printf '%s\n' "Testing paste solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(paste -sd, test.file)
done
}
printf -- '----\n%s\n' "Testing pure Bash solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(<test.file) # read file into variable
csv_string=${csv_string//$'\n'/,} # replace \n with ,
csv_string=${csv_strings%,} # remove trailing comma
done
}
printf -- '----\n%s\n' "Testing Awk solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(awk '{$1=$1}1' FS='\n' OFS=',' RS= test.file)
done
}
printf -- '----\n%s\n' "Testing Perl solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(perl -ne '{ chomp; $_="$_," if not eof; printf("%s",$_) }' test.file)
done
}
printf -- '----\n%s\n' "Testing tr | sed solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(tr '\n' ',' < test.file | sed 's/,$//')
done
}
Surprisingly, the Bash only solution does quite poorly. paste comes on top, followed by tr | sed, Awk, and perl:
Testing paste solution
real 0m0.109s
user 0m0.052s
sys 0m0.075s
----
Testing pure Bash solution
real 1m57.777s
user 1m57.113s
sys 0m0.341s
----
Testing Awk solution
real 0m0.221s
user 0m0.152s
sys 0m0.077s
----
Testing Perl solution
real 0m0.424s
user 0m0.388s
sys 0m0.080s
----
Testing tr | sed solution
real 0m0.162s
user 0m0.092s
sys 0m0.141s
For some reasons, csv_string=${csv_string//$'\n'/,} hangs on macOS Mojave running Bash 4.4.23.
Related posts:
How to join multiple lines of file names into one with custom delimiter?
Concise and portable “join” on the Unix command-line
Turning multi-line string into single comma-separated

output of oddlines in sed not appearing on separate lines

I have the following file:
>A6NGG8_201_I_F
line2
>B1AK53_719_S_R
line4
>B1AK53_744_D_N
line5
>B7U540_205_R_H
line6
>B7U540_354_T_M
line7
where I want to print out all odd lines. I can do this by:
$ sed -n 1~2p file
>A6NGG8_201_I_F
>B1AK53_719_S_R
>B1AK53_744_D_N
>B7U540_205_R_H
>B7U540_354_T_M
and so I want to store the number in each line as a variable in bash, however I run into a problem - storing the result of sed puts the output all on one line:
#!/bin/bash
line1=$(sed -n 1~2p)
echo ${line1}
in which the output is:
>A6NGG8_201_I_F >B1AK53_719_S_R >B1AK53_744_D_N >B7U540_205_R_H >B7U540_354_T_M
so that when I do something like:
#!/bin/bash
line1=$(sed -n 1~2p)
pos=$(echo ${line1} | awk -F"[__]" 'NF>2{print $2}')
echo ${pos}
I get
201
where I of course want:
201
719
744
205
354
How do I store the result of sed into separate lines so that they are processed properly when piped into my awk statement? I see you can use the /anotation, however when I tried sed -n '/1~2p/a' filethis does not work in my bash script. Thanks
As said in comments, you need to quote the variable to make this happen:
echo "${line1}"
instead of
echo ${line1}
However, you can directly say:
awk -F_ 'NR%2 && NF>2 {print $2}' file
This will process even lines and, in them, print the 2nd field on _ separated, just if it there are more than 2 fields.
From tripleee's answer I observe that a FASTA file can contain a different format. If so, I guess you will still want to get the ID in the lines starting with ">". This can be translated as:
awk -F_ '/^>/ && NF>2 {print $2}' file
See an example of how quoting preserves the format:
The file:
$ cat a
hello
bye
Read it into a variable:
$ var=$(< a)
echo without quoting:
$ echo $var
hello bye
Let's quote!
$ echo "$var"
hello
bye
If you are trying to get the header lines out of a FASTA file, your problem statement is wrong -- the data between the headers could be more than one line. You could simply do
sed -n '/^>/!d;s/^[^_]*//;s/_.*//p' file.fasta
to get just the second underscore-delimited field out of each header line; or equivalently, in Awk,
awk -F _ '/^>/ { print $2 }' file.fasta

print the first word of the line twice, then the rest of the line

I have the following file:
F1
This is the first line.
And the second line.
I want the output to be:
This This This is the first line.
And And And the second line.
I run the following command:
sed -re 's/([^ ]+).*/\1 \1/' F1
It does print twice the first word of the line like this:
This This
And And
but I don't know how to print the whole line afterwords.
This This This is the first line.
And And And the second line.
I need it as a 'sed' command.
You can also use awk for this (and it should make it a bit more readable):
awk '{print $1, $1, $0}' F1
Like this:
sed -re 's/([^ ]+)(.*)/\1 \1 \1\2/' F1
sed -re 's/([^ ]+).*/\1 \1 \0/' F1
Using Sed with & command
sed -re 's/([^ ]+)/& \1 \1/' F1
Pretty easy with pure bash
while read -r first rest; do
printf '%s %s %s %s\n' "$first" "$first" "$first" "$rest"
done < F1

Why awk '{ print }' doesn't start a new line but loops on space char

I have this shell script
#!/bin/bash
LINES=$(awk '{ print }' filename.txt)
for LINE in $LINES; do
echo "$LINE"
done
And filename.txt has this content
Loreum ipsum dolores
Loreum perche non se imortale
The shell script is iterating all spaces of the lines in filename.txt while it is supposed to loop only those two lines.
But when I type the "awk '{ print }' filename.txt" in terminal then it loops ok.
Any explanations?
Thanks in advance!
The $(...) construct absorbs all the output from awk as one large string, and then for LINE in $LINES splits on whitespace. You want this construct instead:
#! /bin/sh
while read LINE; do
printf '%s\n' "$LINE"
done < filename.txt
The other answers are good, another thing you can do is temporarily change your IFS (Internal Field Separator) variable. If you update your shell script to look like this:
#!/bin/bash
IFS="
"
LINES=$(awk '{ print }' filename.txt)
for LINE in $LINES; do
echo "$LINE"
done
This updates the IFS to be a newline instead of ' ' which should also do what you want.
Just another suggestion.
You need to loop over LINES as an array as all lines are stored as an array there.
Here's an example how to loop over the lines:
http://tldp.org/LDP/abs/html/arrays.html#SCRIPTARRAY

Resources