Removing newlines in a txt file - bash

I have a txt file in a format like this:
test1
test2
test3
How can I bring it into a format like this using bash?
test1,test2,test3

Assuming that “using Bash” means “without any external processes”:
if IFS= read -r line; then
printf '%s' "$line"
while IFS= read -r line; do
printf ',%s' "$line"
done
echo
fi

Old answer here
TL;DR:
cat "export.txt" | paste -sd ","

Another pure bash implementation that avoids explicit loops:
#!/usr/bin/env bash
file2csv() {
local -a lines
readarray -t lines <"$1"
local IFS=,
printf "%s\n" "${lines[*]}"
}
file2csv input.txt

You can use awk. If the file name is test.txt then
awk '{print $1}' ORS=',' test.txt | awk '{print substr($1, 1, length($1)-1)}'
The first awk commad joins the three lines with comma (test1,test2,test3,).
The second awk command just deletes the last comma from the string.

Use tool 'tr' (translate) and sed to remove last comma:
tr '\n' , < "$source_file" | sed 's/,$//'
If you want to save the output into a variable:
var="$( tr '\n' , < "$source_file" | sed 's/,$//' )"

Using sed:
$ sed ':a;N;$!ba;s/\n/,/g' file
Output:
test1,test2,test3
I think this is where I originally picked it up.

If you don't want a terminating newline:
$ awk '{printf "%s%s", sep, $0; sep=","}' file
test1,test2,test3
or if you do:
awk '{printf "%s%s", sep, $0; sep=","} END{print ""}' file
test1,test2,test3

Another loopless pure Bash solution:
contents=$(< input.txt)
printf '%s\n' "${contents//$'\n'/,}"
contents=$(< input.txt) is equivalent to contents=$(cat input.txt). It puts the contents of the input.txt file (with trailing newlines automatically removed) into the variable contents.
"${contents//$'\n'/,}" replaces all occurrences of the newline character ($'\n') in contents with the comma character. See Parameter expansion [Bash Hackers Wiki].
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why printf '%s\n' is used instead of echo.

Related

how to extract column by removing delimiter in bash script

Having a huge file in which columns are separated by |~| delimiter.
How to extract required number of columns using shell command ?
Lets say File looks like :
column1|~|column2|~|column3|~|column4|~|column5|~|column6|~|column7
and we want to extract column 4 and 5
awk -F "(|~|)" '{ print $4,$5 }' file
Set the field delimiter as "|~|" with -F and then print the 4th and 5th fields ($4,$5)
In plain bash:
#!/bin/bash
while IFS= read -r line; do
readarray -t fields <<< "${line//'|~|'/$'\n'}"
printf '%s %s\n' "${fields[3]}" "${fields[4]}"
done < file
or, with awk
awk -F '\\|~\\|' '{ print $4, $5 }' file
or, with GNU sed:
sed -E 's/\|~\|/\n/g; s/([^\n]*\n){3}(([^\n]*\n){2}).*/\2/; s/\n/ /g' file

Trim line to the first comma (bash)

I have a line from which I need to cut the branch name to the first comma:
commit 2bea9e0351dae65f18d2de11621049b465b1e868 (HEAD, origin/MGB-322, refs/pipelines/36877)
I need to cut out MGB-322.
The number of characters in a line is always different.
awk -F "origin/" '{print $2}' - this is how I cut out
MGB-322, refs/pipelines/36877)
But how to tell it to trim to the first comma?
I tried doing it via substr,
awk -F "origin/" '{print substr ($2,1, index $2 ,)}'
But it is not clear how to correctly specify the comma in index
With any awk. Use / and , as field separator:
awk '{print $3}' FS='[/,]' file
Output:
MGB-322
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
With OP's code fix: considered that you have only occurrence of origin in case you have more than occurrence then change $NF to $2 in following code. Written and tested in https://ideone.com/xjv2we
awk -F"origin/" '{print $NF}' Input_file
sed could be also helpful here, generic solution it's based on first occurrence of comma and / as per OP's thread title. I have written this on mobile so couldn't test it as of now should with though and will test it after sometime.
sed 's/\([^,]*\),\([^/]*\)\/\(.*\)/\3/' Input_file
"I need to cut out MGB-322."
You can use cut in two steps:
echo "${line}" | cut -d"/" -f2 | cut -d"," -f1
I would prefer one step with awk (already anwered by others) or sed
echo "${line}" | sed -r 's/.*origin.(.*), refs.*/\1/'
Why spawn procs? bash's built-in parameter parsing will handle this.
If
$: line="commit 2bea9e0351dae65f18d2de11621049b465b1e868 (HEAD, origin/MGB-322, refs/pipelines/36877)"
then
$: [[ "$line" =~ .*origin.(.*), ]] && echo "${BASH_REMATCH[1]}"
MGB-322
or maybe
$: tmp=${line#*, origin/}; echo ${tmp%,*}
MGB-322
or even
$: IFS=",/" read _ _ x _ <<< "$line" && echo $x
MGB-322
c.f. https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html

Passing two awk columns into a while read command

SampleFile:
Two Words,Extra
My code:
cat SampleFile | awk -F "," '{print $1" "$2}' | while read var1 var2;
do
echo $var1
done
This will print out only Two and var2 will take Words. Is there a way so that I can pass Two Words into var1?
You don't have to use awk for this. Bash has a built-in variable to determine where words are split:
while IFS=, read -r var1 var2; do
echo "$var1"
done < SampleFile
IFS is set to ,, so word splitting takes place at commas.
Instead of piping to the while loop, I use redirection, which has the advantage of not spawning a subshell.
A remark: you don't need cat in this context. Awk can take a file name as an argument, so
cat SampleFile | awk -F "," '{print $1" "$2}'
becomes
awk -F "," '{print $1, $2}' SampleFile
Also, when using print, you don't need to explicitly introduce spaces: if you comma-separate your fields, awk will replace the comma by the value of the OFS (output field separator) variable, which defaults to a space.
...| while read var1; do echo $var1 done

Search in CSV file and split each matching line using command-line tools

I'm using the following command:
grep -F "searchterm" source.csv >> output.csv
to search for matching terms in source.csv. Each line in the source file is like so:
value1,value2,value3|value4,value5
How do I insert only the fields value1,value2,value3 into the output file?
You can simply use awk which will go through line by line and then you apply the separator and get the part you would like to take from the string .
awk -F"|" '{print $1}' input.csv > output.csv
You can do it with a simple while read loop:
while read -r line; do echo ${line%|*}; done < file.csv >> newfile.csv
or in a subshell, so you truncate the newfile each time:
( while read -r line; do echo ${line%|*}; done < file.csv ) > newfile.csv
or with sed:
sed -e 's/[|].*$//' file.csv > newfile.csv
This perl solution is similar to the awk solution:
perl -F'\|' -lane 'print $F[0]' input.csv > output.csv
The | field separator character needs to be escaped with a \
-a puts perl into autosplit mode, which populates the fields array F

Awk: Drop last record separator in one-liner

I have a simple command (part of a bash script) that I'm piping through awk but can't seem to suppress the final record separator without then piping to sed. (Yes, I have many choices and mine is sed.) Is there a simpler way without needing the last pipe?
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd \
| uniq | awk '{IRS="\n"; ORS=","; print}'| sed s/,$//);
Without the sed, this produces output like echo,sierra,victor, and I'm just trying to drop the last comma.
You don't need awk, try:
egrep -o ....uniq|paste -d, -s
Here is another example:
kent$ echo "a
b
c"|paste -d, -s
a,b,c
Also I think your chained command could be simplified. awk could do all things in an one-liner.
Instead of egrep, uniq, awk, sed etc, all this can be done in one single awk command:
awk -F":" '!($1 in a){l=l $1 ","; a[$1]} END{sub(/,$/, "", l); print l}' /etc/password
Here is a small and quite straightforward one-liner in awk that suppresses the final record separator:
echo -e "alpha\necho\nnovember" | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=","
Gives:
alpha,echo,november
So, your example becomes:
dolls = $(egrep -o 'alpha|echo|november|sierra|victor|whiskey' /etc/passwd | uniq | awk 'y {print s} {s=$0;y=1} END {ORS=""; print s}' ORS=",");
The benefit of using awk over paste or tr is that this also works with a multi-character ORS.
Since you tagged it bash here is one way of doing it:
#!/bin/bash
# Read the /etc/passwd file in to an array called names
while IFS=':' read -r name _; do
names+=("$name");
done < /etc/passwd
# Assign the content of the array to a variable
dolls=$( IFS=, ; echo "${names[*]}")
# Display the value of the variable
echo "$dolls"
echo "a
b
c" |
mawk 'NF-= _==$NF' FS='\n' OFS=, RS=
a,b,c

Resources