print the first word of the line twice, then the rest of the line - bash

I have the following file:
F1
This is the first line.
And the second line.
I want the output to be:
This This This is the first line.
And And And the second line.
I run the following command:
sed -re 's/([^ ]+).*/\1 \1/' F1
It does print twice the first word of the line like this:
This This
And And
but I don't know how to print the whole line afterwords.
This This This is the first line.
And And And the second line.
I need it as a 'sed' command.

You can also use awk for this (and it should make it a bit more readable):
awk '{print $1, $1, $0}' F1

Like this:
sed -re 's/([^ ]+)(.*)/\1 \1 \1\2/' F1

sed -re 's/([^ ]+).*/\1 \1 \0/' F1

Using Sed with & command
sed -re 's/([^ ]+)/& \1 \1/' F1

Pretty easy with pure bash
while read -r first rest; do
printf '%s %s %s %s\n' "$first" "$first" "$first" "$rest"
done < F1

Related

Removing newlines in a txt file

I have a txt file in a format like this:
test1
test2
test3
How can I bring it into a format like this using bash?
test1,test2,test3
Assuming that “using Bash” means “without any external processes”:
if IFS= read -r line; then
printf '%s' "$line"
while IFS= read -r line; do
printf ',%s' "$line"
done
echo
fi
Old answer here
TL;DR:
cat "export.txt" | paste -sd ","
Another pure bash implementation that avoids explicit loops:
#!/usr/bin/env bash
file2csv() {
local -a lines
readarray -t lines <"$1"
local IFS=,
printf "%s\n" "${lines[*]}"
}
file2csv input.txt
You can use awk. If the file name is test.txt then
awk '{print $1}' ORS=',' test.txt | awk '{print substr($1, 1, length($1)-1)}'
The first awk commad joins the three lines with comma (test1,test2,test3,).
The second awk command just deletes the last comma from the string.
Use tool 'tr' (translate) and sed to remove last comma:
tr '\n' , < "$source_file" | sed 's/,$//'
If you want to save the output into a variable:
var="$( tr '\n' , < "$source_file" | sed 's/,$//' )"
Using sed:
$ sed ':a;N;$!ba;s/\n/,/g' file
Output:
test1,test2,test3
I think this is where I originally picked it up.
If you don't want a terminating newline:
$ awk '{printf "%s%s", sep, $0; sep=","}' file
test1,test2,test3
or if you do:
awk '{printf "%s%s", sep, $0; sep=","} END{print ""}' file
test1,test2,test3
Another loopless pure Bash solution:
contents=$(< input.txt)
printf '%s\n' "${contents//$'\n'/,}"
contents=$(< input.txt) is equivalent to contents=$(cat input.txt). It puts the contents of the input.txt file (with trailing newlines automatically removed) into the variable contents.
"${contents//$'\n'/,}" replaces all occurrences of the newline character ($'\n') in contents with the comma character. See Parameter expansion [Bash Hackers Wiki].
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why printf '%s\n' is used instead of echo.

sed command within a while loop doesn't write output

I have this input file
gb|KY798440.1|
gb|KY842329.1|
MG082893.1
MG173246.1
and I want to get all the characters that are between the "|" or the full line if there is no "|". That is a desired output that looks like
KY798440.1
KY842329.1
MG082893.1
MG173246.1
I wrote:
while IFS= read -r line; do
if [[ $line == *\|* ]] ; then
sed 's/.*\|\(.*\)\|.*/\1/' <<< $line >> output_file
else echo $line >> output_file
fi
done < input_file
Which gives me
empty line
empty line
MG082893.1
MG173246.1
(note: empty line means an actual empty line - it doesn't actually writes "empty line")
The sed command works on a single example (i.e. sed 's/.*\|\(.*\)\|.*/\1/' <<< "gb|KY842329.1|" outputs KY842329.1) but within the loop it just does a line return. The else echo $line >> output_file seems to work.
Bare sed:
$ sed 's/^[^|]*|\||[^|]*$//g' file
Output:
KY798440.1
KY842329.1
MG082893.1
MG173246.1
You could do
sed '/|/s/[^|]*|\([^|]*\)|.*/\1/' input
or
awk 'NF>1 {print $2} NF < 2 { print $1}' FS=\| input
or
sed -e 's/[^|]*|//' -e 's/|.*//' input

Sed remove selected line to file using shell script variable

I have shell script variable var="7,8,9"
These are the line number use to delete to file using sed.
Here I tried:
sed -i "$var"'d' test_file.txt
But i got error `sed: -e expression #1, char 4: unknown command: ,'
Is there any other way to remove the line?
sed command doesn't accept comma delimited line numbers.
You can use this awk command that uses a bit if BASH string manipulation to form a regex with the given comma separated line numbers:
awk -v var="^(${var//,/|})$" 'NR !~ var' test_file.txt
This will set awk variable var as this regex:
^(7|8|9)$
And then condition NR !~ var ensures that we print only those lines that don't match above regex.
For inline editing, if you gnu-awk with version > 4.0 then use:
awk -i inplace -v var="^(${var//,/|})$" 'NR !~ var' test_file.txt
Or for older awk use:
awk -v var="^(${var//,/|})$" 'NR !~ var' test_file.txt > $$.tmp && mv $$.tmp test_file.txt
I like sed, you were close to it. You just need to split each line number into a separate command. How about this:
sed -e "$(echo 1,3,4 | tr ',' '\n' | while read N; do printf '%dd;' $N; done)"
do like this:
sed -i "`echo $var|sed 's/,/d;/g'`d;" file
Another option to consider would be ed, with printf '%s\n' to put commands onto separate lines:
lines=( 9 8 7 )
printf '%s\n' "${lines[#]/%/d}" w | ed -s file
The array lines contains the line numbers to be deleted; it's important to put these in descending order! The expansion ${lines[#]/%/d} adds a d (delete) command to each line number and w writes to the file at the end. You can change this to ,p instead, to check the output before overwriting your file.
As an aside, for this example, you could also just use 7,9 as a single entry in the array.

How to print "\n" character using bash?

I have a .csv file where some strings have some special characters like "\n" (new line).
I'm using this script to extract the data from column 1 and 3:
while IFS=";" read f1 f2 f3 f4
do
echo "\"$f1\" = \"$f3\";"
done < file.csv >file.txt
The main problem is that in some $f3 I have the \n special character and I need to print it.
At the moment, this script is omitting this character.
e.g. If I have
\nXPTO
it will print
XPTO
and I would expect that would print
\nXPTO
Thanks
awk to the rescue!
$ echo "a;b;\nXPTO;d" | awk -F';' '{print $1 "=" $3}'
a=\nXPTO
or with file in/out
$ awk ... input_file > output_file
Use read -r to prevent read from interpreting escape sequences:
while IFS=";" read -r f1 f2 f3 f4
do
echo "\"$f1\" = \"$f3\";"
done < file.csv >file.txt
Side Note: While it can be done with bash as I showed above, I agree with Karafka that awk is ideal for that kind of problems and performs very well. Better than bash itself, having that the input file has a significant size.

shell loop match a regex in the current line

I'm trying to create a script to fix a csv file like this:
field_one,field_two,field_three
,field_two,field_three
So I need to check inside my loop if the current line is missing field_one and replace it with sed with a new value for field_one (overwrite the line missing field_one).
For this i have a loop but i need some help with identifying if the line is missing field one or not. I should probably use grep? but how to use it in a loop and get its response?
while read -r line; do
# this is pseudocode:
# if $line matches regex then
# sed 's/,/newfieldone/'
# overwrite the corrected line in the file
# end if
done < my_file
Thanks a lot in advance for your help!!!!
Inside your loop you can run following sed command:
sed 's/^\s*,/newfieldone,/'
To see if a line begins with a , and is hence missing field one, you can use if [[ "$line" =~ ^, ]].
For example:
while read -r line; do
if [[ "$line" =~ ^, ]]
then
echo "newfieldone$line"
else
echo "$line"
fi
done < my_file
Just for the heck of it, here's a solution in awk:
awk '{FS=","} {if ($1 == "") print "field_one" $0;else print $0} ' < /tmp/test.txt
$ sed -e "/^,/s/^,\([^,]*\),\([^,]\)/new_field_one,\1,\2/" < my_file
Edit: This probably is too complicated. Take one of the other fine answers :)
with sed try something like that:
sed -i 's|\(^,.*\)|new_field_one\1|g' <your file>
This might work for you:
a=Field_one,Field_two,Field_three
sed '/^,/c\'$a'' file
field_one,field_two,field_three
Field_one,Field_two,Field_three
Or if just inserting field_one:
a=Field_one
sed '/^,/s/^/'$a'/' file
field_one,field_two,field_three
Field_one,field_two,field_three
Simple bash solution using case statemetn:
while read -r line; do
case "$line" in
,*) printf "%s%s\n" newfieldone "$line" ;;
*) printf "%s\n" "$line" ;;
esac
done < my_file
case uses "glob" matching, not regular expressions, so ,* matches a string beginning with a comma.
sed -i 's/^,/fieldone,/' YOURFILE
Will replace every line starting , with fieldone, (inplace, so the original file gets overwritten, if you need a backup, try -i.backup).
If you want a dynamic fieldone value, well it depends, how dynamic want it to be :-), e.g.:
MYDYNAMICFIELDONE="DYNAF1"
sed -i "s/^,/${MYDYNAMICFIELDONE},/" YOURFILE
Or with your while loop:
while read -r line; do
MYDYNAMICFIELDONE="SET IT"
sed -i "s/^,/${MYDYNAMICFIELDONE},/"
done < my_file > tmpfile
mv tmpfile my_file
Or with awk:
awk '{
/^,/ {
DYNAF1="SET IT HERE"
print gensub("^,",DYNAF1 ",","g",$0)
}
} INPUT > OUTPUT
This is a pretty short 1-liner with awk
awk '{$1="field_one"}1' FS=',' OFS=',' file.csv
. . . and another awk one-liner:
awk '$1==""{$1="field_one"}1' FS=',' OFS=',' file
What about the use of bash only
while IFS=\, read field_one field_two rest_of_line
echo "${field_one:-default_field_one_value},$field_two,$rest_of_line"
doen < my_file > my_corecct_file
where the 'default_field_one_value' is used if the 'field_one' is empty

Resources