Unix command to convert multiple line data in a single line along with delimiter - shell

Here is the actual file data:
abc
def
ghi
jkl
mno
And the required output should be in this format:
'abc','def','ghi','jkl','mno'
The command what I used to do this gives output as:
abc,def,ghi,jkl,mno
The command is as follows:
sed -n 's/[0-3]//;s/ //;p' Split_22_05_2013 | \
awk -v ORS= '{print $0" ";if(NR%4==0){print "\n"}}'

In response to sudo_O's comment I add an awk less solution in pure bash. It does not exec any program at all. Of course instead of <<XXX ... XXX (here-is-the-document) stuff one could add <filename.
set c=""
while read w; do
echo -e "$c'$w'\c"
c=,
done<<XXX
abc
def
ghi
jkl
mno
XXX
Output:
'abc','def','ghi','jkl','mno'
An even shorter version:
printf -v out ",'%s'" $(<infile)
echo ${out:1}
Without the horrifying pipe snakes You can try something like this:
awk 'NR>1{printf ","}{printf "\x27%s\x27",$0}' <<XXX
abc
def
ghi
jkl
mno
XXX
Output:
'abc','def','ghi','jkl','mno'
Or an other version which reads the whole input as one line:
awk -vRS="" '{gsub("\n","\x27,\x27");print"\x27"$0"\x27"}'
Or a version which lets awk uses the internal variables more
awk -vRS="" -F"\n" -vOFS="','" -vORS="'" '{$1=$1;print ORS $0}'
The $1=$1; is needed to tell to awk to repack $0 using the new field and record separators (OFS, ORS).

$ cat test.txt
abc
def
ghi
jkl
mno
$ cat test.txt | tr '\n' ','
abc,def,ghi,jkl,mno,
$ cat test.txt | awk '{print "\x27" $1 "\x27"}' | tr '\n' ','
'abc','def','ghi','jkl','mno',
$ cat test.txt | awk '{print "\x27" $1 "\x27"}' | tr '\n' ',' | sed 's/,$//'
'abc','def','ghi','jkl','mno'
The last command can be shortened to avoid UUOC:
$ awk '{print "\x27" $1 "\x27"}' test.txt | tr '\n' ',' | sed 's/,$//'
'abc','def','ghi','jkl','mno'

Using sed alone:
sed -n "/./{s/^\|\$/'/g;H}; \${x;s/\n//;s/\n/,/gp};" test.txt
Edit: Fixed, it should also work with or without empty lines now.

$ cat file
abc
def
ghi
jkl
mno
$ cat file | tr '\n' ' ' | awk -v q="'" -v OFS="','" '$1=$1 { print q $0 q }'
'abc','def','ghi','jkl','mno'
Replace '\n' with ' ' -> (tr '\n\ ' ')
Replace each separator (' ' space) with (',' quote-comma-quote) ->
(-v OFS="','")
Add quotes to the begin and end of line -> (print q $0 q)

This can be done pretty briefly with sed and paste:
<infile sed "s/^\|\$/'/g" | paste -sd,
Or more portably (I think, cannot test right now):
sed "s/^\|\$/'/g" infile | paste -s -d , -

$ sed "s/[^ ][^ ]*/'&',/g" input.txt | tr -d '\n'
'abc','def','ghi','jkl','mno',
To clean the last ,, throw in a
| sed 's/,$//'

awk 'seen == 1 { printf("'"','"'%s", $1);} seen == 0 {seen = 1; printf("'"'"'%s", $1);} END { printf("'"'"'\n"); }'
In slightly more readable format (suitable for awk -f):
# Print quote-terminator, separator, quote-start, thing
seen == 1 { printf("','%s", $1); }
# Set the "print separator" flag, print quote-start thing
seen == 0 { seen = 1; printf("'%s", $1}; }
END { printf("'\n"); } # Print quote-end

perl -l54pe 's/.*/\x27$&\x27/' file

Related

How to get the line number of a string in another string in Shell

Given
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
I'd like to get the line number of the first occurrence of $str in $sourceStr, which should be 3.
I don't know how to do it.
I have tried:
awk 'match($0, v) { print NR; exit }' v=$str <<<$sourceStr
grep -n $str <<< $sourceStr | grep -Eo '^[^:]+';
grep -n $str <<< $sourceStr | cut -f1 -d: | sort -ug
grep -n $str <<< $sourceStr | awk -F: '{ print $1 }' | sort -u
All output 1, not 3.
How can I get the line number of $str in $sourceStr?
Thanks!
You may use this awk + printf in bash:
awk -v s="$str" '$0 == s {print NR; exit}' <(printf "%b\n" "$sourceStr")
3
Or even this awk without any bash support:
awk -v s="$str" -v source="$sourceStr" 'BEGIN {
split(source, a); for (i=1; i in a; ++i) if (a[i] == s) {print i; exit}}'
3
You may use this sed as well:
sed -n "/^$str$/{=;q;}" <(printf "%b\n" "$sourceStr")
3
Or this grep + cut:
printf "%b\n" "$sourceStr" | grep -nxF -m 1 "$str" | cut -d: -f1
3
It's not clear if you've just made a cut-n-paste error, but your sourceStr is not a multiline string (as demonstrated below). Also, you really need to quote your herestring (also demonstrated below). Perhaps you just want:
$ sourceStr="abc\nefg\nhij\nlmn\nhij"
$ echo "$sourceStr"
abc\nefg\nhij\nlmn\nhij
$ sourceStr=$'abc\nefg\nhij\nlmn\nhij'
$ echo "$sourceStr"
abc
efg
hij
lmn
hij
$ cat <<< $sourceStr
abc efg hij lmn hij
$ cat <<< "$sourceStr"
abc
efg
hij
lmn
hij
$ str=hij
$ awk "/${str}/ {print NR; exit}" <<< "$sourceStr"
3
Just use sed!
printf 'abc\nefg\nhij\nlmn\nhij\n' \
| sed -n '/hij/ { =; q; }'
Explanation: if sed meets a line that contains "hij" (regex /hij/), it prints the line number (the = command) and exits (the q command). Else it doesn't print anything (the -n switch) and goes on with the next line.
[update] Hmmm, sorry, I just noticed your "All output 1, not 3".
The primary reason why your commands don't output 3 is that sourceStr="abc\nefg\nhij\nlmn\nhij" doesn't automagically change your \n into new lines, so it ends up being one single line and that's why your commands always display 1.
If you want a multiline string, here are two solutions with bash:
printf -v sourceStr "abc\nefg\nhij\nlmn\nhij"
sourceStr=$'abc\nefg\nhij\nlmn\nhij'
And now that your variable contains space characters (new lines), as stated by William Pursell, in order to preserve them, you must enclose your $sourceStr with double quotes:
grep -n "$str" <<< "$sourceStr" | ...
There's always a hard way to do it:
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
echo -e $sourceStr | nl | grep $str | head -1 | gawk '{ print $1 }'
or, a bit more efficient:
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
echo -e $sourceStr | gawk '/'$str/'{ print NR; exit }'

Starting a new line in bash scripting

I need to start a new line after each field. I know I need to use \n at the end of the command how would I do it if I am using the cat command at the start.
I have tried using && after the awk -F : 'NR==1' && '\n'. My code is:
cat /etc/shadow | awk -F : 'NR==1' && "\n"
cat /etc/shadow | awk -F : 'NR == 1 { print "Username: " $1, "\n"}'
When you want to split the fields in different lines, you can use
... | tr ':' '\n'
or when you want to hold the : at the end of each line
... | sed 's/:/:\n/g'
Maybe
&& echo or
&& Printf "\n"
Not clear the mean.

sed | grep weird behaviour in script

THE ISSUE
I have two files,
File1
INT1;INT2;INT3INT4;INT5;INT6INT7;INT7;INT9
File2
INT1;INT2;INT3
Next I'll grep the difference between the files and only take the integers of third column.
DIFFERENCE=`grep -vxFf File1 File2 | awk 'BEGIN { FS = ";" } ; { print $3 }'`
resulting in
INT6 INT9
Next I want to substitute the spaces with line breaks
echo $DIFFERENCE | sed 's/ /;\n/g'
which results in
INT6;
INT9
Just as it should.
Instead, when I do it in the script, it returns
INT6
INT9
Why does it do this in script, and is there solution to this / how can I modify my result easily?
ORIGINAL CODE - FOR CLARIFICATION
Original code and output here
CODE=`grep -vxFf $FOUND $COMPARETO | awk 'BEGIN { FS = ";" } ; { print $3 }'`
echo "$CODE;" | sed 's/ /;\n/g' > "testfile"
8000070118157
8002820000804
3394700015011;
Your intermediate output is not INT6 INT9 on single line but already two lines, therefore sed doesn't replace anything.
You can do all of this in awk itself, for example
$ awk -F';' 'NR==FNR{a[$0];next} !($0 in a){print $3 FS}' file2 file1
INT6;
INT9;
if you don't want the last ;, perhaps easier to pipe to sed '$ s/;$//'

How to split a text file on a delimiter into multiple files in unix?

I have a text file that looks like this:
input_file
1|abc
2|def
3|ghi
n|etc...
I need to split this up into two files on the pipe delimeter. So this is the expected output:
File_1:
1
2
3
n
File_2:
abc
def
ghi
etc
I do not know how many lines the input file will have. How do you achieve this in ksh or bash?
Thank you.
awk would be suitable for this task:
awk -F\| '{print $1 > "File_1"; print $2 > "File_2"}' input_file
This splits your text on the "|" and prints each column to the respective file.
If there were more than two fields, you may prefer to use a loop instead:
awk -F\| '{for(i=1;i<=NF;++i) print $i > "File_" i}' input_file
cut -d '|' -f 1 input_file > File_1
cut -d '|' -f 2 input_file > File_2
Only with bash:
while IFS='|' read A B; do echo "$A" >>File_1; echo "$B" >>File_2; done <input_file
Here is another solution using other bash commands
cat input_file | cut -d '|' -f1 > File_1
cat input_file | cut -d '|' -f2 > File_2
Or you can put them together in one line
cat input_file | tee >(cut -d '|' -f1 > File_1) | cut -d '|' -f2 > File_2

Bash: "xargs cat", adding newlines after each file

I'm using a few commands to cat a few files, like this:
cat somefile | grep example | awk -F '"' '{ print $2 }' | xargs cat
It nearly works, but my issue is that I'd like to add a newline after each file.
Can this be done in a one liner?
(surely I can create a new script or a function that does cat and then echo -n but I was wondering if this could be solved in another way)
cat somefile | grep example | awk -F '"' '{ print $2 }' | while read file; do cat $file; echo ""; done
Using GNU Parallel http://www.gnu.org/software/parallel/ it may be even faster (depending on your system):
cat somefile | grep example | awk -F '"' '{ print $2 }' | parallel "cat {}; echo"
awk -F '"' '/example/{ system("cat " $2 };printf "\n"}' somefile

Resources