send the unix output to a csv file - bash

I want to put the output data from unix command to a csv file.
Suppose the output which I am getting is :
A
B
C
I want to put this data in .csv file as
A B C
in three different columns but same row.

Try this :
printf '%s\n' A B C | paste -sd ' ' >> file.csv
or more classical for a CSV (delimiter with a , :
printf '%s\n' A B C | paste -sd ',' >> file.csv
printf '%s\n' A B C is just an example to have the same sample input as you. My solution works with spaces in a same line too.
EDIT from your comments, you seems to need to treat with a for loop, so :
for i in {0..5}; do printf '%s\n' {A..C} | paste -sd " " >> file.csv; done
or in pseudo code :
for ...:
unix_command | paste -sd " " >> file.csv
endfor

unix_command | tr "\n" " " > file.csv
or
unix_command | awk 'ORS=FS' > file.csv
Disadvantage in both cases: one trailing space

For my understanding, #Django needs three line into one line.
paste -d ' ' - - - < infile
If you need output as csv format (split by ,), you can use this
paste -d ',' - - - < infile
Here is the test result
$ cat infile
Manoj Mishra
Japan
Environment.
Michael Jackson
America
Environment.
$ paste -d ',' - - - < infile
Manoj Mishra,Japan,Environment.
Michael Jackson,America,Environment.

A more general answer
If the output of your command is multi-line and you want to put the
quoted output in csv format, n items per line, the following script
could be handy.
The groupby program reads from stdin and
quotes each input line
groups n quoted input lines in a csv record, using a comma as a separator
optionally, using the -s optional argument, the program discards the
last line of its output if said last line doesn't contain exactly n
items.
The -h option, as usual, echoes an usage line and exits.
Specifying another option the program prints the usage line and exits
in error.
The code
% cat groupby
#!/bin/sh
usage () { echo Usage: $0 [-s] n --- -s is for \"strict\", outputs only records of n items. ; exit $1 ; }
s=0
while getopts :sh o ; do
case "${o}" in
s) s=1 ; shift ;;
h) usage 0 ;;
*) usage 1 ;;
esac
done
awk -v n=$1 -v s=$s -v q='"' '
NR==1 {buf = q $0 q ; next}
NR%n==1 {print buf; buf = q $0 q ; next}
{buf = buf "," q $0 q}
END {if(!s||NR%n==0)print buf}'
%
An example of usage
% chmod +x groupby
% echo -e "1\n2\n3\n4\n5" | ./groupby 3
"1","2","3"
"4","5"
% echo -e "1\n2\n3\n4\n5\n6" | ./groupby 3
"1","2","3"
"4","5","6"
echo -e "1\n2\n3\n4\n5\n6\n7" | ./groupby 3
"1","2","3"
"4","5","6"
"7"
% echo -e "1\n2\n3\n4\n5\n6\n7\n8" | ./groupby -s 4
"1","2","3","4"
"5","6","7","8"
% echo -e "1\n2\n3\n4\n5\n6\n7" | ./groupby -s 4
"1","2","3","4"
%
A different angle
I changed the defaults to suit best the OP requirements, and introduced other options, see the usage string for details
#!/bin/sh
usage () { echo 'Usage: '$0' [-s] [-q quote_char] [-c separator_char] n
Reads lines from stdin and prints them grouped by n and separated by spaces.
Optional arguments:
-s is for "strict", outputs only records of n items;
-q quote_char, forces quoting of each input line;
-c separator_char, changes the field separator,
interesting alternatives are tab, comma, semicolon etc;
-h prints this help and exits.' ; exit $1 ; }
# Default options
s=0 ; q='' ; c=' '
# Treatment of optional arguments
while getopts :shc:q: o ; do
case "${o}" in
s) s=1 ; ;;
c) c="${OPTARG}" ;;
q) q="${OPTARG}" ;;
h) usage 0 ;;
*) usage 1 ;;
esac
done
shift $(($OPTIND-1))
# awk code
awk -v n=$1 -v s=$s -v q="$q" -v c="$c" '
NR==1 {buf = q $0 q ; next}
NR%n==1 {print buf; buf = q $0 q ; next}
{buf = buf c q $0 q}
END {if(!s||NR%n==0)print buf}'

just use xargs.
eg:
less filename| xargs >> filename.csv

Related

How to detect and remove indentation of a piped text

I'm looking for a way to remove the indentation of a piped text. Below is a solution using cut -c 9- which assumes the indentation is 8 character wide.
I'm looking for a solution which can detect the number of spaces to remove. This implies going through the whole (piped) file to know the minimum number of spaces (tabs?) used to indent it, then remove them on each line.
run.sh
help() {
awk '
/esac/{b=0}
b
/case "\$arg" in/{b=1}' \
"$me" \
| cut -c 9-
}
while [[ $# -ge 1 ]]
do
arg="$1"
shift
case "$arg" in
help|h|?|--help|-h|'-?')
# Show this help
help;;
esac
done
$ ./run.sh --help
help|h|?|--help|-h|'-?')
# Show this help
help;;
Note: echo $' 4\n 2\n 3' | python3 -c 'import sys; import textwrap as tw; print(tw.dedent(sys.stdin.read()), end="")' works but I expect there is a better, way (I mean, one which doesn't only depends on software more common than python. Maybe awk? I wouldn't mind seeing a perl solution either.
Note2: echo $' 4\n 2\n 3' | python -c 'import sys; import textwrap as tw; print tw.dedent(sys.stdin.read()),' also works (Python 2.7.15rc1).
The following is pure bash, with no external tools or command substitutions:
#!/usr/bin/env bash
all_lines=( )
min_spaces=9999 # start with something arbitrarily high
while IFS= read -r line; do
all_lines+=( "$line" )
if [[ ${line:0:$min_spaces} =~ ^[[:space:]]*$ ]]; then
continue # this line has at least as much whitespace as those preceding it
fi
# this line has *less* whitespace than those preceding it; we need to know how much.
[[ $line =~ ^([[:space:]]*) ]]
line_whitespace=${BASH_REMATCH[1]}
min_spaces=${#line_whitespace}
done
for line in "${all_lines[#]}"; do
printf '%s\n' "${line:$min_spaces}"
done
Its output is:
4
2
3
Suppose you have:
$ echo $' 4\n 2\n 3\n\ttab'
4
2
3
tab
You can use the Unix expand utility to expand the tabs to spaces. Then run through an awk to count the minimum number of spaces on a line:
$ echo $' 4\n 2\n 3\n\ttab' |
expand |
awk 'BEGIN{min_indent=9999999}
{lines[++cnt]=$0
match($0, /^[ ]*/)
if(RLENGTH<min_indent) min_indent=RLENGTH
}
END{for (i=1;i<=cnt;i++)
print substr(lines[i], min_indent+1)}'
4
2
3
tab
Here's the (semi-) obvious temp file solution.
#!/bin/sh
t=$(mktemp -t dedent.XXXXXXXXXX) || exit
trap 'rm -f $t' EXIT ERR
awk '{ n = match($0, /[^ ]/); if (NR == 1 || n<min) min = n }1
END { exit min+1 }' >"$t"
cut -c $?- "$t"
This obviously fails if all lines have more than 255 leading whitespace characters because then the result won't fit into the exit code from Awk.
This has the advantage that we are not restricting ourselves to the available memory. Instead, we are restricting ourselves to the available disk space. The drawback is that disk might be slower, but the advantage of not reading big files into memory will IMHO trump that.
echo $' 4\n 2\n 3\n \n more spaces in the line\n ...' | \
(text="$(cat)"; echo "$text" \
| cut -c "$(echo "$text" | sed 's/[^ ].*$//' | awk 'NR == 1 {a = length} length < a {a = length} END {print a + 1}')-"\
)
With explanations:
echo $' 4\n 2\n 3\n \n more spaces in the line\n ...' | \
(
text="$(cat)" # Obtain the input in a varibale
echo "$text" | cut -c "$(
# `cut` removes the n-1 first characters of each line of the input, where n is:
echo "$text" | \
sed 's/[^ ].*$//' | \
awk 'NR == 1 || length < a {a = length} END {print a + 1}'
# sed: keep only the initial spaces, remove the rest
# awk:
# At the first line `NR == 1`, get the length of the line `a = length`.
# For any shorter line `a < length`, update the length `a = length`.
# At the end of the piped input, print the shortest length + 1.
# ... we add 1 because in `cut`, characters of the line are indexed at 1.
)-"
)
Update:
It is possible to avoid spawning sed. As per tripleee's comment, sed's s/// can be replace awk's sub(). Here is an even shorter option, using n = match() as in tripleee's answer.
echo $' 4\n 2\n 3\n \n more spaces in the line\n ...' | \
(
text="$(cat)" # Obtain the input in a varibale
echo "$text" | cut -c "$(
# `cut` removes the a-1 first characters of each line of the input, where a is:
echo "$text" | \
awk '
{n = match($0, /[^ ]/)}
NR == 1 || n < a {a = n}
END || a == 0 {print a + 1; exit 0}'
# awk:
# At every line, get the position of the first non-space character
# At the first line `NR == 1`, copy that lenght to `a`.
# For any line with less spaces than `a` (`n < a`) update `a`, (`a = n`).
# At the end of the piped input, print a + 1.
# a is then the minimum number of common leading spaces found in all lines.
# ... we add 1 because in `cut`, characters of the line are indexed at 1.
#
# I'm not sure the whether the `a == 0 {...; exit 0}` optimisation will let the "$text" be written to the script stdout yet (which is not desirable at all). Gotta test that when I get the time.
)-"
)
Apparently, it's also possible to do in Perl 6 with the function my &f = *.indent(*);.
Another solution with awk, based on dawg’s answer. Major differences include:
No need to set an arbitrary large number for indentation, which feels hacky.
Works on text with empty lines, by not considering them when gathering the lowest indented line.
awk '
{
lines[++count] = $0
if (NF == 0) next
match($0, /[^ ]/)
if (length(min) == 0 || RSTART < min) min = RSTART
}
END {
for (i = 1; i <= count; i++) print substr(lines[i], min)
}
' <<< $' 4\n 2\n 3'
Or all on the same line
awk '{ lines[++count] = $0; if (NF == 0) next; match($0, /[^ ]/); if (length(min) == 0 || RSTART < min) min = RSTART; } END { for (i = 1; i <= count; i++) print substr(lines[i], min) }' <<< $' 4\n 2\n 3'
Explanation:
Add current line to an array, and increment count variable
{
lines[++count] = $0
If line is empty, skip to next iteration
if (NF == 0) next
Set RSTART to the start index of the first non-space character.
match($0, /[^ ]/)
If min isn’t set or is higher than RSTART, set the former to the latter.
if (length(min) == 0 || RSTART < min) min = RSTART
}
Run after all input is read.
END {
Loop over the array, and for each line print only a substring going from the index set in min to the end of the line.
for (i = 1; i <= count; i++) print substr(lines[i], min)
}
solution using bash
#!/usr/bin/env bash
cb=$(xclip -selection clipboard -o)
firstchar=${cb::1}
if [ "$firstchar" == $'\t' ];then
tocut=$(echo "$cb" | awk -F$'\t' '{print NF-1;}' | sort -n | head -n1)
else
tocut=$(echo "$cb" | awk -F '[^ ].*' '{print length($1)}' | sort -n | head -n1)
fi
echo "$cb" | cut -c$((tocut+1))- | xclip -selection clipboard
Note: assumes first line has the left-most indent
Works for both spaces and tabs
Ctrl+V some text, run that bash script, and now the dedented text is saved to your clipboard
solution using python
detab.py
import sys
import textwrap
data = sys.stdin.readlines()
data = "".join(data)
print(textwrap.dedent(data))
use with pipes
xclip -selection clipboard -o | python detab.py | xclip -selection clipboard

Shell Script for combining 3 files

I have 3 files with below data
$cat File1.txt
Apple,May
Orange,June
Mango,July
$cat File2.txt
Apple,Jan
Grapes,June
$cat File3.txt
Apple,March
Mango,Feb
Banana,Dec
I require the below output file.
$Output_file.txt
Apple,May|Jan|March
Orange,June
Mango,July|Feb
Grapes,June
Banana,Dec
Requirement here is the take out the first column and then common data in column 1 in each file need to be searched and second column needs to be "|" separated. If there is no common column, then same needs to be printed in the output file.
I have tried putting this in a while loop, but it takes time as the file size increase. Wanted a simple solution using shell script.
This should work :
#!/bin/bash
for FRUIT in $( cat "$#" | cut -d "," -f 1 | sort | uniq )
do
echo -ne "${FRUIT},"
awk -F "," "\$1 == \"$FRUIT\" {printf(\"%s|\",\$2)}" "$#" | sed 's/.$/\'$'\n/'
done
Run it as :
$ ./script.sh File1.txt File2.txt File3.txt
A purely native-bash solution (calling no external tools, and thus limited only by the performance constraints of bash itself) might look like:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "ERROR: Bash 4 or newer required" >&2; exit 1;; esac
declare -A items=( )
for file in "$#"; do
while IFS=, read -r key value; do
items[$key]+="|$value"
done <"$file"
done
for key in "${!items[#]}"; do
value=${items[$key]}
printf '%s,%s\n' "$key" "${value#'|'}"
done
...called as ./yourscript File1.txt File2.txt File3.txt
This is fairly easy done with a single awk command:
awk 'BEGIN{FS=OFS=","} {a[$1] = a[$1] (a[$1] == "" ? "" : "|") $2}
END {for (i in a) print i, a[i]}' File{1,2,3}.txt
Orange,June
Banana,Dec
Apple,May|Jan|March
Grapes,June
Mango,July|Feb
If you want output in the same order as strings appear in original files then use this awk:
awk 'BEGIN{FS=OFS=","} !($1 in a) {b[++n] = $1}
{a[$1] = a[$1] (a[$1] == "" ? "" : "|") $2}
END {for (i=1; i<=n; i++) print b[i], a[b[i]]}' File{1,2,3}.txt
Apple,May|Jan|March
Orange,June
Mango,July|Feb
Grapes,June
Banana,Dec

Reverse the words but keep the order Bash

I have a file with lines. I want to reverse the words, but keep them in same order.
For example: "Test this word"
Result: "tseT siht drow"
I'm using MAC, so awk doesn't seem to work.
What I got for now
input=FILE_PATH
while IFS= read -r line || [[ -n $line ]]
do
echo $line | rev
done < "$input"
Here is a solution that completely avoids awk
#!/bin/bash
input=./data
while read -r line ; do
for word in $line ; do
output=`echo $word | rev`
printf "%s " $output
done
printf "\n"
done < "$input"
In case xargs works on mac:
echo "Test this word" | xargs -n 1 | rev | xargs
Inside your read loop, you can just iterate over the words of your string and pass them to rev
line="Test this word"
for word in "$line"; do
echo -n " $word" | rev
done
echo # Add final newline
output
tseT siht drow
You are actually in fairly good shape with bash. You can use string-indexes and string-length and C-style for loops to loop over the characters in each word building a reversed string to output. You can control formatting in a number of ways to handle spaces between words, but a simple flag first=1 is about as easy as anything else. You can do the following with your read,
#!/bin/bash
while read -r line || [[ -n $line ]]; do ## read line
first=1 ## flag to control space
a=( $( echo $line ) ) ## put line in array
for i in "${a[#]}"; do ## for each word
tmp= ## clear temp
len=${#i} ## get length
for ((j = 0; j < len; j++)); do ## loop length times
tmp="${tmp}${i:$((len-j-1)):1}" ## add char len - j to tmp
done
if [ "$first" -eq '1' ]; then ## if first word
printf "$tmp"; first=0; ## output w/o space
else
printf " $tmp" ## output w/space
fi
done
echo "" ## output newline
done
Example Input
$ cat dat/lines2rev.txt
my dog has fleas
the cat has none
Example Use/Output
$ bash revlines.sh <dat/lines2rev.txt
ym god sah saelf
eht tac sah enon
Look things over and let me know if you have questions.
Using rev and awk
Consider this as the sample input file:
$ cat file
Test this word
Keep the order
Try:
$ rev <file | awk '{for (i=NF; i>=2; i--) printf "%s%s",$i,OFS; print $1}'
tseT siht drow
peeK eht redro
(This uses awk but, because it uses no advanced awk features, it should work on MacOS.)
Using in a script
If you need to put the above in a script, then create a file like:
$ cat script
#!/bin/bash
input="/Users/Anastasiia/Desktop/Tasks/test.txt"
rev <"$input" | awk '{for (i=NF; i>=2; i--) printf "%s%s",$i,OFS; print $1}'
And, run the file:
$ bash script
tseT siht drow
peeK eht redro
Using bash
while read -a arr
do
x=" "
for ((i=0; i<${#arr}; i++))
do
((i == ${#arr}-1)) && x=$'\n'
printf "%s%s" $(rev <<<"${arr[i]}") "$x"
done
done <file
Applying the above to our same test file:
$ while read -a arr; do x=" "; for ((i=0; i<${#arr}; i++)); do ((i == ${#arr}-1)) && x=$'\n'; printf "%s%s" $(rev <<<"${arr[i]}") "$x"; done; done <file
tseT siht drow
peeK eht redro

unix shell script to check for EOF

I wish to take names of two files as command line arguments in bash shell script and then for each word (words are comma separated and the file has more than one line) in the first file I need to count its occurrence in the second file.
I wrote a shell script like this
if [ $# -ne 2 ]
then
echo "invalid number of arguments"
else
i=1
a=$1
b=$2
fp=*$b
while[ fgetc ( fp ) -ne EOF ]
do
d=$( cut -d',' -f$i $a )
echo "$d"
grep -c -o $d $b
i=$(( $i + 1 ))
done
fi
for example file1 has words abc,def,ghi,jkl (in first line )
mno,pqr (in second line)
and file2 has words abc,abc,def
Now the output should be like abc 2
def 1
ghi 0
To read a file word by word separated by comma use this snippet:
while read -r p; do
IFS=, && for w in $p; do
printf "%s: " "$w"
tr , '\n' < file2 | grep -Fc "$w"
done
done < file1
Another approach:
words=( `tr ',' ' ' < file1`) #split the file1 into words...
for word in "${words[#]}"; do #iterate in the words
printf "%s : " "$word"
awk 'END{print FNR-1}' RS="$word" file2
# split file2 with 'word' as record separator.
# print number of lines == number of occurrences of the word..
done

Using BASH, how can I expand PDSH list of IP addresses?

I have a list of 3000 or so IP addresses that were the result of a pdsh output piped through dshback -c which formats the output into a readable format. I like the readability of dshback -c, but the problem I have is that IP's with common octets are collapsed to save space. I need to have the full IP address for the rest of my project.
Is there an easy way to convert this input:
192.168.38.[217,222],192.168.40.215,192.168.41.[219-222]
to this output:
192.168.38.217,192.168.38.222,192.168.40.215,192.168.41.219,192.168.41.220,192.168.41.221,192.168.41.222
I was thinking sed could be used directly, but I'm not sure how to store the common octets in a variable. For this reason, I believe a bash script will need to be used along with sed. Any help or points in the right direction would be appreciated.
If you can change an input you can use following form:
echo 192.168.38.{217,222} 192.168.40.215 192.168.41.{219..222} | tr ' ' ','
Otherwise you can change it by command and eval:
eval echo $( echo '192.168.38.[217,222],192.168.40.215,192.168.41.[219-222]' | \
sed 's/,/ /g;s/\[/{/g;s/]/}/g;s/-/../g;s/\({[0-9]\+\) \([0-9]\+}\)/\1,\2/g' | \
grep -v '[^0-9{}., ]' ) | tr ' ' ','
note, that eval is pretty dangerous on invalidated data, therefore I use grep '[^0-9{}., ]' to exclude any unexpected symbols.
sed in this command just transforms your original string to a form I've mentioned above.
If you are ready to use awk then you can try this
echo "192.168.38.[217,222],192.168.40.215,192.168.41.[219-222]" |sed 's/\[//g' | sed 's/\]//g' | awk -F, '{for(i=1;i<=NF;i++){n=split($i,a,".");IPL="";if(n>1){PIP=a[1] "." a[2] "." a[3];}else{IPL=PIP "." $i;}if(index(a[4],"-") > 0){x=0;split(a[4],b,"-");for(j=b[1];j<=b[2];j++){if(x==0){IPL=PIP "." j;x++;}else{IPL=IPL "," PIP "." j;}}}else if(index(a[4],",") > 0){split(a[4],b,",");IPL=PIP "." b[1] "," PIP "." b[2];}else{if(length(IPL)<=3){IPL=PIP "." a[4];}}printf("%s,",IPL);}}'
If you are interested in using this i can explain the logic...
This is one way to process it purely with Bash as required. No awks, sed and other stuffs.
#!/bin/bash
shopt -s extglob
IFS=,
while read -r LINE; do
OUTPUT=()
while [[ -n $LINE ]]; do
case "$LINE" in
+([[:digit:]]).+([[:digit:]]).+([[:digit:]]).+([[:digit:]]))
OUTPUT[${#OUTPUT[#]}]=$LINE
break
;;
+([[:digit:]]).+([[:digit:]]).+([[:digit:]]).+([[:digit:]]),*)
OUTPUT[${#OUTPUT[#]}]=${LINE%%,*}
LINE=${LINE#*,}
;;
+([[:digit:]]).+([[:digit:]]).+([[:digit:]]).\[+([[:digit:],-])\]*)
SET=${LINE%%\]*}
PREFIX=${SET%%\[*}
read -a RANGES <<< "${SET:${#PREFIX} + 1}"
for R in "${RANGES[#]}"; do
case "$R" in
+([[:digit:]]))
OUTPUT[${#OUTPUT[#]}]=${PREFIX}${R}
;;
+([[:digit:]])-+([[:digit:]]))
X=${R%%-*} Y=${R##*-}
if [[ X -le Y ]]; then
for (( I = X; I <= Y; ++I )); do
OUTPUT[${#OUTPUT[#]}]=${PREFIX}${I}
done
else
for (( I = X; I >= Y; --I )); do
OUTPUT[${#OUTPUT[#]}]=${PREFIX}${I}
done
fi
;;
esac
done
LINE=${LINE:${#SET} + 2}
;;
*)
# echo "Invalid token: $LINE" >&2
break
esac
done
echo "${OUTPUT[*]}"
done
For an input of
192.168.38.[217,222],192.168.40.215,192.168.41.[219-222]
Running bash temp.sh < temp.txt yields
192.168.38.217,192.168.38.222,192.168.40.215,192.168.41.219,192.168.41.220,192.168.41.221,192.168.41.222
It's consistent also with ranges. If X is later than Y e.g. 200-100 then it would generate IPS with subsets of 200 to 100. The script could also process multi-line inputs.
And it should also work with mixed ranges like [100,200-250].
With GNU awk:
$ cat tst.awk
BEGIN{ FS=OFS="," }
{
$0 = gensub(/(\[[[:digit:]]+),([[:digit:]]+\])/,"\\1+\\2","g")
gsub(/[][]/,"")
for (i=1;i<=NF;i++) {
split($i,a,/\./)
base = a[1] "." a[2] "." a[3]
range = a[4]
split(range,r,/[+-]/)
printf (i>1 ? "," : "")
if (range ~ /+/) {
printf "%s.%s", base, r[1]
printf "%s.%s", base, r[2]
}
else if (range ~ /-/) {
for (j=r[1]; j<=r[2]; j++) {
printf "%s.%s", base, j
}
}
else {
printf "%s.%s", base, range
}
}
print ""
}
$
$ awk -f tst.awk file
192.168.38.217192.168.38.222,192.168.40.215,192.168.41.219192.168.41.220192.168.41.221192.168.41.222
We need the gensub() to change the comma inside the square brackets to a different character (+) so we can use the comma outside of the brackets as the field separator and gensub() makes it gawk-specific.

Resources