What's the difference here? - bash

I have written a script to automate a routine, but can't understand the difference between the 2 blocks below. The first works and the second doesn't.
This works:
echo "$(pull_data)" > data.csv
cat data.csv | while read a b c d; do
This doesn't work:
cat "$(pull_data)" | while read a b c d; do
Why is that?

cat concatenates and outputs files - I think you want echo in your second statement:
echo "$(pull_data)" | while read a b c d; do

cat is used to work with files. You don't have a file in this case. If you don't need to store your data in data.csv, you should be able to pipe it directly to the loop:
echo "$(pull_data)" | while read a b c d; do

Related

Parsing and storing the values of a csv file using shell script outputs :::: instead of actual characters

I am trying to read a csv file using shell script,using the following command.
cat file.csv | while read -r a b c d e f; do echo "$a:$b:$c:$d:$e:$f"; done
When i run this command the first column in the file is not being read properly.
For Ex: If 1 st column contents are
number1,
number2,
number3,
number4,
(so on)
It outputs:
::::er1,
::::er2,
::::er3,
::::er4,
some characters are replaced by ':'
this happens only for the first column contents. Where am i going wrong?
The problem is due to most likely a couple of issues:-
You are reading the file without the IFS=,
Your csv file might likely have carriage returns(\r) which could mangle how read command processes the input stream.
To remove the carriage returns(\r) use tr -d '\r' < oldFile.csv > newFile.csv and in the new file do the parsing as mentioned below.
Without setting the Internal Field Separator (IFS=","), while reading from the input stream read doesn't know where to delimit your words. Add the same in the command as below.
cat file.csv | while IFS="," read -r a b c d e f; do echo "$a:$b:$c:$d:$e:$f"; done
You can see it working as below. I have the contents of the file.csv as follows.
$ cat file.csv
abc,def,ghi,ijk,lmn,opz
1,2,3,4,5,6
$ cat file.csv | while IFS="," read -r a b c d e f; do echo "$a:$b:$c:$d:$e:$f"; done
abc:def:ghi:ijk:lmn:opz
1:2:3:4:5:6
More over using cat and looping it over it is not recommended and bash enthusiasts often call it as UUOC - Useless Use Of Cat
You can avoid this by doing
#!/bin/bash
while IFS="," read -r a b c d e f;
do
echo "$a:$b:$c:$d:$e:$f"
done < file.csv

Stopping paste after any input is exhausted

I have two programs that produce data on stdout, and I'd like to paste their output together. I can successfully do this like so:
paste <(./prog1) <(./prog2)
But I find that this method will print all lines from both inputs,
and what I really want is to stop paste after either input program is finished.
So if ./prog1 produces the output:
a
b
c
But ./prog2 produces:
Hello
World
I would expect the output:
a Hello
b World
Also note that one of the input programs may actually produce infinite output, and I want to be able to handle that case as well. For example, if my inputs are yes and ./prog2, I should get:
y Hello
y World
Use join instead, with a variation on the Schwartzian transform:
numbered () {
nl -s- -ba -nrz
}
join -j 1 <(prog1 | numbered) <(prog2 | numbered) | sed 's/^[^-]*-//'
Piping to nl numbers each line, and join -1 1 will join corresponding lines with the same number. The extra lines in the longer file will have no join partner and be omitted. Once the join is complete, pipe through sed to remove the line numbers.
Here's one solution:
while IFS= read -r -u7 a && IFS= read -r -u8 b; do echo "$a $b"; done 7<$file1 8<$file2
This has the slightly annoying effect of ignoring the last line of an input file if it is not terminated with a newline (but such a file is not a valid text file).
You can wrap this in a function, of course:
paste_short() {
(
while IFS= read -r -u7 a && IFS= read -r -u8 b; do
echo "$a $b"
done
) 7<"$1" 8<"$2"
}
Consider using awk:
awk 'FNR==NR{a[++i]=$0;next} FNR>i{exit}
{print a[FNR], $0}' <(printf "hello\nworld\n") <(printf "a\nb\nc\n")
hello a
world b
Keep the longer output producing program as your 2nd input.

BASH: "while read line ???"

I understand the format below...
while read line
do
etc...
However, I saw this yesterday and haven't been able to figure out what var would be in the following:
while read pkg var
do
etc...
Thanks
while loop will read the var one by one , but assign the last parts to one var.
For example, I have a file like:
a b c d
when run the command
$ while read x y
do
echo $x
echo $y
done < file
Resule:
a
b c d
You will get "b c d" to $y.
Of course, if you only assign one var (line), then $line will get the whole line.
The read builtin will read multiple whitespace-separated (or, really, separated by whatever is in $IFS) values.
echo a b c | (read x y z; echo "$y")
#=> b
If there are more fields than variables passed to read, the last variable gets the rest of the line.

Compare Lines of file to every other line of same file

I am trying to write a program that will print out every line from a file with another line of that file added at the end, basically creating pairs from a portion of each line. If the line is the same, it will do nothing. Also, it must avoid repeating the same pairs. A B is the same as B A
In short
FileInput:
otherstuff A
otherstuff B
otherstuff C
otherstuff D
Output:
A B
A C
A D
B C
B D
C D
I was trying to do this with a BASH script, but was having trouble because I could not get my nested while loops to work. It would read the first line, compare it to each other line, and then stop (Basically only outputting the first 3 lines in the example output above, the outer while loop only ran once).
I also suspect I might be able to do this using MATLAB, so suggestions using that are also welcome.
Here is the bash script that I have thus far. As I said, it is no printing out correctly for me, as the outer loop only runs once.
#READS IN file from terminal
FILE1=$1
#START count at 0
count0=
exec 3<&0
exec 0< $FILE1
while read LINEa; do
while read LINEb; do
eventIDa=$(echo $LINEa | cut -c20-23)
eventIDb=$(echo $LINEb | cut -c20-23)
echo $eventIDa $eventIDb
done
done
Using bash:
#!/bin/bash
[ -f "$1" ] || { echo >&2 "File not found"; exit 1; }
mapfile -t lines < <(cut -c20-23 <"$1" | sort | uniq)
for i in ${!lines[#]}; do
elem1=${lines[$i]}
unset lines[$i]
for elem2 in "${lines[#]}"; do
echo "$elem1" "$elem2"
done
done
This will read a file given as a parameter on the command line, sort and filter out duplicates, and output all combinations. You can modify the parameter to cut to adjust to your particular input file.
Due to the particular way you seem to indent to use cut, your input example above won't work. Instead, use something with the correct line length, such as:
123456789012345678 A
123456789012345678 B
123456789012345678 C
123456789012345678 D
Assuming the otherstuff is not relevant (otherwise you can of course add it later) this should do the trick in Matlab:
combnk({'A' 'B' 'C' 'D'},2)

What's the difference between `echo x > y` and `echo x | tee y`?

When I want to redirect output to a file, I usually do this:
$ echo 'a' > b
$ cat b
a
However, I've seen people use tee instead of redirecting directly to a file. I'm wondering what the difference is. What I mean in this pattern:
$ echo 'a' | tee c
a
$ cat c
a
It doesn't seem to be doing anything differently than a simple redirect. I know they are conceptually not the same thing, but I'm wondering why people would use one over the other.
In simple word
echo 'a' > b , it will write "a" to file b.
#echo 'a' > b
#cat b
#a
echo 'a' | tee b , it will write "a" to file b and will display output(echo output) in the terminal.
#echo 'a' | tee b
#a
#cat b
#a
Using tee let's you split the output. You can either view it (by directing stdout to the tty you are looking at) or pass it on to further processing. It is handy for keeping track of intermediate stages of a pipeline.

Resources