Stopping paste after any input is exhausted - bash

I have two programs that produce data on stdout, and I'd like to paste their output together. I can successfully do this like so:
paste <(./prog1) <(./prog2)
But I find that this method will print all lines from both inputs,
and what I really want is to stop paste after either input program is finished.
So if ./prog1 produces the output:
a
b
c
But ./prog2 produces:
Hello
World
I would expect the output:
a Hello
b World
Also note that one of the input programs may actually produce infinite output, and I want to be able to handle that case as well. For example, if my inputs are yes and ./prog2, I should get:
y Hello
y World

Use join instead, with a variation on the Schwartzian transform:
numbered () {
nl -s- -ba -nrz
}
join -j 1 <(prog1 | numbered) <(prog2 | numbered) | sed 's/^[^-]*-//'
Piping to nl numbers each line, and join -1 1 will join corresponding lines with the same number. The extra lines in the longer file will have no join partner and be omitted. Once the join is complete, pipe through sed to remove the line numbers.

Here's one solution:
while IFS= read -r -u7 a && IFS= read -r -u8 b; do echo "$a $b"; done 7<$file1 8<$file2
This has the slightly annoying effect of ignoring the last line of an input file if it is not terminated with a newline (but such a file is not a valid text file).
You can wrap this in a function, of course:
paste_short() {
(
while IFS= read -r -u7 a && IFS= read -r -u8 b; do
echo "$a $b"
done
) 7<"$1" 8<"$2"
}

Consider using awk:
awk 'FNR==NR{a[++i]=$0;next} FNR>i{exit}
{print a[FNR], $0}' <(printf "hello\nworld\n") <(printf "a\nb\nc\n")
hello a
world b
Keep the longer output producing program as your 2nd input.

Related

Extracting file content using a for loop [duplicate]

I'm working on a long Bash script. I want to read cells from a CSV file into Bash variables. I can parse lines and the first column, but not any other column. Here's my code so far:
cat myfile.csv|while read line
do
read -d, col1 col2 < <(echo $line)
echo "I got:$col1|$col2"
done
It's only printing the first column. As an additional test, I tried the following:
read -d, x y < <(echo a,b,)
And $y is empty. So I tried:
read x y < <(echo a b)
And $y is b. Why?
You need to use IFS instead of -d:
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done < myfile.csv
To skip a given number of header lines:
skip_headers=3
while IFS=, read -r col1 col2
do
if ((skip_headers))
then
((skip_headers--))
else
echo "I got:$col1|$col2"
fi
done < myfile.csv
Note that for general purpose CSV parsing you should use a specialized tool which can handle quoted fields with internal commas, among other issues that Bash can't handle by itself. Examples of such tools are cvstool and csvkit.
How to parse a CSV file in Bash?
Coming late to this question and as bash do offer new features, because this question stand about bash and because none of already posted answer show this powerful and compliant way of doing precisely this.
Parsing CSV files under bash, using loadable module
Conforming to RFC 4180, a string like this sample CSV row:
12,22.45,"Hello, ""man"".","A, b.",42
should be splitted as
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
bash loadable .C compiled modules.
Under bash, you could create, edit, and use loadable c compiled modules. Once loaded, they work like any other builtin!! ( You may find more information at source tree. ;)
Current source tree (Oct 15 2021, bash V5.1-rc3) do contain a bunch of samples:
accept listen for and accept a remote network connection on a given port
asort Sort arrays in-place
basename Return non-directory portion of pathname.
cat cat(1) replacement with no options - the way cat was intended.
csv process one line of csv data and populate an indexed array.
dirname Return directory portion of pathname.
fdflags Change the flag associated with one of bash's open file descriptors.
finfo Print file info.
head Copy first part of files.
hello Obligatory "Hello World" / sample loadable.
...
tee Duplicate standard input.
template Example template for loadable builtin.
truefalse True and false builtins.
tty Return terminal name.
uname Print system information.
unlink Remove a directory entry.
whoami Print out username of current user.
There is an full working cvs parser ready to use in examples/loadables directory: csv.c!!
Under Debian GNU/Linux based system, you may have to install bash-builtins package by
apt install bash-builtins
Using loadable bash-builtins:
Then:
enable -f /usr/lib/bash/csv csv
From there, you could use csv as a bash builtin.
With my sample: 12,22.45,"Hello, ""man"".","A, b.",42
csv -a myArray '12,22.45,"Hello, ""man"".","A, b.",42'
printf "%s\n" "${myArray[#]}" | cat -n
1 12
2 22.45
3 Hello, "man".
4 A, b.
5 42
Then in a loop, processing a file.
while IFS= read -r line;do
csv -a aVar "$line"
printf "First two columns are: [ '%s' - '%s' ]\n" "${aVar[0]}" "${aVar[1]}"
done <myfile.csv
This way is clearly the quickest and strongest than using any other combination of bash builtins or fork to any binary.
Unfortunely, depending on your system implementation, if your version of bash was compiled without loadable, this may not work...
Complete sample with multiline CSV fields.
Conforming to RFC 4180, a string like this single CSV row:
12,22.45,"Hello ""man"",
This is a good day, today!","A, b.",42
should be splitted as
1 12
2 22.45
3 Hello "man",
This is a good day, today!
4 A, b.
5 42
Full sample script for parsing CSV containing multilines fields
Here is a small sample file with 1 headline, 4 columns and 3 rows. Because two fields do contain newline, the file are 6 lines length.
Id,Name,Desc,Value
1234,Cpt1023,"Energy counter",34213
2343,Sns2123,"Temperatur sensor
to trigg for alarm",48.4
42,Eye1412,"Solar sensor ""Day /
Night""",12199.21
And a small script able to parse this file correctly:
#!/bin/bash
enable -f /usr/lib/bash/csv csv
file="sample.csv"
exec {FD}<"$file"
read -ru $FD line
csv -a headline "$line"
printf -v fieldfmt '%-8s: "%%q"\\n' "${headline[#]}"
numcols=${#headline[#]}
while read -ru $FD line;do
while csv -a row "$line" ; (( ${#row[#]} < numcols )) ;do
read -ru $FD sline || break
line+=$'\n'"$sline"
done
printf "$fieldfmt\\n" "${row[#]}"
done
This may render: (I've used printf "%q" to represent non-printables characters like newlines as $'\n')
Id : "1234"
Name : "Cpt1023"
Desc : "Energy\ counter"
Value : "34213"
Id : "2343"
Name : "Sns2123"
Desc : "$'Temperatur sensor\nto trigg for alarm'"
Value : "48.4"
Id : "42"
Name : "Eye1412"
Desc : "$'Solar sensor "Day /\nNight"'"
Value : "12199.21"
You could find a full working sample there: csvsample.sh.txt or
csvsample.sh.
Note:
In this sample, I use head line to determine row width (number of columns). If you're head line could hold newlines, (or if your CSV use more than 1 head line). You will have to pass number or columns as argument to your script (and the number of head lines).
Warning:
Of course, parsing CSV using this is not perfect! This work for many simple CSV files, but care about encoding and security!! For sample, this module won't be able to handle binary fields!
Read carefully csv.c source code comments and RFC 4180!
From the man page:
-d delim
The first character of delim is used to terminate the input line,
rather than newline.
You are using -d, which will terminate the input line on the comma. It will not read the rest of the line. That's why $y is empty.
We can parse csv files with quoted strings and delimited by say | with following code
while read -r line
do
field1=$(echo "$line" | awk -F'|' '{printf "%s", $1}' | tr -d '"')
field2=$(echo "$line" | awk -F'|' '{printf "%s", $2}' | tr -d '"')
echo "$field1 $field2"
done < "$csvFile"
awk parses the string fields to variables and tr removes the quote.
Slightly slower as awk is executed for each field.
In addition to the answer from #Dennis Williamson, it may be helpful to skip the first line when it contains the header of the CSV:
{
read
while IFS=, read -r col1 col2
do
echo "I got:$col1|$col2"
done
} < myfile.csv
If you want to read CSV file with some lines, so this the solution.
while IFS=, read -ra line
do
test $i -eq 1 && ((i=i+1)) && continue
for col_val in ${line[#]}
do
echo -n "$col_val|"
done
echo
done < "$csvFile"

Parsing and storing the values of a csv file using shell script outputs :::: instead of actual characters

I am trying to read a csv file using shell script,using the following command.
cat file.csv | while read -r a b c d e f; do echo "$a:$b:$c:$d:$e:$f"; done
When i run this command the first column in the file is not being read properly.
For Ex: If 1 st column contents are
number1,
number2,
number3,
number4,
(so on)
It outputs:
::::er1,
::::er2,
::::er3,
::::er4,
some characters are replaced by ':'
this happens only for the first column contents. Where am i going wrong?
The problem is due to most likely a couple of issues:-
You are reading the file without the IFS=,
Your csv file might likely have carriage returns(\r) which could mangle how read command processes the input stream.
To remove the carriage returns(\r) use tr -d '\r' < oldFile.csv > newFile.csv and in the new file do the parsing as mentioned below.
Without setting the Internal Field Separator (IFS=","), while reading from the input stream read doesn't know where to delimit your words. Add the same in the command as below.
cat file.csv | while IFS="," read -r a b c d e f; do echo "$a:$b:$c:$d:$e:$f"; done
You can see it working as below. I have the contents of the file.csv as follows.
$ cat file.csv
abc,def,ghi,ijk,lmn,opz
1,2,3,4,5,6
$ cat file.csv | while IFS="," read -r a b c d e f; do echo "$a:$b:$c:$d:$e:$f"; done
abc:def:ghi:ijk:lmn:opz
1:2:3:4:5:6
More over using cat and looping it over it is not recommended and bash enthusiasts often call it as UUOC - Useless Use Of Cat
You can avoid this by doing
#!/bin/bash
while IFS="," read -r a b c d e f;
do
echo "$a:$b:$c:$d:$e:$f"
done < file.csv

Append index variable at the end of each line

I am trying to append an index variable at the end of each line of a file that I have. However I dont want to lose the escape characters that I have in the textFile and thus cannot echo into the file again.
Here's what I tried:
while read p; do
tempCom+=$p
tempCom+=$indexVar
echo $tempCom >> otherFile.txt
tempCom=""
done < result.txt
What I am after:
Read:
"asdasdasdasdasdasd\ asdasd/asda"
"qweqweqweqweqweqwe\ qweqwe/qweq"
Output:
"asdasdasdasdasdasd\ asdasd/asda" 1
"qweqweqweqweqweqwe\ qweqwe/qweq" 2
Note that indexVar is an index that is stored elsewhere and does not necessarily correspond to the line that its being appended to.
Your problem is very likely a quoting problem. Observe the IFS= and the -r option in the read statement too.
while IFS= read -r p
tempCom+=$p$indexVar
printf '%s\n' "$tempCom" >> otherFile.txt # Observe the quotes
tempCom=
done < result.txt
If you just want to append the line number to the end why not use awk?
awk '{print $0, "\t", NR}' < file.txt
EDIT 1: It sounds like you want to use paste then (assuming you want to just join line by line)
paste file1.txt file2.txt > fileresults.txt
EDIT 2: You can use sed then:
sed "s|$|${indexVar}|" input
Use the -r option of the read command, so that the backslashes are preserved.
while read -r p; do

Compare Lines of file to every other line of same file

I am trying to write a program that will print out every line from a file with another line of that file added at the end, basically creating pairs from a portion of each line. If the line is the same, it will do nothing. Also, it must avoid repeating the same pairs. A B is the same as B A
In short
FileInput:
otherstuff A
otherstuff B
otherstuff C
otherstuff D
Output:
A B
A C
A D
B C
B D
C D
I was trying to do this with a BASH script, but was having trouble because I could not get my nested while loops to work. It would read the first line, compare it to each other line, and then stop (Basically only outputting the first 3 lines in the example output above, the outer while loop only ran once).
I also suspect I might be able to do this using MATLAB, so suggestions using that are also welcome.
Here is the bash script that I have thus far. As I said, it is no printing out correctly for me, as the outer loop only runs once.
#READS IN file from terminal
FILE1=$1
#START count at 0
count0=
exec 3<&0
exec 0< $FILE1
while read LINEa; do
while read LINEb; do
eventIDa=$(echo $LINEa | cut -c20-23)
eventIDb=$(echo $LINEb | cut -c20-23)
echo $eventIDa $eventIDb
done
done
Using bash:
#!/bin/bash
[ -f "$1" ] || { echo >&2 "File not found"; exit 1; }
mapfile -t lines < <(cut -c20-23 <"$1" | sort | uniq)
for i in ${!lines[#]}; do
elem1=${lines[$i]}
unset lines[$i]
for elem2 in "${lines[#]}"; do
echo "$elem1" "$elem2"
done
done
This will read a file given as a parameter on the command line, sort and filter out duplicates, and output all combinations. You can modify the parameter to cut to adjust to your particular input file.
Due to the particular way you seem to indent to use cut, your input example above won't work. Instead, use something with the correct line length, such as:
123456789012345678 A
123456789012345678 B
123456789012345678 C
123456789012345678 D
Assuming the otherstuff is not relevant (otherwise you can of course add it later) this should do the trick in Matlab:
combnk({'A' 'B' 'C' 'D'},2)

Appending Text From A File To Text In Another File

I have two separate text files, one with 4 letter words and one with 4 digit numbers, all on individual lines. The words in the on file correspond to the numbers on the same line in the other file. For example:
CATS
RATS
HATS
matches up with
2287
7287
4287
What I would like is to append the numbers to the end of their matching word, so it looks like this:
CATS2287
RATS7287
HATS4287
so far what I have is this:
for i in $(cat numbers); do
sed 's/$/'$i'/' words;
done
but the problem is a) that doesn't print/echo out to a new file and b) it loops through each word every time the first loop comes to a new number so in the end, all the words are paired up with the last number in the number file. Thanks in advance for the help.
paste -d "" /path/to/letters /path/to/numbers
Proof of Concept
$ paste -d "" CATS NUMS
CATS2287
RATS7287
HATS4287
You can use the excellent little paste(1) utility:
$ cat a
CATS
RATS
HATS
$ cat b
2287
7287
4287
$ paste -d "" a b
CATS2287
RATS7287
HATS4287
$
-d specifies a list of delimiters; I gave it a blank list: no delimiters, no delimiters.
Hmm, my version of paste with -d"" just results in numbers, the words get overwritten (GNU paste 8.10 on cygwin). my input files have no carriage returns.
paste words numbers | tr -d '\t'
Also, just with shell builtins
exec 3<words
exec 4<numbers
while read -u3 word; do
read -u4 num
echo $word$num
done
exec 3<&-
exec 4<&-
On Mac OS X:
paste -d "\0" <(echo abc) <(echo def)
there are a few ways to do that
Paste:
paste -d "" file1 file2
awk
awk '{ getline f<"file2" ; print $0f}' file1
Bash:
exec 6<"file2"
while read -r line
do
read -r S <&6
printf "${line}${S}\n"
done <"file1"
exec >&6-
Ruby(1.9+)
ruby -ne 'BEGIN{f=File.open("file1")};print $_.chomp+f.readline;END{f.close}' file

Resources