How to evaluate a stream line by line - bash

I am trying to avoid creating any new files to store output in order to minimize the risk of overwriting something in a directory with the same name. I am trying to just evaluate each line in a stream with a pipe instead of outputting to a file and then using a while read line do done < file loop. Something like:
echo -e "1\n2\n3\n4\n5" | #evaluate current line separately#
Could I somehow read each line into an array and then evaluate the elements in the array? or is there a better way to avoid accidentally overwriting files?

In bash, the common way is to use the Process Substitution:
while read line ; do
...
done < <( commands producing the input)

You were halfway there...
echo -e "1\n2\n3\n4\n5" | while read line; do
...
done
Note that bash runs each part of the pipeline in a separate process, and any variables defined there will not persist after that block. (ksh93 will preserve them, as the loop will run in the current shell process.)

You can avoid overwriting files by using mktemp or tempfile to create temporary files with unique names. However, I would use process substitution as in choroba's answer.

Related

How Can I Loop Edit Multiple Files in Bash script?

I have 40 csv files that I need to edit. 20 have matching format and the names only differ by one character, e.g., docA.csv, docB.csv, etc. The other 20 also match and are named pair_docA.csv, pair_docB.csv, etc.
I have the code written to edit and combine docA.csv and pair_docA.csv, but I'm struggling writing a loop that calls both the above files, edits them, and combines them under the name combinedA.csv, then goes on the the next pair.
Can anyone help my rudimentary bash scripting? Here's what I have thus far. I've tried in a single for loop, and now I'm trying in 2 (probably 3) for loops. I'd prefer to keep it in a single loop.
set -x
DIR=/path/to/file/location
for file in `ls $DIR/doc?.csv`
do
#code to edit the doc*.csv files ie $file
done
for pairdoc in `ls $DIR/pair_doc?.csv`
do
#code to edit the piar_doc*.csv files ie $pairdoc
done
#still need to combine the files. I have the join written for a single iteration,
#but how do I loop the code to save each join as a different file corresponding
#to combined*.csv
Something along these lines:
#!/bin/bash
dir=/path/to/file/location
cd "$dir" || exit
for file in doc?.csv; do
pair=pair_$file
# "${file#doc}" deletes the prefix "doc"
combined=combined_${file#doc}
cat "$file" "$pair" >> "$combined"
done
ls, on principle, shouldn't be used in a shell script in order to iterate over the files. It is intended to be used interactively and nearly never needed within a script. Also, all-capitalized variable names shouldn't be used as ordinary variables, since they may collide with internal shell variables or environment variables.
Below is a version without changing the directory.
#!/bin/bash
dir=/path/to/file/location
for file in "$dir/"doc?.csv; do
basename=${file#"$dir/"}
pair=$dir/pair_$basename
combined=$dir/combined_${basename#doc}
cat "$file" "$pair" >> "$combined"
done
This might work for you (GNU parallel):
parallel cat {1} {2} \> join_{1}_{2} ::: doc{A..T}.csv :::+ pair_doc{A..T}.csv
Change the cat commands to your chosen commands where {1} represents the docX.csv files and {2} represents the pair_docX.csv file.
N.B. X represents the letters A thru T

How to iterate each line of text file and assign it as variable value in each iteration?

I have a folder which contains hundreds of files. Also, another file which contains the name of all files in the directory as follows:
>myfile.txt
11j
33t
dsvd33
343
im#3
I would like to write bash script such that, it goes each line of myfile.txt, and select the file name (file id) in each iteration and path it to my CrunchMe.
More specifically:
#!/bin/bash
for ID in myfile.txt:
# do this
CrunchMe ID
end
Can anyone help me with it?
#saterHater is almost right. In his solution you have to set the internal field separator (IFS) when you're working if your filenames contain spaces. Instead of his solution, I personally would put the filenames into a for loop. Note that the for loop will have some memory overhead, but reading about CrunchMe I think that doesn't matter really.
#! /bin/bash
IFS=$'\n'
for ID in $(cat myfile.txt); do
CrunchMe $ID
done
Otherwise, the link #chepner could be a solution. The following code uses less memory and more CPU resources, because you read the file line-by-line.
#! /bin/bash
while IFS='' read -r ID; do
CrunchMe $ID
done < myfile.txt
Both scripts have the same output. In my test case that's:
1.txt
a 1.txt
lala .txt
I think you want to take a look at child processes when you're changing a lot of files. In that case you should consider the disk overhead, but that's off topic for this thread ;-)
I'm a newbie answer-er, so my apologies if this isn't helpful.
The two things you need to know are the subshells $() and the variable prefix $:
cat $(cat myfile.txt) | while read ID ; do
mv $ID CruncMe
done
I am assuming you'll run this in the local directory where both your myfile.txt and ALL the other files are in.
Otherwise you can run a find subshell to locate the correct absolute path name of the $ID file.

File input getting consumed inside the while loop

I'm reading through a lookup file and performing a set of actions for each line in the file. However the while loop only reads the first line in the file and exits. Here's the current code that I have.
sql_from_lkp $lookup
function sql_from_lkp {
lkp=$1
while read line; do
sql_from_columns ${line}
echo ${line}
done < ${lkp}
}
function sql_from_columns {
table_name=$1
table_column_info_file=${table_name}_columns
line_count=`cat $table_info_file | wc -l`
....
}
By selectively commenting the code, I found that if I comment the line_count line, the while loop goes through every line in the file and works fine. So the input is getting consumed by the cat statement.
I've checked other answers and understood that ssh usually consumes the file inputs inside while loops if -n option is not used. But not sure how to fix this case. Need some help.
You've mistyped a variable name: $table_info_file should be $table_column_info_file.
If you correct that, your problem will go away.
By referring to a non-extant variable - the mistyped $table_info_file - you're essentially executing cat | wc -l (no filename argument passed to cat) in sql_from_columns(), which makes cat read from stdin.
Therefore - after having read the 1st line in the while loop - the cat command in sql_from_columns() consumes the entire rest of your input (< ${lkp}), which is why the while loop exits after the 1st iteration.
Generally,
You should double-quote all your variable references so as not to subject their values to word-splitting and globbing.
Bash won't allow you to call functions before they're defined, so as presented in your question, your code fundamentally couldn't work.
While the legacy `...` syntax for command substitutions is still supported, it has pitfalls that can be avoided with the modern $(...) syntax.
A more efficient way to count lines is to pass the input file to wc -l via < rather than via cat and a pipeline (wc also accepts filename operands directly, but it then prints the input filename after the counts).
Incidentally, you probably would have caught your mistyped variable reference more easily had you done that, as Bash would have reported an ambiguous redirect error in the absence of a filename following <.
Here's a reformulation that addresses all the issues:
function sql_from_lkp {
lkp=$1
while read line; do
sql_from_columns "${line}"
echo "${line}"
done < "${lkp}"
}
function sql_from_columns {
table_name=$1
table_column_info_file=${table_name}_columns
line_count=$(wc -l < "$table_column_info_file")
# ...
}
sql_from_lkp "$lookup"
Note that I've only added double quotes where strictly needed to make the command robust; it wouldn't hurt to add them whenever a parameter (variable) is referenced.

Output filename from input in bash

I have this script:
#!/bin/bash
FASTQFILES=~/Programs/ncbi-blast-2.2.29+/DB_files/*.fastq
FASTAFILES=~/Programs/ncbi-blast-2.2.29+/DB_files/*.fasta
clear
for file in $FASTQFILES
do cat $FASTQFILES | perl -e '$i=0;while(<>){if(/^\#/&&$i==0){s/^\#/\>/;print;}elsif($i==1){print;$i=-3}$i++;}' > ~/Programs/ncbi-blast-2.2.29+/DB_files/"${FASTQFILES%.*}.fasta"
mv $FASTAFILES ~/Programs/ncbi-blast-2.2.29+/db/
done
I'm trying it to grab the files defined in $FASTQFILES, do the .fastq to .fasta conversion, name the output with the same filename of the input, and move it to a new folder. E.g., ~/./DB_files/HELLO.fastq should give a converted ~/./db/HELLO.fasta
The problem is that the output of the conversion is a properly formatted hidden file called .fasta in the first folder instead of the expected one named HELLO.fasta. So there is nothing to mv. I think I'm messing up in the ${FASTQFILES%.*}.fasta argument but I can't seem to fix it.
I see three problems:
One part of your trouble is that you use cat $FASTQFILES instead of cat $file.
You also need to fix the I/O redirection at the end of that line to > ~/Programs/ncbi-blast-2.2.29+/DB_files/"${file%.fastq}.fasta".
The mv command needs to be executed outside the loop.
In fact, when processing a single file at a time, you don't need to use cat at all (UUOC — Useless Use Of Cat). Simply provide "$file" as an argument to the Perl script.

filename with space

I have written a shell script which picks all the files recursively inside all the directories and prepared a report with the file last modified, size.
The problem that I am facing, there are few files with name as "User Interface"(space in between). How to use there files in the for loop of the shell script and fetch the files and directories inside this.
Thanks in advance
Just put the file name variable between double quotes "$FILENAME"
You're probably trying to use something like for file in $(command). Instead, use a while read loop or a for loop with globbing. Make sure you quote variables that contain filenamess.
#!/bin/sh
command | while read -r file
do
something_with "$file"
done
or, in shells that support process substitution:
#!/bin/bash
while read -r file
do
something_with "$file"
done < <(command)
If you're simply iterating over a list of files:
for file in "$dir"/*
do
something_with "$file"
done
Google Search led me to this page

Resources