bash - combine while read with grep and cut - bash

I want to modify my existing bash script. This is how it looks now:
#! /bin/bash
SAMPLE = myfile.txt
while read SAMPLE
do
name = $SAMPLE
# some other code
done < $SAMPLE
In this case 'myfile'.txt consists only of one column, with all the info I need.
Now I want to modify this script because 'myfile.txt' contains now more columns and more lines than I need.
grep 'TEST' myfile.txt | cut -d "," -f 1
gives me the values I need. But how can I integrate this into my bash script?

You can pipe the output of any command into a while read loop.
Try this:
#! /bin/bash
INPUT=myfile.txt
grep 'TEST' $INPUT |
cut -d "," -f 1 |
while read SAMPLE
do
name=$SAMPLE
# some other code
done

You have to change the input field separator (IFS), which tells read where to split the input line. Then you tell read to read two fields: the one you need and one you do not care about.
#! /bin/bash
SAMPLE=myfile.txt
while IFS=, read SAMPLE dontcare
do
name="$SAMPLE"
# some other code
done < <(grep TEST "$SAMPLE")
By the way: whenever you use read, you should consider to use the option -r.

Related

How to copy specific columns from one csv file to another csv file?

File1.csv:
File2.csv:
I want to replace the contents of configSku,selectedSku,config_id in File1.csv with the contents of configSku,selectedSku,config_idfrom File2.csv. The end result should look like this:
Here are the links to download the files so you can try it yourself:
File1.csv: https://www.dropbox.com/s/2o12qjzqlcgotxr/file1.csv?dl=0
File2.csv: https://www.dropbox.com/s/331lpqlvaaoljil/file2.csv?dl=0
Here's what I have tried but still failed:
#!/bin/bash
INPUT=/tmp/file2.csv
OLDIFS=$IFS
IFS=,
[ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; }
echo "no,my_account,form_token,fingerprint,configSku,selectedSku,config_id,address1,item_title" > /tmp/temp.csv
while read item_title configSku selectedSku config_id
do
cat /tmp/file1.csv |
awk -F ',' -v item_title="$item_title" \
-v configSku="$configSku" \
-v selectedSku="$selectedSku" \
-v config_id="$config_id" \
-v OFS=',' 'NR>1{$5=configSku; $6=selectedSku; $7=config_id; $9=item_title; print}' >> /tmp/temp.csv
done < <(tail -n +2 "$INPUT")
IFS=$OLDIFS
How do I do this ?
If I understood the question correctly, how about using:
paste -d, file1.csv file2.csv | awk -F, -v OFS=',' '{print $1,$2,$3,$4,$11,$12,$13,$8,$10}'
This is not as nearly as robust as the other answer, and assumes that file1.csv and file2.csv have the same number of lines and each line in one file corresponds to the same line on the other file. the output would look like this:
no,my_account,form_token,fingerprint,configSku,selectedSku,config_id,address1,item_title
1,account1,asdf234safd,sd4d5s6sa,NEWconfigSku1,NEWselectedSku1,NEWconfig_id1,myaddr1,Samsung Handsfree
2,account2,asdf234safd,sd4d5s6sa,NEWconfigSku2,NEWselectedSku2,NEWconfig_id2,myaddr2,Xiaomi Mi headset
3,account3,asdf234safd,sd4d5s6sa,NEWconfigSku3,NEWselectedSku3,NEWconfig_id3,myaddr3,Ear Headphones with Mic
4,account4,asdf234safd,sd4d5s6sa,NEWconfigSku4,NEWselectedSku4,NEWconfig_id4,myaddr4,Handsfree/Headset
The first part is using paste to put the files side-by-side, separated by comma, hence the -d option. Then, you end up with a combined file with 13 columns. The awk part first tells that the input and output field separators should be comma (-F,and -v OFS=',', respectively) and then prints the desired columns (columns 1-4 from first file, then columns 2-4 of the second file, which now correspond to columns 11-13 on the merged file.
The main issue in your original script is that you are reading one file (/tmp/file2.csv) one line at a time, and for each line, your parse and print the whole other file (/tmp/file1.csv).
Here is an example how to merge two csv files in bash:
#!/bin/bash
# Open both files in "reading mode"
exec 3<"$1"
exec 4<"$2"
# Read(/discard) the header line in both csv files
read -r -u 3
read -r -u 4
# Print the new header line
printf "your,own,header,line\n"
# Read both files one line at a time and print the merged result
while true; do
IFS="," read -r -u 3 your own || break
IFS="," read -r -u 4 header line
printf "%s,%s,%s,%s\n" "$your" "$own" "$header" "$line"
done
exec 3<&-
exec 4<&-
Assuming you saved the script above in "merge_csv.sh", you can use it like this:
$ bash merge_csv.sh /tmp/file1.csv /tmp/file2.csv > /tmp/temp.csv
Be sure to modify the script to suit your needs (I did not use the headers you provided in your question).
If you are not familiar with the exec command, the tldp documentation and the bash hackers wiki both have an entry about it. The man page for read should document the -u option well enough. Finally, the VAR="something" command arg1 arg2 (used in the script for IFS=',' read -u -r 3) is a common construct in shell scripting. If you are not familiar with it, I believe this answer should provide enough information on what it does.
Note: if you want to do more complex processing of csv files I recommend using python and its csv package.

CSV file parsing in Bash

I have a CSV file with sample entries given below. What I want is to write a Bash script to read the CSV file line by line and put the first entry e.g 005 in one variable and the IP 192.168.10.1 in another variable, that I need to pass to some other script.
005,192.168.10.1
006,192.168.10.109
007,192.168.10.12
008,192.168.10.121
009,192.168.10.123
A more efficient approach, without the need to fork cut each time:
#!/usr/bin/env bash
while IFS=, read -r field1 field2; do
# do something with $field1 and $field2
done < file.csv
The gains can be quite substantial for large files.
Here's how I would do it with GNU tools :
while read line; do
echo $line | cut -d, -f1-2 --output-delimiter=' ' | xargs your_command
done < your_input.csv
while read line; do [...]; done < your_input.csv will read your file line by line.
For each line, we will cut it to its first two fields (separated by commas since it's a CSV) and pass them separated by spaces to xargs which will in turn pass as parameters to your_command.
If this is a very simple csv file with no string literals, etc. you can simply use head and cut:
#!/bin/bash
while read line
do
id_field=$(cut -d',' -f 1 <<<"$line") #here 005 for the first line
ip_field=$(cut -d',' -f 2 <<<"$line") #here 192.168.0.1 for the first line
#do something with $id_field and $ip_field
done < file.csv
The program works as follows: we use cut -d',' to obtain the first and second field of that line. We wrap this around a while read line and use I/O redirection to feed the file to the while loop.
Of course you substitute file.csv with the name of the file you want to process, and you can use other variable names than the ones in this sample.

Bash variables not acting as expected

I have a bash script which parses a file line by line, extracts the date using a cut command and then makes a folder using that date. However, it seems like my variables are not being populated properly. Do I have a syntax issue? Any help or direction to external resources is very appreciated.
#!/bin/bash
ls | grep .mp3 | cut -d '.' -f 1 > filestobemoved
cat filestobemoved | while read line
do
varYear= $line | cut -d '_' -f 3
varMonth= $line | cut -d '_' -f 4
varDay= $line | cut -d '_' -f 5
echo $varMonth
mkdir $varMonth'_'$varDay'_'$varYear
cp ./$line'.mp3' ./$varMonth'_'$varDay'_'$varYear/$line'.mp3'
done
You have many errors and non-recommended practices in your code. Try the following:
for f in *.mp3; do
f=${f%%.*}
IFS=_ read _ _ varYear varMonth varDay <<< "$f"
echo $varMonth
mkdir -p "${varMonth}_${varDay}_${varYear}"
cp "$f.mp3" "${varMonth}_${varDay}_${varYear}/$f.mp3"
done
The actual error is that you need to use command substitution. For example, instead of
varYear= $line | cut -d '_' -f 3
you need to use
varYear=$(cut -d '_' -f 3 <<< "$line")
A secondary error there is that $foo | some_command on its own line does not mean that the contents of $foo gets piped to the next command as input, but is rather executed as a command, and the output of the command is passed to the next one.
Some best practices and tips to take into account:
Use a portable shebang line - #!/usr/bin/env bash (disclaimer: That's my answer).
Don't parse ls output.
Avoid useless uses of cat.
Use More Quotes™
Don't use files for temporary storage if you can use pipes. It is literally orders of magnitude faster, and generally makes for simpler code if you want to do it properly.
If you have to use files for temporary storage, put them in the directory created by mktemp -d. Preferably add a trap to remove the temporary directory cleanly.
There's no need for a var prefix in variables.
grep searches for basic regular expressions by default, so .mp3 matches any single character followed by the literal string mp3. If you want to search for a dot, you need to either use grep -F to search for literal strings or escape the regular expression as \.mp3.
You generally want to use read -r (defined by POSIX) to treat backslashes in the input literally.

How to process lines which is read from standard input in UNIX shell script?

I get stuck by this problem:
I wrote a shell script and it gets a large file with many lines from stdin, that's how it is executed:
./script < filename
I want use the file as an input to another operation in the script, however I don't know how to store this file's name in a variable.
It is a script that takes a file from stdin as argument and then do awk operation in this file it self. Say if I write in script:
script:
#!/bin/sh
...
read file
...
awk '...' < "$file"
...
it only reads first line of the input file.
And I find a way to write like this:
Min=-1
while read line; do
n=$(echo $line | awk -F$delim '{print NF}')
if [ $Min -eq -1 ] || [ $n -lt $Min ];then
Min=$n
fi
done
it would take very very long time to wait for processing, it seems awk takes much time.
So how to improve this?
/dev/stdin can be quite useful here.
In fact, it's just a chain of links to your input.
So, writing cat /dev/stdin will give you all input from your file and you can deny using input filename at all.
Now answer to question :) Recursively read links, beginning at /dev/stdin, and you will get filename. Bash code:
r(){
l=`readlink $1`
if [ $? -ne 0 ]
then
echo $1
else
r $l
fi
}
filename=`r /dev/stdin`
echo $filename
UPD:
in Ubuntu I found an option -f to readlink. i.e. readlink -f /dev/stdin gives the same output. This option may absent in some systems.
UPD2:tests (test.sh is code above):
$ ./test.sh <input # that is a file
/home/sfedorov/input
$ ./test.sh <<EOF
> line
> EOF
/tmp/sh-thd-214216298213
$ echo 1 | ./test.sh
pipe:[91219]
$ readlink -f /dev/stdin < input
/home/sfedorov/input
$ readlink -f /dev/stdin << EOF
> line
> EOF
/tmp/sh-thd-3423766239895 (deleted)
$ echo 1 | readlink -f /dev/stdin
/proc/18489/fd/pipe:[92382]
You're overdoing this. The way you invoke your script:
the file contents are the script's standard input
the script receives no argument
But awk already takes input from stdin by default, so all you need to do to make this work is:
not give awk any file name argument, it's going to be the wrapping shell's stdin automatically
not consume any of that input before the wrapping script reaches the awk part. Specifically: no read
If that's all there is to your script, it reduces to the awk invocation, so you might consider doing away with it altogether and just call awk directly. Or make your script directly an awk one instead of a sh one.
Aside: the reason your while read line/multiple awk variant (the one in the question) is slow is because it spawns an awk process for each and every line of the input, and process spawning is order of magnitudes slower than awk processing a single line. The reason why the generate tmpfile/single awk variant (the one in your answer) is still a bit slow is because it's generating the tmpfile line by line, reopening to append every time.
Modify your script to that it takes the input file name as an argument, then read from the file in your script:
$ ./script filename
In script:
filename=$1
awk '...' < "$filename"
If your script just reads from standard input, there is no guarantee that there is a named file providing the input; it could just as easily be reading from a pipe or a network socket.
How about invoking the script differently pipe standard output of YourFilename into
your scriptName as follows (the standard output of the cat filename now becomes standard
input to you script, actually in this case to the awk command
For I have filename Names.data and script showNames.sh execute as follows
cat Names.data | ./showNames.sh
Contents of filename Names.data
Huckleberry Finn
Jack Spratt
Humpty Dumpty
Contents of scrip;t showNames.sh
#!/bin/bash
#whatever awk commands you need
awk "{ print }"
Well I finally find this way to solve my problem, although it will take several seconds.
grep '.*' >> /tmp/tmpfile
Min=$(awk -F$delim 'NF < min || min == "" { min = NF };END {printmin}'</tmp/tmpfile)
Just append each line into a temporary file so that after reading from stdin, the tmpfile is the same as input file.

Counting commas in a line in bash

Sometimes I receive a CSV file which has a carriage return inside a cell. This is not an acceptable format to a program that will use it as input.
In order to detect if an input line is split, I determined that a bad line would not have the expected number of commas in it. Is there a bash or other common unix command line tool that would allow me to count the commas in the line? If necessary, I can write a Python or Perl program to do it, but if possible, I'd like to add a line or two to an existing bash script to cause it to fail if the comma count is wrong. Any ideas?
Strip everything but the commas, and then count number of characters left:
$ echo foo,bar,baz | tr -cd , | wc -c
2
To count the number of times a comma appears, you can use something like awk:
string=(line of input from CSV file)
echo "$string" | awk -F "," '{print NF-1}'
But this really isn't sufficient to determine whether a field has carriage returns in it. Fields can have commas inside as long as they're surrounded by quotes.
What worked for me better than the other solutions was this. If test.txt has:
foo,bar,baz
baz,foo,foobar,bar
Then cat test.txt | xargs -I % sh -c 'echo % | tr -cd , | wc -c' produces
2
3
This works very well for streaming sources, or tailing logs, etc.
In pure Bash:
while IFS=, read -ra array
do
echo "$((${#array[#]} - 1))"
done < inputfile
or
while read -r line
do
count=${line//[^,]}
echo "${#count}"
done < inputfile
Try Perl:
$ perl -ne 'print 0+#{[/,/g]},"\n"'
a
0
a,a
1
a,a,a,a,a
4
Depending on what you are trying to do with the CSV data, it may be helpful to use a wrapper script like csvquote to temporarily replace the problematic newlines (and commas) inside quoted fields, then restore them. For instance:
csvquote inputfile.csv | wc -l
and
csvquote inputfile.csv | cut -d, -f1 | csvquote -u
may be the sort of thing you're looking for. See [https://github.com/dbro/csvquote][1] for the code and more information
An example Python command you could run (since it's going to be installed on most modern shells) is:
python -c "import pathlib; print({l.count(',') for l in pathlib.Path('my_file.csv').read_text().splitlines()})"
This counts the number of commas per line, then makes a set from them (so if your lines all have the same number of commas in, you'll get a set with just that number in).
Just remove all of the carriage returns:
tr -d "\r" old_file > new_file

Resources