How to copy specific columns from one csv file to another csv file? - bash

File1.csv:
File2.csv:
I want to replace the contents of configSku,selectedSku,config_id in File1.csv with the contents of configSku,selectedSku,config_idfrom File2.csv. The end result should look like this:
Here are the links to download the files so you can try it yourself:
File1.csv: https://www.dropbox.com/s/2o12qjzqlcgotxr/file1.csv?dl=0
File2.csv: https://www.dropbox.com/s/331lpqlvaaoljil/file2.csv?dl=0
Here's what I have tried but still failed:
#!/bin/bash
INPUT=/tmp/file2.csv
OLDIFS=$IFS
IFS=,
[ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; }
echo "no,my_account,form_token,fingerprint,configSku,selectedSku,config_id,address1,item_title" > /tmp/temp.csv
while read item_title configSku selectedSku config_id
do
cat /tmp/file1.csv |
awk -F ',' -v item_title="$item_title" \
-v configSku="$configSku" \
-v selectedSku="$selectedSku" \
-v config_id="$config_id" \
-v OFS=',' 'NR>1{$5=configSku; $6=selectedSku; $7=config_id; $9=item_title; print}' >> /tmp/temp.csv
done < <(tail -n +2 "$INPUT")
IFS=$OLDIFS
How do I do this ?

If I understood the question correctly, how about using:
paste -d, file1.csv file2.csv | awk -F, -v OFS=',' '{print $1,$2,$3,$4,$11,$12,$13,$8,$10}'
This is not as nearly as robust as the other answer, and assumes that file1.csv and file2.csv have the same number of lines and each line in one file corresponds to the same line on the other file. the output would look like this:
no,my_account,form_token,fingerprint,configSku,selectedSku,config_id,address1,item_title
1,account1,asdf234safd,sd4d5s6sa,NEWconfigSku1,NEWselectedSku1,NEWconfig_id1,myaddr1,Samsung Handsfree
2,account2,asdf234safd,sd4d5s6sa,NEWconfigSku2,NEWselectedSku2,NEWconfig_id2,myaddr2,Xiaomi Mi headset
3,account3,asdf234safd,sd4d5s6sa,NEWconfigSku3,NEWselectedSku3,NEWconfig_id3,myaddr3,Ear Headphones with Mic
4,account4,asdf234safd,sd4d5s6sa,NEWconfigSku4,NEWselectedSku4,NEWconfig_id4,myaddr4,Handsfree/Headset
The first part is using paste to put the files side-by-side, separated by comma, hence the -d option. Then, you end up with a combined file with 13 columns. The awk part first tells that the input and output field separators should be comma (-F,and -v OFS=',', respectively) and then prints the desired columns (columns 1-4 from first file, then columns 2-4 of the second file, which now correspond to columns 11-13 on the merged file.

The main issue in your original script is that you are reading one file (/tmp/file2.csv) one line at a time, and for each line, your parse and print the whole other file (/tmp/file1.csv).
Here is an example how to merge two csv files in bash:
#!/bin/bash
# Open both files in "reading mode"
exec 3<"$1"
exec 4<"$2"
# Read(/discard) the header line in both csv files
read -r -u 3
read -r -u 4
# Print the new header line
printf "your,own,header,line\n"
# Read both files one line at a time and print the merged result
while true; do
IFS="," read -r -u 3 your own || break
IFS="," read -r -u 4 header line
printf "%s,%s,%s,%s\n" "$your" "$own" "$header" "$line"
done
exec 3<&-
exec 4<&-
Assuming you saved the script above in "merge_csv.sh", you can use it like this:
$ bash merge_csv.sh /tmp/file1.csv /tmp/file2.csv > /tmp/temp.csv
Be sure to modify the script to suit your needs (I did not use the headers you provided in your question).
If you are not familiar with the exec command, the tldp documentation and the bash hackers wiki both have an entry about it. The man page for read should document the -u option well enough. Finally, the VAR="something" command arg1 arg2 (used in the script for IFS=',' read -u -r 3) is a common construct in shell scripting. If you are not familiar with it, I believe this answer should provide enough information on what it does.
Note: if you want to do more complex processing of csv files I recommend using python and its csv package.

Related

bash - combine while read with grep and cut

I want to modify my existing bash script. This is how it looks now:
#! /bin/bash
SAMPLE = myfile.txt
while read SAMPLE
do
name = $SAMPLE
# some other code
done < $SAMPLE
In this case 'myfile'.txt consists only of one column, with all the info I need.
Now I want to modify this script because 'myfile.txt' contains now more columns and more lines than I need.
grep 'TEST' myfile.txt | cut -d "," -f 1
gives me the values I need. But how can I integrate this into my bash script?
You can pipe the output of any command into a while read loop.
Try this:
#! /bin/bash
INPUT=myfile.txt
grep 'TEST' $INPUT |
cut -d "," -f 1 |
while read SAMPLE
do
name=$SAMPLE
# some other code
done
You have to change the input field separator (IFS), which tells read where to split the input line. Then you tell read to read two fields: the one you need and one you do not care about.
#! /bin/bash
SAMPLE=myfile.txt
while IFS=, read SAMPLE dontcare
do
name="$SAMPLE"
# some other code
done < <(grep TEST "$SAMPLE")
By the way: whenever you use read, you should consider to use the option -r.

Grep approach to remove all lines in file that match any line in other file?

I have a file of camera information where each line has a unique ID of the format
{"_id":{"$oid":"5b0cfa5845bb0c0004277e13"},"geometry":{"coordinates":[139.751,35.685]},"addEditBy":["dd53cbd9c5306b1baa103335c4b3e91d8b73386ba29124ea2b1d47a619c8c066877843cd8a7745ce31021a8d1548cf2a"],"legacy_cameraID":1,"type":"ip","source":"google","country":"JP","city":"Tokyo","is_active_image":false,"is_active_video":false,"utc_offset":32400,"timezone_id":"Japan Standard Time","timezone_name":"Japan Standard Time","reference_url":"101.110.193.152/","retrieval":{"ip":"101.110.193.152","port":"80","video_path":"/"},"__v":0}
I also have a list of camera IDs that I want to remove from the original file in the format:
5b182800751c3b00044514a9
5b1976b473569e00045dba59
5b197b1273569e00045ddf0f
5b1970cc73569e00045d94fc
How can I use grep or some other command line utility to remove all lines in the input file that have an ID listed in the second file?
Let's say that you have a file called ids.txt that has all of the camera id's that need to be excluded from your data file, which we'll call data.json. We can use the -f option of grep (match from a file) and the -v option (only output non-matching lines) as follows:
grep -f ids.txt -v data.json
grep will only output lines of data.json that do not match any lines in ids.txt.
You should use json aware tool. Here is a GNU awk script that uses json extension:
$ gawk ' # GNU awk
#load "json" # load extension
NR==FNR { # read oids to a hash
oid[$0]
next
}
{ # process json
lines=lines $0 # support multiline json form
if(json_fromJSON(lines,data)!=0) { # once json is complete
if(!(data["_id"]["$oid"] in oid)) # test if oid in exclude list
print # output if not
lines="" # rinse for repeat
}
}' oids json
A simple thing you can do is get ids from camera info and check if they are listed in the second file.
For example:
#!/bin/bash
exec 3<info.txt
while IFS= read -r line <&3; do
id="$(printf '%s' "${line}" | jq '._id."$oid"' | sed -e 's/"//g')"
if ! grep -e "${id}" list.txt >/dev/null; then
printf '%s\n' "${line}"
fi
done >clean.txt
exec 3>&-
Where:
info.txt is the file with camera information
list.txt is the list of ids you do not want
Note that this is not the only way you can achieve it, I used a simple cycle just as poc.
You can achieve it using directly jq, for example:
#!/bin/bash
for id in $(jq '._id."$oid"' info.txt | sed -e 's/"//g'); do
if ! grep -e "${id}" list.txt >/dev/null; then
grep -e "${id}" info.txt
fi
done >clean.txt
Note that in this second example the second grep is needed because you never take the whole line of the into.txt file, only the id.
Also, be aware that if you have an alias like alias grep='grep --color=always' it could break your output.
Assuming your json file is always that regular:
awk -F'"' 'NR==FNR{ids[$1]; next} !($6 in ids)' ids json

CSV file parsing in Bash

I have a CSV file with sample entries given below. What I want is to write a Bash script to read the CSV file line by line and put the first entry e.g 005 in one variable and the IP 192.168.10.1 in another variable, that I need to pass to some other script.
005,192.168.10.1
006,192.168.10.109
007,192.168.10.12
008,192.168.10.121
009,192.168.10.123
A more efficient approach, without the need to fork cut each time:
#!/usr/bin/env bash
while IFS=, read -r field1 field2; do
# do something with $field1 and $field2
done < file.csv
The gains can be quite substantial for large files.
Here's how I would do it with GNU tools :
while read line; do
echo $line | cut -d, -f1-2 --output-delimiter=' ' | xargs your_command
done < your_input.csv
while read line; do [...]; done < your_input.csv will read your file line by line.
For each line, we will cut it to its first two fields (separated by commas since it's a CSV) and pass them separated by spaces to xargs which will in turn pass as parameters to your_command.
If this is a very simple csv file with no string literals, etc. you can simply use head and cut:
#!/bin/bash
while read line
do
id_field=$(cut -d',' -f 1 <<<"$line") #here 005 for the first line
ip_field=$(cut -d',' -f 2 <<<"$line") #here 192.168.0.1 for the first line
#do something with $id_field and $ip_field
done < file.csv
The program works as follows: we use cut -d',' to obtain the first and second field of that line. We wrap this around a while read line and use I/O redirection to feed the file to the while loop.
Of course you substitute file.csv with the name of the file you want to process, and you can use other variable names than the ones in this sample.

Passing input to sed, and sed info to a string

I have a list of files (~1000) and there is 1 file per line in my text file named: 'files.txt'
I have a macro that looks something like the following:
#!/bin/sh
b=$(sed '${1}q;d' files.txt)
cat > MyMacro_${1}.C << +EOF
myFile = new TFile("/MYPATHNAME/$b");
+EOF
and I use this input script by doing
./MakeMacro.sh 1
and later I want to do
./MakeMacro.sh 2
./MakeMacro.sh 3
...etc
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
Given this statement and your tags, I'm going to answer using shell tools and not really address the issue of the .c macro.
The first line of your script contains a sed script. There are numerous ways to get the Nth line from a text file. The simplest might be to use head and tail.
$ head -n "${i}" files.txt | tail -n 1
This takes the first $i lines of files.txt, and shows you the last 1 lines of that set.
$ sed -ne "${i}p" files.txt
This use of sed uses -n to avoid printing by default, then prints the $ith line. For better performance, try:
$ sed -ne "${i}{p;q;}" files.txt
This does the same, but quits after printing the line, so that sed doesn't bother traversing the rest of the file.
$ awk -v i="$i" 'NR==i' files.txt
This passes the shell variable $i into awk, then evaluates an expression that tests whether the number of records processed is the same as that variable. If the expression evaluates true, awk prints the line. For better performance, try:
$ awk -v i="$i" 'NR==i{print;exit}' files.txt
Like the second sed script above, this will quit after printing the line, so as to avoid traversing the rest of the file.
Plenty of ways you could do this by loading the file into an array as well, but those ways would take more memory and perform less well. I'd use one-liners if you can. :)
To take any of these one-liners and put it into your script, you already have the notation:
if expr "$i" : '[0-9][0-9]*$' >/dev/null; then
b=$(sed -ne "${i}{p;q;}" files.txt)
else
echo "ERROR: invalid line number" >&2; exit 1
fi
If I am understanding you correctly, you can do a for loop in bash to call the script multiple times with different arguments.
for i in `seq 1 n`; do ./MakeMacro.sh $i; done
Based on the OP's comment, it seems that he wants to submit the generated files to Condor. You can modify the loop above to include the condor submission.
for i in `seq 1 n`; do ./MakeMacro.sh $i; condor_submit <OutputFile> ; done
i=0
while read file
do
((i++))
cat > MyMacro_${i}.C <<-'EOF'
myFile = new TFile("$file");
EOF
done < files.txt
Beware: you need tab indents on the EOF line.
I'm puzzled about why this is the way you want to do the job. You could have your C++ code read files.txt at runtime and it would likely be more efficient in most ways.
If you want to get the Nth line of files.txt into MyMacro_N.C, then:
{
echo
sed -n -e "${1}{s/.*/myFile = new TFILE(\"&\");/p;q;}" files.txt
echo
} > MyMacro_${1}.C
Good grief. The entire script should just be (untested):
awk -v nr="$1" 'NR==nr{printf "\nmyFile = new TFile(\"/MYPATHNAME/%s\");\n\n",$0 > ("MyMacro_"nr".C")}' files.txt
You can throw in a ;exit before the } if performance is an issue but I doubt if it will be.

How to read specific lines in a file in BASH?

while read -r line will run through each line in a file. How can I have it run through specific lines in a file, for example, lines "1-20", then "30-100"?
One option would be to use sed to get the desired lines:
while read -r line; do
echo "$line"
done < <(sed -n '1,20p; 30,100p' inputfile)
Saying so would feed lines 1-20, 30-100 from the inputfile to read.
#devnull's sed command does the job. Another alternative is using awk since it avoids doing the read and you can do the processing in awk itself:
awk '(NR>=1 && NR<=20) || (NR>=30 && NR<=100) {print "processing $0"}' file

Resources