suppose we have the following csv file
file1.csv
#groups id owner
abc id1 owner1
abc id2 owner1
bcx id1 owner2
cpa id3 owner1
the following script reads file1.csv, filters on the first column, #groups, and adds extra characters
#!/bin/env python2
#!/usr/bin/python
import re
import csv
print "enter Path to orignal file"
GROUPS = raw_input()
print "enter Path to modified file"
WORKING = raw_input()
def filter_lines(f):
"""this generator funtion uses a regular expression
to include only lines that have a `abc` at the start
and NO `gep` throughout the record
"""
filter_regex = r'^abc(?!gep).*'
for line in f:
line = line.strip()
m = re.match(filter_regex, line)
if m:
yield line
pat = re.compile(r'^(abc)(?!.*gep.*)') #insert gep in any abc records that dont have gep
#insert gep
variable1 = 0
with open(GROUPS, 'r') as f:
with open(WORKING, 'w') as data:
#next(f) # Skip over header in input file.
#filter
filter_generator = filter_lines(f)
csv_reader = csv.reader(filter_generator)
count = 0
writer = csv.writer(data) #, quoting=csv.QUOTE_ALL
for row in csv_reader:
count += 1
variable1 = (pat.sub('\\1gep_', row[0])) #modify all filtered records to include gep
fields = [variable1]
writer.writerow(fields)
print 'Filtered (abc at Start and NO gep) Rows Count = ' + str(count)
for example, abc would turn to abc_gep and we would write that to another csv file file2.csv
so file2.csv now contains only:
abc_gep
abc_gep
good.
now i want to add the rest of the columns where they match with abc from file1.csv
how could i do that?
i tried the following
fields = [variable1,row[1],row[2]]
but this is hardcoding the columns and not dynamic. i am looking for something more like this:
fields = [variable1, row[i]]
essentially, this is the result im seeking for file2.csv:
abc_gep id1 owner1
abc_gep id2 owner1
Related
I have two .csv files which I am trying to 'multiply' out via a script. The first file is person information and looks basically like this:
First Name, Last Name, Email, Phone
Sally,Davis,sdavis#nobody.com,555-555-5555
Tom,Smith,tsmith#nobody.com,555-555-1212
The second file is account numbers and looks like this:
AccountID
1001
1002
Basically I want to get every name with every account Id. So if I had 10 names in the first file and 10 account IDs in the second file, I should end up with 100 rows in the resulting file and have it look like this:
First Name, Last Name, Email, Phone, AccountID
Sally,Davis,sdavis#nobody.com,555-555-5555, 1001
Tom,Smith,tsmith#nobody.com,555-555-1212, 1001
Sally,Davis,sdavis#nobody.com,555-555-5555, 1002
Tom,Smith,tsmith#nobody.com,555-555-1212, 1002
Any help would be greatly appreciated
You could simply write a for loop for each value to be repeated by it's id count and append the description, but just in the reverse order.
Has that not worked or have you not tried that?
If python works for you, here's a script which does that:
def main():
f1 = open("accounts.txt", "r")
f1_total_lines = sum(1 for line in open('accounts.txt'))
f2_total_lines = sum(1 for line in open('info.txt'))
f1_line_counter = 1;
f2_line_counter = 1;
f3 = open("result.txt", "w")
f3.write('First Name, Last Name, Email, Phone, AccountID\n')
for line_account in f1.readlines():
f2 = open("info.txt", "r")
for line_info in f2.readlines():
parsed_line_account = line_account
parsed_line_info = line_info.rstrip() # we have to trim the newline character from every line from the 'info' file
if f2_line_counter == f2_total_lines: # ...for every but the last line in the file (because it doesn't have a newline character)
parsed_line_info = line_info
f3.write(parsed_line_info + ',' + parsed_line_account)
if f1_line_counter == f1_total_lines:
f3.write('\n')
f2_line_counter = f2_line_counter + 1
f1_line_counter = f1_line_counter + 1
f2_line_counter = 1 # reset the line counter to the first line
f1.close()
f2.close()
f3.close()
if __name__ == '__main__':
main()
And the files I used are as follows:
info.txt:
Sally,Davis,sdavis#nobody.com,555-555-555
Tom,Smith,tsmith#nobody.com,555-555-1212
John,Doe,jdoe#nobody.com,555-555-3333
accounts.txt:
1001
1002
1003
If You Intended to Duplicate Account_ID
If you intended to add each Account_ID to every record in your information file then a short awk solution will do, e.g.
$ awk -F, '
FNR==NR{a[i++]=$0}
FNR!=NR{b[j++]=$0}
END{print a[0] ", " b[0]
for (k=1; k<i; k++)
for (m=1; m<i; m++)
print a[m] ", " b[k]}
' info id
First Name, Last Name, Email, Phone, AccountID
Sally,Davis,sdavis#nobody.com,555-555-5555, 1001
Tom,Smith,tsmith#nobody.com,555-555-1212, 1001
Sally,Davis,sdavis#nobody.com,555-555-5555, 1002
Tom,Smith,tsmith#nobody.com,555-555-1212, 1002
Above the lines in the first file (when the file-record-number equals the record-number, e.g. FNR==NR) are stored in array a, the lines from the second file (when FNR!=NR) are stored in array b and then they combined and output in the END rule in the desired order.
Without Duplicating Account_ID
Since Account_ID is usually a unique bit of information, if you did not intended to duplicate every ID at the end of each record, then there is no need to loop. The paste command does that for you. In your case with your information file as info and you account ID file as id, it is as simple as:
$ paste -d, info id
First Name, Last Name, Email, Phone,AccountID
Sally,Davis,sdavis#nobody.com,555-555-5555,1001
Tom,Smith,tsmith#nobody.com,555-555-1212,1002
(note: the -d, option just sets the delimiter to a comma)
Seems a lot easier that trying to reinvent the wheel.
Can be easily done with arrays
OLD=$IFS; IFS=$'\n'
ar1=( $(cat file1) )
ar2=( $(cat file2) )
IFS=$OLD
ind=${!ar1[#]}
for i in $ind; { echo "${ar1[$i]}, ${ar2[$i]}"; }
I'm reading a file (input.txt) and trying to split the contents with '*' as the delimiter and below is the code I came up with but it assigns all the values to the first list i.e. uno instead of splitting equally. Where I'm doing it wrong. Many thanks for any help.
One more scenario - how to split the contents equally even though we have no idea how many lines are separated with * .In other words, here 3 rows are separated with the delimiter of multiple *s. What if in case of 1000 + rows and 1000 + delimiter of multiple *s.
input.txt
Record: C:/my_files/00_Roll_Tom-values.txt
#RakeBoss-Random as on 12/19/2016
[groups]
met = chk\rel_io_chk, chk\dev_op_io, chk\yuijn7, chk\rft65y
div = chk\ret5cf, chk\plo09i, chk\czo987, chk\lopikm, chk\qzt0kc
rakeonly = chk\r7ynsd, chk\bhtw5h, chk\ko9ibv, chk\plot9q
*******************************************************************************************
Record: C:/my_files/Docker_red-values.txt
#RakeBoss-Check it io.
[groups]
met = chk\rel_io_chk, chk\dev_op_io, chk\yuijn7
div = chk\gzlqvg, chk\k7ygyp, chk\lzg0rp, chk\oli6iv
rakeonly = chk\qzliu0, chk\pl6w6t, chk\bzp0l, chk\dzfbhp
*******************************************************************************************
Record: C:/my_files/456_Milo_123-values.txt
#RakeBoss-Jan 21st Prod
[groups]
met = chk\rel_io_chk, chk\dev_op_io
div = chk\ret5cf, chk\plo09i, chk\cz0o9t, chk\lopikm
rakeonly = chk\ztylcd, chk\hrft9l, chk\zzkkmi
*******************************************************************************************
Code,
file = File.open("C:/rake_check/input.txt", "r")
file.each_line.map do |line|
next if line.chomp! =~ /^$|#/
uno, deus, id = line.split(/\*+/) # also tried line.split("*")
puts second
end
myarray = File.read('input.txt').split(/^\*{91}\n/)
(or, for example, /^\*{5,}\n/ "split on a line of five or more asterisks" if you don't have exactly 91 each time.)
/\*+/ means "one or more asterisks", which has several dangers: you might get an asterisk somewhere inside the record and split on it, and you're not slurping up the newline so each record except the first will start off with one.
I'm parsing a CSV file that has a break line in double quoted fields. I'm reading the file line by line with a groovy script but I get an ArrayIndexOutBoundException when I tried to get access the missing tokens.
I was trying to pre-process the file to remove those characters and I was thinking to do that with some bash script or with groovy itself.
Could you, please suggest any approach that I can use to resolve the problem?
This is how the CSV looks like:
header1,header2,header3,header4
timestamp, "abcdefghi", "abcdefghi","sdsd"
timestamp, "zxcvb
fffffgfg","asdasdasadsd","sdsdsd"
This is the groovy script I'm using
def csv = new File(args[0]).text
def bufferString = ""
def parsedFile = new File("Parsed_" + args[0]);
csv.eachLine { line, lineNumber ->
def splittedLine = line.split(',');
retString += new Date(splittedLine[0]) + ",${splittedLine[1]},${splittedLine[2]},${splittedLine[3]}\n";
if(lineNumber % 1000 == 0){
parsedFile.append(retString);
retString = "";
}
}
parsedFile.append(retString);
UPDATE:
Finally I did this and it works, (I needed format the first column from timestamp to a human readable date):
gawk -F',' '{print strftime("%Y-%m-%d %H:%M:%S", substr( $1, 0, length($1)-3 ) )","($2)","($3)","($4)}' TobeParsed.csv > Parsed.csv
Thank you #karakfa
If you use a proper CSV parser rather than trying to do it with split (which as you can see doesn't work with any form of quoting), then it works fine:
#Grab('com.xlson.groovycsv:groovycsv:1.1')
import static com.xlson.groovycsv.CsvParser.parseCsv
def csv = '''header1,header2,header3,header4
timestamp, "abcdefghi", "abcdefghi","sdsd"
timestamp, "zxcvb
fffffgfg","asdasdasadsd","sdsdsd"'''
def data = parseCsv(csv)
data.eachWithIndex { line, index ->
println """Line $index:
| 1:$line.header1
| 2:$line.header2
| 3:$line.header3
| 4:$line.header4""".stripMargin()
}
Which prints:
Line 0:
1:timestamp
2:abcdefghi
3:abcdefghi
4:sdsd
Line 1:
1:timestamp
2:zxcvb
fffffgfg
3:asdasdasadsd
4:sdsdsd
awk to the rescue!
this will merge the newline split fields together, you process can take it from there
$ awk -F'"' '!(NF%2){getline remainder;$0=$0 OFS remainder}1' splitted.csv
header1,header2,header3
xxxxxx, "abcdefghi", "abcdefghi"
yyyyyy, "zxcvb fffffgfg","asdasdasadsd"
assumes that odd number of quotes mean split field and replace new line with OFS. If you want to simple delete new line (the split parts will combine) remove OFS.
I have a file "names.txt". The contents are
"Smith,RobJones,MikeJane,SallyPetel,Brian"
and I want to read "names.txt" and make a new file "names2.txt" that looks like:
"Rob Smith Mike Jones Sally Jane Brian Petel"
I know I should be using #rstrip(\n) and #.split(',')
So far I have:
namesfile = input('Enter name of file: ') #open names.txt
openfile = open(namesfile, 'r')
This will do exactly that. You might be able to polish this and make it more elegant and I encourage you to do so:
import re
with open('names.txt') as f:
# Split the names
names = re.sub(r'([A-Z])(?![A-Z])',r',\1',f.read()).split(',')
# Filter empty results
names = [n for n in names if n != '']
# Swap pairs with each other
for i in range(len(names)):
if((i+1)%2 == 0):
names[i], names[i-1] = names[i-1], names[i]
print ' '.join(names)
I have 2 txt files with different strings and numbers in them splitted with ;
Now I need to subtract the
((number on position 2 in file1) - (number on position 25 in file2)) = result
Now I want to replace the (number on position 2 in file1) with the result.
I tried my code below but it only appends the number in the end of the file and its not the result of the calculation which got appended.
def calc
f1 = File.open("./file1.txt", File::RDWR)
f2 = File.open("./file2.txt", File::RDWR)
f1.flock(File::LOCK_EX)
f2.flock(File::LOCK_EX)
f1.each.zip(f2.each).each do |line, line2|
bg = line.split(";").compact.collect(&:strip)
bd = line2.split(";").compact.collect(&:strip)
n = bd[2].to_i - bg[25].to_i
f2.print bd[2] << n
#puts "#{n}" Only for testing
end
f1.flock(File::LOCK_UN)
f2.flock(File::LOCK_UN)
f1.close && f2.close
end
Use something like this:
lines1 = File.readlines('file1.txt').map(&:to_i)
lines2 = File.readlines('file2.txt').map(&:to_i)
result = lines1.zip(lines2).map do |value1, value2| value1 - value2 }
File.write('file1.txt', result.join(?\n))
This code load all files in memory, then calculate result and write it to first file.
FYI: If you want to use your code just save result to other file (i.e. result.txt) and at the end copy it to original file.