I'm self study of Python and it's my first code.
I'm working for analyze logs from the servers. Usually I need analyze full day logs. I created script (this is example, simple logic) just for check speed. If I use normal coding the duration of analyzing 20mil rows about 12-13 minutes. I need 200mil rows by 5 min.
What I tried:
Use multiprocessing (met issue with share memory, think that fix it). But as the result - 300K rows = 20 sec and no matter how many processes. (PS: Also need control processors count in advance)
Use threading (I found that it's not give any speed, 300K rows = 2 sec. But normal code same, 300K = 2 sec)
Use asyncio (I think that script is slow because need reads many files). Result same as threading - 300K = 2 sec.
Finally I think that all three my script incorrect and didn't work correctly.
PS: I try to avoid use specific python modules (like pandas) because in this case it will be more difficult to execute on different servers. Better to use common lib.
Please help to check 1st - multiprocessing.
import csv
import os
from multiprocessing import Process, Queue, Value, Manager
file = {"hcs.log", "hcs1.log", "hcs2.log", "hcs3.log"}
def argument(m, a, n):
proc_num = os.getpid()
a_temp_m = a["vod_miss"]
a_temp_h = a["vod_hit"]
with open(os.getcwd() + '/' + m, newline='') as hcs_1:
hcs_2 = csv.reader(hcs_1, delimiter=' ')
for j in hcs_2:
if j[3].find('MISS') != -1:
a_temp_m[n] = a_temp_m[n] + 1
elif j[3].find('HIT') != -1:
a_temp_h[n] = a_temp_h[n] + 1
a["vod_miss"][n] = a_temp_m[n]
a["vod_hit"][n] = a_temp_h[n]
if __name__ == '__main__':
procs = []
manager = Manager()
vod_live_cuts = manager.dict()
i = "vod_hit"
ii = "vod_miss"
cpu = 1
n = 1
vod_live_cuts[i] = manager.list([0] * cpu)
vod_live_cuts[ii] = manager.list([0] * cpu)
for m in file:
proc = Process(target=argument, args=(m, vod_live_cuts, (n-1)))
procs.append(proc)
proc.start()
if n >= cpu:
n = 1
proc.join()
else:
n += 1
[proc.join() for proc in procs]
[proc.close() for proc in procs]
I'm expect, each file by def argument will be processed by independent process and finally all results will be saved in dict vod_live_cuts. For each process I added independent list in dict. I think it will help cross operation for use this parameter. But maybe it's wrong way :(
using IPC is costly, so only use "shared objects" for saving the final result, not for intermediate results while parsing the file.
limiting the number of processes is done by using a multiprocessing.Pool, the following code uses it to reach the max hard-disk speed, you only need to post-process the results.
you can only parse data as fast as your HDD can read it (typically 30-80 MB/s), so if you need to improve the performance further you should use SSD or RAID0 for higher disk speed, you cannot get much faster than this without changing your hardware.
import csv
import os
from multiprocessing import Process, Queue, Value, Manager, Pool
file = {"hcs.log", "hcs1.log", "hcs2.log", "hcs3.log"}
def argument(m, a):
proc_num = os.getpid()
a_temp_m_n = 0 # make it local to process
a_temp_h_n = 0 # as shared lists use IPC
with open(os.getcwd() + '/' + m, newline='') as hcs_1:
hcs_2 = csv.reader(hcs_1, delimiter=' ')
for j in hcs_2:
if j[3].find('MISS') != -1:
a_temp_m_n = a_temp_m_n + 1
elif j[3].find('HIT') != -1:
a_temp_h_n = a_temp_h_n + 1
a["vod_miss"].append(a_temp_m_n)
a["vod_hit"].append(a_temp_h_n)
if __name__ == '__main__':
manager = Manager()
vod_live_cuts = manager.dict()
i = "vod_hit"
ii = "vod_miss"
cpu = 1
vod_live_cuts[i] = manager.list()
vod_live_cuts[ii] = manager.list()
with Pool(cpu) as pool:
tasks = []
for m in file:
task = pool.apply_async(argument, args=(m, vod_live_cuts))
tasks.append(task)
for task in tasks:
task.get()
print(list(vod_live_cuts[i]))
print(list(vod_live_cuts[ii]))
When running the Function with no names on the input lists, it gives everyone the approtriate time based on the varible listed. If we have the input varibale have a name it gives everyone time off instead of just that indivudual.
There are some lists associated with this as well and that part works fine.
this is the required resources:
from openpyxl import Workbook
from datetime import timedelta, datetime
import random
def add_agents_w1():
num = 4
c = ["B","C","D","E","F"]
tow1 = input("IF anyone taking time off enter 1st
person now: \n")
tow12 = input("If someone else is taking time off enter
2nd person now: \n")
for x in tl:
ws1[f"A{num}"] = x
ws1[f"I{num}"] = x
for f in c:
if x in tow1 or tow12:
ws1[f'{f}{num}'] = "OFF"
else:
ws1[f"{f}{num}"] = s2t
num += 1
wb.save(dest_filename)
I'm a newbie of programming, and i have a basic question.
The bottom is the code I made to extract Excel data.
import os
path = "./data"
file_list = os.listdir(path)
from openpyxl import load_workbook
results = []
for file_name_raw in file_list:
file_name = "./data/"+file_name_raw
wb = load_workbook(filename=file_name, data_only=True)
Ad = wb.get_sheet_by_name('Advanced')
result = []
**result.append(Ad['C1'].value)
result.append(Ad['C2'].value)
result.append(Ad['C3'].value)
result.append(Ad['C4'].value)
result.append(Ad['C5'].value)
...
result.append(Ad['C100'].value)**
results.append(result)
print(results)
If i want to repeat the number in the result.append(Ad['C number ].value)
how can i make a code? Is there a way to use for loop?
You can write this in a for loop.
Define a range for your repetitions, let's say you want to do result.append(...) for 100 times, then:
Assuming you are using python
for i in range(1, 100):
result.append(Ad['C' + str(i) ].value)
OR
By specifying a limit:
n = 100
for i in range(1, n):
result.append(Ad['C' + str(i) ].value)
I worked up a working code to check if a credit card is valid using luhn algorithm:
class CreditCard
def initialize(num)
##num_arr = num.to_s.split("")
raise ArgumentError.new("Please enter exactly 16 digits for the credit card number.")
if ##num_arr.length != 16
#num = num
end
def check_card
final_ans = 0
i = 0
while i < ##num_arr.length
(i % 2 == 0) ? ans = (##num_arr[i].to_i * 2) : ans = ##num_arr[i].to_i
if ans > 9
tens = ans / 10
ones = ans % 10
ans = tens + ones
end
final_ans += ans
i += 1
end
final_ans % 10 == 0 ? true : false
end
end
However, when I create driver test codes to check for it, it doesn't work:
card_1 = CreditCard.new(4563960122001999)
card_2 = CreditCard.new(4563960122001991)
p card_1.check_card
p card_2.check_card
I've been playing around with the code, and I noticed that the driver code works if I do this:
card_1 = CreditCard.new(4563960122001999)
p card_1.check_card
card_2 = CreditCard.new(4563960122001991)
p card_2.check_card
I tried to research before posting on why this is happening. Logically, I don't see why the first driver codes wouldn't work. Can someone please assist me as to why this is happening?
Thanks in advance!!!
You are using a class variable that starts with ##, which is shared among all instances of CreditCard as well as the class (and other related classes). Therefore, the value will be overwritten every time you create a new instance or apply check_card to some instance. In your first example, the class variable will hold the result for the last application of the method, and hence will reflect the result for the last instance (card_2).
I was hoping someone with better math capabilities would assist me in figuring out the total possibilities for a string given it's length and character set.
i.e. [a-f0-9]{6}
What are the possibilities for this pattern of random characters?
It is equal to the number of characters in the set raised to 6th power.
In Python (3.x) interpreter:
>>> len("0123456789abcdef")
16
>>> 16**6
16777216
>>>
EDIT 1:
Why 16.7 million? Well, 000000 ... 999999 = 10^6 = 1M, 16/10 = 1.6 and
>>> 1.6**6
16.77721600000000
* EDIT 2:*
To create a list in Python, do: print(['{0:06x}'.format(i) for i in range(16**6)])
However, this is too huge. Here is a simpler, shorter example:
>>> ['{0:06x}'.format(i) for i in range(100)]
['000000', '000001', '000002', '000003', '000004', '000005', '000006', '000007', '000008', '000009', '00000a', '00000b', '00000c', '00000d', '00000e', '00000f', '000010', '000011', '000012', '000013', '000014', '000015', '000016', '000017', '000018', '000019', '00001a', '00001b', '00001c', '00001d', '00001e', '00001f', '000020', '000021', '000022', '000023', '000024', '000025', '000026', '000027', '000028', '000029', '00002a', '00002b', '00002c', '00002d', '00002e', '00002f', '000030', '000031', '000032', '000033', '000034', '000035', '000036', '000037', '000038', '000039', '00003a', '00003b', '00003c', '00003d', '00003e', '00003f', '000040', '000041', '000042', '000043', '000044', '000045', '000046', '000047', '000048', '000049', '00004a', '00004b', '00004c', '00004d', '00004e', '00004f', '000050', '000051', '000052', '000053', '000054', '000055', '000056', '000057', '000058', '000059', '00005a', '00005b', '00005c', '00005d', '00005e', '00005f', '000060', '000061', '000062', '000063']
>>>
EDIT 3:
As a function:
def generateAllHex(numDigits):
assert(numDigits > 0)
ceiling = 16**numDigits
for i in range(ceiling):
formatStr = '{0:0' + str(numDigits) + 'x}'
print(formatStr.format(i))
This will take a while to print at numDigits = 6.
I recommend dumping this to file instead like so:
def generateAllHex(numDigits, fileName):
assert(numDigits > 0)
ceiling = 16**numDigits
with open(fileName, 'w') as fout:
for i in range(ceiling):
formatStr = '{0:0' + str(numDigits) + 'x}'
fout.write(formatStr.format(i))
If you are just looking for the number of possibilities, the answer is (charset.length)^(length). If you need to actually generate a list of the possibilities, just loop through each character, recursively generating the remainder of the string.
e.g.
void generate(char[] charset, int length)
{
generate("",charset,length);
}
void generate(String prefix, char[] charset, int length)
{
for(int i=0;i<charset.length;i++)
{
if(length==1)
System.out.println(prefix + charset[i]);
else
generate(prefix+i,charset,length-1);
}
}
The number of possibilities is the size of your alphabet, to the power of the size of your string (in the general case, of course)
assuming your string size is 4: _ _ _ _ and your alphabet = { 0 , 1 }:
there are 2 possibilities to put 0 or 1 in the first place, second place and so on.
so it all sums up to: alphabet_size^String_size
first: 000000
last: ffffff
This matches hexadecimal numbers.
For any given set of possible values, the number of permutations is the number of possibilities raised to the power of the number of items.
In this case, that would be 16 to the 6th power, or 16777216 possibilities.