How do I repeat the number in the command code? - for-loop

I'm a newbie of programming, and i have a basic question.
The bottom is the code I made to extract Excel data.
import os
path = "./data"
file_list = os.listdir(path)
from openpyxl import load_workbook
results = []
for file_name_raw in file_list:
file_name = "./data/"+file_name_raw
wb = load_workbook(filename=file_name, data_only=True)
Ad = wb.get_sheet_by_name('Advanced')
result = []
**result.append(Ad['C1'].value)
result.append(Ad['C2'].value)
result.append(Ad['C3'].value)
result.append(Ad['C4'].value)
result.append(Ad['C5'].value)
...
result.append(Ad['C100'].value)**
results.append(result)
print(results)
If i want to repeat the number in the result.append(Ad['C number ].value)
how can i make a code? Is there a way to use for loop?

You can write this in a for loop.
Define a range for your repetitions, let's say you want to do result.append(...) for 100 times, then:
Assuming you are using python
for i in range(1, 100):
result.append(Ad['C' + str(i) ].value)
OR
By specifying a limit:
n = 100
for i in range(1, n):
result.append(Ad['C' + str(i) ].value)

Related

Improve code result speed by multiprocessing

I'm self study of Python and it's my first code.
I'm working for analyze logs from the servers. Usually I need analyze full day logs. I created script (this is example, simple logic) just for check speed. If I use normal coding the duration of analyzing 20mil rows about 12-13 minutes. I need 200mil rows by 5 min.
What I tried:
Use multiprocessing (met issue with share memory, think that fix it). But as the result - 300K rows = 20 sec and no matter how many processes. (PS: Also need control processors count in advance)
Use threading (I found that it's not give any speed, 300K rows = 2 sec. But normal code same, 300K = 2 sec)
Use asyncio (I think that script is slow because need reads many files). Result same as threading - 300K = 2 sec.
Finally I think that all three my script incorrect and didn't work correctly.
PS: I try to avoid use specific python modules (like pandas) because in this case it will be more difficult to execute on different servers. Better to use common lib.
Please help to check 1st - multiprocessing.
import csv
import os
from multiprocessing import Process, Queue, Value, Manager
file = {"hcs.log", "hcs1.log", "hcs2.log", "hcs3.log"}
def argument(m, a, n):
proc_num = os.getpid()
a_temp_m = a["vod_miss"]
a_temp_h = a["vod_hit"]
with open(os.getcwd() + '/' + m, newline='') as hcs_1:
hcs_2 = csv.reader(hcs_1, delimiter=' ')
for j in hcs_2:
if j[3].find('MISS') != -1:
a_temp_m[n] = a_temp_m[n] + 1
elif j[3].find('HIT') != -1:
a_temp_h[n] = a_temp_h[n] + 1
a["vod_miss"][n] = a_temp_m[n]
a["vod_hit"][n] = a_temp_h[n]
if __name__ == '__main__':
procs = []
manager = Manager()
vod_live_cuts = manager.dict()
i = "vod_hit"
ii = "vod_miss"
cpu = 1
n = 1
vod_live_cuts[i] = manager.list([0] * cpu)
vod_live_cuts[ii] = manager.list([0] * cpu)
for m in file:
proc = Process(target=argument, args=(m, vod_live_cuts, (n-1)))
procs.append(proc)
proc.start()
if n >= cpu:
n = 1
proc.join()
else:
n += 1
[proc.join() for proc in procs]
[proc.close() for proc in procs]
I'm expect, each file by def argument will be processed by independent process and finally all results will be saved in dict vod_live_cuts. For each process I added independent list in dict. I think it will help cross operation for use this parameter. But maybe it's wrong way :(
using IPC is costly, so only use "shared objects" for saving the final result, not for intermediate results while parsing the file.
limiting the number of processes is done by using a multiprocessing.Pool, the following code uses it to reach the max hard-disk speed, you only need to post-process the results.
you can only parse data as fast as your HDD can read it (typically 30-80 MB/s), so if you need to improve the performance further you should use SSD or RAID0 for higher disk speed, you cannot get much faster than this without changing your hardware.
import csv
import os
from multiprocessing import Process, Queue, Value, Manager, Pool
file = {"hcs.log", "hcs1.log", "hcs2.log", "hcs3.log"}
def argument(m, a):
proc_num = os.getpid()
a_temp_m_n = 0 # make it local to process
a_temp_h_n = 0 # as shared lists use IPC
with open(os.getcwd() + '/' + m, newline='') as hcs_1:
hcs_2 = csv.reader(hcs_1, delimiter=' ')
for j in hcs_2:
if j[3].find('MISS') != -1:
a_temp_m_n = a_temp_m_n + 1
elif j[3].find('HIT') != -1:
a_temp_h_n = a_temp_h_n + 1
a["vod_miss"].append(a_temp_m_n)
a["vod_hit"].append(a_temp_h_n)
if __name__ == '__main__':
manager = Manager()
vod_live_cuts = manager.dict()
i = "vod_hit"
ii = "vod_miss"
cpu = 1
vod_live_cuts[i] = manager.list()
vod_live_cuts[ii] = manager.list()
with Pool(cpu) as pool:
tasks = []
for m in file:
task = pool.apply_async(argument, args=(m, vod_live_cuts))
tasks.append(task)
for task in tasks:
task.get()
print(list(vod_live_cuts[i]))
print(list(vod_live_cuts[ii]))

Automated Scheduling

When running the Function with no names on the input lists, it gives everyone the approtriate time based on the varible listed. If we have the input varibale have a name it gives everyone time off instead of just that indivudual.
There are some lists associated with this as well and that part works fine.
this is the required resources:
from openpyxl import Workbook
from datetime import timedelta, datetime
import random
def add_agents_w1():
num = 4
c = ["B","C","D","E","F"]
tow1 = input("IF anyone taking time off enter 1st
person now: \n")
tow12 = input("If someone else is taking time off enter
2nd person now: \n")
for x in tl:
ws1[f"A{num}"] = x
ws1[f"I{num}"] = x
for f in c:
if x in tow1 or tow12:
ws1[f'{f}{num}'] = "OFF"
else:
ws1[f"{f}{num}"] = s2t
num += 1
wb.save(dest_filename)

Jmeter - How to compare two numbers using threshold

For the purpose of my testing i need to compare two numbers, which are real numbers.
a) 0.070103 vs. b) 0.0701029999999999986
What is the best way to archive that, if possible with threshold included?
How about rounding them?
Something like:
import java.math.MathContext
def a = 0.070103
def b = 0.0701029999999999986
def roundedA = a.round(new MathContext(5))
def roundedB = b.round(new MathContext(5))
log.info('Rounded a: ' + roundedA)
log.info('Rounded b: ' + roundedB)
log.info('Numbers are equal: ' + roundedA.equals(roundedB))
More information:
BigDecimal.round()
MathContext
Scripting JMeter Assertions in Groovy - A Tutorial

Lua Random number generator always produces the same number

I have looked up several tutorials on how to generate random numbers with lua, each said to use math.random(), so I did. however, every time I use it I get the same number every time, I have tried rewriting the code, and I always get the lowest possible number. I even included a random seed based on the OS time. code below.
require "math"
math.randomseed(os.time())
num = math.random(0,10)
print(num)
I'm using the random function like this:
math.randomseed(os.time())
num = math.random() and math.random() and math.random() and math.random(0, 10)
This is working fine. An other option would be to improve the built-in random function, described here.
This might help! I had to use these functions to write a class that generates Nano IDs. I basically used the milliseconds from the os.clock() function and used that for math.randomseed().
NanoId = {
validCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-",
generate = function (size, validChars)
local response = ""
local ms = string.match(tostring(os.clock()), "%d%.(%d+)")
local temp = math.randomseed(ms)
if (size > 0 and string.len(validChars) > 0) then
for i = 1, size do
local num = math.random(string.len(validChars))
response = response..string.sub(validChars, num, num)
end
end
return response
end
}
function NanoId:Generate()
return self.generate(21, self.validCharacters)
end
-- Runtime Testing
for i = 1, 10 do
print(NanoId:Generate())
end
--[[
Output:
>>> p2r2-WqwvzvoIljKa6qDH
>>> pMoxTET2BrIjYUVXNMDNH
>>> w-nN7J0RVDdN6-R9iv4i-
>>> cfRMzXB4jZmc3quWEkAxj
>>> aFeYCA2kgOx-s4UN02s0s
>>> xegA--_EjEmcDk3Q1zh7K
>>> 6dkVRaNpW4cMwzCPDL3zt
>>> R2Fct5Up5OwnHeExDnqZI
>>> JwnlLZcp8kml-MHUEFAgm
>>> xPr5dULuv48UMaSTzdW5J
]]

Multiprocessing and shared multiprocessing manager lists for parsing large file

I am trying to parse a huge file (approx 23 MB) using the code below, wherein I populate a multiprocessing.manager.list with all the lines read from the file . In the target routine (parse_line) for each process, I pop a line and parse it to create a defaultdict object with certain parsed attributes and finally push each of these objects into another multiprocessing.manager.list.
class parser(object):
def __init__(self):
self.manager = mp.Manager()
self.in_list = self.manager.list()
self.out_list = self.manager.list()
self.dict_list,self.lines, self.pcap_text = [],[],[]
self.last_timestamp = [[(999999,0)]*32]*2
self.num = Word(nums)
self.word = Word(alphas)
self.open_brace = Suppress(Literal("["))
self.close_brace = Suppress(Literal("]"))
self.colon = Literal(":")
self.stime = Combine(OneOrMore(self.num + self.colon) + self.num + Literal(".") + self.num)
self.date = OneOrMore(self.word) + self.num + self.stime
self.is_cavium = self.open_brace + (Suppress(self.word)) + self.close_brace
self.oct_id = self.open_brace + Suppress(self.word) + Suppress(Literal("=")) \
+ self.num + self.close_brace
self.core_id = self.open_brace + Suppress(self.word) + Suppress(Literal("#")) \
+ self.num + self.close_brace
self.ppm_id = self.open_brace + self.num + self.close_brace
self.oct_ts = self.open_brace + self.num + self.close_brace
self.dump = Suppress(Word(hexnums) + Literal(":")) + OneOrMore(Word(hexnums))
self.opening = Suppress(self.date) + Optional(self.is_cavium.setResultsName("cavium")) \
+ self.oct_id.setResultsName("octeon").setParseAction(lambda toks:int(toks[0])) \
+ self.core_id.setResultsName("core").setParseAction(lambda toks:int(toks[0])) \
+ Optional(self.ppm_id.setResultsName("ppm").setParseAction(lambda toks:int(toks[0])) \
+ self.oct_ts.setResultsName("timestamp").setParseAction(lambda toks:int(toks[0]))) \
+ Optional(self.dump.setResultsName("pcap"))
def parse_file(self, filepath):
self.filepath = filepath
with open(self.filepath,'r') as f:
self.lines = f.readlines()
for lineno,line in enumerate(self.lines):
self.in_list.append((lineno,line))
processes = [mp.Process(target=self.parse_line) for i in range(mp.cpu_count())]
[process.start() for process in processes]
[process.join() for process in processes]
while self.in_list:
(lineno, len) = self.in_list.pop()
print mp.current_process().name, "start"
dic = defaultdict(int)
result = self.opening.parseString(line)
self.pcap_text.append("".join(result.pcap))
if result.timestamp or result.ppm:
dic['oct'], dic['core'], dic['ppm'], dic['timestamp'] = result[0:4]
self.last_timestamp[result.octeon][result.core] = (result.ppm,result.timestamp)
else:
dic['oct'], dic['core'] = result[0:2]
dic['ppm'] = (self.last_timestamp[result.octeon][result.core])[0]
dic['ts'] = (self.last_timestamp[result.octeon][result.core])[1]
dic['line'] = lineno
self.out_list.append(dic)
However this entire process takes approximately 3 minutes to complete.
My question is, if there is a better way to make this faster ?
I am using pyparsing module to parse each line, if it makes any difference.
PS: Made changes in the routine Paul McGuire's advice
Not a big performance issue, but learn to iterate over files directly, instead of using readlines(). In place of this code:
self.lines = f.readlines()
for lineno,line in enumerate(self.lines):
self.in_list.append((lineno,line))
You can write:
self.in_list = list(enumerate(f))
A hidden performance killer is using while self.in_list: (lineno,line) = list.pop(). Each call to pop removes the 0'th element from the list. Unfortunately, Python's lists are implemented as arrays. To remove the 0'th element, the 1..n-1'th elements have to be moved up one slot in the array. You don't really have to destroy self.in_list as you go, just iterate over it:
for lineno, line in self.in_list:
<Do something with line and line no. Parse each line and push into out_list>
If you are thinking that consuming self.in_list as you go is a memory-saving measure, then you can avoid the array-shifting inefficiency of Python lists by using a deque instead (from Python's provided collections module). deque's are implemented internally as linked lists, so that pushing or popping to and from either end is very fast, but indexed access is slow. To use a deque, replace the line:
self.in_list = list(enumerate(f))
with:
self.in_list = deque(enumerate(f))
Then replace the call in your code self.in_list.pop() with self.in_list.popleft().
But MUCH more likely to be the performance issue is the pyparsing code you are using to process each line. But since you didn't post the parser code, there is not much help we can provide there.
To get an idea about where the time is going, try leaving all your code, and then comment out the <Do something with line and line no. Parse each line and push into out_list> code (you may have to add a pass statement for the for loop), and then run against your 23MB file. This will give you a rough idea about how much of your 3 minutes is being spent in reading and iterating over the file, and how much is being spent doing the actual parsing. Then post back in another question when you find where the real performance issues lie.

Resources