python Use Pool to create multiple processes but not execute the results - multiprocessing
I put all the functions are placed in a class, including the creation of the process of the function and the implementation of the function, in another file to call the function of this class
from multiprocessing import Pool
def initData(self, type):
# create six process to deal with the data
if type == 'train':
data = pd.read_csv('./data/train_merged_8.csv')
elif type == 'test':
data = pd.read_csv('./data/test_merged_2.csv')
modelvec = allWord2Vec('no').getModel()
modelvec_all = allWord2Vec('all').getModel()
modelvec_stop = allWord2Vec('stop').getModel()
p = Pool(6)
count = 0
for i in data.index:
count += 1
p.apply_async(self.valueCal, args=(i, data, modelvec, modelvec_all, modelvec_stop))
if count % 1000 == 0:
print(str(count // 100) + 'h rows of data has been dealed')
p.close()
p.join
def valueCal(self, i, data, modelvec, modelvec_all, modelvec_stop):
# the function run in process
list_con = []
q1 = str(data.get_value(i, 'question1')).split()
q2 = str(data.get_value(i, 'question2')).split()
f1 = self.getF1_union(q1, q2)
f2 = self.getF2_inter(q1, q2)
f3 = self.getF3_sum(q1, q2)
f4_q1 = len(q1)
f4_q2 = len(q2)
f4_rate = f4_q1/f4_q2
q1 = [','.join(str(ve)) for ve in q1]
q2 = [','.join(str(ve)) for ve in q2]
list_con.append('|'.join(q1))
list_con.append('|'.join(q2))
list_con.append(f1)
list_con.append(f2)
list_con.append(f3)
list_con.append(f4_q1)
list_con.append(f4_q2)
list_con.append(f4_rate)
f = open('./data/test.txt', 'a')
f.write('\t'.join(list_con) + '\n')
f.close()
The result appears very soon like this, but I have not even seen the file being created.But when I check the task manager, there are indeed six processes are created and consumed a lot of resources I cpu. And when the program is finished, the file is still not created.
How can i solve this problem?
10h rows of data have been dealed
20h rows of data have been dealed
30h rows of data have been dealed
40h rows of data have been dealed
Related
Improve code result speed by multiprocessing
I'm self study of Python and it's my first code. I'm working for analyze logs from the servers. Usually I need analyze full day logs. I created script (this is example, simple logic) just for check speed. If I use normal coding the duration of analyzing 20mil rows about 12-13 minutes. I need 200mil rows by 5 min. What I tried: Use multiprocessing (met issue with share memory, think that fix it). But as the result - 300K rows = 20 sec and no matter how many processes. (PS: Also need control processors count in advance) Use threading (I found that it's not give any speed, 300K rows = 2 sec. But normal code same, 300K = 2 sec) Use asyncio (I think that script is slow because need reads many files). Result same as threading - 300K = 2 sec. Finally I think that all three my script incorrect and didn't work correctly. PS: I try to avoid use specific python modules (like pandas) because in this case it will be more difficult to execute on different servers. Better to use common lib. Please help to check 1st - multiprocessing. import csv import os from multiprocessing import Process, Queue, Value, Manager file = {"hcs.log", "hcs1.log", "hcs2.log", "hcs3.log"} def argument(m, a, n): proc_num = os.getpid() a_temp_m = a["vod_miss"] a_temp_h = a["vod_hit"] with open(os.getcwd() + '/' + m, newline='') as hcs_1: hcs_2 = csv.reader(hcs_1, delimiter=' ') for j in hcs_2: if j[3].find('MISS') != -1: a_temp_m[n] = a_temp_m[n] + 1 elif j[3].find('HIT') != -1: a_temp_h[n] = a_temp_h[n] + 1 a["vod_miss"][n] = a_temp_m[n] a["vod_hit"][n] = a_temp_h[n] if __name__ == '__main__': procs = [] manager = Manager() vod_live_cuts = manager.dict() i = "vod_hit" ii = "vod_miss" cpu = 1 n = 1 vod_live_cuts[i] = manager.list([0] * cpu) vod_live_cuts[ii] = manager.list([0] * cpu) for m in file: proc = Process(target=argument, args=(m, vod_live_cuts, (n-1))) procs.append(proc) proc.start() if n >= cpu: n = 1 proc.join() else: n += 1 [proc.join() for proc in procs] [proc.close() for proc in procs] I'm expect, each file by def argument will be processed by independent process and finally all results will be saved in dict vod_live_cuts. For each process I added independent list in dict. I think it will help cross operation for use this parameter. But maybe it's wrong way :(
using IPC is costly, so only use "shared objects" for saving the final result, not for intermediate results while parsing the file. limiting the number of processes is done by using a multiprocessing.Pool, the following code uses it to reach the max hard-disk speed, you only need to post-process the results. you can only parse data as fast as your HDD can read it (typically 30-80 MB/s), so if you need to improve the performance further you should use SSD or RAID0 for higher disk speed, you cannot get much faster than this without changing your hardware. import csv import os from multiprocessing import Process, Queue, Value, Manager, Pool file = {"hcs.log", "hcs1.log", "hcs2.log", "hcs3.log"} def argument(m, a): proc_num = os.getpid() a_temp_m_n = 0 # make it local to process a_temp_h_n = 0 # as shared lists use IPC with open(os.getcwd() + '/' + m, newline='') as hcs_1: hcs_2 = csv.reader(hcs_1, delimiter=' ') for j in hcs_2: if j[3].find('MISS') != -1: a_temp_m_n = a_temp_m_n + 1 elif j[3].find('HIT') != -1: a_temp_h_n = a_temp_h_n + 1 a["vod_miss"].append(a_temp_m_n) a["vod_hit"].append(a_temp_h_n) if __name__ == '__main__': manager = Manager() vod_live_cuts = manager.dict() i = "vod_hit" ii = "vod_miss" cpu = 1 vod_live_cuts[i] = manager.list() vod_live_cuts[ii] = manager.list() with Pool(cpu) as pool: tasks = [] for m in file: task = pool.apply_async(argument, args=(m, vod_live_cuts)) tasks.append(task) for task in tasks: task.get() print(list(vod_live_cuts[i])) print(list(vod_live_cuts[ii]))
applyInPandas() aggregation runs slowly on big delta table
I'm trying to create a gold table notebook in Databricks, however it would take 9 days to fully reprocess the historical data (43GB, 35k parquet files). I tried scaling up the cluster but it doesn't go above 5000 records/second. The bottleneck seems to be the applyInPandas() function. I'm wondering if I could replace pandas with anything else to make the gold notebook execute faster. Silver table has 60 columns (read_id, reader_id, tracker_timestamp, event_type, ebook_id, page_id, agent_ip, agent_device_type, ...). Each row of data represents read event of an ebook. E.g 'page turn', 'click on image', 'click on link',... All of the events that have occurred in the single session have the same read.id. In the gold table I'm trying to group those events in sessions and calculate the number of times each event has occurred in the single session. So instead of 100+ rows of data for a read session in silver table I would end up just with a single aggregated row in gold table. Input is the silver delta table: import pyspark.sql.functions as F import pyspark.sql.types as T import pandas as pd from pyspark.sql.functions import pandas_udf input = (spark .readStream .format("delta") .option("withEventTimeOrder", "true") .option("maxFilesPerTrigger", 100) .load(f"path_to_silver_bucket") ) I use withWatermark and session_window functions to ensure I end up grouping all of the events from the single read session. (read session automatically ends 30 minutes after the last reader activity) group = input.withWatermark("tracker_timestamp", "10 minutes").groupBy("read_id", F.session_window(input.tracker_timestamp, "30 minutes")) In the next step I use the applyInPandas function like so: sessions = group.applyInPandas(processing_function, schema=processing_function_output_schema) Definition of the processing_function used in applyInPandas: def processing_function(df): surf_time_ms = df.query('event_type == "surf"')['duration'].sum() immerse_time_ms = df.query('event_type == "immersion"')['duration'].sum() min_timestamp = df['tracker_timestamp'].min() max_timestamp = df['tracker_timestamp'].max() shares = len(df.query('event_type == "share"')) leads = len(df.query('event_type == "lead_store"')) is_read = len(df.query('event_type == "surf"')) > 0 distinct_pages = df['page_id'].nunique() data = { "read_id": df['read_id'].values[0], "surf_time_ms": surf_time_ms, "immerse_time_ms": immerse_time_ms, "min_timestamp": min_timestamp, "max_timestamp": max_timestamp, "shares": shares, "leads": leads, "is_read": is_read, "number_of_events": len(df), "distinct_pages": distinct_pages } for field in not_calculated_string_fields: data[field] = df[field].values[0] new_df = pd.DataFrame(data=data, index=['read_id']) for x in all_events: new_df[f"count_{x}"] = df.query(f"type == '{x}'").count() for x in duration_events: duration = df.query(f"event_type == '{x}'")['duration'] duration_sum = duration.sum() new_df[f"duration_{x}_ms"] = duration_sum if duration_sum > 0: new_df[f"mean_duration_{x}_ms"] = duration.mean() else: new_df[f"mean_duration_{x}_ms"] = 0 return new_df And finally, I'm writing the calculated row to the gold table like so: for_partitioning = (sessions .withColumn("tenant", F.col("story_tenant")) .withColumn("year", F.year(F.col("min_timestamp"))) .withColumn("month", F.month(F.col("min_timestamp")))) checkpoint_path = "checkpoint-path" gold_path = f"gold-bucket" (for_partitioning .writeStream .format('delta') .partitionBy('year', 'month', 'tenant') .option("mergeSchema", "true") .option("checkpointLocation", checkpoint_path) .outputMode("append") .start(gold_path)) Can anybody think of a more efficient way to do a UDF in PySpark than applyInPandas for the above example? I simply cannot afford to wait 9 days to reprocess 43GB of data... I've tried playing around with different input and output options (e.g. .option("maxFilesPerTrigger", 100)) but the real problem seems to be applyInPandas.
You could rewrite your processing_function into native Spark if you really wanted. "read_id": df['read_id'].values[0] F.first('read_id').alias('read_id') "surf_time_ms": df.query('event_type == "surf"')['duration'].sum() F.sum(F.when(F.col('event_type') == 'surf', F.col('duration'))).alias('surf_time_ms') "immerse_time_ms": df.query('event_type == "immersion"')['duration'].sum() F.sum(F.when(F.col('event_type') == 'immersion', F.col('duration'))).alias('immerse_time_ms') "min_timestamp": df['tracker_timestamp'].min() F.min('tracker_timestamp').alias('min_timestamp') "max_timestamp": df['tracker_timestamp'].max() F.max('tracker_timestamp').alias('max_timestamp') "shares": len(df.query('event_type == "share"')) F.count(F.when(F.col('event_type') == 'share', F.lit(1))).alias('shares') "leads": len(df.query('event_type == "lead_store"')) F.count(F.when(F.col('event_type') == 'lead_store', F.lit(1))).alias('leads') "is_read": len(df.query('event_type == "surf"')) > 0 (F.count(F.when(F.col('event_type') == 'surf', F.lit(1))) > 0).alias('is_read') "number_of_events": len(df) F.count(F.lit(1)).alias('number_of_events') "distinct_pages": df['page_id'].nunique() F.countDistinct('page_id').alias('distinct_pages') for field in not_calculated_string_fields: data[field] = df[field].values[0] *[F.first(field).alias(field) for field in not_calculated_string_fields] for x in all_events: new_df[f"count_{x}"] = df.query(f"type == '{x}'").count() The above can probably be skipped? As far as my tests go, new columns get NaN values, because .count() returns a Series object instead of one simple value. for x in duration_events: duration = df.query(f"event_type == '{x}'")['duration'] duration_sum = duration.sum() new_df[f"duration_{x}_ms"] = duration_sum if duration_sum > 0: new_df[f"mean_duration_{x}_ms"] = duration.mean() else: new_df[f"mean_duration_{x}_ms"] = 0 *[F.sum(F.when(F.col('event_type') == x, F.col('duration'))).alias(f"duration_{x}_ms") for x in duration_events] *[F.mean(F.when(F.col('event_type') == x, F.col('duration'))).alias(f"mean_duration_{x}_ms") for x in duration_events] So, instead of def processing_function(df): ... ... sessions = group.applyInPandas(processing_function, schema=processing_function_output_schema) you could use efficient native Spark: sessions = group.agg( F.first('read_id').alias('read_id'), F.sum(F.when(F.col('event_type') == 'surf', F.col('duration'))).alias('surf_time_ms'), F.sum(F.when(F.col('event_type') == 'immersion', F.col('duration'))).alias('immerse_time_ms'), F.min('tracker_timestamp').alias('min_timestamp'), F.max('tracker_timestamp').alias('max_timestamp'), F.count(F.when(F.col('event_type') == 'share', F.lit(1))).alias('shares'), F.count(F.when(F.col('event_type') == 'lead_store', F.lit(1))).alias('leads'), (F.count(F.when(F.col('event_type') == 'surf', F.lit(1))) > 0).alias('is_read'), F.count(F.lit(1)).alias('number_of_events'), F.countDistinct('page_id').alias('distinct_pages'), *[F.first(field).alias(field) for field in not_calculated_string_fields], # skipped count_{x} *[F.sum(F.when(F.col('event_type') == x, F.col('duration'))).alias(f"duration_{x}_ms") for x in duration_events], *[F.mean(F.when(F.col('event_type') == x, F.col('duration'))).alias(f"mean_duration_{x}_ms") for x in duration_events], )
R estimating one independent variable more than once
I am trying to estimate a multinomial logit model for predicting systemic banking crisis with panel data. Below is my code. I have ran this code before and it has worked fine. However, I tried to change the names of the independent variables and used the new data to run the model again. But ever since then R is estimating multiple iterations of x1 variable. But when I am dropping x1 the model estimation turns out to be just fine again. I have attached a screenshots of the results. Faulty_result1, Faulty_result_2 and Result_with_x1_dropped. I can't seem to figure out what the issue is. Any help will be much appreciated. #Remove all items from memory (if any) rm(list=ls(all=TRUE)) #Set working directory to load files setwd("D:/PhD/Codes") #Load necessary libraries library(readr) library(nnet) library(plm) #Load data my_data <- read_csv("D:/PhD/Data/xx_Final Data_4.csv", col_types = cols(`Time Period` = col_date(format = "%d/%m/%Y"), y = col_factor(levels = c("0", "1", "2")), x2 = col_double(), x5 = col_double(), x9 = col_double(), x11 = col_double(), x13 = col_double(), x24 = col_double()), na = "NA") #Change levels from numeric to character levels(my_data$y) <- c("Tranquil", "Pre-crisis", "Crisis") str(my_data$y) #Create Panel Data p_data=pdata.frame(my_data) #Export dataset write_csv(p_data,"D:/PhD/Data/Clean_Final Data_4.csv") #Drop unnecessary columns p <- subset(p_data, select = c(3:27)) #Set reference level p$y <- relevel(p$y, ref="Tranquil") #Create Model model <- multinom(y~ ., data = p) summary(model) stargazer::stargazer(model, type = "text")
Extract multiple protein sequences from a Protein Data Bank along with Secondary Structure
I want to extract protein sequences and their corresponding secondary structure from any Protein Data bank, say RCSB. I just need short sequences and their secondary structure. Something like, ATRWGUVT Helix It is fine even if the sequences are long, but I want a tag at the end that denotes its secondary structure. Is there any programming tool or anything available for this. As I've shown above I want only this much minimal information. How can I achieve this?
from Bio.PDB import * from distutils import spawn Extract sequence: def get_seq(pdbfile): p = PDBParser(PERMISSIVE=0) structure = p.get_structure('test', pdbfile) ppb = PPBuilder() seq = '' for pp in ppb.build_peptides(structure): seq += pp.get_sequence() return seq Extract secondary structure with DSSP as explained earlier: def get_secondary_struc(pdbfile): # get secondary structure info for whole pdb. if not spawn.find_executable("dssp"): sys.stderr.write('dssp executable needs to be in folder') sys.exit(1) p = PDBParser(PERMISSIVE=0) ppb = PPBuilder() structure = p.get_structure('test', pdbfile) model = structure[0] dssp = DSSP(model, pdbfile) count = 0 sec = '' for residue in model.get_residues(): count = count + 1 # print residue,count a_key = list(dssp.keys())[count - 1] sec += dssp[a_key][2] print sec return sec This should print both sequence and secondary structure.
You can use DSSP. The output of DSSP is explained extensively under 'explanation'. The very short summary of the output is: H = α-helix B = residue in isolated β-bridge E = extended strand, participates in β ladder G = 3-helix (310 helix) I = 5 helix (π-helix) T = hydrogen bonded turn S = bend
Join tiles in Corona SDK into one word for a Breakout game grid?
I have a game project to re-implement Breakout. I want to display two words, each word on a line. They are joined by the bricks block. Inside, the top line is the first name, aligned left. The bottom line is the last name, aligned right. They are input from textboxes, and rendered as shown: Each second that passes, the screen will add a configurable number of bricks to the grid (for example, five bricks per second) until the two words appear complete. I displayed a letter of the alphabet which is created from the matrix(0,1). ...But I don’t know how to join them into one word. How can I join these letters? This is what I've gotten so far: Bricks.lua local Bricks = display.newGroup() -- static object local Events = require("Events") local Levels = require("Levels") local sound = require("Sound") local physics = require("physics") local Sprites = require("Sprites") local Func = require("Func") local brickSpriteData = { { name = "brick", frames = {Sprites.brick} }, { name = "brick2", frames = {Sprites.brick2} }, { name = "brick3", frames = {Sprites.brick3} }, } -- animation table local brickAnimations = {} Sprites:CreateAnimationTable { spriteData = brickSpriteData, animationTable = brickAnimations } -- get size from temp object for later use local tempBrick = display.newImage('red_apple_20.png',300,500) --local tempBrick = display.newImage('cheryGreen2.png',300,500) local brickSize = { width = tempBrick.width, height = tempBrick.height } --tempBrick:removeSelf( ) ---------------- -- Rubble -- needs to be moved to its own file ---------------- local rubbleSpriteData = { { name = "rubble1", frames = {Sprites.rubble1} }, { name = "rubble2", frames = {Sprites.rubble2} }, { name = "rubble3", frames = {Sprites.rubble3} }, { name = "rubble4", frames = {Sprites.rubble4} }, { name = "rubble5", frames = {Sprites.rubble5} }, } local rubbleAnimations = {} Sprites:CreateAnimationTable { spriteData = rubbleSpriteData, animationTable = rubbleAnimations } local totalBricksBroken = 0 -- used to track when level is complete local totalBricksAtStart = 0 -- contains all brick objects local bricks = {} local function CreateBrick(data) -- random brick sprite local obj = display.newImage('red_apple_20.png') local objGreen = display.newImage('cheryGreen2.png') obj.name = "brick" obj.x = data.x --or display.contentCenterX obj.y = data.y --or 1000 obj.brickType = data.brickType or 1 obj.index = data.index function obj:Break() totalBricksBroken = totalBricksBroken + 1 bricks[self.index] = nil obj:removeSelf( ) sound.play(sound.breakBrick) end function obj:Update() if(self == nil) then return end if(self.y > display.contentHeight - 20) then obj:Break() end end if(obj.brickType ==1) then physics.addBody( obj, "static", {friction=0.5, bounce=0.5 } ) elseif(obj.brickType == 2) then physics.addBody( objGreen,"static",{friction=0.2, bounce=0.5, density = 1 } ) end return obj end local currentLevel = testLevel -- create level from bricks defined in an object -- this allows for levels to be designed local function CreateBricksFromTable(level) totalBricksAtStart = 0 local activeBricksCount = 0 for yi=1, #level.bricks do for xi=1, #level.bricks[yi] do -- create brick? if(level.bricks[yi][xi] > 0) then local xPos local yPos if(level.align == "center") then --1100-((99*16)*0.5) xPos = display.contentCenterX- ((level.columns * brickSize.width) * 0.5/3) + ((xi-1) * level.xSpace)--display.contentCenterX --xPos = 300 +(xi * level.xSpace) yPos = 100 + (yi * level.ySpace)--100 else xPos = level.xStart + (xi * level.xSpace) yPos = level.yStart + (yi * level.ySpace) end local brickData = { x = xPos, y = yPos, brickType = level.bricks[yi][xi], index = activeBricksCount+1 } bricks[activeBricksCount+1] = CreateBrick(brickData) activeBricksCount = activeBricksCount + 1 end end end totalBricks = activeBricksCount totalBricksAtStart = activeBricksCount end -- create bricks for level --> set from above functions, change function to change brick build type local CreateAllBricks = CreateBricksFromTable -- called by a timer so I can pass arguments to CreateAllBricks local function CreateAllBricksTimerCall() CreateAllBricks(Levels.currentLevel) end -- remove all brick objects from memory local function ClearBricks() for i=1, #bricks do bricks[i] = nil end end -- stuff run on enterFrame event function Bricks:Update() -- update individual bricks if(totalBricksAtStart > 0) then for i=1, totalBricksAtStart do -- brick exists? if(bricks[i]) then bricks[i]:Update() end end end -- is level over? if(totalBricksBroken == totalBricks) then Events.allBricksBroken:Dispatch() end end ---------------- -- Events ---------------- function Bricks:allBricksBroken(event) -- cleanup bricks ClearBricks() local t = timer.performWithDelay( 1000, CreateAllBricksTimerCall) --CreateAllBricks() totalBricksBroken = 0 -- play happy sound for player to enjoy sound.play(sound.win) print("You Win!") end Events.allBricksBroken:AddObject(Bricks) CreateAllBricks(Levels.currentLevel) return Bricks Levels.lua local Events = require("Events") local Levels = {} local function MakeLevel(data) local level = {} level.xStart = data.xStart or 100 level.yStart = data.yStart or 100 level.xSpace = data.xSpace or 23 level.ySpace = data.ySpace or 23 level.align = data.align or "center" level.columns = data.columns or #data.bricks[1] level.bricks = data.bricks --> required return level end Levels.test4 = MakeLevel { bricks = { {0,2,0,0,2,0,0,2,0}, {0,0,2,0,2,0,2,0,0}, {0,0,0,0,2,0,0,0,0}, {1,1,2,1,1,1,2,1,1}, {0,0,0,0,1,0,0,0,0}, {0,0,0,0,1,0,0,0,0}, {0,0,0,0,1,0,0,0,0}, } } Levels.test5 = MakeLevel { bricks = { {0,0,0,1,0,0,0,0}, {0,0,1,0,1,0,0,0}, {0,0,1,0,1,0,0,0}, {0,1,0,0,0,1,0,0}, {0,1,1,1,1,1,0,0}, {1,0,0,0,0,0,1,0}, {1,0,0,0,0,0,1,0}, {1,0,0,0,0,0,1,0}, {1,0,0,0,0,0,1,0} } } -- Levels.test6 = MakeLevel2 -- { -- bricks = -- { ----A "a" = {{0,0,0,1,0,0,0,0}, -- {0,0,1,0,1,0,0,0}, -- {0,0,1,0,1,0,0,0}, -- {0,1,0,0,0,1,0,0}, -- {0,1,1,1,1,1,0,0}, -- {1,0,0,0,0,0,1,0}, -- {1,0,0,0,0,0,1,0}, -- {1,0,0,0,0,0,1,0}, -- {1,0,0,0,0,0,1,0}}, ----B -- "b" = {{1,1,1,1,0,0,0}, -- {1,0,0,0,1,0,0}, -- {1,0,0,0,1,0,0}, -- {1,0,0,0,1,0,0}, -- {1,1,1,1,0,0,0}, -- {1,0,0,0,1,0,0}, -- {1,0,0,0,0,1,0}, -- {1,0,0,0,0,1,0}, -- {1,1,1,1,1,0,0}}, --........... --....... --... -- --Z -- "z"= {{1,1,1,1,1,1,1,0}, -- {0,0,0,0,0,1,0,0}, -- {0,0,0,0,1,0,0,0}, -- {0,0,0,0,1,0,0,0}, -- {0,0,0,1,0,0,0,0}, -- {0,0,1,0,0,0,0,0}, -- {0,0,1,0,0,0,0,0}, -- {0,1,0,0,0,0,0,0}, -- {1,1,1,1,1,1,1,0}} -- } -- } -- stores all levels in ordered table so that one can be selected randomly by index Levels.levels = { --Levels.test4, Levels.test5 -- Levels.test6, } function Levels:GetRandomLevel() return self.levels[math.random(#Levels.levels)] end Levels.notPlayedYet = {} Levels.currentLevel = Levels:GetRandomLevel() -- Events function Levels:allBricksBroken(event) self.currentLevel = Levels:GetRandomLevel() end Events.allBricksBroken:AddObject(Levels) return Levels The work I've done thus far (same as above) as an external download: http://www.mediafire.com/download/1t89ftkbznkn184/Breakout2.rar
In the interest of actually answering the question: I'm not 100% sure what you mean by "How can I join these letters", but from poking through the code I have a guess, so please clarify on whether it is accurate, or if I am wrong about what you wanted. Scenario 1 You haven't successfully achieved the image illustrated in the screenshot - you've been able to draw one letter, but not multiple ones. In this case, you'll need to have a better understanding of what your code is doing. The CreateBricksFromTable function takes in a Level object, which is created by the MakeLevel function from a table with a bricks property, which is a table of tables that represent rows with columns in them, showing what type of brick should be at each position. In your commented-out level, you have created an table where the bricks field contains a field for each letter, but the MakeLevel function still expects a bricks field that directly contains the grid of blocks. You will have to - as it seems you attempted - create a MakeWordLevel function (or the like) that takes this letter list, and a word for each line, and constructs a larger grid by copying the appropriate letters into it. StackOverflow is not your programming tutor, and an SO question is not the right forum for having people write code for you or getting into step-by-step details of how to do this, but I'll leave you a basic outline. Your function would look something like this: local function MakeWordLevel(data, line1, line2) local level = {} ... return level end And then would have to: Populate all of the same properties that MakeLevel does Calculate how wide (level.columns) the level should be with all the letters Create a table in the same format as the bricks properties, but big enough to hold all of the letters Go through the input strings (line1 and line2), find the correct letter data from what is now the test6 array, and copy that data into the large table Assign that table as level.bricks This question already is a bit outside of what StackOverflow is intended for in that it asks about how to implement a feature rather than achieve a small, specific programming task, so any further followup should take place in a chatroom - perhaps the Hello World room would be helpful. Scenario 2: This was my original guess, but after considering and reading past edits, I doubt this is answering the right question You may want a solid "background" of, say, red blocks, surrounding your letters and making the field into a solid "wall", with the name in a different color. And you may want these bricks to slowly show up a few at a time. In that case, the main thing you need to do is keep track of what spaces are "taken" by the name bricks. There are many ways to do this, but I would start with a matrix to keep track of that - as big as the final playing field - full of 0's. Then, as you add the bricks for the name, set a 1 at the x,y location in that matrix according to that block's coordinate. When you want to fill in the background, each time you go to add a block at a coordinate, check that "taken" matrix before trying to add a block - if it's taken (1), then just skip it and move onto the next coordinate. This works if you're filling in the background blocks sequentially (say, left to right, top to bottom), or if you want to add them randomly. With random, you'd also want to keep updating the "taken" matrix so you don't try to add a block twice. The random fill-in, however, presents its own problem - it will keep taking longer to fill in as it goes, because it'll find more and more "taken" blocks and have to pick a new one. There are solutions to this, of course, but I won't go too far down that road when I don't know if that's even what you want.
I don't really understand (or read, for that matter) your code but from what I see joining them into complete words is easy. You have two possibilities. You can "render" them directly into your level/display data, simply copy them to the appropriate places, like this: -- The level data. local level = {} -- Create the level data. for row = 1, 25, 1 do local rowData = {} for column = 1, 80, 1 do rowData[column] = "." end level[row] = rowData end -- Now let us setup the letters. local letters = { A = { {".",".",".","#",".",".",".","."}, {".",".","#",".","#",".",".","."}, {".",".","#",".","#",".",".","."}, {".","#",".",".",".","#",".","."}, {".","#","#","#","#","#",".","."}, {"#",".",".",".",".",".","#","."}, {"#",".",".",".",".",".","#","."}, {"#",".",".",".",".",".","#","."}, {"#",".",".",".",".",".","#","."} }, B = { {"#","#","#","#",".",".","."}, {"#",".",".",".","#",".","."}, {"#",".",".",".","#",".","."}, {"#",".",".",".","#",".","."}, {"#","#","#","#",".",".","."}, {"#",".",".",".","#",".","."}, {"#",".",".",".",".","#","."}, {"#",".",".",".",".","#","."}, {"#","#","#","#","#",".","."} } } -- The string to print. local text = "ABBA" -- Let us insert the data into the level data. for index = 1, #text, 1 do local char = string.sub(text, index, index) local charData = letters[char] local offset = index * 7 for row = 1, 9, 1 do local rowData = charData[row] for column = 1, 7, 1 do level[row][offset + column] = rowData[column] end end end -- Print everything for row = 1, 25, 1 do local rowData = level[row] for column = 1, 80, 1 do io.write(rowData[column]) end print() end You save you letters in a lookup table and then copy them, piece by piece, to the level data. Here I replaced the numbers with dots and number signs to make it prettier on the command line. Alternately to that you can also "render" the words into a prepared buffer and then insert that into the level data by using the same logic.