I need to copy existing private s3 files to another directory but my process is too slow, on my local its 2 seconds per file_get_content of each file.
My problem is most files that I process are 50+ files so if you total that would be 2seconds * 50 and its really not a great user experience waiting for that amount of time for a process to finish, what might be the best approach I can do to refactor this? queue is not really an option at the moment
foreach ($sourceAttachmentFiles as $sourceAttachmentFile) {
$newFullFileName = $newDirectory.$sourceAttachmentFile->filename;
// 2 seconds every loop
$content = file_get_contents($sourceAttachmentFile->getS3SignedUrl());
Storage::disk('s3')->put($newFullFileName, $content, 'private');
}
Related
I have a shell script that collects some data and send it to destination. Part of the data should be copied every 5 minutes and other every 20 minutes. How can this be achieved in a single script? As of now i'm collecting the data every 5 minutes by scheduling with cron.
Best practice would be to use to separate files with two different cron entries. If you need to reutilize part of your code consider using functions.
If you still want to do it in only one file, you should run it every 5 minutes and on each run check whether or not you should execute the other part (every 20 min) or not.
modulo=$((`date +%_M)` % 20))
//do whatever has to be done every 5min
[...]
//check for modulo of current minute / 20
if [ $modulo -eq 0 ]; then
echo Current minute is `date +%_M)`, must execute part 2
//whatever has to be done every 20min
else
//do nothing
fi;
The reason why the variable modulo is defined in the first line is because what has to be done every 5min can potentially take longer than 1min to execute so by the time it is done minute is no longer 20 but 21. Defining the variable before is a safeguard to overcome this.
I need to develop a streaming application which read some session logs from several sources.
The batch interval could be in a scale around 5 minutes..
The problem is that the files I get in each batch vary enormously. In one in each batch I may get some file with 10 megabyte and then in another batch getting some files around 20GB.
I want to know if there is any approach to handle this..Is there any limitation for the size of RDDs a file stream can generate for each batch?
Can I limit the spark streaming to read just a fixed amount of data in each batch into the RDD?
As of I know there is no direct way to limit that. File to considered is controlled in isNewFile private function in FileStream. Based on the code I can think of one work around.
Use filter function to limit the number of files to be read. Any files more then 10 return false and use touch command to update the timestamp of the file to be considered for next window.
globalcounter=10
val filterF = new Function[Path, Boolean] {
def apply(file: Path): Boolean = {
globalcounter --
if(globalcounter > 0) {
return true // consider only 10 files.
}
// touch the file so that timestamp of the file is updated.
return false
}
}
I am trying to get to grips with perl. I am trying to write a few scripts as a scheduling simulator. FCFS, SSTF and Scan and Look
I have one array with a list of block requests and another to act as the buffer. First I will copy over the first request, then I need to work out the time it takes to get from the first to the second block.
the buffer reads in blocks at 1 per ms, seek, search and access time are all 1ms to make the calculations a bit easier, the simulator always starts on block 1 track 1.
http://postimg.org/image/d9osb8tkj/
so if the first block is 5, the search time will be 3ms to traverse to the start of the 5th block, the seek time will be zero as its on the same track and the access time to read the block will always be 1ms. This means that the time for this request will be 4ms so the simulator will read in the next 4 requests into the buffer. In first come first served this will just be the order that the requests are served.
So if the next request to serve is 12 the arm is on the end of the 5th block so will take 2ms to get to the right track then 1ms to get to the start of the 12th block and another 1ms to access it.
I was just wondering if anyone could give me some idea how I could express this as an algorithm. Just some pointers would be much appreciated.
write a class HardDiskSim::Abstract, 3 subs seek_time(), spin_time(), and read_time()
Write a subclass of AbstractDisk for each different set of values/logic for the three methods.
Fir example:
package HardDiskSim::Simple;
use base qw(HardDiskSim::Abstract);
our $SECTORS_PER_TRACK = 5;
our $SEEK_TTIM_PER_TRACK = 1;
sub read_time { return 1 }
sub seek_time {
my $block = #_;
my $tracks_to_seek = int($block / $SECTORS_PER_TRACK);
return $tracks_to_seek * $SEEK_TTIM_PER_TRACK;
}
sub spin_time {
# compute head position at end of seek using seek time and RPM of disk
# compute number of sectors to spin past using computed head position
# return number_of_sectors_to_spin_past * time_per_sector
}
I had the fun of writing this kind of code in Fortran, for a class, back in 1985.
I've seen developers have had this problem since a few years ago. I have studied many forums and the official POI documents. Nonetheless I haven't found an answer yet.
So the problem is.. I have tried the following two snippets:
Workbook wb = WorkbookFactory.create(new File("spreadsheet.xlsx"));
and
File file = new File("C:\\spreadsheet.xlsx");
OPCPackage opcPackage = OPCPackage.open(file.getAbsolutePath());
XSSFWorkbook workbook = new XSSFWorkbook(opcPackage);
and either of the approaches takes about 5-6min (if the application doesn't run out of memory) to process a simple and fairly small spreadsheet.xlsx file (200KB).
What do I need to do to fix this? (I'm using Apache POI 3.9)
/*****************************/
The process takes a long time in the following location:
public class XSSFSheet extends POIXMLDocumentPart implements Sheet{
...
protected void read(InputStream is) throws IOException {
try {
-->>> worksheet = WorksheetDocument.Factory.parse(is).getWorksheet();
} catch (XmlException e){
throw new POIXMLException(e);
}
}
...
I can't debug further. The VisualVM also says the same thing..!
One factor that might be contributing to the load time is that the data has been pasted into the worksheet so that the used range includes every row, ie when you use the sheet.usedrange rows count it returns > 1,000,000 rows.. Not sure how this happens but I found that I needed to perform an intermediary step wherein prior to loading the workbook I 'cleaned' it by using some vba script. The workbook has around 20 sheets of around 5000 rows each, each of which are filled out by different parts o the business, and it takes a fairly long time (maybe 4 minutes) to load but that is acceptable in this case. Before I added the cleaning stage it ran for over 30 minutes, which was not acceptable....
A user runs the process I am referring to, bu pressing two buttons. The first cleans, the second does the rest. The first process is triggered using Runtime.getruntime.exec and creates an empty text file that the second process will not run unless the test file is there.
I'm posting this question here because I'm not sure it's a WordPress issue.
I'm running XAMPP on my local system, with 512MB max headroom and a 2.5-hour php timeout. I'm importing about 11,000 records into the WordPress wp_user and wp_usermeta tables via a custom script. The only unknown quantity (performance-wise) on the WordPress end is the wp_insert_user and update_user_meta calls. Otherwise it's a straight CSV import.
The process to import 11,000 users and create 180,000 usermeta entries took over 2 hours to complete. It was importing about 120 records a minute. That seems awfully slow.
Are there known performance issues importing user data into WordPress? A quick Google search was unproductive (for me).
Are there settings I should be tweaking beyond the timeout in XAMPP? Is its mySQL implementation notoriously slow?
I've read something about virus software dramatically slowing down XAMPP. Is this a myth?
yes, there are few issues with local vs. hosted. One of the important things to remember is the max_execution time for php script. You may need to reset the timer once a while during the data upload.
I suppose you have some loop which takes the data row by row from CSV file for example and uses SQL query to insert it into WP database. I usually put this simple snippet into my loop so it will keep the PHP max_exec_time reset:
$counter = 1;
// some upload query
if (($handle = fopen("some-file.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
mysql_query..... blablabla....
// snippet
if($counter == '20') // this count 20 loops and resets the counter
{
set_time_limit(0);
$counter = 0;
}
$counter = $counter + 1;
} //end of the loop
.. also BTW 512MB room is not much if the database is big. Count how much resources is taking your OS and all running apps. I have ove 2Gb WO database and my MySql needs a lot of RAM to run fast. (depends on the query you are using as well)