AIOfile VS AIOfiles - python-asyncio

I want to open files from asynchronous processes and I've noticed two different modules that could be used: aiofiles and aiofile. However, I don't seem to find information about pros and cons between the two.
Would anyone have information about their main differences? Is either of them more widely use than the other?
Thanks

They are doing mostly the same things, currently the aiofiles lib supports a little bit more stuff than aiofile like temp files.
Currently the usage for aiofile:
Downloads last month: 91,971 (pypistats.org)
and for aiofiles:
Downloads last month: 4,732,892 (pypistats.org)
Personally, I would use the aiofiles because of the stats and the additional features.

Related

Extracting the serialized data from unknown files

My dearest stackoverflowers,
I want to access the serialized data contained in files with strange, to me, extensions. The bulk of the data seems to be in a .st and an .idt file.
The program is meant to be run on Windows, and the unix file command gives me only false positives. Any ideas on either what these extensions mean or on how to investigate and extract their contents?
Below I provide the entirety of the extensions in a long list in hope somebody recognizes them. Googling also gives me false positives. For example: .st is commonly used for ATARI emulation files.
Thanks in advance!
.cix
.cmp
.cnt
.dam
.das
.drf
.idt
.irc
.lxp
.mp
.mbr
.str
.vlf
.rpf
.st
.st
Some general advice on how to approach this:
One way to approach this is to use a site like http://filext.com/ to try to figure out where the files came from. This can be tough, because it's not like there's a file extension standard anywhere - anyone can use any extension, so you're going to have a lot of conflicts/disambiguation issues to solve.
Sometimes you can get lucky, and if you open up the files in a plain text editor you can occasionally see plain string data that is readable, which can help identify the general sort of data contained in a file, and therefore help cut down on the possible number of sources for a file. For example, I have often helped people who received a file as an email attachment with no extension, figure out what file type it was using this technique, adding the file extension, and then opening it in the appropriate program.
There are also sites like http://www.oldversion.com/ that keep old versions of programs that you (typcially) can download for free. This is especially helpful if the data you're working with was created 5+ years ago, in a proprietary program, and that program is no longer available/purchasable from the vendor who created it.
Once you have a good idea of what files belong to what programs, then you're probably going to spend a lot of time trying to find online resources for what the structure of the files are. If that isn't available, you can get a copy of the original program, but either the program won't open the files you're interested in or you still want raw access to the data, then try generating some sample output files with data that you input, and go Rosetta Stone on it, comparing your known file to the original file.
From there, the additional knowledge you'll probably want, is to try to find out what language/compiler the software was written in, which can give you a lead on what code libraries were used to serialize the data in the first place. Once you know all that, then it's matter of reading through any available documentation on the serialization process, and then writing a deserializer.
The one thing this technique won't solve is, if you're dealing with corrupt/truncated data files, it may be very difficult to tell the difference between that and whether or not you have the file structure correct. The "Rosetta Stone" technique might be helpful in that case.
Depending on how many different pieces of source software you're talking about, sounds like a pretty big project. Good luck!

software for organizing text

(
I suspect that the question may not belong here as it's about software and not about programing. However, this is my computers community, and I trust you guys to refer me elsewhere if you think it's not appropriate to answer it here.
)
So,
I'm writing a lot. Text. For myself. Diaries, ideas, insights, observations. It always comes in the form of passages, passage at a time.
Until now I used to write in word documents, organizing them by rough categories divided to different documents, and by chronological order.
I figure out that this is way sub optimal. I can have more, and I do need more.
I'm looking for a software that will allow me to:
1 - tag passages
2 - store date and time automatically (created and edited)
3 - powerful full text search
4 - besides the above, I'd like it to have as much word processing capabilities as possible
Recommendations for a software that have this?
Now, I don't need this to be online. I'm doing this for myself, and don't want it to be published anywhere. I figure out however that many web platforms may have much of what I need, so I don't automatically reject the possibility to use one for my offline needs.
Thanks guys
Gidi
You could install wordpress or any other suitable blogging software locally and have your own private blog - let's you write passages as short as you like, you can tag it, categorize it, search it. Keeps track of when it was created and edited. And you can probably add a fair amount of word processing capabilities to it via plugins. And you could put it online when you wanted to.
It's a bit install overhead required (probably XAMP) though.

Do I need a ETL?

We currently use Datastage ETL to - Export a CSV/text file with data from 15 tables(3 different schemas) on a daily basis.
I am wondering If there is a simpler way to accomplish this with out using an ETL. I tried Scriptella. It looks simple/fast, but it again it is an ETL. Please suggest..
We use Python. Every programming language -- every single one ever invented -- is an alternative to an ETL.
You never need an ETL.
The questions is these:
Which is cheaper to build? Custom software or a configuration of an ETL?
Which is cheaper to maintain an operate?
Which is easier to adapt to changing requirements?
Why not use a free and easy to use ETL tool such as expressor Studio. You can download it at http://www.expressorstudio.com.
My 2 cents.
Datastage is an awful tool, and expensive to license.
SSIS is much simpler, or cloverETL is good.
ETL tool vs code is a good question.
ETL tools often have better performance as can queue data up ready to be used
where programming is is going to do this one at a time, and datastage can do this in parallel (but again i think it blows). PLus ETL tools can get data from multiple heterogeneous sources, where as you cant do this (easily) with code.
However if any data transformations etc are all to be done with data on the same server, I generally end up doing as much in SQL/TSQL(or PL/SQL) as possible, as it is just tonnes easier to debug/maintain. Primary Keys/Foreign Keys are your friend, and any missed lookups can be checked through checking counts later on to ensure data integrity is in order.
You do not need a ETL tool for that purpose. You can perform all the tasks using python, right from extracting data from CSVs/XMLs/text files, transforming data (identifying data types, null value transformation) and loading into tables.
https://towardsdatascience.com/python-etl-tools-best-8-options-5ef731e70b49
ETL can definitely be performed without the help of ETL Tools.
for eg: we can develop python scripts or there is open sources like Drift to work with it.
I think it's better to use a cheap ETL tool for your task. Because ETL tools work better than code always and make your task easy.
ETL Tool Vs Manual Script
“According to the IT research firm Forrester, the low-code development
platform market will reach a value of $21.2 billion by 2022, growing
at an annual rate of 40 percent. What’s more, 45 percent of developers
have already used a low-code platform or expect to do so in the near
future.”

Javascript source code analysis ( specifically duplication checking )

Partial duplicate of this
Notes:
I already use JSLint extensively via a tool I wrote that scans in intervals my current project directory for recently updated/created .js files. It's drastically improved productivity for me and I doubt there is anything as good as JSLint for the price (it's free).
That said, is there any analysis tool out there that can find repetitive or near-duplicate code blocks, the goal being to make it easier to find opportunities to consolidate large files or small/medium sized projects?
May not be exactly what your after, but Google's Javascript optimizer is worth a look.
Our CloneDR is a tool for finding exact and near-miss cloned code blocks for a variety of languages. It will find duplicates in the same file or across literally thousands of files if you have them. You don't have to provide it with any guidance; it can find the cloned code by itself. And it won't be fooled by different indentation or line breaks, or even consistent renaming of identifiers
It does support JavaScript, even if it isn't clear from the website.
You can see sample clone reports for a variety of langauges at the website.
You may want to have a look at jsinspect.
jsinspect ./src
It will print a list of code blocks that are either identical or structurally very similar.
And there's also jscpd.

Perform a batch of queries on a set of Shark performance logs?

I've been using Shark to benchmark a (very large) application and have a set of features I drill down into each time (e.g., focus on one function and remove stacks with particular others to determine the milliseconds for a particular feature on that run). So far, so good.
I'd like to write a script that takes in a bunch of shark session files and outputs the results of these queries for each file: is there a way to programmatically interact with Shark, or perhaps a way to understand the session log format?
Thanks!
I think this will be tricky unless you can reverse-engineer the Shark data files. The only other possibility I can think of is to export the profiles as text and manipulate these (obviously only works if there's enough info in the exported text to do what you need to do.)
I would also suggest asking the question again on Apple's PerfOptimization-dev mailing list (PerfOptimization-dev#lists.apple.com) - there are a number of Apple engineers on that list who can usually come up with good advice when it comes to performance and the Apple CHUD tools etc.

Resources