Sort a QTreeWidgetItem by a data value? - sorting

Is it possible to sort a PyQt QTreeWidget by a QTreeWidgetItem's data column?
For example, I've got a list of directories I want to sort by size on disk displaying in the QTreeWidget, but instead of displaying the results (in bytes) I use a method to convert the directory size to megabytes/gigabytes, but still be able to sort the items by the actual value, which I've stored in the QTreeWidgetItem's data slot.

Yea, I've run into this a lot. I've always thought that theoretically doing something like:
item = QTreeWidgetItem()
item.setData(0, Qt.EditRole, QVariant(1024))
item.setData(0, Qt.DisplayRole, '1 Kb')
SHOULD work...unfortunately it does not. The only real way to get it to work is to subclass the QTreeWidgetItem and either store your sort information in its own way, or use the Qt.UserRole vs. EditRole or DisplayRole:
class MyTreeWidgetItem(QTreeWidgetItem):
def __lt__( self, other ):
if ( not isinstance(other, MyTreeWidgetItem) ):
return super(MyTreeWidgetItem, self).__lt__(other)
tree = self.treeWidget()
if ( not tree ):
column = 0
else:
column = tree.sortColumn()
return self.sortData(column) < other.sortData(column)
def __init__( self, *args ):
super(MyTreeWidgetItem, self).__init__(*args)
self._sortData = {}
def sortData( self, column ):
return self._sortData.get(column, self.text(column))
def setSortData( self, column, data ):
self._sortData[column] = data
So, using it in this case would be similar to before, but actually allow for sorting via custom data:
item = MyTreeWidgetItem()
item.setSortData(0, 1024)
item.setText(0, '1 kb')01

Related

multiprocessing a geopandas.overlay() throws no error but seemingly never completes

I'm trying to pass a geopandas.overlay() to multiprocessing to speed it up.
I have used custom functions and functools to partially fill function inputs and then pass the iterative component to the function to produce a series of dataframes that I then concat into one.
def taska(id, points, crs):
return make_break_points((vms_points[points.ID == id]).reset_index(drop=True), crs)
points_gdf = geodataframe of points with an id field
grid_gdf = geodataframe polygon grid
partialA = functools.partial(taska, points=points_gdf, crs=grid_gdf.crs)
partialA_results =[]
with Pool(cpu_count()-4) as pool:
for results in pool.map(partialA, list(points_gdf.ID.unique())):
partialA_results.append(results)
bpts_gdf = pd.concat(partialA_results)
In the example above I use the list of unique values to subset the df and pass it to a processor to perform the function and return the results. In the end all the results are combined using pd.concat.
When I apply the same approach to a list of dataframes created using numpy.array_split() the process starts with a number of processors, then they all close and everything hangs with no indication that work is being done or that it will ever exit.
def taskc(tracks, grid):
return gpd.overlay(tracks, grid, how='union').explode().reset_index(drop=True)
tracks_gdf = geodataframe of points with an id field
dfs = np.array_split(tracks_gdf, (cpu_count()-4))
grid_gdf = geodataframe polygon grid
partialC_results = []
partialC = functools.partial(taskc, grid=grid_gdf)
with Pool(cpu_count() - 4) as pool:
for results in pool.map(partialC, dfs):
partialC_results.append(results)
results_df = pd.concat(partialC_results)
I tried using with get_context('spawn').Pool(cpu_count() - 4) as pool: based on the information here https://pythonspeed.com/articles/python-multiprocessing/ with no change in behavior.
Additionally, if I simply run geopandas.overlay(tracks_gdf, grid_gdf) the process is successful and the script carries on to the end with expected results.
Why does the partial function approach work on a list of items but not a list of dataframes?
Is the numpy.array_split() not an iterable object like a list?
How can I pass a single df into geopandas.overlay() in chunks to utilize multiprocessing capabilities and get back a single dataframe or a series of dataframes to concat?
This is my work around but am also interested if there is a better way to perform this and similar tasks. Essentially, modified the partial function so the df split is moved to the partial function then I create a list of values from range() as my iteral.
def taskc(num, tracks, grid):
return gpd.overlay(np.array_split(tracks, cpu_count()-4)[num], grid, how='union').explode().reset_index(drop=True)
partialC = functools.partial(taskc, tracks=tracks_gdf, grid=grid_gdf)
dfrange = list(range(0, cpu_count() - 4))
partialC_results = []
with get_context('spawn').Pool(cpu_count() - 4) as pool:
for results in pool.map(partialC, dfrange):
partialC_results.append(results)
results_gdf = pd.concat(partialC_results)

How to select two random values from two strings if both the strings contains same values

two drop downs
1)Array _ depart city contains N cities
aaa,bbb,ccc,ddd,eee,fff,......nnn
2)Array _ arrival city contains N cities
aaa,bbb,ccc,ddd,eee,fff,......nnn
want to select two cities from two strings randomly but two cities should not match
You can include following in a JSR223 Sampler
def cities =["aaa","bbb","ccc","ddd","eee","fff","test"]
//Remove a random city and assign to dep_city
cities.shuffle()
def dep_city = cities.pop()
//Remove a random city and assign to arrival_city
cities.shuffle()
def arrival_city= cities.pop()
//Setting the variables
vars.put("dep_city", dep_city)
vars.put("arrival_city", arrival_city)
SampleResult.setIgnore() //Result is not generated
Groovy is used for the scripting
Shuffle is used to randomly reorder the elements
Pop is used to remove the first element from the list
First of all I don't think your approach is correct, test needs to be repeatable and your "random" logic may lead to the situation when one test run reveals performance problem and the next one doesn't because the data is different.
So maybe it makes more sense to consider using parameterization instead, i.e. put all the cities to the CSV file and use CSV Data Set Config to read them.
If you really want your test to use random data and you have these "arrays" in form of string like in your question you can implement the randomization using any suitable JSR223 Test Element and the example code like:
def dep_array = 'aaa,bbb,ccc,ddd,eee,fff'
def arr_array = 'aaa,bbb,ccc,ddd,eee,fff'
def getRandomCity(String cities, Object prev) {
def array = cities.split(',')
def rv = array[org.apache.commons.lang3.RandomUtils.nextInt(0, array.size())]
if (prev != null) {
if (rv == prev) {
rv = getRandomCity(array.join(','), prev)
}
}
return rv
}
def dep_city = getRandomCity(dep_array, null)
def arr_city = getRandomCity(arr_array, dep_city)
vars.put('dep_city', dep_city)
vars.put('arr_city', arr_city)
You will be able to access the values as ${dep_city} and ${arr_city} later on where required

Conversion of mutable collections to immutable introduces performance penalty

I've encountered a strange behavior regarding conversion of mutable collections to immutable ones, which might significantly affect performance.
Let's take a look at the following code:
val map: Map[String, Set[Int]] = createMap()
while (true) {
map.get("existing-key")
}
It simply creates a map once, and then repeatedly accesses one of its enries, which contains a set as a value. It may create the map in several ways:
With immutable collections:
def createMap() = keys.map(key => key -> (1 to amount).toSet).toMap
Or with mutable collections (note the two conversion options at the end):
def createMap() = {
val map = mutable.Map[String, mutable.Set[Int]]()
for (key <- keys) {
val set = map.getOrElseUpdate(key, mutable.Set())
for (i <- 1 to amount) {
set.add(i)
}
}
map.toMap.mapValues(_.toSet) // option #1
map.mapValues(_.toSet).toMap // option #2
}
Curiously enough, mutable #1 code creates a map which invokes toSet on its values whenever get is invoked (if the entry exists), which may introduce a significant performance hit (depending on the use-case).
Why is this happening? How can this be avoided?
mapValues simply returns a map view which maps every key of this map to f(this(key)). The resulting map wraps the original map without copying any elements.
Looking at the implementation, mapValues returns an instance of MappedValues which override the get function:
def get(key: K) = self.get(key).map(f)
If you want to force the materialization of the map, call toMap after the mapValues call. Just like you did in #2!

Graphlab: How to avoid manually duplicating functions that has only a different string variable?

I imported my dataset with SFrame:
products = graphlab.SFrame('amazon_baby.gl')
products['word_count'] = graphlab.text_analytics.count_words(products['review'])
I would like to do sentiment analysis on a set of words shown below:
selected_words = ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate']
Then I would like to create a new column for each of the selected words in the products matrix and the entry is the number of times such word occurs, so I created a function for the word "awesome":
def awesome_count(word_count):
if 'awesome' in product:
return product['awesome']
else:
return 0;
products['awesome'] = products['word_count'].apply(awesome_count)
so far so good, but I need to manually create other functions for each of the selected words in this way, e.g., great_count, etc. How to avoid this manual effort and write cleaner code?
I think the SFrame.unpack command should do the trick. In fact, the limit parameter will accept your list of selected words and keep only these results, so that part is greatly simplified.
I don't know precisely what's in your reviews data, so I made a toy example:
# Create the data and convert to bag-of-words.
import graphlab
products = graphlab.SFrame({'review':['this book is awesome',
'I hate this book']})
products['word_count'] = \
graphlab.text_analytics.count_words(products['review'])
# Unpack the bag-of-words into separate columns.
selected_words = ['awesome', 'hate']
products2 = products.unpack('word_count', limit=selected_words)
# Fill in zeros for the missing values.
for word in selected_words:
col_name = 'word_count.{}'.format(word)
products2[col_name] = products2[col_name].fillna(value=0)
I also can't help but point out that GraphLab Create does have its own sentiment analysis toolkit, which could be worth checking out.
I actually find out an easier way do do this:
def wordCount_select(wc,selectedWord):
if selectedWord in wc:
return wc[selectedWord]
else:
return 0
for word in selected_words:
products[word] = products['word_count'].apply(lambda wc: wordCount_select(wc, word))

Sorting a Lua table by key

I have gone through many questions and Google results but couldn't find the solution.
I am trying to sort a table using table.sort function in Lua but I can't figure out how to use it.
I have a table that has keys as random numeric values. I want to sort them in ascending order. I have gone through the Lua wiki page also but table.sort only works with the table values.
t = { [223]="asd", [23]="fgh", [543]="hjk", [7]="qwe" }
I want it like:
t = { [7]="qwe", [23]="fgh", [223]="asd", [543]="hjk" }
You cannot set the order in which the elements are retrieved from the hash (which is what your table is) using pairs. You need to get the keys from that table, sort the keys as its own table, and then use those sorted keys to retrieve the values from your original table:
local t = { [223]="asd", [23]="fgh", [543]="hjk", [7]="qwe" }
local tkeys = {}
-- populate the table that holds the keys
for k in pairs(t) do table.insert(tkeys, k) end
-- sort the keys
table.sort(tkeys)
-- use the keys to retrieve the values in the sorted order
for _, k in ipairs(tkeys) do print(k, t[k]) end
This will print
7 qwe
23 fgh
223 asd
543 hjk
Another option would be to provide your own iterator instead of pairs to iterate the table in the order you need, but the sorting of the keys may be simple enough for your needs.
What was said by #lhf is true, your lua table holds its contents in whatever order the implementation finds feasible. However, if you want to print (or iterate over it) in a sorted manner, it is possible (so you can compare it element by element). To achieve this, you can do it in the following way
for key, value in orderedPairs(mytable) do
print(string.format("%s:%s", key, value))
end
Unfortunately, orderedPairs is not provided as a part of lua, you can copy the implementation from here though.
The Lua sort docs provide a good solution
local function pairsByKeys (t, f)
local a = {}
for n in pairs(t) do table.insert(a, n) end
table.sort(a, f)
local i = 0 -- iterator variable
local iter = function () -- iterator function
i = i + 1
if a[i] == nil then return nil
else return a[i], t[a[i]]
end
end
return iter
end
Then you traverse the sorted structure
local t = { b=1, a=2, z=55, c=0, qa=53, x=8, d=7 }
for key,value in pairsByKeys(t) do
print(" " .. tostring(key) .. "=" .. tostring(value))
end
There is no notion of order in Lua tables: they are just sets of key-value pairs.
The two tables below have exactly the same contents because they contain exactly the same pairs:
t = { [223] = "asd" ,[23] = "fgh",[543]="hjk",[7]="qwe"}
t = {[7]="qwe",[23] = "fgh",[223] = "asd" ,[543]="hjk"}

Resources