Related
I am working on grouping of elements for large dataset in least amount of time.
Below is small dataset example of the same:
Input:
pairs = [[1,2],[8,10],[4,8],[3,4],[2,3],[5,6],[6,7],[9,4],[11,12],[13,10],[16,1],[78,79],[61,3],[93,94]]
How to derive output:
If any of the element is present in another pair then both pair can be grouped. This way traverse all pairs and create different groups.
Output: -
[{1, 2, 3, 4, 8, 9, 10, 13, 16, 61}, {5, 6, 7}, {11, 12}, {78, 79}, {93, 94}]
Question:
What is the best way to solve this problem?
e.g.: graph database(neo4j) or any specific readily available algorithm.
any other alternatives which is not listed in example solution(neo4j, algorithm) above?
Below output is expected and solution should consume less time with large dataset.
Expected Output: -
[{1, 2, 3, 4, 8, 9, 10, 13, 16, 61}, {5, 6, 7}, {11, 12}, {78, 79}, {93, 94}]
Based on the documentation provided here, https://github.com/huggingface/transformers/blob/v4.21.3/src/transformers/modeling_outputs.py#L101, how can i read all the outputs, last_hidden_state (), pooler_output and hidden_state. in my sample code below, i get the outputs
from transformers import BertModel, BertConfig
config = BertConfig.from_pretrained("xxx", output_hidden_states=True)
model = BertModel.from_pretrained("xxx", config=config)
outputs = model(inputs)
when i print one of the output (sample below) . i looked through the documentation to see if i can use some functions of this class to just get the last_hidden_state values , but i'm not sure of the type here.
the value for the last_hidden_state =
tensor([[...
is it some class or tuple or array .
how can i get the values or array of values such as
[0, 1, 2, 3 , ...]
BaseModelOutputWithPoolingAndNoAttention(
last_hidden_state=tensor([
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
...
hidden_states= ...
The BaseModelOutputWithPoolingAndCrossAttentions you retrieve is class that inherits from OrderedDict (code) that holds pytorch tensors. You can access the keys of the OrderedDict like properties of a class and, in case you do not want to work with Tensors, you can them to python lists or numpy. Please have a look at the example below:
from transformers import BertTokenizer, BertModel
t = BertTokenizer.from_pretrained("bert-base-cased")
m = BertModel.from_pretrained("bert-base-cased")
i = t("This is a test", return_tensors="pt")
o = m(**i, output_hidden_states=True)
print(o.keys())
print(type(o.last_hidden_state))
print(o.last_hidden_state.tolist())
print(o.last_hidden_state.detach().numpy())
Output:
odict_keys(['last_hidden_state', 'pooler_output', 'hidden_states'])
<class 'transformers.modeling_outputs.BaseModelOutputWithPoolingAndCrossAttentions'>
<class 'torch.Tensor'>
[[[0.36328405141830444, 0.018902940675616264, 0.1893523931503296, ..., 0.09052444249391556, 1.4617693424224854, 0.0774402841925621]]]
[[[ 0.36328405 0.01890294 0.1893524 ... -0.0259465 0.38701165
0.19099694]
[ 0.30656984 -0.25377586 0.76075834 ... 0.2055152 0.29494798
0.4561815 ]
[ 0.32563183 0.02308523 0.665546 ... 0.34597045 -0.0644953
0.5391255 ]
[ 0.3346715 -0.02526359 0.12209094 ... 0.50101244 0.36993945
0.3237842 ]
[ 0.18683438 0.03102166 0.25582778 ... 0.5166369 -0.1238729
0.4419385 ]
[ 0.81130844 0.4746894 -0.03862225 ... 0.09052444 1.4617693
0.07744028]]]
I'm quite newby to Power Query. I have a column for the date, called MyDate, format (dd/mm/yy), and another variable called TotalSales. Is there any way of obtaining a variable TotalSalesYTD, with the sum of year-to-date TotalSales for each row? I've seen you can do that at Power Pivot or Power Bi, but didn't find anything for Power Query.
Alternatively, is there a way of creating a variable TotalSales12M, for the rolling sum of the last 12 months of TotalSales?
I wasn't able to test this properly, but the following code gave me your expected result:
let
initialTable = Table.FromRows({
{#date(2020, 5, 1), 150},
{#date(2020, 4, 1), 20},
{#date(2020, 3, 1), 54},
{#date(2020, 2, 1), 84},
{#date(2020, 1, 1), 564},
{#date(2019, 12, 1), 54},
{#date(2019, 11, 1), 678},
{#date(2019, 10, 1), 885},
{#date(2019, 9, 1), 54},
{#date(2019, 8, 1), 98},
{#date(2019, 7, 1), 654},
{#date(2019, 6, 1), 45},
{#date(2019, 5, 1), 64},
{#date(2019, 4, 1), 68},
{#date(2019, 3, 1), 52},
{#date(2019, 2, 1), 549},
{#date(2019, 1, 1), 463},
{#date(2018, 12, 1), 65},
{#date(2018, 11, 1), 45},
{#date(2018, 10, 1), 68},
{#date(2018, 9, 1), 65},
{#date(2018, 8, 1), 564},
{#date(2018, 7, 1), 16},
{#date(2018, 6, 1), 469},
{#date(2018, 5, 1), 4}
}, type table [MyDate = date, TotalSales = Int64.Type]),
ListCumulativeSum = (numbers as list) as list =>
let
accumulator = (listState as list, toAdd as nullable number) as list =>
let
previousTotal = List.Last(listState, 0),
combined = listState & {List.Sum({previousTotal, toAdd})}
in combined,
accumulated = List.Accumulate(numbers, {}, accumulator)
in accumulated,
TableCumulativeSum = (someTable as table, columnToSum as text, newColumnName as text) as table =>
let
values = Table.Column(someTable, columnToSum),
cumulative = ListCumulativeSum(values),
columns = Table.ToColumns(someTable) & {cumulative},
toTable = Table.FromColumns(columns, Table.ColumnNames(someTable) & {newColumnName})
in toTable,
yearToDateColumn =
let
groupKey = Table.AddColumn(initialTable, "$groupKey", each Date.Year([MyDate]), Int64.Type),
grouped = Table.Group(groupKey, "$groupKey", {"toCombine", each
let
sorted = Table.Sort(_, {"MyDate", Order.Ascending}),
cumulative = TableCumulativeSum(sorted, "TotalSales", "TotalSalesYTD")
in cumulative
}),
combined = Table.Combine(grouped[toCombine]),
removeGroupKey = Table.RemoveColumns(combined, "$groupKey")
in removeGroupKey,
rolling = Table.AddColumn(yearToDateColumn, "TotalSales12M", each
let
inclusiveEnd = [MyDate],
exclusiveStart = Date.AddMonths(inclusiveEnd, -12),
filtered = Table.SelectRows(yearToDateColumn, each [MyDate] > exclusiveStart and [MyDate] <= inclusiveEnd),
sum = List.Sum(filtered[TotalSales])
in sum
),
sortedRows = Table.Sort(rolling, {{"MyDate", Order.Descending}})
in
sortedRows
There might be more efficient ways to do what this code does, but if the size of your data is relatively small, then this approach should be okay.
For the year to date cumulative, the data is grouped by year, then sorted ascendingly, then a running total column is added.
For the rolling 12-month total, the data is grouped into 12-month windows and then the sales are totaled within each window. The totaling is a bit inefficient (since all rows are re-processed as opposed to only those which have entered/left the window), but you might not notice it.
Table.Range could have been used instead of Table.SelectRows when creating the 12-month windows, but I figured Table.SelectRows makes less assumptions about the input data (i.e. whether it's sorted, whether any months are missing, etc.) and is therefore safer/more robust.
This is what I get:
I am trying to find all the possible differences between the elements of one list.
For example:
x=[1,4,10,17,20,35].
I would like to have as an answer an array:
y=[3, 9, 16, 19, 34, 3, 6, 13, 16, 31, 9, 6, 7, 10, 25, 16, 13, 10, 3, 18, 19, 16, 10, 3, 15, 34, 31, 25, 18, 15]
corresponding to
[1-4, 1-10, 1-17, 1-20, 1-35, 4-1, 4-10, 4-17, ....]
I have tried to do that with diff, but I only get the difference of two consecutive numbers. and I do not really know how to compute it in a loop.
Can you please help?
A Python 1-liner:
>>> [abs(a - b) for i,a in enumerate(x) for j,b in enumerate(x) if i != j]
[3, 9, 16, 19, 34, 3, 6, 13, 16, 31, 9, 6, 7, 10, 25, 16, 13, 7, 3, 18, 19, 16, 10, 3, 15, 34, 31, 25, 18, 15]
A solution written in python
elements = [1,4,10,17,20,35]
differences = []
for i , element in enumerate(elements):
for j, element2 in enumerate(elements):
if i != j:
differences.append( abs(element - element2) )
Writing in Java, it is as simple as this:
List<Integer> diff = new ArrayList<Integer>();
for(int i=0; i<list.size(); i++) {
for(int j=0; j<list.size(); j++) {
if(i != j)
diff.add(Math.abs(list.get(i) - list.get(j)));
}
}
I have hours in an array 1 - 24 as integers just like that. I want to know how to format the x axis for that those values are displayed as times 12:00 am - 12:00 pm. I cant seem to figure out how to do it.
Here's one way you could do it.
var hours = [
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24
];
var timeformat = d3.time.format('%I:%M%p');
var now = new Date()
now.setHours(hours[15]);
now.setMinutes(0);
var time = timeformat(now); // 04:00PM
Working Fiddle