How to apply partial sort on a Spark DataFrame?

How to apply partial sort on a Spark DataFrame? - sorting

The following code:
val myDF = Seq(83, 90, 40, 94, 12, 70, 56, 70, 28, 91).toDF("number")
myDF.orderBy("number").limit(3).show
outputs:
+------+
|number|
+------+
| 12|
| 28|
| 40|
+------+
Does Spark's laziness in combination with the limit call and the implementation of orderBy automatically result in a partially sorted DataFrame, or are the remaining 7 numbers also sorted, even though it's not needed? And if so, is there a way to avoid this needless computational work?
Using .explain() shows, that two sorts stages are performed, first on each partition and then (with the top 3 each) a global one. But it does not state if these sorts are full or partial.
myDF.orderBy("number").limit(3).explain(true)
== Parsed Logical Plan ==
GlobalLimit 3
+- LocalLimit 3
+- Sort [number#3416 ASC NULLS FIRST], true
+- Project [value#3414 AS number#3416]
+- LocalRelation [value#3414]
== Analyzed Logical Plan ==
number: int
GlobalLimit 3
+- LocalLimit 3
+- Sort [number#3416 ASC NULLS FIRST], true
+- Project [value#3414 AS number#3416]
+- LocalRelation [value#3414]
== Optimized Logical Plan ==
GlobalLimit 3
+- LocalLimit 3
+- Sort [number#3416 ASC NULLS FIRST], true
+- LocalRelation [number#3416]
== Physical Plan ==
TakeOrderedAndProject(limit=3, orderBy=[number#3416 ASC NULLS FIRST], output=[number#3416])
+- LocalTableScan [number#3416]

If you explain() your dataframe, you'll find that Spark will first do a "local" sort within each partition, and then pick only top three elements from each for a final global sort before taking the top three out of it.
scala> myDF.orderBy("number").limit(3).explain(true)
== Parsed Logical Plan ==
GlobalLimit 3
+- LocalLimit 3
+- Sort [number#3 ASC NULLS FIRST], true
+- Project [value#1 AS number#3]
+- LocalRelation [value#1]
== Analyzed Logical Plan ==
number: int
GlobalLimit 3
+- LocalLimit 3
+- Sort [number#3 ASC NULLS FIRST], true
+- Project [value#1 AS number#3]
+- LocalRelation [value#1]
== Optimized Logical Plan ==
GlobalLimit 3
+- LocalLimit 3
+- Sort [number#3 ASC NULLS FIRST], true
+- LocalRelation [number#3]
== Physical Plan ==
TakeOrderedAndProject(limit=3, orderBy=[number#3 ASC NULLS FIRST], output=[number#3])
+- LocalTableScan [number#3]
I think its best seen in the Optimized Logical Plan section, but physical says the same thing.

myDF.orderBy("number").limit(3).show
myDF.limit(3).orderBy("number").show
1 => will do full sort and then pick first 3 elements.
2 => will return dataframe with first 3 elements and sort.

Related

Data structure to achieve random delete and insert where elements are weighted in [a,b]

I would like to design a data structure and algorithm such that, given an array of elements, where each element has a weight according to [a,b], I can achieve constant time insertion and deletion. The deletion is performed randomly where the probability of an element being deleted is proportional to its weight.
I do not believe there is a deterministic algorithm that can achieve both operations in constant time, but I think there are there randomized algorithms that should be can accomplish this?

I don't know if O(1) worst-case time is impossible; I don't see any particular reason it should be. But it's definitely possible to have a simple data structure which achieves O(1) expected time.
The idea is to store a dynamic array of pairs (or two parallel arrays), where each item is paired with its weight; insertion is done by appending in O(1) amortised time, and an element can be removed by index by swapping it with the last element so that it can be removed from the end of the array in O(1) time. To sample a random element from the weighted distribution, choose a random index and generate a random number in the half-open interval [0, 2); if it is less than the element's weight, select the element at that index, otherwise repeat this process until an element is selected. The idea is that each index is equally likely to be chosen, and the probability it gets kept rather than rejected is proportional to its weight.
This is a Las Vegas algorithm, meaning it is expected to complete in a finite time, but with very low probability it can take arbitrarily long to complete. The number of iterations required to sample an element will be highest when every weight is exactly 1, in which case it follows a geometric distribution with parameter p = 1/2, so its expected value is 2, a constant which is independent of the number of elements in the data structure.
In general, if all weights are in an interval [a, b] for real numbers 0 < a <= b, then the expected number of iterations is at most b/a. This is always a constant, but it is potentially a large constant (i.e. it takes many iterations to select a single sample) if the lower bound a is small relative to b.

This is not an answer per se, but just a tiny example to illustrate the algorithm devised by #kaya3
| value | weight |
| v1 | 1.0 |
| v2 | 1.5 |
| v3 | 1.5 |
| v4 | 2.0 |
| v5 | 1.0 |
| total | 7.0 |
The total weight is 7.0. It's easy to maintain in O(1) by storing it in some memory and increasing/decreasing at each insertion/removal.
The probability of each element is simply it's weight divided by total weight.
| value | proba |
| v1 | 1.0/7 | 0.1428...
| v2 | 1.5/7 | 0.2142...
| v3 | 1.5/7 | 0.2142...
| v4 | 2.0/7 | 0.2857...
| v5 | 1.0/7 | 0.1428...
Using the algorithm of #kaya3, if we draw a random index, then the probability of each value is 1/size (1/5 here).
The chance of being rejected is 50% for v1, 25% for v2 and 0% for v4. So at first round, the probability to be selected are:
| value | proba |
| v1 | 2/20 | 0.10
| v2 | 3/20 | 0.15
| v3 | 3/20 | 0.15
| v4 | 4/20 | 0.20
| v5 | 2/20 | 0.10
| total | 14/20 | (70%)
Then the proba of having a 2nd round is 30%, and the proba of each index is 6/20/5 = 3/50
| value | proba 2 rounds |
| v1 | 2/20 + 6/200 | 0.130
| v2 | 3/20 + 9/200 | 0.195
| v3 | 3/20 + 9/200 | 0.195
| v4 | 4/20 + 12/200 | 0.260
| v5 | 2/20 + 6/200 | 0.130
| total | 14/20 + 42/200 | (91%)
The proba to have a 3rd round is 9%, that is 9/500 for each index
| value | proba 3 rounds |
| v1 | 2/20 + 6/200 + 18/2000 | 0.1390
| v2 | 3/20 + 9/200 + 27/2000 | 0.2085
| v3 | 3/20 + 9/200 + 27/2000 | 0.2085
| v4 | 4/20 + 12/200 + 36/2000 | 0.2780
| v5 | 2/20 + 6/200 + 18/2000 | 0.1390
| total | 14/20 + 42/200 + 126/2000 | (97,3%)
So we see that the serie is converging to the correct probabilities. The numerators are multiple of the weight, so it's clear that the relative weight of each element is respected.

This is a sketch of an answer.
With weights only 1, we can maintain a random permutation of the inputs.
Each time an element is inserted, put it at the end of the array, then pick a random position i in the array, and swap the last element with the element at position i.
(It may well be a no-op if the random position turns out to be the last one.)
When deleting, just delete the last element.
Assuming we can use a dynamic array with O(1) (worst case or amortized) insertion and deletion, this does both insertion and deletion in O(1).
With weights 1 and 2, the similar structure may be used.
Perhaps each element of weight 2 should be put twice instead of once.
Perhaps when an element of weight 2 is deleted, its other copy should also be deleted.
So we should in fact store indices instead of the elements, and another array, locations, which stores and tracks the two indices for each element. The swaps should keep this locations array up-to-date.
Deleting an arbitrary element can be done in O(1) similarly to inserting: swap with the last one, delete the last one.

How do I get one row for every Min or Max on every column of a dataframe in Pyspark efficiently?

I'm trying to reduce a big dataset to rows having minimum and maximum values for each column. In other words, I would like, for every column of this dataset to get one row that has the minimum value on that column, as well as another that has the maximum value on the same column. I should mention that I do not know in advance what columns this dataset will have. Here's an example:
+----+----+----+ +----+----+----+
|Col1|Col2|Col3| ==> |Col1|Col2|Col3|
+----+----+----+ +----+----+----+
| F | 99 | 17 | | A | 34 | 25 |
| M | 32 | 20 | | Z | 51 | 49 |
| D | 2 | 84 | | D | 2 | 84 |
| H | 67 | 90 | | F | 99 | 17 |
| P | 54 | 75 | | C | 18 | 9 |
| C | 18 | 9 | | H | 67 | 90 |
| Z | 51 | 49 | +----+----+----+
| A | 34 | 25 |
+----+----+----+
The first row is selected because A is the smallest value on Col1. The second because Z is the largest value on Col1. The third because 2 is the smallest on Col2, and so on. The code below seems to do the right thing (correct me if I'm wrong), but performance is sloooow. I start with getting a dataframe from a random .csv file:
input_file = (sqlContext.read
.format("csv")
.options(header="true", inferSchema="true", delimiter=";", charset="UTF-8")
.load("/FileStore/tables/random.csv")
)
Then I create two other dataframes that each have one row with the min and respectively, max values of each column:
from pyspark.sql.functions import col, min, max
min_values = input_file.select(
*[min(col(col_name)).name(col_name) for col_name in input_file.columns]
)
max_values = input_file.select(
*[max(col(col_name)).name(col_name) for col_name in input_file.columns]
)
Finally, I repeatedly join the original input file to these two dataframes holding minimum and maximum values, using every column in turn, and do a union between all the results.
min_max_rows = (
input_file
.join(min_values, input_file[input_file.columns[0]] == min_values[input_file.columns[0]])
.select(input_file["*"]).limit(1)
.union(
input_file
.join(max_values, input_file[input_file.columns[0]] == max_values[input_file.columns[0]])
.select(input_file["*"]).limit(1)
)
)
for c in input_file.columns[1:]:
min_max_rows = min_max_rows.union(
input_file
.join(min_values, input_file[c] == min_values[c])
.select(input_file["*"]).limit(1)
.union(
input_file
.join(max_values, input_file[c] == max_values[c])
.select(input_file["*"]).limit(1)
)
)
min_max_rows.dropDuplicates()
For my test dataset of 500k rows, 40 columns, doing all this takes about 7-8 minutes on a standard Databricks cluster. I'm supposed to sift through more than 20 times this amount of data regularly. Is there any way to optimize this code? I'm quite afraid I've taken the naive approach to it, since I'm quite new to Spark.
Thanks!

Does not seem to be a popular question, but interesting (for me). And a lot of work for 15 pts. In fact I got it wrong first time round.
Here is a scaleable solution that you can partition accordingly to increase throughput.
Hard to explain, manipulation of the data and transposing the data is
the key issue here - and some lateral thinking.
I did not focus on variable columns all sorts of data types. That needs to be solved by yourself, can be done but some if else logic required to check if alpha or double or numeric. Mixing data types and applying to stuff gets problematic, but can be solved. I gave a notion of num_string, but did not complete that.
I have focused on the scalability issue and approach, with less procedural logic. Smaller sample size with all numbers, but correct as now as far as I can see. General principle is there.
Try it. Success.
Code:
from pyspark.sql.functions import *
from pyspark.sql.types import *
def reshape(df, by):
cols, dtypes = zip(*((c, t) for (c, t) in df.dtypes if c not in by))
kvs = explode(array([
struct(lit(c).alias("key"), col(c).alias("val")) for c in cols
])).alias("kvs")
return df.select(by + [kvs]).select(by + ["kvs.key", "kvs.val"])
df1 = spark.createDataFrame(
[(4, 15, 3), (200, 100, 25), (7, 16, 4)], ("c1", "c2", "c3"))
df1 = df1.withColumn("rowId", monotonically_increasing_id())
df1.cache
df1.show()
df2 = reshape(df1, ["rowId"])
df2.show()
# In case you have other types like characters in the other column - not focusing on that aspect
df3 = df2.withColumn("num_string", format_string("%09d", col("val")))
# Avoid column name issues.
df3 = df3.withColumn("key1", col("key"))
df3.show()
df3 = df3.groupby('key1').agg(min(col("val")).alias("min_val"), max(col("val")).alias("max_val"))
df3.show()
df4 = df3.join(df2, df3.key1 == df2.key)
new_column_condition = expr(
"""IF(val = min_val, -1, IF(val = max_val, 1, 0))"""
)
df4 = df4.withColumn("col_class", new_column_condition)
df4.show()
df5 = df4.filter( '(min_val = val or max_val = val) and col_class <> 0' )
df5.show()
df6 = df5.join(df1, df5.rowId == df1.rowId)
df6.show()
df6.select([c for c in df6.columns if c in ['c1','c2', 'c3']]).distinct().show()
Returns:
+---+---+---+
| c1| c2| c3|
+---+---+---+
| 4| 15| 3|
|200|100| 25|
+---+---+---+
Data wrangling the clue here.

What does +-# after percent of cache misses mean in perf stat?

I used perf stat --repeat 100 -e cache-references,cache- misses,cycles,instructions,branches,faults,migrations ./avx2ADD command and the output is followed. What does +- 8.93% for cache-misses mean when percent of cache misses is equal to 4.010 %?
32,425 cache-references ( +- 0.54% )
1,300 cache-misses # 4.010 % of all cache refs ( +- 8.93% )
538,839 cycles ( +- 0.28% )
520,056 instructions # 0.97 insns per cycle ( +- 0.22% )
98,720 branches ( +- 0.20% )
95 faults ( +- 0.12% )
0 migrations ( +- 70.35% )

The +- 8.93% part is described in manual page:
-r, --repeat=
repeat command and print average + stddev (max: 100). 0 means forever.
If you are not sure what is meant be abbreviation of stddev, it it Standard Deviation (yes, also there could be manual page more verbose). In short, how much the results differ from all the repeated measurements. Less value is better, but if you have such a small problem (500k instructions), deviation will be larger, because cache-misses might be non-deterministic.
The percentage 4.010 % then means the average from the description above.

Counting the ways to build a wall with two tile sizes [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
You are given a set of blocks to build a panel using 3”×1” and 4.5”×1" blocks.
For structural integrity, the spaces between the blocks must not line up in adjacent rows.
There are 2 ways in which to build a 7.5”×1” panel, 2 ways to build a 7.5”×2” panel, 4 ways to build a 12”×3” panel, and 7958 ways to build a 27”×5” panel. How many different ways are there to build a 48”×10” panel?
This is what I understand so far:
with the blocks 3 x 1 and 4.5 x 1
I've used combination formula to find all possible combinations that the 2 blocks can be arranged in a panel of this size
C = choose --> C(n, k) = n!/r!(n-r)! combination of group n at r at a time
Panel: 7.5 x 1 = 2 ways -->
1 (3 x 1 block) and 1 (4.5 x 1 block) --> Only 2 blocks are used--> 2 C 1 = 2 ways
Panel: 7.5 x 2 = 2 ways
I used combination here as well
1(3 x 1 block) and 1 (4.5 x 1 block) --> 2 C 1 = 2 ways
Panel: 12 x 3 panel = 2 ways -->
2(4.5 x 1 block) and 1(3 x 1 block) --> 3 C 1 = 3 ways
0(4.5 x 1 block) and 4(3 x 1 block) --> 4 C 0 = 1 way
3 ways + 1 way = 4 ways
(This is where I get confused)
Panel 27 x 5 panel = 7958 ways
6(4.5 x 1 block) and 0(3 x 1) --> 6 C 0 = 1 way
4(4.5 x 1 block) and 3(3 x 1 block) --> 7 C 3 = 35 ways
2(4.5 x 1 block) and 6(3 x 1 block) --> 8 C 2 = 28 ways
0(4.5 x 1 block) and 9(3 x 1 block) --> 9 C 0 = 1 way
1 way + 35 ways + 28 ways + 1 way = 65 ways
As you can see here the number of ways is nowhere near 7958. What am I doing wrong here?
Also how would I find how many ways there are to construct a 48 x 10 panel?
Because it's a little difficult to do it by hand especially when trying to find 7958 ways.
How would write a program to calculate an answer for the number of ways for a 7958 panel?
Would it be easier to construct a program to calculate the result? Any help would be greatly appreciated.

I don't think the "choose" function is directly applicable, given your "the spaces between the blocks must not line up in adjacent rows" requirement. I also think this is where your analysis starts breaking down:
Panel: 12 x 3 panel = 2 ways -->
2(4.5 x 1 block) and 1(3 x 1 block)
--> 3 C 1 = 3 ways
0(4.5 x 1 block) and 4(3 x 1 block)
--> 4 C 0 = 1 way
3 ways + 1 way = 4 ways
...let's build some panels (1 | = 1 row, 2 -'s = 1 column):
+---------------------------+
| | | | |
| | | | |
| | | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
Here we see that there are 4 different basic row types, but none of these are valid panels (they all violate the "blocks must not line up" rule). But we can use these row types to create several panels:
+---------------------------+
| | | | |
| | | | |
| | | |
+---------------------------+
+---------------------------+
| | | | |
| | | | |
| | | |
+---------------------------+
+---------------------------+
| | | | |
| | | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
...
But again, none of these are valid. The valid 12x3 panels are:
+---------------------------+
| | | | |
| | | |
| | | | |
+---------------------------+
+---------------------------+
| | | |
| | | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
+---------------------------+
| | | |
| | | |
| | | |
+---------------------------+
So there are in fact 4 of them, but in this case it's just a coincidence that it matches up with what you got using the "choose" function. In terms of total panel configurations, there are quite more than 4.

Find all ways to form a single row of the given width. I call this a "row type". Example 12x3: There are 4 row types of width 12: (3 3 3 3), (4.5 4.5 3), (4.5 3 4.5), (3 4.5 4.5). I would represent these as a list of the gaps. Example: (3 6 9), (4.5 9), (4.5 7.5), (3 7.5).
For each of these row types, find which other row types could fit on top of it.
Example:
a. On (3 6 9) fits (4.5 7.5).
b. On (4.5 9) fits (3 7.5).
c: On (4.5 7.5) fits (3 6 9).
d: On (3 7.5) fits (4.5 9).
Enumerate the ways to build stacks of the given height from these rules. Dynamic programming is applicable to this, as at each level, you only need the last row type and the number of ways to get there.
Edit: I just tried this out on my coffee break, and it works. The solution for 48x10 has 15 decimal digits, by the way.
Edit: Here is more detail of the dynamic programming part:
Your rules from step 2 translate to an array of possible neighbours. Each element of the array corresponds to a row type, and holds that row type's possible neighbouring row types' indices.
0: (2)
1: (3)
2: (0)
3: (1)
In the case of 12×3, each row type has only a single possible neighbouring row type, but in general, it can be more.
The dynamic programming starts with a single row, where each row type has exactly one way of appearing:
1 1 1 1
Then, the next row is formed by adding for each row type the number of ways that possible neighbours could have formed on the previous row. In the case of a width of 12, the result is 1 1 1 1 again. At the end, just sum up the last row.
Complexity:
Finding the row types corresponds to enumerating the leaves of a tree; there are about (/ width 3) levels in this tree, so this takes a time of O(2w/3) = O(2w).
Checking whether two row types fit takes time proportional to their length, O(w/3). Building the cross table is proportional to the square of the number of row types. This makes step 2 O(w/3·22w/3) = O(2w).
The dynamic programming takes height times the number of row types times the average number of neighbours (which I estimate to be logarithmic to the number of row types), O(h·2w/3·w/3) = O(2w).
As you see, this is all dominated by the number of row types, which grow exponentially with the width. Fortunately, the constant factors are rather low, so that 48×10 can be solved in a few seconds.

This looks like the type of problem you could solve recursively. Here's a brief outline of an algorithm you could use, with a recursive method that accepts the previous layer and the number of remaining layers as arguments:
Start with the initial number of layers (e.g. 27x5 starts with remainingLayers = 5) and an empty previous layer
Test all possible layouts of the current layer
Try adding a 3x1 in the next available slot in the layer we are building. Check that (a) it doesn't go past the target width (e.g. doesn't go past 27 width in a 27x5) and (b) it doesn't violate the spacing condition given the previous layer
Keep trying to add 3x1s to the current layer until we have built a valid layer that is exactly (e.g.) 27 units wide
If we cannot use a 3x1 in the current slot, remove it and replace with a 4.5x1
Once we have a valid layer, decrement remainingLayers and pass it back into our recursive algorithm along with the layer we have just constructed
Once we reach remainingLayers = 0, we have constructed a valid panel, so increment our counter
The idea is that we build all possible combinations of valid layers. Once we have (in the 27x5 example) 5 valid layers on top of each other, we have constructed a complete valid panel. So the algorithm should find (and thus count) every possible valid panel exactly once.

This is a '2d bin packing' problem. Someone with decent mathematical knowledge will be able to help or you could try a book on computational algorithms. It is known as a "combinatorial NP-hard problem". I don't know what that means but the "hard" part grabs my attention :)
I have had a look at steel cutting prgrams and they mostly use a best guess. In this case though 2 x 4.5" stacked vertically can accommodate 3 x 3" inch stacked horizontally. You could possibly get away with no waste. Gets rather tricky when you have to figure out the best solution --- the one with minimal waste.

Here's a solution in Java, some of the array length checking etc is a little messy but I'm sure you can refine it pretty easily.
In any case, I hope this helps demonstrate how the algorithm works :-)
import java.util.Arrays;
public class Puzzle
{
// Initial solve call
public static int solve(int width, int height)
{
// Double the widths so we can use integers (6x1 and 9x1)
int[] prev = {-1}; // Make sure we don't get any collisions on the first layer
return solve(prev, new int[0], width * 2, height);
}
// Build the current layer recursively given the previous layer and the current layer
private static int solve(int[] prev, int[] current, int width, int remaining)
{
// Check whether we have a valid frame
if(remaining == 0)
return 1;
if(current.length > 0)
{
// Check for overflows
if(current[current.length - 1] > width)
return 0;
// Check for aligned gaps
for(int i = 0; i < prev.length; i++)
if(prev[i] < width)
if(current[current.length - 1] == prev[i])
return 0;
// If we have a complete valid layer
if(current[current.length - 1] == width)
return solve(current, new int[0], width, remaining - 1);
}
// Try adding a 6x1
int total = 0;
int[] newCurrent = Arrays.copyOf(current, current.length + 1);
if(current.length > 0)
newCurrent[newCurrent.length - 1] = current[current.length - 1] + 6;
else
newCurrent[0] = 6;
total += solve(prev, newCurrent, width, remaining);
// Try adding a 9x1
if(current.length > 0)
newCurrent[newCurrent.length - 1] = current[current.length - 1] + 9;
else
newCurrent[0] = 9;
total += solve(prev, newCurrent, width, remaining);
return total;
}
// Main method
public static void main(String[] args)
{
// e.g. 27x5, outputs 7958
System.out.println(Puzzle.solve(27, 5));
}
}

insert, delete, max in O(1)

Can someone tell me which data structure supports insert/delete/maximum operation in O(1)?

This is a classical interview question, and is usually presented like this:
Devise a stack-like data structure that does push, pop and min (or max) operations in O(1) time. There are no space constraints.
The answer is, you use two stacks: the main stack, and a min (or max) stack.
So for example, after pushing 1,2,3,4,5 onto the stack, your stacks would look like this:
MAIN MIN
+---+ +---+
| 5 | | 1 |
| 4 | | 1 |
| 3 | | 1 |
| 2 | | 1 |
| 1 | | 1 |
+---+ +---+
However, if you were to push 5,4,3,2,1, the stacks would look like this:
MAIN MIN
+---+ +---+
| 1 | | 1 |
| 2 | | 2 |
| 3 | | 3 |
| 4 | | 4 |
| 5 | | 5 |
+---+ +---+
For 5,2,4,3,1 you would have:
MAIN MIN
+---+ +---+
| 1 | | 1 |
| 3 | | 2 |
| 4 | | 2 |
| 2 | | 2 |
| 5 | | 5 |
+---+ +---+
and so on.
You can also save some space by pushing to the min stack only when the minimum element changes, iff the items are known to be distinct.

The following solution uses O(1) extra memory and O(1) time for max, push and pop operations.
Keep a variable max which will keep track of the current max element at any particular time.
Lets utilize the fact that when max is updated, all the elements in the stack should be less than the new max element.
When a push operation occurs and the new element(newElement) is greater than the current max we push the max + newElement in the stack and update max = newElement.
When we are doing a pop operation and we find that the current popped element is greater than the current max then we know that this is place where we had updated our stack to hold max+elem. So the actual element to be returned is max and max = poppedElem - max.
For eg. if we are pushing 1, 2, 3, 4, 5 the stack and max variable will look like below:
MAIN Value of MAX
+---+ +---+
| 9 | max = | 5 |
| 7 | max = | 4 |
| 5 | max = | 3 |
| 3 | max = | 2 |
| 1 | max = | 1 |
+---+ +---+
Now lets say we pop an element, we will basically pop, max element(since top > max) and update the max element to (top-max)
MAIN Value of MAX
+---+ +---+
| 7 | max = | 4 | = (9-5)
| 5 | max = | 3 |
| 3 | max = | 2 |
| 1 | max = | 1 |
+---+ +---+
Now lets say we are pushing numbers 5, 4, 3, 2, 1, the stack will look like:
MAIN Value of MAX
+---+ +---+
| 1 | max = | 5 |
| 2 | max = | 5 |
| 3 | max = | 5 |
| 4 | max = | 5 |
| 5 | max = | 5 |
+---+ +---+
When we pop, the top of stack is popped since top < max, and max remains unchanged.
Following is a pseudo code for each of the operation for better insight.
Elem max;
void Push(Elem x){
if x < max :
push(x);
else{
push(x+max);
max = x;
}
}
Elem Pop(){
Elem p = pop();
if |p| < |max|:
return p;
else{
max = p - max;
return max;
}
}
Elem Max(){
return max;
}
push and pop are normal stack operations. Hope this helps.

#KennyTM's comment points out an important missing detail - insert where, and delete from where. So I am going to assume that you always want to insert and delete only from one end like a stack.
Insertion (push) and Delete (pop) are O(1).
To get Max in O(1), use an additional stack to record the current max which corresponds to the main stack.

If you are using only comparisons, you would be hard pressed to find such a data structure.
For instance you could insert n elements, get max, delete max etc and could sort numbers in O(n) time, while the theoretical lower bound is Omega(nlogn).

Below program keeps track of max elements in stack in such a way that any point of time the top pointer would give us the max in the stack :
So, max would be O(1), and we can find max by max[N]
ITEM MAX
+---+ +---+
| 1 | | 1 |
| 10| | 10|
| 9 | | 10|
| 19| | 19| <--top
+---+ +---+
Java Program:
public class StackWithMax {
private int[] item;
private int N = 0;
private int[] max;
public StackWithMax(int capacity){
item = new int[capacity];//generic array creation not allowed
max = new int[capacity];
}
public void push(int item){
this.item[N++] = item;
if(max[N-1] > item) {
max[N] = max[N-1];
} else {
max[N] = item;
}
}
public void pop() {
this.item[N] = 0;
this.max[N] = 0;
N--;
}
public int findMax(){
return this.max[N];
}
public static void main(String[] args) {
StackWithMax max = new StackWithMax(10);
max.push(1);
max.push(10);
max.push(9);
max.push(19);
System.out.println(max.findMax());
max.pop();
System.out.println(max.findMax());
}
}

Like some have already pointed out, the question lacks some information. You don't specify were to insert/delete, nor the nature of the data we are dealing with.
Some ideas that could be useful: You say,
insert/delete/maximum operation in O(1)
note that if we can insert, delete, and find maximun in O(1), then we can use this hipotetical technique to sort in O(n), because we can insert the n elements, and then take max/delete and we get them all sorted. It's proven that no sorting algorithm based in comparisons can sort in less than O(nlogn), so we know that no comparison based aproach will work. In fact, one of the fastest known ways of doing this is the Brodal queue, but it's deletion time exceeds O(1).
Maybe the solution is something like a radix tree, were the complexity of all these operations is related to the key length as oposed to the amount of keys. This is valid only if they let you bound the key length by some other number, so you can consider it constant.
But maybe it wasn't something that generic. Another interpretation, is that the insert/delete are the ones of a classic stack. In that restricted case, you can use the double stack solutiom that Can Berk Güder gave you.

The best thing exists is:
Insert in O(1)
Delete in O(logn)
Max/Min in O(1)
But to do that the insert function must create a link chain and you will also need an extra thread. The good news is that this link chain function also works in O(1) so it will not change the O(1) of insert.
Delete function doesnt break the link chain.
If the target of your delete is the max or min then the delete will be executed in O(1)
The data structure is a mix of an avl tree and a linked list.
The nature of a true delete is such that you cannot make it work in O(1). Hash tables which work with O(1) delete dont have the cabability to hold all the inputs.

A hash table might support insert/delete in O(1), no clue about maximum though. You'd probably need to keep track of it yourself somehow.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio