Parallel Stream toArray maintains order - java-8

I read about concurrent collectors maintaining order of input list. So if i use a Collectors to ArrayList, it can guarantee ordered collection.
Also map functions on ordered list maintain the order.
I could not find any documentation around order preservation in toArray
Even when a pipeline is constrained to produce a result that is consistent with the encounter order of the stream source (for example, IntStream.range(0,5).parallel().map(x -> x*2).toArray() must produce [0, 2, 4, 6, 8]), no guarantees are made as to the order in which the mapper function is applied to individual elements, or in what thread any behavioral parameter is executed for a given element.
So will
Stream.map(x->x).toArray()
produce ordered results? Or should I use collectors.

The cited part of the documentation already states by example that both, map and toArray will maintain the encounter order.
When you go through the Stream API documentation you’ll see that it never makes an explicit statement about operations which maintain the encounter order, but does it the other way round, it explicitly states when an operation is unordered or has special policies depending on the ordered state.
obviously unordered() retracts the encounter order explicitly
forEach and findAny do not respect the encounter order
Stream.concat returns an unordered stream if at least one of the two input streams is unordered (a debatable behavior, but that’s how it is)
Stream.generate() generates an unordered stream
skip, limit, takeWhile, and dropWhile respect the encounter order, which may cause significant performance penalties in parallel executions
distinct() and sorted() are stable for ordered streams, distinct() may have significantly better parallel performance when the stream is unordered
collect(Collector) may behave as unordered if the collector is unordered, which is only hinted by the statement that the operation will be concurrent if the collector is concurrent and either the stream is unordered or the collector is unordered. For more details, we have to refer the Collector documentation and the builtin collectors.
Note that while the operations count(), allMatch, anyMatch, and noneMatch have no statement about the encounter order, these operations have a semantic that implies that the result should not depend on the encounter order at all.

Related

Are add/remove set CRDT's monotonic?

The internals of an add/remove set CRDT is monotonic, because we only ever add to the internal sets, so the internal state of the CRDT cannot ever go backwards in logical time.
However, the observed state of the CRDT is that we're adding and removing elements, so the observed state doesn't have to be monotonic.
If we chain these systems together and take actions based on the presense or non-presense of an element, it doesn't look very monotonic anymore. The final state will still converge eventually, but we may or may not see some elements for a while before it stabilizes. It's not unlikely that some side-effect happens because of that intermediate state, such as a user reading the state of the system and reacting before it converges.
What does it mean for a CRDT to be monotonic?
Just to add a TL;DR on top of alekibango's awesome answer:
Monotonicity refers to the fact that once operations are observed and applied by a replica, the object's state will always take into consideration that operation.
Once an operation is applied, it will never be un-applied.
The observed non-monotonicity of (most) CRDT Sets does not invalidate the CRDT monotonic property.
CRDT Sets that support the remove operation are at its core two G-Sets:
One of the G-Sets is the set of elements that were added.
The other G-set is the set of elements that were removed.
The observed state is the set of added elements minus the set of removed elements. Although each of the internal sets is clearly monotonic, their difference can appear to be non-monotonic.
CRDT means Conflict-Free Replicated Data Type
This means (diverging) instances of CRDT might merge together (in whatever order and repetitions) to finally get into correct, consistent state.
Monotonicity might help with implementing this (see CALM -- Consistency as Logical Monotonicity). But it is not a requirement for your set instance.
Read those notes on crdt: https://github.com/pfrazee/crdt_notes
Some Examples of CRDT sets are:
G-set (grow only set, only adding items)
2P-set (keeps tombstones, element can be inserted once only)
LWW-set uses timestamps for marking 'time' of adding/removing item, allows adding/removing an item multiple times. Concurrent add
and remove is decided using bias.
OR-set - similar to lww set, but uses unique tags to be sure which element we are removing.
Optimized OR-Sets - can be able to be useable without having many tombstones around, see those if your sets are big (and have many
changes).
some links to read more:
https://hal.inria.fr/inria-00609399v1/document
https://github.com/CBaquero/delta-enabled-crdts
https://doc.akka.io/docs/akka/2.5/distributed-data.html
https://www.youtube.com/watch?v=ebWVLVhiaiY
https://www.youtube.com/watch?v=veeWamWy8dk
https://www.youtube.com/watch?v=xxjHC3yLDqw
https://www.youtube.com/watch?v=PQzNW8uQ_Y4

Assembly sorting

I am working on a Task Schedule Simulator which needs to be programmed in Assembly language.
I've been struggling about Task sorting:
I am allocating new memory for each Task (user can insert the task and by using the sbrk instruction i allocate 20 byte that contain a word for Task's numeric ID, another word for it's priority expressed as an int, another word for the number of cycles to finish the task) and I'm storing the address of each new Task in the stack.
My problem is: i need to sort this tasks and the sorting can either be based on priority or number of cycles. When I pop these Tasks i can easily access the right field (since the structure is very rigid, i just need to type the right offset in the lw instruction and voilat), but then comparing and sorting gets complicated.
I am working on the pseudocode for this part of the program and can't find any way to untie the knot.
Let me first try and paraphrase what you have indicate as the problem.
You have a stack, that has "records" of the structure
{ word : id, word : priority, word : cycle_count, dword : address}
Since the end objective is to "pop" these in desired order, we have to execute an in-place sort. There are many choices of algorithms, but to keep matters simple (also taking a cue from an underlying assumption that the count of tasks is not that many), I am explaining using bubble-sort. There exist a vast cornucopia of literature comparing each probable sort algorithm to their finest details, and if relevant, you may consider wikipedia as the perfect starting point.
Step 1 : Make the data pointer = stack pointer+count_of_records*20 - effectively, for next few steps, the data pointer points to the top of the "table of records" which happens to be located at the stack. An advanced consideration but not required in MIPS, is to assert DS=SS.
Step 2 : Next, identify which record pair needs to be swapped, and use the appropriate index within a record to identify the field that defines the swapping order.
Step 3: Allocate a 20 byte space as temporary, and use that space to temporarily hold the swapped record. An advanced consideration here is whether the environment can do an interrupt while the swap is going on. MIPS does not seem to have an atomic lock so this memory move needs to be carefully done.
Once the requisite number of passes are completed, the table will appear sorted, and will remain in place. The temporary buffer to store a record may be released.
The vital statistics for Bubble-Sort is that its O(n^2), responds well to almost sorted situations (not very likely in your example), and will handle the fact well that in midst of sorting, some records may find the processor free to start running, and therefore, will have to be removed from the queue by a POP and the sort needs to be restarted. This restart, however, will find the table almost sorted, and therefore, on a continuous basis, the table will display fairly strong pre-sorted behavior. Most importantly, has the perhaps most efficient code footprint among all in-situ algorithms.
Trust this helps
You might want to introduce a level of indirection, and sort pointers to your structs based on comparing the pointed-to data.
If your sort keys are all integers of the same size at different offsets within the structs, your sort function could take an offset as a parameter. e.g. lw from base + off to get the integer that you're going to compare.
Insertion sort is probably easiest to code, and much better that BubbleSort in the not-almost-sorted case. If you care about having a result ready to pop ASAP, before the whole array is sorted, then use Selection Sort.
It wasn't clear if your code is itself going to be multi-threaded, or if you can just write a normal sort function. #qasar66's answer seems to be suggesting a BubbleSort with atomic swaps, so other threads can safely look at the partially-sorted array while its being sorted.
If you only ever need to pop the min element, one of the best data structures is a Heap. It takes more code to implement, so if ease of implementation is your top goal, use a simple sort. Heapifying an un-sorted array is cheaper than doing a full sort: the full O(n log n) cost of extracting all elements in order is amortized over the extracts. So it's great if you want to be able to change the sort key, since you don't have to do all the work of fully sorting.

What's the technical difference between a linked list and a stream?

It seems like they both do the same thing in the same way: Lazy operations in a specific but not-necessarily-indexed order, and cannot necessarily backtrack.
Linked list is a specific way of representing sequences of data elements in memory, where each element is paired with a pointer of sorts to the next element in the sequence. Linked lists let you perform a range of operations on their subsequences: you can cut out or insert entire chains of elements, or delete elements from the middle at a very low cost.
Streams, on the other hand, are abstractions for accessing data in sequential order, without any specific requirements to their representation in memory. You can use a linked list to implement a stream, but you could also use another data structure for that, such as a plain array or a circular array buffer.
I think this is a little like trying to compare apples and oranges.
A linked list is a data structure where each node points to another node. They're a useful structure as inserting and removing items from a linked list is just a matter of re-pointing a node, rather than with an array where you need to do allot more shuffling. See http://en.wikipedia.org/wiki/Linked_list for more information.
A stream is an abstracted object used to represent a series of bytes. In most frameworks (Java, .NET etc) there are several concrete implementations of steams (memory stream, file stream etc) used to read the byte array from the relevant source (memory, file etc).
A linked list is a data structure where each element has a pointer to the next one, and possibly the same in the other direction. Circular linked lists even have a pointer from the last to the first element and vice versa. Those pointers (or references in languages that don't have pointers) are what defines the data structure. They imply a certain mode of operation, they don't force that, though. The LinkedList class in Java, for example, can be used like an array, although it won't be very effective then. It also can be used as (double-ended) queue or stack, depending on which functions you call.
A stream, on the other hand, is not defined as a data structure, but as a source or sink of elements. These elements could be bytes or characters, if you think of file streams, socket streams or reader/writer classes that wrap streams. The elements provided by a stream could also be more complex, e.g. tokens for a parser. In that case the stream likely uses some kind of queue internally, which could be implemented using a linked list or some array construction.
Just make sure to understand these two things are defined on different layers of abstraction. The linked list is defined on how it works internally, while the stream is defined on how it works externally.
There is a common abstraction between a read-only singly-linked list and an input stream, which C++ formalises as InputIterator: you can read a value and you can move forwards. In many stream APIs you have to do both simultaneously, but given that API it's fairly easy to see how to separate them out with a wrapper that caches one value: C++ calls this class istream_iterator.
However, a singly-linked list has a property that a stream does not always have, which C++ formalises as ForwardIterator: you can copy the current position, move the copy forward, but still read the value at the location of the original. A generalized stream cannot do this, because the underlying I/O only has one "current position". With a linked list you can have multiple pointers to different nodes in the list, with no troubles.
Some streams can be marked and reset, rewound, seeked (sought?) etc, adding facilities that are somewhat like a C++ ForwardIterator, or even RandomAccessIterator.
I use C++ as an example not because it's particularly important, but because the C++ concept of an iterator is designed in part to provide an abstraction common to data structures and streams. Not all languages have such a common abstraction, but for another example in Python you can write for x in y: if y is a container data structure or if y is a file-like object, or in general if y is "iterable".

What are the underlying data structures used for Redis?

I'm trying to answer two questions in a definitive list:
What are the underlying data structures used for Redis?
And what are the main advantages/disadvantages/use cases for each type?
So, I've read the Redis lists are actually implemented with linked lists. But for other types, I'm not able to dig up any information. Also, if someone were to stumble upon this question and not have a high level summary of the pros and cons of modifying or accessing different data structures, they'd have a complete list of when to best use specific types to reference as well.
Specifically, I'm looking to outline all types: string, list, set, zset and hash.
Oh, I've looked at these article, among others, so far:
http://redis.io/topics/data-types
http://redis.io/topics/data-types-intro
http://redis.io/topics/faq
I'll try to answer your question, but I'll start with something that may look strange at first: if you are not interested in Redis internals you should not care about how data types are implemented internally. This is for a simple reason: for every Redis operation you'll find the time complexity in the documentation and, if you have the set of operations and the time complexity, the only other thing you need is some clue about memory usage (and because we do many optimizations that may vary depending on data, the best way to get these latter figures are doing a few trivial real world tests).
But since you asked, here is the underlying implementation of every Redis data type.
Strings are implemented using a C dynamic string library so that we don't pay (asymptotically speaking) for allocations in append operations. This way we have O(N) appends, for instance, instead of having quadratic behavior.
Lists are implemented with linked lists.
Sets and Hashes are implemented with hash tables.
Sorted sets are implemented with skip lists (a peculiar type of balanced trees).
But when lists, sets, and sorted sets are small in number of items and size of the largest values, a different, much more compact encoding is used. This encoding differs for different types, but has the feature that it is a compact blob of data that often forces an O(N) scan for every operation. Since we use this format only for small objects this is not an issue; scanning a small O(N) blob is cache oblivious so practically speaking it is very fast, and when there are too many elements the encoding is automatically switched to the native encoding (linked list, hash, and so forth).
But your question was not really just about internals, your point was What type to use to accomplish what?.
Strings
This is the base type of all the types. It's one of the four types but is also the base type of the complex types, because a List is a list of strings, a Set is a set of strings, and so forth.
A Redis string is a good idea in all the obvious scenarios where you want to store an HTML page, but also when you want to avoid converting your already encoded data. So for instance, if you have JSON or MessagePack you may just store objects as strings. In Redis 2.6 you can even manipulate this kind of object server side using Lua scripts.
Another interesting usage of strings is bitmaps, and in general random access arrays of bytes, since Redis exports commands to access random ranges of bytes, or even single bits. For instance check this good blog post: Fast Easy real time metrics using Redis.
Lists
Lists are good when you are likely to touch only the extremes of the list: near tail, or near head. Lists are not very good to paginate stuff, because random access is slow, O(N).
So good uses of lists are plain queues and stacks, or processing items in a loop using RPOPLPUSH with same source and destination to "rotate" a ring of items.
Lists are also good when we want just to create a capped collection of N items where usually we access just the top or bottom items, or when N is small.
Sets
Sets are an unordered data collection, so they are good every time you have a collection of items and it is very important to check for existence or size of the collection in a very fast way. Another cool thing about sets is support for peeking or popping random elements (SRANDMEMBER and SPOP commands).
Sets are also good to represent relations, e.g., "What are friends of user X?" and so forth. But other good data structures for this kind of stuff are sorted sets as we'll see.
Sets support complex operations like intersections, unions, and so forth, so this is a good data structure for using Redis in a "computational" manner, when you have data and you want to perform transformations on that data to obtain some output.
Small sets are encoded in a very efficient way.
Hashes
Hashes are the perfect data structure to represent objects, composed of fields and values. Fields of hashes can also be atomically incremented using HINCRBY. When you have objects such as users, blog posts, or some other kind of item, hashes are likely the way to go if you don't want to use your own encoding like JSON or similar.
However, keep in mind that small hashes are encoded very efficiently by Redis, and you can ask Redis to atomically GET, SET or increment individual fields in a very fast fashion.
Hashes can also be used to represent linked data structures, using references. For instance check the lamernews.com implementation of comments.
Sorted Sets
Sorted sets are the only other data structures, besides lists, to maintain ordered elements. You can do a number of cool stuff with sorted sets. For instance, you can have all kinds of Top Something lists in your web application. Top users by score, top posts by pageviews, top whatever, but a single Redis instance will support tons of insertion and get-top-elements operations per second.
Sorted sets, like regular sets, can be used to describe relations, but they also allow you to paginate the list of items and to remember the ordering. For instance, if I remember friends of user X with a sorted set I can easily remember them in order of accepted friendship.
Sorted sets are good for priority queues.
Sorted sets are like more powerful lists where inserting, removing, or getting ranges from the the middle of the list is always fast. But they use more memory, and are O(log(N)) data structures.
Conclusion
I hope that I provided some info in this post, but it is far better to download the source code of lamernews from http://github.com/antirez/lamernews and understand how it works. Many data structures from Redis are used inside Lamer News, and there are many clues about what to use to solve a given task.
Sorry for grammar typos, it's midnight here and too tired to review the post ;)
Most of the time, you don't need to understand the underlying data structures used by Redis. But a bit of knowledge helps you make CPU v/s Memory trade offs. It also helps you model your data in an efficient manner.
Internally, Redis uses the following data structures :
String
Dictionary
Doubly Linked List
Skip List
Zip List
Int Sets
Zip Maps (deprecated in favour of zip list since Redis 2.6)
To find the encoding used by a particular key, use the command object encoding <key>.
1. Strings
In Redis, Strings are called Simple Dynamic Strings, or SDS. It's a smallish wrapper over a char * that allows you to store the length of the string and number of free bytes as a prefix.
Because the length of the string is stored, strlen is an O(1) operation. Also, because the length is known, Redis strings are binary safe. It is perfectly legal for a string to contain the null character.
Strings are the most versatile data structure available in Redis. A String is all of the following:
A string of characters that can store text. See SET and GET commands.
A byte array that can store binary data.
A long that can store numbers. See INCR, DECR, INCRBY and DECRBY commands.
An Array (of chars, ints, longs or any other data type) that can allow efficient random access. See SETRANGE and GETRANGE commands.
A bit array that allows you to set or get individual bits. See SETBIT and GETBIT commands.
A block of memory that you can use to build other data structures. This is used internally to build ziplists and intsets, which are compact, memory-efficient data structures for small number of elements. More on this below.
2. Dictionary
Redis uses a Dictionary for the following:
To map a key to its associated value, where value can be a string, hash, set, sorted set or list.
To map a key to its expiry timestamp.
To implement Hash, Set and Sorted Set data types.
To map Redis commands to the functions that handle those commands.
To map a Redis key to a list of clients that are blocked on that key. See BLPOP.
Redis Dictionaries are implemented using Hash Tables. Instead of explaining the implementation, I will just explain the Redis specific things :
Dictionaries use a structure called dictType to extend the behaviour of a hash table. This structure has function pointers, and so the following operations are extendable: a) hash function, b) key comparison, c) key destructor, and d) value destructor.
Dictionaries use the murmurhash2. (Previously they used the djb2 hash function, with seed=5381, but then the hash function was switched to murmur2. See this question for an explanation of the djb2 hash algorithm.)
Redis uses Incremental Hashing, also known as Incremental Resizing. The dictionary has two hash tables. Every time the dictionary is touched, one bucket is migrated from the first (smaller) hash table to the second. This way, Redis prevents an expensive resize operation.
The Set data structure uses a Dictionary to guarantee there are no duplicates. The Sorted Set uses a dictionary to map an element to its score, which is why ZSCORE is an O(1) operation.
3. Doubly Linked Lists
The list data type is implemented using Doubly Linked Lists. Redis' implementation is straight-from-the-algorithm-textbook. The only change is that Redis stores the length in the list data structure. This ensures that LLEN has O(1) complexity.
4. Skip Lists
Redis uses Skip Lists as the underlying data structure for Sorted Sets. Wikipedia has a good introduction. William Pugh's paper Skip Lists: A Probabilistic Alternative to Balanced Trees has more details.
Sorted Sets use both a Skip List and a Dictionary. The dictionary stores the score of each element.
Redis' Skip List implementation is different from the standard implementation in the following ways:
Redis allows duplicate scores. If two nodes have the same score, they are sorted by the lexicographical order.
Each node has a back pointer at level 0. This allows you to traverse elements in reverse order of the score.
5. Zip List
A Zip List is like a doubly linked list, except it does not use pointers and stores the data inline.
Each node in a doubly linked list has at 3 pointers - one forward pointer, one backward pointer and one pointer to reference the data stored at that node. Pointers require memory (8 bytes on a 64 bit system), and so for small lists, a doubly linked list is very inefficient.
A Zip List stores elements sequentially in a Redis String. Each element has a small header that stores the length and data type of the element, the offset to the next element and the offset to the previous element. These offsets replace the forward and backward pointers. Since the data is stored inline, we don't need a data pointer.
The Zip list is used to store small lists, sorted sets and hashes. Sorted sets are flattened into a list like [element1, score1, element2, score2, element3, score3] and stored in the Zip List. Hashes are flattened into a list like [key1, value1, key2, value2] etc.
With Zip Lists you have the power to make a tradeoff between CPU and Memory. Zip Lists are memory-efficient, but they use more CPU than a linked list (or Hash table/Skip List). Finding an element in the zip list is O(n). Inserting a new element requires reallocating memory. Because of this, Redis uses this encoding only for small lists, hashes and sorted sets. You can tweak this behaviour by altering the values of <datatype>-max-ziplist-entries and <datatype>-max-ziplist-value> in redis.conf. See Redis Memory Optimization, section "Special encoding of small aggregate data types" for more information.
The comments on ziplist.c are excellent, and you can understand this data structure completely without having to read the code.
6. Int Sets
Int Sets are a fancy name for "Sorted Integer Arrays".
In Redis, sets are usually implemented using hash tables. For small sets, a hash table is inefficient memory wise. When the set is composed of integers only, an array is often more efficient.
An Int Set is a sorted array of integers. To find an element a binary search algorithm is used. This has a complexity of O(log N). Adding new integers to this array may require a memory reallocation, which can become expensive for large integer arrays.
As a further memory optimization, Int Sets come in 3 variants with different integer sizes: 16 bits, 32 bits and 64 bits. Redis is smart enough to use the right variant depending on the size of the elements. When a new element is added and it exceeds the current size, Redis automatically migrates it to the next size. If a string is added, Redis automatically converts the Int Set to a regular Hash Table based set.
Int Sets are a tradeoff between CPU and Memory. Int Sets are extremely memory efficient, and for small sets they are faster than a hash table. But after a certain number of elements, the O(log N) retrieval time and the cost of reallocating memory become too much. Based on experiments, the optimal threshold to switch over to a regular hash table was found to be 512. However, you can increase this threshold (decreasing it doesn't make sense) based on your application's needs. See set-max-intset-entries in redis.conf.
7. Zip Maps
Zip Maps are dictionaries flattened and stored in a list. They are very similar to Zip Lists.
Zip Maps have been deprecated since Redis 2.6, and small hashes are stored in Zip Lists. To learn more about this encoding, refer to the comments in zipmap.c.
Redis stores keys pointing to values. Keys can be any binary value up to a reasonable size (using short ASCII strings is recommended for readability and debugging purposes). Values are one of five native Redis data types.
1.strings — a sequence of binary safe bytes up to 512 MB
2.hashes — a collection of key value pairs
3.lists — an in-insertion-order collection of strings
4.sets — a collection of unique strings with no ordering
5.sorted sets — a collection of unique strings ordered by user defined scoring
Strings
A Redis string is a sequence of bytes.
Strings in Redis are binary safe (meaning they have a known length not determined by any special terminating characters), so you can store anything up to 512 megabytes in one string.
Strings are the cannonical "key value store" concept. You have a key pointing to a value, where both key and value are text or binary strings.
For all possible operations on strings, see the
http://redis.io/commands/#string
Hashes
A Redis hash is a collection of key value pairs.
A Redis hash holds many key value pairs, where each key and value is a string. Redis hashes do not support complex values directly (meaning, you can't have a hash field have a value of a list or set or another hash), but you can use hash fields to point to other top level complex values. The only special operation you can perform on hash field values is atomic increment/decrement of numeric contents.
You can think of a Redis hashes in two ways: as a direct object representation and as a way to store many small values compactly.
Direct object representations are simple to understand. Objects have a name (the key of the hash) and a collection of internal keys with values. See the example below for, well, an example.
Storing many small values using a hash is a clever Redis massive data storage technique. When a hash has a small number of fields (~100), Redis optimizes the storage and access efficency of the entire hash. Redis's small hash storage optimization raises an interesting behavior: it's more efficient to have 100 hashes each with 100 internal keys and values rather than having 10,000 top level keys pointing to string values. Using Redis hashes to optimize your data storage this way does require additional programming overhead for tracking where data ends up, but if your data storage is primarly string based, you can save a lot of memory overhead using this one weird trick.
For all possible operations on hashes, see the hash docs
Lists
Redis lists act like linked lists.
You can insert to, delete from, and traverse lists from either the head or tail of a list.
Use lists when you need to maintain values in the order they were inserted. (Redis does give you the option to insert into any arbitrary list position if you need to, but your insertion performance will degrade if you insert far from your start position.)
Redis lists are often used as producer/consumer queues. Insert items into a list then pop items from the list. What happens if your consumers try to pop from a list with no elements? You can ask Redis to wait for an element to appear and return it to you immediately when it gets added. This turns Redis into a real time message queue/event/job/task/notification system.
You can atomically remove elements off either end of a list, enabling any list to be treated as a stack or a queue.
You can also maintain fixed-length lists (capped collections) by trimming your list to a specific size after every insertion.
For all possible operations on lists, see the lists docs
Sets
Redis sets are, well, sets.
A Redis set contains unique unordered Redis strings where each string only exists once per set. If you add the same element ten times to a set, it will only show up once. Sets are great for lazily ensuring something exists at least once without worrying about duplicate elements accumulating and wasting space. You can add the same string as many times as you like without needing to check if it already exists.
Sets are fast for membership checking, insertion, and deletion of members in the set.
Sets have efficient set operations, as you would expect. You can take the union, intersection, and difference of multiple sets at once. Results can either be returned to the caller or results can be stored in a new set for later usage.
Sets have constant time access for membership checks (unlike lists), and Redis even has convenient random member removal and returning ("pop a random element from the set") or random member returning without replacement ("give me 30 random-ish unique users") or with replacement ("give me 7 cards, but after each selection, put the card back so it can potentially be sampled again").
For all possible operations on sets, see the sets docs.
Sorted Sets
Redis sorted sets are sets with a user-defined ordering.
For simplicity, you can think of a sorted set as a binary tree with unique elements. (Redis sorted sets are actually skip lists.) The sort order of elements is defined by each element's score.
Sorted sets are still sets. Elements may only appear once in a set. An element, for uniqueness purposes, is defined by its string contents. Inserting element "apple" with sorting score 3, then inserting element "apple" with sorting score 500 results in one element "apple" with sorting score 500 in your sorted set. Sets are only unique based on Data, not based on (Score, Data) pairs.
Make sure your data model relies on the string contents and not the element's score for uniqueness. Scores are allowed to be repeated (or even zero), but, one last time, set elements can only exist once per sorted set. For example, if you try to store the history of every user login as a sorted set by making the score the epoch of the login and the value the user id, you will end up storing only the last login epoch for all your users. Your set would grow to size of your userbase and not your desired size of userbase * logins.
Elements are added to your set with scores. You can update the score of any element at any time, just add the element again with a new score. Scores are represented by floating point doubles, so you can specify granularity of high precision timestamps if needed. Multiple elements may have the same score.
You can retrieve elements in a few different ways. Since everything is sorted, you can ask for elements starting at the lowest scores. You can ask for elements starting at the highest scores ("in reverse"). You can ask for elements by their sort score either in natural or reverse order.
For all possible operations on sorted sets, see the sorted sets docs.

Is there any performance difference between myCollection.Where(...).FirstOrDefault() and myCollection.FirstOrDefault(...)

Is there any performance difference between myCollection.Where(...).FirstOrDefault() and myCollection.FirstOrDefault(...)
Filling in the dots with the predicate you are using.
Assuming we're talking LinqToObjects (obviously LinqToSql, LinqToWhatever have their own rules), the first will be ever so slightly slower since a new iterator must be created, but it's incredibly unlikely that you would ever notice the difference. In terms of number of comparisons and number of items examined, the time the two take to run will be virtually identical.
In case you're worried, what will not happen is that the .Where operator filters the list to n items and the .FirstOfDefault takes the first out of the filtered list. Both sequences will short-circuit correctly
If we assume that in both cases you're using the Extension methods provided by the Enumerable static class, then you'll be hard pressed to measure any difference between the two.
The longer form ...
myCollection.Where(...).FirstOrDefault()
... will (technically) produce more memory activity (creating an intermediary iterator to handle the Where() clause) and involve a few more cycles of processing.
The thing is that these iterators are lazy - the Where() clause won't go merrily through the entire list evaluating the predicate, it will only check as many items as necessary to find one to pass through.

Resources