How can I get a random number in Kotlin? - random

A generic method that can return a random integer between 2 parameters like ruby does with rand(0..n).
Any suggestion?

My suggestion would be an extension function on IntRange to create randoms like this: (0..10).random()
TL;DR Kotlin >= 1.3, one Random for all platforms
As of 1.3, Kotlin comes with its own multi-platform Random generator. It is described in this KEEP. The extension described below is now part of the Kotlin standard library, simply use it like this:
val rnds = (0..10).random() // generated random from 0 to 10 included
Kotlin < 1.3
Before 1.3, on the JVM we use Random or even ThreadLocalRandom if we're on JDK > 1.6.
fun IntRange.random() =
Random().nextInt((endInclusive + 1) - start) + start
Used like this:
// will return an `Int` between 0 and 10 (incl.)
(0..10).random()
If you wanted the function only to return 1, 2, ..., 9 (10 not included), use a range constructed with until:
(0 until 10).random()
If you're working with JDK > 1.6, use ThreadLocalRandom.current() instead of Random().
KotlinJs and other variations
For kotlinjs and other use cases which don't allow the usage of java.util.Random, see this alternative.
Also, see this answer for variations of my suggestion. It also includes an extension function for random Chars.

Generate a random integer between from(inclusive) and to(exclusive)
import java.util.Random
val random = Random()
fun rand(from: Int, to: Int) : Int {
return random.nextInt(to - from) + from
}

As of kotlin 1.2, you could write:
(1..3).shuffled().last()
Just be aware it's big O(n), but for a small list (especially of unique values) it's alright :D

In Kotlin SDK >=1.3 you can do it like
import kotlin.random.Random
val number = Random.nextInt(limit)

You can create an extension function similar to java.util.Random.nextInt(int) but one that takes an IntRange instead of an Int for its bound:
fun Random.nextInt(range: IntRange): Int {
return range.start + nextInt(range.last - range.start)
}
You can now use this with any Random instance:
val random = Random()
println(random.nextInt(5..9)) // prints 5, 6, 7, 8, or 9
If you don't want to have to manage your own Random instance then you can define a convenience method using, for example, ThreadLocalRandom.current():
fun rand(range: IntRange): Int {
return ThreadLocalRandom.current().nextInt(range)
}
Now you can get a random integer as you would in Ruby without having to first declare a Random instance yourself:
rand(5..9) // returns 5, 6, 7, 8, or 9

Possible Variation to my other answer for random chars
In order to get random Chars, you can define an extension function like this
fun ClosedRange<Char>.random(): Char =
(Random().nextInt(endInclusive.toInt() + 1 - start.toInt()) + start.toInt()).toChar()
// will return a `Char` between A and Z (incl.)
('A'..'Z').random()
If you're working with JDK > 1.6, use ThreadLocalRandom.current() instead of Random().
For kotlinjs and other use cases which don't allow the usage of java.util.Random, this answer will help.
Kotlin >= 1.3 multiplatform support for Random
As of 1.3, Kotlin comes with its own multiplatform Random generator. It is described in this KEEP. You can now directly use the extension as part of the Kotlin standard library without defining it:
('a'..'b').random()

Building off of #s1m0nw1 excellent answer I made the following changes.
(0..n) implies inclusive in Kotlin
(0 until n) implies exclusive in Kotlin
Use a singleton object for the Random instance (optional)
Code:
private object RandomRangeSingleton : Random()
fun ClosedRange<Int>.random() = RandomRangeSingleton.nextInt((endInclusive + 1) - start) + start
Test Case:
fun testRandom() {
Assert.assertTrue(
(0..1000).all {
(0..5).contains((0..5).random())
}
)
Assert.assertTrue(
(0..1000).all {
(0..4).contains((0 until 5).random())
}
)
Assert.assertFalse(
(0..1000).all {
(0..4).contains((0..5).random())
}
)
}

Examples random in the range [1, 10]
val random1 = (0..10).shuffled().last()
or utilizing Java Random
val random2 = Random().nextInt(10) + 1

No need to use custom extension functions anymore. IntRange has a random() extension function out-of-the-box now.
val randomNumber = (0..10).random()

Kotlin >= 1.3, multiplatform support for Random
As of 1.3, the standard library provided multi-platform support for randoms, see this answer.
Kotlin < 1.3 on JavaScript
If you are working with Kotlin JavaScript and don't have access to java.util.Random, the following will work:
fun IntRange.random() = (Math.random() * ((endInclusive + 1) - start) + start).toInt()
Used like this:
// will return an `Int` between 0 and 10 (incl.)
(0..10).random()

Another way of implementing s1m0nw1's answer would be to access it through a variable. Not that its any more efficient but it saves you from having to type ().
val ClosedRange<Int>.random: Int
get() = Random().nextInt((endInclusive + 1) - start) + start
And now it can be accessed as such
(1..10).random

If the numbers you want to choose from are not consecutive, you can use random().
Usage:
val list = listOf(3, 1, 4, 5)
val number = list.random()
Returns one of the numbers which are in the list.

Using a top-level function, you can achieve exactly the same call syntax as in Ruby (as you wish):
fun rand(s: Int, e: Int) = Random.nextInt(s, e + 1)
Usage:
rand(1, 3) // returns either 1, 2 or 3

There is no standard method that does this but you can easily create your own using either Math.random() or the class java.util.Random. Here is an example using the Math.random() method:
fun random(n: Int) = (Math.random() * n).toInt()
fun random(from: Int, to: Int) = (Math.random() * (to - from) + from).toInt()
fun random(pair: Pair<Int, Int>) = random(pair.first, pair.second)
fun main(args: Array<String>) {
val n = 10
val rand1 = random(n)
val rand2 = random(5, n)
val rand3 = random(5 to n)
println(List(10) { random(n) })
println(List(10) { random(5 to n) })
}
This is a sample output:
[9, 8, 1, 7, 5, 6, 9, 8, 1, 9]
[5, 8, 9, 7, 6, 6, 8, 6, 7, 9]

Kotlin standard lib doesn't provide Random Number Generator API. If you aren't in a multiplatform project, it's better to use the platform api (all the others answers of the question talk about this solution).
But if you are in a multiplatform context, the best solution is to implement random by yourself in pure kotlin for share the same random number generator between platforms. It's more simple for dev and testing.
To answer to this problem in my personal project, i implement a pure Kotlin Linear Congruential Generator. LCG is the algorithm used by java.util.Random. Follow this link if you want to use it :
https://gist.github.com/11e5ddb567786af0ed1ae4d7f57441d4
My implementation purpose nextInt(range: IntRange) for you ;).
Take care about my purpose, LCG is good for most of the use cases (simulation, games, etc...) but is not suitable for cryptographically usage because of the predictability of this method.

First, you need a RNG. In Kotlin you currently need to use the platform specific ones (there isn't a Kotlin built in one). For the JVM it's java.util.Random. You'll need to create an instance of it and then call random.nextInt(n).

To get a random Int number in Kotlin use the following method:
import java.util.concurrent.ThreadLocalRandom
fun randomInt(rangeFirstNum:Int, rangeLastNum:Int) {
val randomInteger = ThreadLocalRandom.current().nextInt(rangeFirstNum,rangeLastNum)
println(randomInteger)
}
fun main() {
randomInt(1,10)
}
// Result – random Int numbers from 1 to 9
Hope this helps.

Below in Kotlin worked well for me:
(fromNumber.rangeTo(toNumber)).random()
Range of the numbers starts with variable fromNumber and ends with variable toNumber. fromNumber and toNumber will also be included in the random numbers generated out of this.

You could create an extension function:
infix fun ClosedRange<Float>.step(step: Float): Iterable<Float> {
require(start.isFinite())
require(endInclusive.isFinite())
require(step.round() > 0.0) { "Step must be positive, was: $step." }
require(start != endInclusive) { "Start and endInclusive must not be the same"}
if (endInclusive > start) {
return generateSequence(start) { previous ->
if (previous == Float.POSITIVE_INFINITY) return#generateSequence null
val next = previous + step.round()
if (next > endInclusive) null else next.round()
}.asIterable()
}
return generateSequence(start) { previous ->
if (previous == Float.NEGATIVE_INFINITY) return#generateSequence null
val next = previous - step.round()
if (next < endInclusive) null else next.round()
}.asIterable()
}
Round Float value:
fun Float.round(decimals: Int = DIGITS): Float {
var multiplier = 1.0f
repeat(decimals) { multiplier *= 10 }
return round(this * multiplier) / multiplier
}
Method's usage:
(0.0f .. 1.0f).step(.1f).forEach { System.out.println("value: $it") }
Output:
value: 0.0 value: 0.1 value: 0.2 value: 0.3 value: 0.4 value: 0.5
value: 0.6 value: 0.7 value: 0.8 value: 0.9 value: 1.0

Full source code. Can control whether duplicates are allowed.
import kotlin.math.min
abstract class Random {
companion object {
fun string(length: Int, isUnique: Boolean = false): String {
if (0 == length) return ""
val alphabet: List<Char> = ('a'..'z') + ('A'..'Z') + ('0'..'9') // Add your set here.
if (isUnique) {
val limit = min(length, alphabet.count())
val set = mutableSetOf<Char>()
do { set.add(alphabet.random()) } while (set.count() != limit)
return set.joinToString("")
}
return List(length) { alphabet.random() }.joinToString("")
}
fun alphabet(length: Int, isUnique: Boolean = false): String {
if (0 == length) return ""
val alphabet = ('A'..'Z')
if (isUnique) {
val limit = min(length, alphabet.count())
val set = mutableSetOf<Char>()
do { set.add(alphabet.random()) } while (set.count() != limit)
return set.joinToString("")
}
return List(length) { alphabet.random() }.joinToString("")
}
}
}

Whenever there is a situation where you want to generate key or mac address which is hexadecimal number having digits based on user demand, and that too using android and kotlin, then you my below code helps you:
private fun getRandomHexString(random: SecureRandom, numOfCharsToBePresentInTheHexString: Int): String {
val sb = StringBuilder()
while (sb.length < numOfCharsToBePresentInTheHexString) {
val randomNumber = random.nextInt()
val number = String.format("%08X", randomNumber)
sb.append(number)
}
return sb.toString()
}

Here is a straightforward solution in Kotlin, which also works on KMM:
fun IntRange.rand(): Int =
Random(Clock.System.now().toEpochMilliseconds()).nextInt(first, last)
Seed is needed for the different random number on each run. You can also do the same for the LongRange.

Gets the next random Int from the random number generator.
Random.nextInt()

to be super duper ))
fun rnd_int(min: Int, max: Int): Int {
var max = max
max -= min
return (Math.random() * ++max).toInt() + min
}

Related

How to ensure minimum occurrences of RNG outcomes?

I'm working on a basic password generator for This Reddit Programming Exercise. I have very simple password generation working in Kotlin, but I'm struggling to come up with a good way to guarantee a specific minimum number of each character type (i.e. at least 2 each of Upper, Lower, Digit, and special character).
I could easily just do two of each to start the password, but I want them to be randomly distributed, not all front-loaded. I also thought of inserting the characters randomly into a character array, rather than appending to a string, but I was wondering if maybe there is a better way.
Here is what I have currently:
private fun generatePassword() {
var password = ""
val passwordLength = (8..25).random()
for (i in 0..passwordLength) {
val charType = (1..5).random()
when (charType) {
1 -> password += lettersLower[(0..lettersLower.size).random()]
2 -> password += lettersUpper[(0..lettersUpper.size).random()]
3 -> password += numbers[(0..numbers.size).random()]
4 -> password += symbols[(0..symbols.size).random()]
}
}
println(password)
}
This works and generates a random password of a random length of 8-24 characters, and contains a random combination of uppercase letters, lowercase letters, digits, and special characters, but it doesn't guarantee the presence of any specific number of each type, like most websites require nowadays.
I thought of doing something like this:
private fun generatePassword() {
var password: ArrayList<Char> = arrayListOf()
var newChar:Char
var numLow = 0
var numUpp = 0
var numDig = 0
var numSpe = 0
while (numLow < 2){
newChar = lettersLower[(0..lettersLower.size).random()]
password.add((0..password.size).random(), newChar)
numLow++
}
while (numUpp < 2){
newChar = lettersUpper[(0..lettersUpper.size).random()]
password.add((0..password.size).random(), newChar)
numUpp++
}
while (numDig < 2){
newChar = numbers[(0..numbers.size).random()]
password.add((0..password.size).random(), newChar)
numDig++
}
while (numSpe < 2){
newChar = symbols[(0..symbols.size).random()]
password.add((0..password.size).random(), newChar)
numSpe++
}
val passwordLength = (8..25).random() - password.size
for (i in 0..passwordLength) {
val charType = (1..5).random()
when (charType) {
1 -> {
newChar = lettersLower[(0..lettersLower.size).random()]
password.add(newChar)
}
2 -> {
newChar = lettersUpper[(0..lettersUpper.size).random()]
password.add(newChar)
}
3 -> {
newChar = numbers[(0..numbers.size).random()]
password.add(newChar)
}
4 -> {
newChar = symbols[(0..symbols.size).random()]
password.add(newChar)
}
}
}
for (i in password.indices){
print(password[i])
}
print("\n")
}
But that doesn't seem particularly concise. The other option I thought of is to check if the password contains two of each type of character and generate a new one if it doesn't, but that seems awfully inefficient. Is there a better way to ensure that if I run my RNG 8+ times it will return at least, but not exactly, two ones, two twos, two threes, and two fours?
Here's my random function, if that helps:
// Random function via SO user #s1m0nw1 (https://stackoverflow.com/a/49507413)
private fun ClosedRange<Int>.random() = (Math.random() * (endInclusive - start) + start).toInt()
I also found this SO post in Java, but it's also not very concise and I'm wondering if there is a more general purpose, more concise solution to ensure a minimum number of occurrences for each possible outcome of an RNG over a number of trials greater than the sum of those minimum occurrences.
Sorry for the rambling post. Thanks in advance for your insight.
I think your second version is on the right track. It can be simplified to:
import java.util.Random
fun generatePassword() : String {
// this defines the classes of characters
// and their corresponding minimum occurrences
val charClasses = arrayOf(
('a'..'z').asIterable().toList() to 2,
('A'..'Z').asIterable().toList() to 2,
('0'..'9').asIterable().toList() to 2,
"!##$%^&*()_+".asIterable().toList() to 2
)
val maxLen = 24;
val minLen = 8;
var password = StringBuilder()
val random = Random()
charClasses.forEach {(chars, minOccur) ->
(0..minOccur).forEach {
password.insert(
if (password.isEmpty()) 0 else random.nextInt(password.length),
chars[random.nextInt(chars.size)])
}
}
((password.length + 1)..(random.nextInt(maxLen - minLen + 1) + minLen)).forEach {
val selected = charClasses[random.nextInt(charClasses.size)]
password.append(selected.first[random.nextInt(selected.first.size)])
}
return password.toString()
}

Swift Dictionary slow even with optimizations: doing uncessary retain/release?

The following code, which maps simple value holders to booleans, runs over 20x faster in Java than Swift 2 - XCode 7 beta3, "Fastest, Aggressive Optimizations [-Ofast]", and "Fast, Whole Module Optimizations" turned on. I can get over 280M lookups/sec in Java but only about 10M in Swift.
When I look at it in Instruments I see that most of the time is going into a pair of retain/release calls associated with the map lookup. Any suggestions on why this is happening or a workaround would be appreciated.
The structure of the code is a simplified version of my real code, which has a more complex key class and also stores other types (though Boolean is an actual case for me). Also, note that I am using a single mutable key instance for the retrieval to avoid allocating objects inside the loop and according to my tests this is faster in Swift than an immutable key.
EDIT: I have also tried switching to NSMutableDictionary but when used with Swift objects as keys it seems to be terribly slow.
EDIT2: I have tried implementing the test in objc (which wouldn't have the Optional unwrapping overhead) and it is faster but still over an order of magnitude slower than Java... I'm going to pose that example as another question to see if anyone has ideas.
EDIT3 - Answer. I have posted my conclusions and my workaround in an answer below.
public final class MyKey : Hashable {
var xi : Int = 0
init( _ xi : Int ) { set( xi ) }
final func set( xi : Int) { self.xi = xi }
public final var hashValue: Int { return xi }
}
public func == (lhs: MyKey, rhs: MyKey) -> Bool {
if ( lhs === rhs ) { return true }
return lhs.xi==rhs.xi
}
...
var map = Dictionary<MyKey,Bool>()
let range = 2500
for x in 0...range { map[ MyKey(x) ] = true }
let runs = 10
for _ in 0...runs
{
let time = Time()
let reps = 10000
let key = MyKey(0)
for _ in 0...reps {
for x in 0...range {
key.set(x)
if ( map[ key ] == nil ) { XCTAssertTrue(false) }
}
}
print("rate=\(time.rate( reps*range )) lookups/s")
}
and here is the corresponding Java code:
public class MyKey {
public int xi;
public MyKey( int xi ) { set( xi ); }
public void set( int xi) { this.xi = xi; }
#Override public int hashCode() { return xi; }
#Override
public boolean equals( Object o ) {
if ( o == this ) { return true; }
MyKey mk = (MyKey)o;
return mk.xi == this.xi;
}
}
...
Map<MyKey,Boolean> map = new HashMap<>();
int range = 2500;
for(int x=0; x<range; x++) { map.put( new MyKey(x), true ); }
int runs = 10;
for(int run=0; run<runs; run++)
{
Time time = new Time();
int reps = 10000;
MyKey buffer = new MyKey( 0 );
for (int it = 0; it < reps; it++) {
for (int x = 0; x < range; x++) {
buffer.set( x );
if ( map.get( buffer ) == null ) { Assert.assertTrue( false ); }
}
}
float rate = reps*range/time.s();
System.out.println( "rate = " + rate );
}
After much experimentation I have come to some conclusions and found a workaround (albeit somewhat extreme).
First let me say that I recognize that this kind of very fine grained data structure access within a tight loop is not representative of general performance, but it does affect my application and I'm imagining others like games and heavily numeric applications. Also let me say that I know that Swift is a moving target and I'm sure it will improve - perhaps my workaround (hacks) below will not be necessary by the time you read this. But if you are trying to do something like this today and you are looking at Instruments and seeing the majority of your application time spent in retain/release and you don't want to rewrite your entire app in objc please read on.
What I have found is that almost anything that one does in Swift that touches an object reference incurs an ARC retain/release penalty. Additionally Optional values - even optional primitives - also incur this cost. This pretty much rules out using Dictionary or NSDictionary.
Here are some things that are fast that you can include in a workaround:
a) Arrays of primitive types.
b) Arrays of final objects as long as long as the array is on the stack and not on the heap. e.g. Declare an array within the method body (but outside of your loop of course) and iteratively copy the values to it. Do not Array(array) copy it.
Putting this together you can construct a data structure based on arrays that stores e.g. Ints and then store array indexes to your objects in that data structure. Within your loop you can look up the objects by their index in the fast local array. Before you ask "couldn't the data structure store the array for me" - no, because that would incur two of the penalties I mentioned above :(
All things considered this workaround is not too bad - If you can enumerate the entities that you want to store in the Dictionary / data structure you should be able to host them in an array as described. Using the technique above I was able to exceed the Java performance by a factor of 2x in Swift in my case.
If anyone is still reading and interested at this point I will consider updating my example code and posting.
EDIT: I'd add an option: c) It is also possible to use UnsafeMutablePointer<> or Unmanaged<> in Swift to create a reference that will not be retained when passed around. I was not aware of this when I started and I would hesitate to recommend it in general because it's a hack, but I've used it in a few cases to wrap a heavily used array that was incurring a retain/release every time it was referenced.

Using par map to increase performance

Below code runs a comparison of users and writes to file. I've removed some code to make it as concise as possible but speed is an issue also in this code :
import scala.collection.JavaConversions._
object writedata {
def getDistance(str1: String, str2: String) = {
val zipped = str1.zip(str2)
val numberOfEqualSequences = zipped.count(_ == ('1', '1')) * 2
val p = zipped.count(_ == ('1', '1')).toFloat * 2
val q = zipped.count(_ == ('1', '0')).toFloat * 2
val r = zipped.count(_ == ('0', '1')).toFloat * 2
val s = zipped.count(_ == ('0', '0')).toFloat * 2
(q + r) / (p + q + r)
} //> getDistance: (str1: String, str2: String)Float
case class UserObj(id: String, nCoordinate: String)
val userList = new java.util.ArrayList[UserObj] //> userList : java.util.ArrayList[writedata.UserObj] = []
for (a <- 1 to 100) {
userList.add(new UserObj("2", "101010"))
}
def using[A <: { def close(): Unit }, B](param: A)(f: A => B): B =
try { f(param) } finally { param.close() } //> using: [A <: AnyRef{def close(): Unit}, B](param: A)(f: A => B)B
def appendToFile(fileName: String, textData: String) =
using(new java.io.FileWriter(fileName, true)) {
fileWriter =>
using(new java.io.PrintWriter(fileWriter)) {
printWriter => printWriter.println(textData)
}
} //> appendToFile: (fileName: String, textData: String)Unit
var counter = 0; //> counter : Int = 0
for (xUser <- userList.par) {
userList.par.map(yUser => {
if (!xUser.id.isEmpty && !yUser.id.isEmpty)
synchronized {
appendToFile("c:\\data-files\\test.txt", getDistance(xUser.nCoordinate , yUser.nCoordinate).toString)
}
})
}
}
The above code was previously an imperative solution, so the .par functionality was within an inner and outer loop. I'm attempting to convert it to a more functional implementation while also taking advantage of Scala's parallel collections framework.
In this example the data set size is 10 but in the code im working on
the size is 8000 which translates to 64'000'000 comparisons. I'm
using a synchronized block so that multiple threads are not writing
to same file at same time. A performance improvment im considering
is populating a separate collection within the inner loop ( userList.par.map(yUser => {)
and then writing that collection out to seperate file.
Are there other methods I can use to improve performance. So that I can
handle a List that contains 8000 items instead of above example of 100 ?
I'm not sure if you removed too much code for clarity, but from what I can see, there is absolutely nothing that can run in parallel since the only thing you are doing is writing to a file.
EDIT:
One thing that you should do is to move the getDistance(...) computation before the synchronized call to appendToFile, otherwise your parallelized code ends up being sequential.
Instead of calling a synchronized appendToFile, I would call appendToFile in a non-synchronized way, but have each call to that method add the new line to some synchronized queue. Then I would have another thread that flushes that queue to disk periodically. But then you would also need to add something to make sure that the queue is also flushed when all computations are done. So that could get complicated...
Alternatively, you could also keep your code and simply drop the synchronization around the call to appendToFile. It seems that println itself is synchronized. However, that would be risky since println is not officially synchronized and it could change in future versions.

Accessing position information in a scala combinatorparser kills performance

I wrote a new combinator for my parser in scala.
Its a variation of the ^^ combinator, which passes position information on.
But accessing the position information of the input element really cost performance.
In my case parsing a big example need around 3 seconds without position information, with it needs over 30 seconds.
I wrote a runnable example where the runtime is about 50% more when accessing the position.
Why is that? How can I get a better runtime?
Example:
import scala.util.parsing.combinator.RegexParsers
import scala.util.parsing.combinator.Parsers
import scala.util.matching.Regex
import scala.language.implicitConversions
object FooParser extends RegexParsers with Parsers {
var withPosInfo = false
def b: Parser[String] = regexB("""[a-z]+""".r) ^^# { case (b, x) => b + " ::" + x.toString }
def regexB(p: Regex): BParser[String] = new BParser(regex(p))
class BParser[T](p: Parser[T]) {
def ^^#[U](f: ((Int, Int), T) => U): Parser[U] = Parser { in =>
val source = in.source
val offset = in.offset
val start = handleWhiteSpace(source, offset)
val inwo = in.drop(start - offset)
p(inwo) match {
case Success(t, in1) =>
{
var a = 3
var b = 4
if(withPosInfo)
{ // takes a lot of time
a = inwo.pos.line
b = inwo.pos.column
}
Success(f((a, b), t), in1)
}
case ns: NoSuccess => ns
}
}
}
def main(args: Array[String]) = {
val r = "foo"*50000000
var now = System.nanoTime
parseAll(b, r)
var us = (System.nanoTime - now) / 1000
println("without: %d us".format(us))
withPosInfo = true
now = System.nanoTime
parseAll(b, r)
us = (System.nanoTime - now) / 1000
println("with : %d us".format(us))
}
}
Output:
without: 2952496 us
with : 4591070 us
Unfortunately, I don't think you can use the same approach. The problem is that line numbers end up implemented by scala.util.parsing.input.OffsetPosition which builds a list of every line break every time it is created. So if it ends up with string input it will parse the entire thing on every call to pos (twice in your example). See the code for CharSequenceReader and OffsetPosition for more details.
There is one quick thing you can do to speed this up:
val ip = inwo.pos
a = ip.line
b = ip.column
to at least avoid creating pos twice. But that still leaves you with a lot of redundant work. I'm afraid to really solve the problem you'll have to build the index as in OffsetPosition yourself, just once, and then keep referring to it.
You could also file a bug report / make an enhancement request. This is not a very good way to implement the feature.

How do I create a URL shortener? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 1 year ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to "http://www.example.org/abcdef".
Instead of "abcdef" there can be any other string with six characters containing a-z, A-Z and 0-9. That makes 56~57 billion possible strings.
My approach:
I have a database table with three columns:
id, integer, auto-increment
long, string, the long URL the user entered
short, string, the shortened URL (or just the six characters)
I would then insert the long URL into the table. Then I would select the auto-increment value for "id" and build a hash of it. This hash should then be inserted as "short". But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don't use these algorithms, I think. A self-built algorithm will work, too.
My idea:
For "http://www.google.de/" I get the auto-increment id 239472. Then I do the following steps:
short = '';
if divisible by 2, add "a"+the result to short
if divisible by 3, add "b"+the result to short
... until I have divisors for a-z and A-Z.
That could be repeated until the number isn't divisible any more. Do you think this is a good approach? Do you have a better idea?
Due to the ongoing interest in this topic, I've published an efficient solution to GitHub, with implementations for JavaScript, PHP, Python and Java. Add your solutions if you like :)
I would continue your "convert number to string" approach. However, you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.
Theoretical background
You need a Bijective Function f. This is necessary so that you can find a inverse function g('abc') = 123 for your f(123) = 'abc' function. This means:
There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
and for every y you must be able to find an x so that f(x) = y.
How to convert the ID to a shortened URL
Think of an alphabet we want to use. In your case, that's [a-zA-Z0-9]. It contains 62 letters.
Take an auto-generated, unique numerical key (the auto-incremented id of a MySQL table for example).
For this example, I will use 12510 (125 with a base of 10).
Now you have to convert 12510 to X62 (base 62).
12510 = 2×621 + 1×620 = [2,1]
This requires the use of integer division and modulo. A pseudo-code example:
digits = []
while num > 0
remainder = modulo(num, 62)
digits.push(remainder)
num = divide(num, 62)
digits = digits.reverse
Now map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:
0 → a
1 → b
...
25 → z
...
52 → 0
61 → 9
With 2 → c and 1 → b, you will receive cb62 as the shortened URL.
http://shor.ty/cb
How to resolve a shortened URL to the initial ID
The reverse is even easier. You just do a reverse lookup in your alphabet.
e9a62 will be resolved to "4th, 61st, and 0th letter in the alphabet".
e9a62 = [4,61,0] = 4×622 + 61×621 + 0×620 = 1915810
Now find your database-record with WHERE id = 19158 and do the redirect.
Example implementations (provided by commenters)
C++
Python
Ruby
Haskell
C#
CoffeeScript
Perl
Why would you want to use a hash?
You can just use a simple translation of your auto-increment value to an alphanumeric value. You can do that easily by using some base conversion. Say you character space (A-Z, a-z, 0-9, etc.) has 62 characters, convert the id to a base-40 number and use the characters as the digits.
public class UrlShortener {
private static final String ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private static final int BASE = ALPHABET.length();
public static String encode(int num) {
StringBuilder sb = new StringBuilder();
while ( num > 0 ) {
sb.append( ALPHABET.charAt( num % BASE ) );
num /= BASE;
}
return sb.reverse().toString();
}
public static int decode(String str) {
int num = 0;
for ( int i = 0; i < str.length(); i++ )
num = num * BASE + ALPHABET.indexOf(str.charAt(i));
return num;
}
}
Not an answer to your question, but I wouldn't use case-sensitive shortened URLs. They are hard to remember, usually unreadable (many fonts render 1 and l, 0 and O and other characters very very similar that they are near impossible to tell the difference) and downright error prone. Try to use lower or upper case only.
Also, try to have a format where you mix the numbers and characters in a predefined form. There are studies that show that people tend to remember one form better than others (think phone numbers, where the numbers are grouped in a specific form). Try something like num-char-char-num-char-char. I know this will lower the combinations, especially if you don't have upper and lower case, but it would be more usable and therefore useful.
My approach: Take the Database ID, then Base36 Encode it. I would NOT use both Upper AND Lowercase letters, because that makes transmitting those URLs over the telephone a nightmare, but you could of course easily extend the function to be a base 62 en/decoder.
Here is my PHP 5 class.
<?php
class Bijective
{
public $dictionary = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
public function __construct()
{
$this->dictionary = str_split($this->dictionary);
}
public function encode($i)
{
if ($i == 0)
return $this->dictionary[0];
$result = '';
$base = count($this->dictionary);
while ($i > 0)
{
$result[] = $this->dictionary[($i % $base)];
$i = floor($i / $base);
}
$result = array_reverse($result);
return join("", $result);
}
public function decode($input)
{
$i = 0;
$base = count($this->dictionary);
$input = str_split($input);
foreach($input as $char)
{
$pos = array_search($char, $this->dictionary);
$i = $i * $base + $pos;
}
return $i;
}
}
A Node.js and MongoDB solution
Since we know the format that MongoDB uses to create a new ObjectId with 12 bytes.
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id
a 3-byte counter (in your machine), starting with a random value.
Example (I choose a random sequence)
a1b2c3d4e5f6g7h8i9j1k2l3
a1b2c3d4 represents the seconds since the Unix epoch,
4e5f6g7 represents machine identifier,
h8i9 represents process id
j1k2l3 represents the counter, starting with a random value.
Since the counter will be unique if we are storing the data in the same machine we can get it with no doubts that it will be duplicate.
So the short URL will be the counter and here is a code snippet assuming that your server is running properly.
const mongoose = require('mongoose');
const Schema = mongoose.Schema;
// Create a schema
const shortUrl = new Schema({
long_url: { type: String, required: true },
short_url: { type: String, required: true, unique: true },
});
const ShortUrl = mongoose.model('ShortUrl', shortUrl);
// The user can request to get a short URL by providing a long URL using a form
app.post('/shorten', function(req ,res){
// Create a new shortUrl */
// The submit form has an input with longURL as its name attribute.
const longUrl = req.body["longURL"];
const newUrl = ShortUrl({
long_url : longUrl,
short_url : "",
});
const shortUrl = newUrl._id.toString().slice(-6);
newUrl.short_url = shortUrl;
console.log(newUrl);
newUrl.save(function(err){
console.log("the new URL is added");
})
});
I keep incrementing an integer sequence per domain in the database and use Hashids to encode the integer into a URL path.
static hashids = Hashids(salt = "my app rocks", minSize = 6)
I ran a script to see how long it takes until it exhausts the character length. For six characters it can do 164,916,224 links and then goes up to seven characters. Bitly uses seven characters. Under five characters looks weird to me.
Hashids can decode the URL path back to a integer but a simpler solution is to use the entire short link sho.rt/ka8ds3 as a primary key.
Here is the full concept:
function addDomain(domain) {
table("domains").insert("domain", domain, "seq", 0)
}
function addURL(domain, longURL) {
seq = table("domains").where("domain = ?", domain).increment("seq")
shortURL = domain + "/" + hashids.encode(seq)
table("links").insert("short", shortURL, "long", longURL)
return shortURL
}
// GET /:hashcode
function handleRequest(req, res) {
shortURL = req.host + "/" + req.param("hashcode")
longURL = table("links").where("short = ?", shortURL).get("long")
res.redirect(301, longURL)
}
C# version:
public class UrlShortener
{
private static String ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private static int BASE = 62;
public static String encode(int num)
{
StringBuilder sb = new StringBuilder();
while ( num > 0 )
{
sb.Append( ALPHABET[( num % BASE )] );
num /= BASE;
}
StringBuilder builder = new StringBuilder();
for (int i = sb.Length - 1; i >= 0; i--)
{
builder.Append(sb[i]);
}
return builder.ToString();
}
public static int decode(String str)
{
int num = 0;
for ( int i = 0, len = str.Length; i < len; i++ )
{
num = num * BASE + ALPHABET.IndexOf( str[(i)] );
}
return num;
}
}
You could hash the entire URL, but if you just want to shorten the id, do as marcel suggested. I wrote this Python implementation:
https://gist.github.com/778542
Take a look at https://hashids.org/ it is open source and in many languages.
Their page outlines some of the pitfalls of other approaches.
If you don't want re-invent the wheel ... http://lilurl.sourceforge.net/
// simple approach
$original_id = 56789;
$shortened_id = base_convert($original_id, 10, 36);
$un_shortened_id = base_convert($shortened_id, 36, 10);
alphabet = map(chr, range(97,123)+range(65,91)) + map(str,range(0,10))
def lookup(k, a=alphabet):
if type(k) == int:
return a[k]
elif type(k) == str:
return a.index(k)
def encode(i, a=alphabet):
'''Takes an integer and returns it in the given base with mappings for upper/lower case letters and numbers 0-9.'''
try:
i = int(i)
except Exception:
raise TypeError("Input must be an integer.")
def incode(i=i, p=1, a=a):
# Here to protect p.
if i <= 61:
return lookup(i)
else:
pval = pow(62,p)
nval = i/pval
remainder = i % pval
if nval <= 61:
return lookup(nval) + incode(i % pval)
else:
return incode(i, p+1)
return incode()
def decode(s, a=alphabet):
'''Takes a base 62 string in our alphabet and returns it in base10.'''
try:
s = str(s)
except Exception:
raise TypeError("Input must be a string.")
return sum([lookup(i) * pow(62,p) for p,i in enumerate(list(reversed(s)))])a
Here's my version for whomever needs it.
Why not just translate your id to a string? You just need a function that maps a digit between, say, 0 and 61 to a single letter (upper/lower case) or digit. Then apply this to create, say, 4-letter codes, and you've got 14.7 million URLs covered.
Here is a decent URL encoding function for PHP...
// From http://snipplr.com/view/22246/base62-encode--decode/
private function base_encode($val, $base=62, $chars='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ') {
$str = '';
do {
$i = fmod($val, $base);
$str = $chars[$i] . $str;
$val = ($val - $i) / $base;
} while($val > 0);
return $str;
}
Don't know if anyone will find this useful - it is more of a 'hack n slash' method, yet is simple and works nicely if you want only specific chars.
$dictionary = "abcdfghjklmnpqrstvwxyz23456789";
$dictionary = str_split($dictionary);
// Encode
$str_id = '';
$base = count($dictionary);
while($id > 0) {
$rem = $id % $base;
$id = ($id - $rem) / $base;
$str_id .= $dictionary[$rem];
}
// Decode
$id_ar = str_split($str_id);
$id = 0;
for($i = count($id_ar); $i > 0; $i--) {
$id += array_search($id_ar[$i-1], $dictionary) * pow($base, $i - 1);
}
Did you omit O, 0, and i on purpose?
I just created a PHP class based on Ryan's solution.
<?php
$shorty = new App_Shorty();
echo 'ID: ' . 1000;
echo '<br/> Short link: ' . $shorty->encode(1000);
echo '<br/> Decoded Short Link: ' . $shorty->decode($shorty->encode(1000));
/**
* A nice shorting class based on Ryan Charmley's suggestion see the link on Stack Overflow below.
* #author Svetoslav Marinov (Slavi) | http://WebWeb.ca
* #see http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener/10386945#10386945
*/
class App_Shorty {
/**
* Explicitly omitted: i, o, 1, 0 because they are confusing. Also use only lowercase ... as
* dictating this over the phone might be tough.
* #var string
*/
private $dictionary = "abcdfghjklmnpqrstvwxyz23456789";
private $dictionary_array = array();
public function __construct() {
$this->dictionary_array = str_split($this->dictionary);
}
/**
* Gets ID and converts it into a string.
* #param int $id
*/
public function encode($id) {
$str_id = '';
$base = count($this->dictionary_array);
while ($id > 0) {
$rem = $id % $base;
$id = ($id - $rem) / $base;
$str_id .= $this->dictionary_array[$rem];
}
return $str_id;
}
/**
* Converts /abc into an integer ID
* #param string
* #return int $id
*/
public function decode($str_id) {
$id = 0;
$id_ar = str_split($str_id);
$base = count($this->dictionary_array);
for ($i = count($id_ar); $i > 0; $i--) {
$id += array_search($id_ar[$i - 1], $this->dictionary_array) * pow($base, $i - 1);
}
return $id;
}
}
?>
public class TinyUrl {
private final String characterMap = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private final int charBase = characterMap.length();
public String covertToCharacter(int num){
StringBuilder sb = new StringBuilder();
while (num > 0){
sb.append(characterMap.charAt(num % charBase));
num /= charBase;
}
return sb.reverse().toString();
}
public int covertToInteger(String str){
int num = 0;
for(int i = 0 ; i< str.length(); i++)
num += characterMap.indexOf(str.charAt(i)) * Math.pow(charBase , (str.length() - (i + 1)));
return num;
}
}
class TinyUrlTest{
public static void main(String[] args) {
TinyUrl tinyUrl = new TinyUrl();
int num = 122312215;
String url = tinyUrl.covertToCharacter(num);
System.out.println("Tiny url: " + url);
System.out.println("Id: " + tinyUrl.covertToInteger(url));
}
}
This is what I use:
# Generate a [0-9a-zA-Z] string
ALPHABET = map(str,range(0, 10)) + map(chr, range(97, 123) + range(65, 91))
def encode_id(id_number, alphabet=ALPHABET):
"""Convert an integer to a string."""
if id_number == 0:
return alphabet[0]
alphabet_len = len(alphabet) # Cache
result = ''
while id_number > 0:
id_number, mod = divmod(id_number, alphabet_len)
result = alphabet[mod] + result
return result
def decode_id(id_string, alphabet=ALPHABET):
"""Convert a string to an integer."""
alphabet_len = len(alphabet) # Cache
return sum([alphabet.index(char) * pow(alphabet_len, power) for power, char in enumerate(reversed(id_string))])
It's very fast and can take long integers.
For a similar project, to get a new key, I make a wrapper function around a random string generator that calls the generator until I get a string that hasn't already been used in my hashtable. This method will slow down once your name space starts to get full, but as you have said, even with only 6 characters, you have plenty of namespace to work with.
I have a variant of the problem, in that I store web pages from many different authors and need to prevent discovery of pages by guesswork. So my short URLs add a couple of extra digits to the Base-62 string for the page number. These extra digits are generated from information in the page record itself and they ensure that only 1 in 3844 URLs are valid (assuming 2-digit Base-62). You can see an outline description at http://mgscan.com/MBWL.
Very good answer, I have created a Golang implementation of the bjf:
package bjf
import (
"math"
"strings"
"strconv"
)
const alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
func Encode(num string) string {
n, _ := strconv.ParseUint(num, 10, 64)
t := make([]byte, 0)
/* Special case */
if n == 0 {
return string(alphabet[0])
}
/* Map */
for n > 0 {
r := n % uint64(len(alphabet))
t = append(t, alphabet[r])
n = n / uint64(len(alphabet))
}
/* Reverse */
for i, j := 0, len(t) - 1; i < j; i, j = i + 1, j - 1 {
t[i], t[j] = t[j], t[i]
}
return string(t)
}
func Decode(token string) int {
r := int(0)
p := float64(len(token)) - 1
for i := 0; i < len(token); i++ {
r += strings.Index(alphabet, string(token[i])) * int(math.Pow(float64(len(alphabet)), p))
p--
}
return r
}
Hosted at github: https://github.com/xor-gate/go-bjf
Implementation in Scala:
class Encoder(alphabet: String) extends (Long => String) {
val Base = alphabet.size
override def apply(number: Long) = {
def encode(current: Long): List[Int] = {
if (current == 0) Nil
else (current % Base).toInt :: encode(current / Base)
}
encode(number).reverse
.map(current => alphabet.charAt(current)).mkString
}
}
class Decoder(alphabet: String) extends (String => Long) {
val Base = alphabet.size
override def apply(string: String) = {
def decode(current: Long, encodedPart: String): Long = {
if (encodedPart.size == 0) current
else decode(current * Base + alphabet.indexOf(encodedPart.head),encodedPart.tail)
}
decode(0,string)
}
}
Test example with Scala test:
import org.scalatest.{FlatSpec, Matchers}
class DecoderAndEncoderTest extends FlatSpec with Matchers {
val Alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
"A number with base 10" should "be correctly encoded into base 62 string" in {
val encoder = new Encoder(Alphabet)
encoder(127) should be ("cd")
encoder(543513414) should be ("KWGPy")
}
"A base 62 string" should "be correctly decoded into a number with base 10" in {
val decoder = new Decoder(Alphabet)
decoder("cd") should be (127)
decoder("KWGPy") should be (543513414)
}
}
Function based in Xeoncross Class
function shortly($input){
$dictionary = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','0','1','2','3','4','5','6','7','8','9'];
if($input===0)
return $dictionary[0];
$base = count($dictionary);
if(is_numeric($input)){
$result = [];
while($input > 0){
$result[] = $dictionary[($input % $base)];
$input = floor($input / $base);
}
return join("", array_reverse($result));
}
$i = 0;
$input = str_split($input);
foreach($input as $char){
$pos = array_search($char, $dictionary);
$i = $i * $base + $pos;
}
return $i;
}
Here is a Node.js implementation that is likely to bit.ly. generate a highly random seven-character string.
It uses Node.js crypto to generate a highly random 25 charset rather than randomly selecting seven characters.
var crypto = require("crypto");
exports.shortURL = new function () {
this.getShortURL = function () {
var sURL = '',
_rand = crypto.randomBytes(25).toString('hex'),
_base = _rand.length;
for (var i = 0; i < 7; i++)
sURL += _rand.charAt(Math.floor(Math.random() * _rand.length));
return sURL;
};
}
My Python 3 version
base_list = list("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
base = len(base_list)
def encode(num: int):
result = []
if num == 0:
result.append(base_list[0])
while num > 0:
result.append(base_list[num % base])
num //= base
print("".join(reversed(result)))
def decode(code: str):
num = 0
code_list = list(code)
for index, code in enumerate(reversed(code_list)):
num += base_list.index(code) * base ** index
print(num)
if __name__ == '__main__':
encode(341413134141)
decode("60FoItT")
For a quality Node.js / JavaScript solution, see the id-shortener module, which is thoroughly tested and has been used in production for months.
It provides an efficient id / URL shortener backed by pluggable storage defaulting to Redis, and you can even customize your short id character set and whether or not shortening is idempotent. This is an important distinction that not all URL shorteners take into account.
In relation to other answers here, this module implements the Marcel Jackwerth's excellent accepted answer above.
The core of the solution is provided by the following Redis Lua snippet:
local sequence = redis.call('incr', KEYS[1])
local chars = '0123456789ABCDEFGHJKLMNPQRSTUVWXYZ_abcdefghijkmnopqrstuvwxyz'
local remaining = sequence
local slug = ''
while (remaining > 0) do
local d = (remaining % 60)
local character = string.sub(chars, d + 1, d + 1)
slug = character .. slug
remaining = (remaining - d) / 60
end
redis.call('hset', KEYS[2], slug, ARGV[1])
return slug
Why not just generate a random string and append it to the base URL? This is a very simplified version of doing this in C#.
static string chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
static string baseUrl = "https://google.com/";
private static string RandomString(int length)
{
char[] s = new char[length];
Random rnd = new Random();
for (int x = 0; x < length; x++)
{
s[x] = chars[rnd.Next(chars.Length)];
}
Thread.Sleep(10);
return new String(s);
}
Then just add the append the random string to the baseURL:
string tinyURL = baseUrl + RandomString(5);
Remember this is a very simplified version of doing this and it's possible the RandomString method could create duplicate strings. In production you would want to take in account for duplicate strings to ensure you will always have a unique URL. I have some code that takes account for duplicate strings by querying a database table I could share if anyone is interested.
This is my initial thoughts, and more thinking can be done, or some simulation can be made to see if it works well or any improvement is needed:
My answer is to remember the long URL in the database, and use the ID 0 to 9999999999999999 (or however large the number is needed).
But the ID 0 to 9999999999999999 can be an issue, because
it can be shorter if we use hexadecimal, or even base62 or base64. (base64 just like YouTube using A-Z a-z 0-9 _ and -)
if it increases from 0 to 9999999999999999 uniformly, then hackers can visit them in that order and know what URLs people are sending each other, so it can be a privacy issue
We can do this:
have one server allocate 0 to 999 to one server, Server A, so now Server A has 1000 of such IDs. So if there are 20 or 200 servers constantly wanting new IDs, it doesn't have to keep asking for each new ID, but rather asking once for 1000 IDs
for the ID 1, for example, reverse the bits. So 000...00000001 becomes 10000...000, so that when converted to base64, it will be non-uniformly increasing IDs each time.
use XOR to flip the bits for the final IDs. For example, XOR with 0xD5AA96...2373 (like a secret key), and the some bits will be flipped. (whenever the secret key has the 1 bit on, it will flip the bit of the ID). This will make the IDs even harder to guess and appear more random
Following this scheme, the single server that allocates the IDs can form the IDs, and so can the 20 or 200 servers requesting the allocation of IDs. The allocating server has to use a lock / semaphore to prevent two requesting servers from getting the same batch (or if it is accepting one connection at a time, this already solves the problem). So we don't want the line (queue) to be too long for waiting to get an allocation. So that's why allocating 1000 or 10000 at a time can solve the issue.

Resources