Генераторы случайных чисел с неравномерным распределением

Автор: Agner Fog

В файле stocc.zip имеется библиотека C++ классов генераторов случайных чисел с неравномерным распределением, обеспечивающие следующие виды распределений: нормальное, Бернулли, Пуассона, биномиальльное, гипергеометрическое, расширенное гипергеометрическое, нецентральное гипергеометрическое, мультивариационное биномиальное (мультиномиальное), мультивариационное гипергеометрическое, мультивариационное расширенное гипергеометрическое и мультивариационное расширенное нецентральное гипергеометрическое. Также имеется функция для перетасовки чисел.

Большинство этих функций достаточно быстры и точны даже при крайних значениях параметров

Функции базируются на генераторах случайных чисел с равномерным распределением, находящимся в файлах randomc.zip и randoma.zip.

Список файлов

Архив stocc.zip содержит следующие файлы:

stocc.htm
Инструкции, что вы читаете
stocc.h
C++ header содержащий определения классов.
stoc1.cpp
Исходники генераторов.
stoc2.cpp
Альтернативные исходники для Пуассонова, биномиального и гипергеометрического распределений, оптимизираванные для случаев, когда эти функции многократно вызываются с одними и теми же параметрами.
ex-stoc.cpp
Примеры использования библиотеки для генерации случайных чисел с различными распределениями.
ex-lib.cpp
Пример использования комбинированного генератора случайных чисел на ассемблере как основы для функций библиотеки.
ex-cards.cpp
Пример перетасовки списка предметов в случаном порядке. Тасует колоду карт.
ex-lotto.cpp
Пример генерации последовательности неповторяющихся случайных целых чисел.
testpois.cpp,  testbino.cpp,  testhype.cpp, testmxh.cpp, testmnch.cpp
Тестовые программы для Пуассонова, биномиального, гипергеометрического, расширенное гипергеометрического и нецентрального гипергеометрического генераторов.
distrib.doc
Определения статистических рапределений, которые можно генерировать используя библиотеку.
sampmet.doc
Sampling methods. Теоретическое обоснование методов генерации variates с заданным распределением.

Instructions

Unpack the files randomc.zip and stocc.zip into the same directory.

Класс StochasticLib пожет быть порожден от любого из классов в randomc.zip. Выбирайте любой. Примеры используют "Mersenne Twister" (TRandomMersenne)  потому, что этот генератор имеет меньше всего проблем с портабельностью. Чтобы использовать другой генератор в качестве родительского класса, измените значение в #define RANDOM_GENERATOR TRandomMersenne 
в файле stocc.h или соорудите похожий #define перед строкой #include "stocc.h" во всех ваших модулях.

Класс StochasticLib2 порожден от класса StochasticLib и содержит альтернативные реализации Пуассонова, биномиального и гипергеометрического распределений, оптимизированные для случаев, когда эти функции многократно вызываются с одними и теми же параметрами. StochasticLib is the best choice if the parameters to these functions often wary in your program,  StochasticLib2  is faster when these functions are called many times with the same parameters.

The header file stocc.h must be included in all modules that use this class library. The source code file stoc1.cpp can be included in your project, either as an #include, or as a separate module. If you are using  StochasticLib2,  then you need both stoc1.cpp and stoc2.cpp.

It is recommended that you make only one instance of  StochasticLib or  StochasticLib2. These stochastic library classes inherit the uniform random number functions from their base class. Therefore, the same object can give you both uniform and non-uniform random numbers. Example:

#include "stocc.h"                // header file for class library
#include <time.h>                 // define time function

void main () {
  int seed = time(0);             // seed for random number generator
  StochasticLib sto(seed);        // make instance of class
  int i = sto.IRandom(0, 99);     // make random integer with uniform distribution
  int b = sto.Binomial(0.2, 100); // make random integer with binomial distribution
  ...
  }

See the file ex-stoc.cpp for a more detailed example.

See the file ex-lib.cpp for an example of how to use an assembly language uniform random number generator as base for the non-uniform random number generators.

If you make more than one instance of these classes then make sure they don't have the same seed.

Portability

The C++ class library is supposed to work with all C++ compilers and all operating systems. It has been tested on several different systems.

There are, however, a few system differences that you need to be aware of:

  1. Error messages. There is no portable way of writing error messages. Systems with a graphical user interface (e.g. Windows) need a pop-up message box, while console mode programs and other line oriented systems need output to the standard output or error output. Therefore, you may have to modify the function FatalError  in stoc1.cpp to fit your system. This function is called by the library functions in case of illegal parameter values. Experience shows that these error messages are very useful when debugging a program that uses the non-uniform random variate generators. You may even enhance the FatalError function to output additional information about the state of your program. 
  2. Multithreaded programs. The functions are not re-entrant. This is because they rely on static variables to remember the last parameters, so that they don't have to re-do the set-up calculations that are the same as last time the function was called.

    If you want to call the same stochastic functions from more than one thread in a multi-threaded program then you have to make these functions re-entrant. There are two ways to do this:

    Method 1. Find all static variables in the non-uniform random variate generating functions in stoc1.cpp and stoc2.cpp. Make these variables non-static and make the setup in these functions non-conditional.

    Method 2. Make all static variables in the abovementioned functions non-static members of the respective class. Those variables that need to be initialized to -1 (names ending in _last) must be so in the constructor.

    Whether you use method 1 or 2, you have to give each thread its own instance of the class and make sure they don't have the same seed. A difference of 1 in the seed is sufficient.
  3. See randomc.htm for more portability issues.

Описание функций

int Bernoulli(double p)
Распределение Бернулли с вероятностным параметром p.
Возвращает 1 с вероятностью p, или 0 с вероятностью 1-p.
Ошибки:
Выдает сообщение об ошибке, если p < 0 или p > 1.
double Normal(double m, double s)
Нормальное распределение со средним значением m и стандартным отклонением s.
Это распределение моделирует сумму множества случайных факторов.
Определение:
f(x) = (2*pi*s)-0.5*exp(-0.5*s-2*(x-m)2)
Ошибки:
Нет.
long int Poisson (double L)
Распределение Пуассона со средним значением L.
Распределение некоторого количества событий на данном промежутке времени, либо в данной географической области, когда эти события случайным образом распределены во времени или пространстве.
Определение:
f(x) = Lx/x! * exp(-L)
Ошибки:
Выдает сообщение об ошибке, если L < 0 или L > 2*109.
long int Binomial (long int n, double p)
Binomial distribution with parameters n and p.
This is the distribution of the number of red balls you get when drawing n balls with replacement from an urn where p is the fraction of red balls in the urn.
Definition:
f(x) = B(n,x) * px * (1-p)n-x,
where the binomial coefficient B(a,b) = a! / (b! * (a-b)!).
Error conditions:
Gives error message if n < 0 or p < 0 or p > 1.
long int Hypergeometric (long int n, long int m, long int t)
Hypergeometric distribution with parameters n, m, t. (Note the order and names of the parameters).
This is the distribution of the number of red balls you get when drawing n balls without replacement from an urn where the urn contains t balls, where m balls are red and t-m balls are white.
Definition:
f(x) = B(m,x) * B(t-m,n-x) / B(t,n),
where the binomial coefficient B(a,b) = a! / (b! * (a-b)!).
Error conditions:
Gives error message if any parameter is negative or n > t or m > t.
long int NonCentralHypergeometric(long int n, long int m, long int t, double bias)
The noncentral hypergeometric distribution is the same as the hypergeometric distribution, but with bias. The bias can be seen as an odds ratio. A bias > 1 will favor the red balls, a bias < 1 will favor the white balls.
When bias = 1 we have the hypergeometric distribution. Execution may be slow and inexact when t is high and bias is far from 1.
Error conditions:
Gives error message if any parameter is negative or n > t or m > t.
long int ExtendedHypergeometric (long int n, long int m, long int t, double bias)
The extended hypergeometric distribution is a conditional binomial distribution resembling the noncentral hypergeometric distribution. See the file distrib.doc for a definition. Execution may be slow and inexact when t is high and bias is far from 1.
Error conditions:
Gives error message if any parameter is negative or n > t or m > t.
void Multinomial (long int * destination, double * source, long int n, int colors)
void Multinomial (long int * destination, long int * source, long int n, int colors)
Multivariate binomial distribution.
This is the distribution you get when drawing n balls from an urn with replacement, where there can be any number of colors. This is the same as the binomial distribution when colors = 2.
The number of balls of each color is returned in destination, which must be an array with colors places. source contains the number or fraction of balls of each color in the urn. source must be a double or long int array with colors places.
The sum of the values in source does not have to be 1, but it must be positive. The probability that a ball has color i is source[i] divided by the sum of all values in source.
Error conditions:
Gives an error message if any parameter is negative or if the sum of the values in source is zero. The behavior is unpredictable if source or destination have less than colors places.
void MultiHypergeo (long int * destination, long int * source, long int n, int colors)
Multivariate hypergeometric distribution.
This is the distribution you get when drawing n balls from an urn without replacement, where there can be any number of colors. This is the same as the hypergeometric distribution when colors = 2.
The number of balls of each color is returned in destination, which must be an array with colors places. source contains the number of balls of each color in the urn. source must be an array with colors places.
Error conditions:
Gives an error message if any parameter is negative or if the sum of the values in source is less than n. The behavior is unpredictable if source or destination have less than colors places.
void MultiNonCentralHypergeo(long int * destination, long int * source, double * weights, long int n, int colors)
Multivariate noncentral hypergeometric distribution. This is the distribution you get when drawing colored balls from un urn without replacement, with bias. weights is an array with colors places containing the odds for each color. The probability of drawing a particular ball is proportional to its weight. The other parameters are the same as above. Execution may be slow and inexact when n is high and weight ratios are far from 1.
Error conditions:
Gives an error message if any parameter is negative or if the total number of balls with nonzero weight is less than n. The behavior is unpredictable if any of the arrays have less than colors places.
void MultiExtendedHypergeo (long int * destination, long int * source, double * weights, long int n, int colors)
The multivariate extended hypergeometric distribution is a conditional binomial distribution resembling the multivariate noncentral hypergeometric distribution. See the file distrib.doc for a definition.
This function is not exact, but uses an approximation with an accuracy that is better than 1% in most cases. The precision can be tuned at the expense of higher calculation times.
Error conditions:
Gives an error message if any parameter is negative or if the total number of balls with nonzero weight is less than n. The behavior is unpredictable if any of the arrays have less than colors places.
void Shuffle(int * list, int min, int n)
This function makes a list of the n numbers from  min  to  min+n-1  in random order.
The result is returned in list, which must be an array with n elements.
The array index goes from 0 to n-1.
If you want to shuffle something else than integers then use the integers in list as an index into a table of the items you want to shuffle.
Error conditions: none. The behavior is unpredictable if the size of the array list is less than n.
static double LnFac(int n)
Log factorial. Mainly used internally.
Definition:
LnFac(n) = loge(n!).
This function uses a table when n < 1024 and Stirling's approximation for larger n. The table is generated by the constructor.
Error conditions:
Gives an error message if n < 0.

Theory

These distributions are defined in the file distrib.doc. The methods used for generating variates with these distributions are described in the file sampmet.doc.

Examples showing how to simulate biological evolution using this class library can be found in evolc.zip.

 

Back to random number generators page.