Module containing functions for generating synthetic datasets with known properties, for model testing and experimentation.
generate_sequential(num_users=100, num_items=1000, num_interactions=10000, concentration_parameter=0.1, order=3, random_state=None)¶
Generate a dataset of user-item interactions where sequential information matters.
The interactions are generated by a n-th order Markov chain with a uniform stationary distribution, where transition probabilities are given by doubly-stochastic transition matrix. For n-th order chains, transition probabilities are a convex combination of the transition probabilities of the last n states in the chain.
The transition matrix is sampled from a Dirichlet distribution described by a constant concentration parameter. Concentration parameters closer to zero generate more predictable sequences.
- num_users (int, optional) – number of users in the dataset
- num_items (int, optional) – number of items (Markov states) in the dataset
- num_interactions (int, optional) – number of interactions to generate
- concentration_parameter (float, optional) – Controls how predictable the sequence is. Values closer to zero give more predictable sequences.
- order (int, optional) – order of the Markov chain
- random_state (numpy.random.RandomState, optional) – random state used to generate the data
Interactions – instance of the interactions class