Synthetic¶
Module containing functions for generating synthetic datasets with known properties, for model testing and experimentation.
-
spotlight.datasets.synthetic.
generate_sequential
(num_users=100, num_items=1000, num_interactions=10000, concentration_parameter=0.1, order=3, random_state=None)[source]¶ Generate a dataset of user-item interactions where sequential information matters.
The interactions are generated by a n-th order Markov chain with a uniform stationary distribution, where transition probabilities are given by doubly-stochastic transition matrix. For n-th order chains, transition probabilities are a convex combination of the transition probabilities of the last n states in the chain.
The transition matrix is sampled from a Dirichlet distribution described by a constant concentration parameter. Concentration parameters closer to zero generate more predictable sequences.
- Parameters
num_users (int, optional) – number of users in the dataset
num_items (int, optional) – number of items (Markov states) in the dataset
num_interactions (int, optional) – number of interactions to generate
concentration_parameter (float, optional) – Controls how predictable the sequence is. Values closer to zero give more predictable sequences.
order (int, optional) – order of the Markov chain
random_state (numpy.random.RandomState, optional) – random state used to generate the data
- Returns
Interactions – instance of the interactions class
- Return type