Module containing functions for generating synthetic datasets with known properties, for model testing and experimentation.

spotlight.datasets.synthetic.generate_sequential(num_users=100, num_items=1000, num_interactions=10000, concentration_parameter=0.1, order=3, random_state=None)[source]

Generate a dataset of user-item interactions where sequential information matters.

The interactions are generated by a n-th order Markov chain with a uniform stationary distribution, where transition probabilities are given by doubly-stochastic transition matrix. For n-th order chains, transition probabilities are a convex combination of the transition probabilities of the last n states in the chain.

The transition matrix is sampled from a Dirichlet distribution described by a constant concentration parameter. Concentration parameters closer to zero generate more predictable sequences.

  • num_users (int, optional) – number of users in the dataset
  • num_items (int, optional) – number of items (Markov states) in the dataset
  • num_interactions (int, optional) – number of interactions to generate
  • concentration_parameter (float, optional) – Controls how predictable the sequence is. Values closer to zero give more predictable sequences.
  • order (int, optional) – order of the Markov chain
  • random_state (numpy.random.RandomState, optional) – random state used to generate the data

Interactions – instance of the interactions class

Return type: