Sequence representations

This module contains prototypes of various ways of representing users as functions of the items they have interacted with in the past.

class spotlight.sequence.representations.CNNNet(num_items, embedding_dim=32, kernel_width=3, dilation=1, num_layers=1, nonlinearity='tanh', residual_connections=True, sparse=False, benchmark=True, item_embedding_layer=None)[source]

Module representing users through stacked causal atrous convolutions ([3], [4]).

To represent a sequence, it runs a 1D convolution over the input sequence, from left to right. At each timestep, the output of the convolution is the representation of the sequence up to that point. The convolution is causal because future states are never part of the convolution’s receptive field; this is achieved by left-padding the sequence.

In order to increase the receptive field (and the capacity to encode states further back in the sequence), one can increase the kernel width, stack more layers, or increase the dilation factor. Input dimensionality is preserved from layer to layer.

Residual connections can be added between all layers.

During training, representations for all timesteps of the sequence are computed in one go. Loss functions using the outputs will therefore be aggregating both across the minibatch and across time in the sequence.

Parameters:
  • num_items (int) – Number of items to be represented.
  • embedding_dim (int, optional) – Embedding dimension of the embedding layer, and the number of filters in each convolutional layer.
  • kernel_width (tuple or int, optional) – The kernel width of the convolutional layers. If tuple, should contain the kernel widths for all convolutional layers. If int, it will be expanded into a tuple to match the number of layers.
  • dilation (tuple or int, optional) – The dilation factor for atrous convolutions. Setting this to a number greater than 1 inserts gaps into the convolutional layers, increasing their receptive field without increasing the number of parameters. If tuple, should contain the dilation factors for all convolutional layers. If int, it will be expanded into a tuple to match the number of layers.
  • num_layers (int, optional) – Number of stacked convolutional layers.
  • nonlinearity (string, optional) – One of (‘tanh’, ‘relu’). Denotes the type of non-linearity to apply after each convolutional layer.
  • residual_connections (boolean, optional) – Whether to use residual connections between convolutional layers.
  • item_embedding_layer (an embedding layer, optional) – If supplied, will be used as the item embedding layer of the network.

References

[3]Oord, Aaron van den, et al. “Wavenet: A generative model for raw audio.” arXiv preprint arXiv:1609.03499 (2016).
[4]Kalchbrenner, Nal, et al. “Neural machine translation in linear time.” arXiv preprint arXiv:1610.10099 (2016).
forward(user_representations, targets)[source]

Compute predictions for target items given user representations.

Parameters:
  • user_representations (tensor) – Result of the user_representation_method.
  • targets (tensor) – Minibatch of item sequences of shape (minibatch_size, sequence_length).
Returns:

predictions – Of shape (minibatch_size, sequence_length).

Return type:

tensor

user_representation(item_sequences)[source]

Compute user representation from a given sequence.

Returns:The first element contains all representations from step -1 (no items seen) to t - 1 (all but the last items seen). The second element contains the final representation at step t (all items seen). This final state can be used for prediction or evaluation.
Return type:tuple (all_representations, final_representation)
class spotlight.sequence.representations.LSTMNet(num_items, embedding_dim=32, item_embedding_layer=None, sparse=False)[source]

Module representing users through running a recurrent neural network over the sequence, using the hidden state at each timestep as the sequence representation, a’la [2]

During training, representations for all timesteps of the sequence are computed in one go. Loss functions using the outputs will therefore be aggregating both across the minibatch and across time in the sequence.

Parameters:
  • num_items (int) – Number of items to be represented.
  • embedding_dim (int, optional) – Embedding dimension of the embedding layer, and the number of hidden units in the LSTM layer.
  • item_embedding_layer (an embedding layer, optional) – If supplied, will be used as the item embedding layer of the network.

References

[2]Hidasi, Balazs, et al. “Session-based recommendations with recurrent neural networks.” arXiv preprint arXiv:1511.06939 (2015).
forward(user_representations, targets)[source]

Compute predictions for target items given user representations.

Parameters:
  • user_representations (tensor) – Result of the user_representation_method.
  • targets (tensor) – A minibatch of item sequences of shape (minibatch_size, sequence_length).
Returns:

predictions – of shape (minibatch_size, sequence_length)

Return type:

tensor

user_representation(item_sequences)[source]

Compute user representation from a given sequence.

Returns:The first element contains all representations from step -1 (no items seen) to t - 1 (all but the last items seen). The second element contains the final representation at step t (all items seen). This final state can be used for prediction or evaluation.
Return type:tuple (all_representations, final_representation)
class spotlight.sequence.representations.PoolNet(num_items, embedding_dim=32, item_embedding_layer=None, sparse=False)[source]

Module representing users through averaging the representations of items they have interacted with, a’la [1].

To represent a sequence, it simply averages the representations of all the items that occur in the sequence up to that point.

During training, representations for all timesteps of the sequence are computed in one go. Loss functions using the outputs will therefore be aggregating both across the minibatch and across time in the sequence.

Parameters:
  • num_items (int) – Number of items to be represented.
  • embedding_dim (int, optional) – Embedding dimension of the embedding layer.
  • item_embedding_layer (an embedding layer, optional) – If supplied, will be used as the item embedding layer of the network.

References

[1]Covington, Paul, Jay Adams, and Emre Sargin. “Deep neural networks for youtube recommendations.” Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 2016.
forward(user_representations, targets)[source]

Compute predictions for target items given user representations.

Parameters:
  • user_representations (tensor) – Result of the user_representation_method.
  • targets (tensor) – Minibatch of item sequences of shape (minibatch_size, sequence_length).
Returns:

predictions – of shape (minibatch_size, sequence_length)

Return type:

tensor

user_representation(item_sequences)[source]

Compute user representation from a given sequence.

Returns:The first element contains all representations from step -1 (no items seen) to t - 1 (all but the last items seen). The second element contains the final representation at step t (all items seen). This final state can be used for prediction or evaluation.
Return type:tuple (all_representations, final_representation)