Layers¶
Embedding layers useful for recommender models.

class
spotlight.layers.
BloomEmbedding
(num_embeddings, embedding_dim, compression_ratio=0.2, num_hash_functions=4, bag=False, padding_idx=0)[source]¶ An embedding layer that compresses the number of embedding parameters required by using bloom filterlike hashing.
Parameters:  num_embeddings (int) – Number of entities to be represented.
 embedding_dim (int) – Latent dimension of the embedding.
 compression_ratio (float, optional) – The underlying number of rows in the embedding layer after compression. Numbers below 1.0 will use more and more compression, reducing the number of parameters in the layer.
 num_hash_functions (int, optional) – Number of hash functions used to compute the bloom filter indices.
 bag (bool, optional) – Whether to use the
EmbeddingBag
layer for the underlying embedding. This should be faster in principle, but currently seems to perform very poorly.
Notes
Large embedding layers are a performance problem for fitting models: even though the gradients are sparse (only a handful of user and item vectors need parameter updates in every minibatch), PyTorch updates the entire embedding layer at every backward pass. Computation time is then wasted on applying zero gradient steps to whole embedding matrix.
To alleviate this problem, we can use a smaller underlying embedding layer, and probabilistically hash users and items into that smaller space. With good hash functions, collisions should be rare, and we should observe fitting speedups without a decrease in accuracy.
The idea follows the RecSys 2017 “Getting recommenders fit”[1] paper. The authors use a bloomfilterlike approach to hashing. Their approach uses onehot encoded inputs followed by fully connected layers as well as softmax layers for the output, and their hashing reduces the size of the fully connected layers rather than embedding layers as implemented here; mathematically, however, the two formulations are identical.
The hash function used is murmurhash3, hashing the indices with a different seed for every hash function, modulo the size of the compressed embedding layer. The hash mapping is computed once at the start of training, and indexed into for every minibatch.
References
[1] Serra, Joan, and Alexandros Karatzoglou. “Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks.” arXiv preprint arXiv:1706.03993 (2017).

class
spotlight.layers.
ScaledEmbedding
(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False, _weight=None)[source]¶ Embedding layer that initialises its values to using a normal variable scaled by the inverse of the embedding dimension.

class
spotlight.layers.
ScaledEmbeddingBag
(num_embeddings, embedding_dim, max_norm=None, norm_type=2, scale_grad_by_freq=False, mode='mean', sparse=False)[source]¶ EmbeddingBag layer that initialises its values to using a normal variable scaled by the inverse of the embedding dimension.

class
spotlight.layers.
ZeroEmbedding
(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False, _weight=None)[source]¶ Embedding layer that initialises its values to using a normal variable scaled by the inverse of the embedding dimension.
Used for biases.