Factorization Machine

Difacto is refined factorization machine (FM) with sparse memory adaptive constraints.

Given an example x \in \mathbb{R}^d and an embedding dimension k, FM models the example by

f(x) = \langle w, x \rangle + \frac{1}{2} \|V x\|_2^2 - \sum_{i=1}^d x_i^2 \|V_i\|^2_2

where w \in \mathbb{R}^d and V \in \mathbb{R}^{d \times k} are the models we need to learn. The learning objective function is

\frac 1{|X|}\sum_{(x,y)} \ell(f(x), y)+ \lambda_1 |w|_1 +
\frac12 \sum_{i=1}^d \left[\lambda_i w_i^2 + \mu_i \|V_i\|^2\right]

where the first sparse regularizer \lambda_1 |w|_1 induces a sparse w, while the second term is a frequency adaptive regularization, which places large penalties for more frequently features.

Furthermore, Difacto adds two heuristics constraints

V_i = 0 if w_i = 0, namely we mark the embedding for feature i is inactive if the according linear term is filtered out by the sparse regularizer. (You can disable it by l1_shrk = false)
V_i = 0 if the occur of feature i is less the a threshold. In other words, Difacto does not learn an embedding for tail features. (You can specify the threshold via threshold = 10)

Train by Asynchronous SGD. w is updated via FTRL while V via adagrad.

Configuration

The configure is defined in the protobuf file config.proto

Input & Output

Type	Field	Description
string	train_data	The training data, can be either a directory or a wildcard filename
string	val_data	The validation or test data, can be either a directory or a wildcard filename
string	data_format	data format. supports libsvm, crb, criteo, adfea, ...
string	model_out	model output filename
string	model_in	model input filename
string	predict_out	the filename for prediction output. if specified, then run/ prediction. otherwise run training

Model and Optimization

Type	Field	Description
float	lambda_l1	l1 regularizer for `w`: `\lambda_1 \|w\|_1`
float	lambda_l2	l2 regularizer for `w`: `\lambda_2 \\|w\\|_2^2`
float	lr_eta	learning rate `\eta` (or `\alpha`) for `w`
Config.Embedding	embedding	the embedding `V`
int32	minibatch	the size of minibatch. the smaller, the faster the convergence, but the/ slower the system performance
int32	max_data_pass	the maximal number of data passes
bool	early_stop	stop earilier if the validation objective is less than prev_obj - min_objv_decr

Config.Embedding

embedding V. basic:

Type	Field	Description
int32	dim	the embedding dimension `k`
int32	threshold	features with occurence < threshold have no embedding (`k=0`)
float	lambda_l2	l2 regularizer for `V`: `\lambda_2 \\|V_i\\|_2^2`

advanced:

Type	Field	Description
float	init_scale	V is initialized by uniformly random weight in/ [-init_scale, +init_scale]
float	dropout	apply dropout on the gradient of `V`. no in default
float	grad_clipping	project the gradient of `V` into `[-c c]`. no in default
float	grad_normalization	normalized the l2-norm of gradient of `V`. no in default
float	lr_eta	learning rate `\eta` for `V`. if not specified, then share the same with `w`
float	lr_beta	leanring rate `\beta` for `V`.

Adavanced Configurations

Type	Field	Description
int32	save_iter	save model for every k data pass. default is -1, which only saves for the/ last iteration
int32	load_iter	load model from the k-th iteration. default is -1, which loads the last/ iteration model
bool	local_data	give a worker the data only if it can access. often used when the data has/ been dispatched to workers' local filesystem
int32	num_parts_per_file	virtually partition a file into n parts for better loadbalance. default is 10
int32	rand_shuffle	randomly shuffle data for minibatch SGD. a minibatch is randomly picked from/ rand_shuffle * minibatch examples. default is 10.
float	neg_sampling	down sampling negative examples in the training data. no in default
bool	prob_predict	if true, then outputs a probability prediction. otherwise `\langle x, y \rangle`
float	print_sec	print the progress every n sec during training. 1 sec in default
float	lr_beta	learning rate `\beta`, 1 in default
float	min_objv_decr	the minimal objective decrease in early stop
bool	l1_shrk	use or not use the contraint `V_i = 0` if `w_i = 0`. yes in default
int32	num_threads	number of threads used within a worker and a server
int32	max_concurrency	the maximal concurrent minibatches being processing at the same time for/ sgd, and the maximal concurrent blocks for block CD. 2 in default.
bool	key_cache	cache the key list on both sender and receiver to reduce communication/ cost. it may increase the memory usage
bool	msg_compression	compression the message to reduce communication cost. it may increase the/ computation cost.
int32	fixed_bytes	convert floating-points into fixed-point integers with n bytes. n can be 1,/ 2 and 3. 0 means no compression.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

difacto.rst

difacto.rst

Factorization Machine

Configuration

Input & Output

Model and Optimization

Config.Embedding

Adavanced Configurations

Performance

Files

difacto.rst

Latest commit

History

difacto.rst

File metadata and controls

Factorization Machine

Configuration

Input & Output

Model and Optimization

Config.Embedding

Adavanced Configurations

Performance