Skip to content

Commit

Permalink
Remove discussion comments from paper for release.
Browse files Browse the repository at this point in the history
(If we need them back for future revisions, we can just check out the previous
revision and work on a branch or something.)
  • Loading branch information
rcurtin committed Oct 20, 2018
1 parent c4f843c commit 5515bca
Showing 1 changed file with 3 additions and 144 deletions.
147 changes: 3 additions & 144 deletions doc/mlsys_paper/paper.tex
Original file line number Diff line number Diff line change
Expand Up @@ -48,21 +48,8 @@

\begin{document}

% TODO: title could possibly be improved
%\title{\texttt{ensmallen}: a generic C++ library for fast optimization} %% it's not clear what "optimization" refers to here
%% other possibilities:
%\title{\texttt{ensmallen}: a flexible C++ library for function optimization in machine learning}
%\title{\texttt{ensmallen}: a fast C++ library for function optimization in machine learning}
%\title{\texttt{ensmallen}: a C++ library for fast function optimization in machine learning}
%\title{\texttt{ensmallen}: a C++ library for fast and flexible function optimization}
%\title{\texttt{ensmallen}: a C++ library of fast and flexible function optimizers}
%\title{\texttt{ensmallen}: a library of flexible function optimizers in C++}
%\title{\texttt{ensmallen}: a library of fast and flexible function optimizers in C++}
%\title{\texttt{ensmallen}: a flexible C++ library for function optimization}
\title{\texttt{ensmallen}: a flexible C++ library for efficient function optimization}

% Alphabetical ordering?
% TODO: check affiliations
\author{Shikhar Bhardwaj \\
Delhi Technological University \\
Delhi, India 110042 \\
Expand All @@ -78,16 +65,10 @@
Arnimallee 7, 14195 Berlin \\
\texttt{[email protected]}
\And
%% CS: I've added "Independent Researcher" below for now,
%% CS: as a blank affiliation looks weird and incomplete
Yannis Mentekidis \\
Independent Researcher \\
\texttt{[email protected]}
% any affiliation/email?
%% CS: googling suggests that Yannis is/was affiliated with Aristotle University of Thessaloniki
%% CS: and is perhaps now with Amazon
\And
%% CS: I have two affiliations, so I've listed them on two lines
Conrad Sanderson \\
Data61, CSIRO, Australia \\
University of Queensland, Australia\\
Expand All @@ -98,7 +79,6 @@

\begin{abstract}
\vspace*{-0.3em}
%% the abstract below still needs more meat and sharpening
We present \texttt{\small ensmallen}, a fast and flexible C++ library for mathematical optimization of
arbitrary user-supplied functions,
which can be applied to many machine learning problems.
Expand All @@ -115,8 +95,6 @@
Empirical comparisons show that \texttt{\small ensmallen} is able to outperform other
optimization frameworks (like Julia and SciPy), sometimes by large margins.
The library is distributed under the
% save words
% 3-clause
BSD license and is ready for use
in production environments.
\end{abstract}
Expand All @@ -128,8 +106,7 @@ \section{Introduction}
(which may have a special structure or constraints),
almost all machine learning problems can be boiled down
to the following optimization form:
%
%\vspace*{-0.2em}

\begin{equation}
\argmindown_x f(x).
\end{equation}
Expand All @@ -141,15 +118,6 @@ \section{Introduction}
parameters on the data~\cite{schmidhuber2015deep}.
Even popular machine learning models such as logistic regression
have training times mostly dominated by an optimization procedure~\cite{kingma2015adam}.
% TODO: might be nice to have something kind of anecdotal like 'even new
% students to the field of machine learning quickly encounter optimization' and
% cite, e.g., Andrew Ng's coursera course or some ML textbook or similar
%% CS: i think we don't need to explore this too much; better to cut out all the
%% CS: fat and stick with concrete examples, instead of veering off on tangents
%
% or maybe just a note about how many optimization techniques get published at
% NIPS every year?
%% CS: NIPS is too self-referential here

The ubiquity of optimization in machine learning algorithms highlights the need
for robust and flexible implementations of optimization algorithms.
Expand Down Expand Up @@ -269,7 +237,6 @@ \section{Types of Objective Functions}
\cmidrule[1pt]{2-9}
\end{tabular}
\end{adjustbox}
% \begin{tablenotes}\footnotesize
\caption{\footnotesize{
Feature comparison: \CIRCLE = provides feature,
\LEFTcircle = partially provides feature, - = does not provide feature.
Expand All @@ -289,12 +256,7 @@ \section{Types of Objective Functions}
optimizing {\bf user-defined objective functions}. It is also easy to implement a
new optimizer in the \texttt{\small ensmallen} framework. Overall, our goal is to provide
an easy-to-use library that can solve the problem
%\vspace*{-0.4em}
%\begin{equation}
$\argminright_{x} f(x)$
%\end{equation}
%\vspace*{-0.4em}
%\noindent
for any function $f(x)$ that takes a vector or matrix input $x$.
In most cases, $f(x)$ will have special structure; one example might be that
$f(x)$ is differentiable. Therefore, the abstraction we have designed for \texttt{\small
Expand All @@ -314,7 +276,6 @@ \section{Types of Objective Functions}
\sum_{i} f_i(x)$
\item {\bf categorical}: $x$ contains elements that can only take discrete
values
%\item {\bf numeric}: all elements of $x$ take values in $\mathcal{R}$
\item {\bf sparse}: the gradient $f'(x)$ or $f'_i(x)$ (for a separable
function) is sparse
\item {\bf partially differentiable}: the separable gradient $f_i'(x)$ is also
Expand All @@ -327,15 +288,6 @@ \section{Types of Objective Functions}
provide a large set of diverse optimization algorithms for objective functions
with these properties. Below is a list of currently available optimizers:

%% CS: WARNING !!!!
%% CS: can't add more citations without causing the item with SGD variants
%% CS: to overflow into 3 lines.
%% CS: this causes a cascade effect of mucking up the entire layout of
%% CS: the paper, causing the main text to spill over to 7 pages.
%%
%% CS: the citations below should be sufficient;
%% CS: this is a workshop paper, not a journal article

\vspace*{-0.4em}
\begin{enumerate}[{~~~$\bullet$}]
\small
Expand Down Expand Up @@ -364,12 +316,6 @@ \section{Types of Objective Functions}
Conditional Gradient Descent,
Frank-Wolfe algorithm~\cite{Frank_1956},
Simulated Annealing~\cite{kirkpatrick1983optimization}

% These were a part of mlpack but not ensmallen.
%\item {\bf Objective functions:} Neural Networks, Logistic regression,
% Matrix completion, Neighborhood Components Analysis, Regularized SVD,
% Reinforcement learning, Softmax regression, Sparse autoencoders,
% Sparse SVM
\end{enumerate}

In \texttt{\small ensmallen}'s framework, if a user wants to optimize a differentiable objective
Expand Down Expand Up @@ -434,10 +380,9 @@ \section{Example: Learning Linear Regression Models}
point and response $(x_i, y_i)$. To fit this model $\theta \in \mathcal{R}^d$
to the data, we must find

%% CS: i've added \nolimits to save a bit of space
\vspace*{-0.5em}
\begin{equation}
\argmindown_\theta f(\theta) = %% CS: for clarity
\argmindown_\theta f(\theta) =
\argmindown_\theta \sum\nolimits_{i = 1}^n (y_i - x_i \theta)^2 =
\argmindown_\theta \| y - X \theta \|_F^2.
\end{equation}
Expand Down Expand Up @@ -539,7 +484,6 @@ \section{Automatic Metaprogramming for Ease of Use and Efficiency}
with an implementation of \texttt{\small EvaluateWithGradient()}
that computes {\small $(y - X \theta)$} only once:

%\vspace*{-0.5em}
\begin{adjustbox}{scale={0.95}{0.95}}
\begin{minipage}{1\textwidth}
\begin{minted}[fontsize=\small]{c++}
Expand All @@ -551,7 +495,6 @@ \section{Automatic Metaprogramming for Ease of Use and Efficiency}
\end{minted}
\end{minipage}
\end{adjustbox}
%\vspace*{-0.5em}

Template metaprogramming techniques are automatically used to
detect which methods exist, and a wrapper class will use suitable mix-ins in
Expand Down Expand Up @@ -589,8 +532,6 @@ \section{Automatic Metaprogramming for Ease of Use and Efficiency}
and \texttt{\small EvaluateWithGradient()}. We aim to expand this support to other
sets of methods for other types of objective functions.

% TODO: anything to write about the visualization page that we had set up?

\vspace*{-0.3em}
\section{Experiments}
\vspace*{-0.5em}
Expand All @@ -601,10 +542,8 @@ \section{Experiments}
\toprule
& \texttt{\small ensmallen} & \texttt{\small scipy} & \texttt{\small Optim.jl} & \texttt{\small samin} \\
\midrule
% TODO: these are just single-run results from Marcus' laptop! We need to do
% 10 and average.
default & {\bf 0.004s} & 1.069s & 0.021s & 3.173s \\
tuned & & 0.574s & & 3.122s \\ % TODO
tuned & & 0.574s & & 3.122s \\
\bottomrule
\end{tabular}
\end{center}
Expand Down Expand Up @@ -637,29 +576,8 @@ \section{Experiments}
While another option here might be \texttt{\small simulannealbnd()}
in the Global Optimization Toolkit for MATLAB,
no license was available.
% TODO: get Marcus' system specs.
We ran our code on a MacBook Pro i7 2018 with 16GB RAM running macOS 10.14 with clang 1000.10.44.2, Julia version 1.0.1, Python 2.7.15, and Octave 4.4.1.

% We compare four frameworks%
% %
% \footnote{Another option here might be \texttt{\small simulannealbnd()}
% in the Global Optimization Toolkit for MATLAB.
% However, no license was available for these simulations.}
% %
% for this task:
%
% \vspace*{-0.3em}
% \begin{itemize}
% \renewcommand{\itemsep}{-0.5ex}
% \item \texttt{\small ensmallen}
% \item \texttt{\small scipy.optimize.anneal}, from scipy 0.14.1~\cite{jones2014scipy}
% \item simulated annealing implementation in \texttt{\small Optim.jl} with Julia
% 1.0.1~\cite{mogensen2018optim}
% \item \texttt{\small samin} in the \texttt{\small optim} package for GNU Octave~\cite{octave}
% \end{itemize}
% \vspace*{-0.3em}


Initially, we implemented these functions as simply as possible and ran them
without any tuning. This reflects how a typical user might interact with a
given framework.
Expand Down Expand Up @@ -694,18 +612,6 @@ \section{Experiments}
\texttt{\small Autograd}~\cite{maclaurin2015autograd}
package. For GNU Octave we use the \texttt{\small bfgsmin()} function.

% For \texttt{\small ensmallen} we have 2 versions:
% (i)~with only \texttt{\small EvaluateWithGradient()},
% and
% (ii)~with \texttt{\small Evaluate()} and \texttt{\small Gradient()}.
% The code for these functions is as shown earlier.
% For Julia we have the options of using manually defined objective and gradient functions,
% or the gradient function can be automatically computed by
% \texttt{\small Calculus.jl}
% (\href{https://github.com/JuliaMath/Calculus.jl}{\footnotesize github.com/JuliaMath/Calculus.jl})
% or \texttt{\small ForwardDiff.jl}~\cite{RevelsLubinPapamarkou2016}.


Results for various data sizes are shown in Table~\ref{tab:lbfgs}. For each
implementation, L-BFGS was allowed to run for only $10$ iterations and never
converged in fewer iterations. The datasets used for training are highly noisy random
Expand All @@ -721,7 +627,6 @@ \section{Experiments}
{\em algorithm} & $d$: 100, $n$: 1k & $d$: 100, $n$: 10k & $d$: 100, $n$:
100k & $d$: 1k, $n$: 100k \\
\midrule
% TODO: this was only one trial on Ryan's desktop!
\texttt{\small ensmallen}-1 & {\bf 0.001s} & {\bf 0.009s} & {\bf 0.154s} & {\bf 2.215s} \\
\texttt{\small ensmallen}-2 & 0.002s & 0.016s & 0.182s & 2.522s \\
% Dropped for space and awful performance
Expand All @@ -743,7 +648,6 @@ \section{Experiments}
and $d$ indicating the dimensionality of each sample.
All Julia runs do not count compilation time.}
\label{tab:lbfgs}
%\vspace*{-1ex}
\end{table}

The results indicate that \texttt{\small ensmallen} with \texttt{\small
Expand All @@ -756,17 +660,6 @@ \section{Experiments}
efficient, especially with \texttt{\small ForwardDiff.jl}. We expect this
effect to be more pronounced with increasingly complex objective functions.

% TODO: show flexibility of optimization with learning curves:
% - use LinearRegressionFunction modified for small batches
% - make sure Info or Debug output is on
% - run with a whole boatload of SGD variants
% - parse the output with awk/sed into a csv of objectives per epoch
% - plot it
% - profit!
%
% Probably a snippet showing the actual code to run with a bunch of different
% optimizers is good too. Other things can be cut to make space.

Lastly, we demonstrate the easy pluggability in \texttt{\small ensmallen}
for using various optimizers on the same task.
Using a version of \texttt{\small LinearRegressionFunction} from Sec.~\ref{sec:linreg_example}
Expand All @@ -778,10 +671,6 @@ \section{Experiments}
yields the learning curves shown in Fig.~\ref{fig:learning_curve}(b).
Any other optimizer for separable differentiable objective
functions can be dropped into place in the same manner.
%% Just because we have some extra space...
%% CS: we need space for the acknowledgement section
% This facilitates the seamless evaluation of various optimizers
% for user-defined objective functions.

\begin{figure}[b!]
\centering
Expand Down Expand Up @@ -842,39 +731,9 @@ \section{Conclusion}
The library is already in use for function optimization in the
\texttt{\small mlpack} machine learning toolkit~\cite{mlpack2018}.

% RC: I think it's really important to highlight ensmallen's usage (and
% genesis), although I can't find the right words to concisely and non-awkwardly
% say that we wrote ensmallen as part of mlpack originally.
%
%% CS: good point, though for our purposes i think it's sufficient
%% CS: to simply state that mlpack uses ensmallen.
%% CS: getting into a tangent on the genesis can negatively distract
%% CS: from the central message. besides, there is no room for a proper
%% CS: explanation.
%% CS:
%% CS: I recommend to avoid interchangeably mixing around the words
%% CS: "library", "toolkit", "package" when referring to ensmallen.
%% CS: it's better to consistently stick to "library", and use
%% CS: the other words to refer to other software, such as mlpack.
%% CS: the point is to avoid potential concept clashes
%% CS: (ie. too much overloading on a word), which can lead
%% CS: to confusion as to what exact software we're referring to.

%\begin{small}
{\bf Acknowledgements.}
We would like to thank the many contributors to \texttt{\small ensmallen},
who are listed on the associated website.
%\end{small}

% \subsubsection*{Acknowledgements}
% \vspace*{-0.5em}
%
% The development team of \texttt{\small ensmallen} does not include just the authors named
% here but also a long list of other contributors. See
% \url{https://www.ensmallen.org/about.html} for more information.
% % TODO: that URL may change



\bibliographystyle{plain}
\bibliography{paper}
Expand Down

0 comments on commit 5515bca

Please sign in to comment.