Skip to content

Commit

Permalink
Complete draft of README.
Browse files Browse the repository at this point in the history
  • Loading branch information
bamos committed Oct 11, 2015
1 parent 000f33c commit 09e7324
Show file tree
Hide file tree
Showing 5 changed files with 62 additions and 30 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,10 @@ models/dlib/shape_predictor_68_face_landmarks.dat
*.pyc
*.mp4

evaluation/*/*.csv
evaluation/*/*.pdf

demos/web/bower_components
demos/web/unknown*.npy

models/openface/*.t7
74 changes: 44 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# FaceNet
# OpenFace

This is a Python and Torch implementation of the CVPR 2015 paper
[FaceNet: A Unified Embedding for Face Recognition and Clustering](http://www.cv-foundation.org/openaccess/content_cvpr_2015/app/1A_089.pdf)
Expand All @@ -21,7 +21,18 @@ See our accuracy comparisons on the famous LFW benchmark below.

---

The following example shows the workflow for a single input
### Please use responsibly!

We do not support the use of this project in applications
that violate privacy and security.
We are using this to help cognitively impaired users to
sense and understand the world around them.

---

# Overview

The following overview shows the workflow for a single input
image of Sylvestor Stallone from the publicly available
[LFW dataset](http://vis-www.cs.umass.edu/lfw/person/Sylvester_Stallone.html).

Expand Down Expand Up @@ -54,7 +65,7 @@ is driven by large private datasets.
In face recognition, there are no open source implementations or
models trained on these datasets.
If you have access to a large dataset, we are very interested
in training a new FaceNet model with it.
in training a new OpenFace model with it.
Please contact Brandon Amos at [[email protected]](mailto:[email protected]).

| Dataset | Public | #Photos | #People |
Expand All @@ -71,7 +82,7 @@ Please contact Brandon Amos at [[email protected]](mailto:[email protected]).
+ [demos/www](/demos/www): Real-time web demo.
+ [demos/compare.py](/demos/compare.py): Compare two images.
+ [evaluation](/evaluation): LFW accuracy evaluation scripts.
+ [facenet](/facenet): Python library code.
+ [openface](/openface): Python library code.
+ [images](/images): Images used in the README.
+ [models](/models): Location of binary models.
+ [training](/training): Scripts to train new models.
Expand All @@ -88,6 +99,9 @@ The source is available in [demos/web](/demos/web).
From the `demos/web` directory, install requirements
with `./install-deps.sh` and `sudo pip install -r requirements.txt`.

In practice, object tracking should be used once the face recognizer
has predicted a face.

# Comparing two images
The [comparison demo](demos/compare.py) outputs the predicted similarity
score of two faces by computing the squared L2 distance between
Expand All @@ -104,10 +118,10 @@ Eric Clapton were generated with
| <img src='images/examples/lennon-1.jpg' width='200px'></img> | <img src='images/examples/lennon-2.jpg' width='200px'></img> | <img src='images/examples/clapton-1.jpg' width='200px'></img> | <img src='images/examples/clapton-2.jpg' width='200px'></img> |

The following table shows that a distance threshold of `0.3` would
distinguish these two images.
distinguish these two people.
In practice, further experimentation should be done on the distance threshold.
On our LFW experiments, a threshold of `0.70` (TODO: Update when used on the final model)
gave the best accuracy on 8 out of 10 experiments.
On our LFW experiments, the best accuracy is 0.71 &plusmn; 0.027,
see [accuracies.txt](evaluation/lfw.nn4.v1.epoch-177/accuracies.txt).

| Image 1 | Image 2 | Distance |
|---|---|---|
Expand All @@ -118,7 +132,6 @@ gave the best accuracy on 8 out of 10 experiments.
| Lennon 2 | Clapton 2 | 1.322 |
| Clapton 1 | Clapton 2 | 0.174 |


# Cool demos, but I want numbers. What's the accuracy?
Even though the public datasets we trained on have orders of magnitude less data
than private industry datasets, the accuracy is remarkably high and
Expand All @@ -129,9 +142,10 @@ benchmark.
We had to fallback to using the deep funneled versions for
152 of 13233 images because dlib failed to detect a face or landmarks.

[TODO: Add final ROC Curve]
![](images/nn4.v1.lfw.roc.png)

This can be generated with the following commands from the root `facenet`
This can be generated with the following commands from the root
`openface`
directory, assuming you have downloaded and placed the raw and
deep funneled LFW data from [here](http://vis-www.cs.umass.edu/lfw/)
in `./data/lfw/raw` and `./data/lfw/deepfunneled`.
Expand All @@ -143,14 +157,14 @@ in `./data/lfw/raw` and `./data/lfw/deepfunneled`.
Fallback to deep funneled versions for images that dlib failed
to align:
`./util/align-dlib.py data/lfw/raw align affine data/lfw/dlib-affine-sz:96 --size 96 --fallbackLfw data/lfw/deepfunneled`
3. Generate representations with `./batch-represent/main.lua -outDir evaluation/lfw.nn4.v1.reps -model models/facenet/nn4.v1.t7 -data data/lfw/dlib-affine-sz:96`
3. Generate representations with `./batch-represent/main.lua -outDir evaluation/lfw.nn4.v1.reps -model models/openface/nn4.v1.t7 -data data/lfw/dlib-affine-sz:96`
4. Generate the ROC curve from the `evaluation` directory with `./lfw-roc.py --workDir lfw.nn4.v1.reps`.
This creates `roc.pdf` in the `lfw.nn4.v1.reps` directory.

# Visualizing representations t-SNE
[t-SNE](http://lvdmaaten.github.io/tsne/) is a dimensionality
reduction technique that can be used to visualize the
128-dimensional features FaceNet produces.
128-dimensional features OpenFace produces.
The following shows the visualization of the three people
in the training and testing dataset with the most images.

Expand All @@ -163,20 +177,20 @@ in the training and testing dataset with the most images.
![](images/val-tsne.png)

These can be generated with the following commands from the root
`facenet` directory.
`openface` directory.

1. Install prerequisites as below.
2. Preprocess the raw `lfw` images, change `8` to however many
separate processes you want to run:
`for N in {1..8}; do ./util/align-dlib.py <path-to-raw-data> align affine <path-to-aligned-data> --size 96 &; done`.
3. Generate representations with `./batch-represent/main.lua -outDir <feature-directory (to be created)> -model models/facenet/nn4.v1.t7 -data <path-to-aligned-data>`
3. Generate representations with `./batch-represent/main.lua -outDir <feature-directory (to be created)> -model models/openface/nn4.v1.t7 -data <path-to-aligned-data>`
4. Generate t-SNE visualization with `./util/tsne.py <feature-directory> --names <name 1> ... <name n>`
This creates `tsne.pdf` in `<feature-directory>`.

# Model Definitions
Model definitions should be kept in [models/facenet](models/facenet),
where we have provided definitions of the [nn1](models/facenet/nn1.def.lua)
and [nn4](models/facenet/nn4.def.lua) as described in the paper,
Model definitions should be kept in [models/openface](models/openface),
where we have provided definitions of the [nn1](models/openface/nn1.def.lua)
and [nn4](models/openface/nn4.def.lua) as described in the paper,
but with batch normalization and no normalization in the lower layers.

# Pre-trained Models
Expand All @@ -191,17 +205,17 @@ This model has been trained by combining the two largest (of August 2015)
publicly-available face recognition datasets based on names:
[FaceScrub](http://vintage.winklerbros.net/facescrub.html)
and [CASIA-WebFace](http://arxiv.org/abs/1411.7923).
This model was trained for TODO hours on a Tesla K40 GPU.
This model was trained for about 300 hours on a Tesla K40 GPU.

The following plot shows the triplet loss on the training
and test set.
Semi-hard triplets are used on the training set, and
random triplets are used on the testing set.

[TODO: Loss plot]
![](images/nn4.v1.loss.png)

The LFW section above shows that this model obtains a mean
accuracy of TODO with an AUC of TODO.
accuracy of 0.8483 &plusmn; 0.0172 with an AUC of 0.92.

# How long does processing a face take?
The processing time depends on the size of your image for
Expand All @@ -210,21 +224,21 @@ These only run on the CPU and take from 100-200ms to over
a second.
The neural network uses a fixed-size input and has
a more consistent runtime, almost 400ms on our 3.70 GHz CPU
and TODO ms on our Tesla K40 GPU.
and 20-40 ms on our Tesla K40 GPU.

# Usage
## Existing Models
See [util/compare.py](the image comparison demo) for a complete example
written in Python using a naive Torch subprocess to process the faces.

```Python
import facenet
from facenet.alignment import NaiveDlib # Depends on dlib.
import openface
from openface.alignment import NaiveDlib # Depends on dlib.

# `args` are parsed command-line arguments.

align = NaiveDlib(args.dlibFaceMean, args.dlibFacePredictor)
net = facenet.TorchWrap(args.networkModel, imgDim=args.imgDim, cuda=args.cuda)
net = openface.TorchWrap(args.networkModel, imgDim=args.imgDim, cuda=args.cuda)

# `img` is a numpy matrix containing the RGB pixels of the image.
bb = align.getLargestFaceBoundingBox(img)
Expand Down Expand Up @@ -263,7 +277,7 @@ Clone with `--recursive` or run `git submodule init && git submodule update`
after checking out.

## Download the models
Run `./models/get-models.sh` to download pre-trained FaceNet
Run `./models/get-models.sh` to download pre-trained OpenFace
models on the combined CASIA-WebFace and FaceScrub database.
This also downloads dlib's pre-trained model for face landmark detection.

Expand All @@ -275,13 +289,13 @@ This repo can be deployed as a container with [Docker](https://www.docker.com/)
for CPU mode:

```
sudo docker build -t facenet .
sudo docker run -t -i -v $PWD:/facenet facenet /bin/bash
cd /facenet
sudo docker build -t openface .
sudo docker run -t -i -v $PWD:/openface openface /bin/bash
cd /openface
./demos/compare.py images/examples/{lennon*,clapton*}
```

To use, place your images in `facenet` on your host and
To use, place your images in `openface` on your host and
access them from the shared Docker directory.

## By hand
Expand Down Expand Up @@ -344,7 +358,7 @@ If you want CUDA support, also install
library for face detection and alignment.

# Licensing
This source is copyright Carnegie Mellon University
The source code is copyright Carnegie Mellon University
and licensed under the [Apache 2.0 License](./LICENSE).
Portions from the following third party sources have
been modified and are included in this repository.
Expand Down
13 changes: 13 additions & 0 deletions evaluation/lfw.nn4.v1.epoch-177/accuracies.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
fold, threshold, accuracy
0, 0.71, 0.86
1, 0.71, 0.83
2, 0.77, 0.82
3, 0.70, 0.84
4, 0.68, 0.82
5, 0.71, 0.85
6, 0.69, 0.85
7, 0.71, 0.85
8, 0.71, 0.86
9, 0.67, 0.88

avg, 0.8483 +/- 0.0172
Binary file added images/nn4.v1.lfw.roc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/nn4.v1.loss.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 09e7324

Please sign in to comment.