diff --git a/README.md b/README.md
index 7a98a446..4955e8eb 100644
--- a/README.md
+++ b/README.md
@@ -223,6 +223,30 @@ model = TransformerWrapper(
 )
 ```
 
+### Transformers Without Tears
+
+<img src="./images/scalenorm.png"></img>
+
+https://arxiv.org/abs/1910.05895
+
+They experiment with alternatives to Layer normalization and found one that is both effective and simpler. Researchers have shared with me this leads to faster convergence.
+
+```python
+import torch
+from x_transformers import TransformerWrapper, Decoder, Encoder
+
+model = TransformerWrapper(
+    num_tokens = 20000,
+    max_seq_len = 1024,
+    attn_layers = Decoder(
+        dim = 512,
+        depth = 6,
+        heads = 8,
+        use_scalenorm = True # set to true to use for all layers
+    )
+)
+```
+
 ## Todo
 
 To be explained and documented
diff --git a/images/scalenorm.png b/images/scalenorm.png
new file mode 100644
index 00000000..907f3a3c
Binary files /dev/null and b/images/scalenorm.png differ