README.md (4994B)
1 DeepMind : Teaching Machines to Read and Comprehend 2 ========================================= 3 4 This repository contains an implementation of the two models (the Deep LSTM and the Attentive Reader) described in *Teaching Machines to Read and Comprehend* by Karl Moritz Hermann and al., NIPS, 2015. This repository also contains an implementation of a Deep Bidirectional LSTM. 5 6 The three models implemented in this repository are: 7 8 - `deepmind_deep_lstm` reproduces the experimental settings of the DeepMind paper for the LSTM reader 9 - `deepmind_attentive_reader` reproduces the experimental settings of the DeepMind paper for the Attentive reader 10 - `deep_bidir_lstm_2x128` implements a two-layer bidirectional LSTM reader 11 12 ## Our results 13 14 We trained the three models during 2 to 4 days on a Titan Black GPU. The following results were obtained: 15 16 17 <table width="416" cellpadding="2" cellspacing="2"> 18 <tr> 19 <td valign="top" align="center"> </td> 20 <td colspan="2" valign="top" align="center">DeepMind </td> 21 <td colspan="2" valign="top" align="center">Us </td> 22 </tr> 23 <tr> 24 <td valign="top" align="center"> </td> 25 <td colspan="2" valign="top" align="center">CNN </td> 26 <td colspan="2" valign="top" align="center">CNN </td> 27 </tr> 28 <tr> 29 <td valign="top" align="center"> </td> 30 <td valign="top" align="center">Valid </td> 31 <td valign="top" align="center">Test </td> 32 <td valign="top" align="center">Valid </td> 33 <td valign="top" align="center">Test </td> 34 </tr> 35 <tr> 36 <td valign="top" align="center">Attentive Reader </td> 37 <td valign="top" align="center"><b>61.6</b> </td> 38 <td valign="top" align="center"><b>63.0</b> </td> 39 <td valign="top" align="center">59.37 </td> 40 <td valign="top" align="center">61.07 </td> 41 </tr> 42 <tr> 43 <td valign="top" align="center">Deep Bidir LSTM </td> 44 <td valign="top" align="center">- </td> 45 <td valign="top" align="center">- </td> 46 <td valign="top" align="center"><b>59.76</b> </td> 47 <td valign="top" align="center"><b>61.62</b> </td> 48 </tr> 49 <tr> 50 <td valign="top" align="center">Deep LSTM Reader</td> 51 <td valign="top" align="center">55.0</td> 52 <td valign="top" align="center">57.0</td> 53 <td valign="top" align="center">46</td> 54 <td valign="top" align="center">47</td> 55 </tr> 56 </table> 57 58 Here is an example of attention weights used by the attentive reader model on an example: 59 60 <img src="https://raw.githubusercontent.com/thomasmesnard/DeepMind-Teaching-Machines-to-Read-and-Comprehend/master/doc/attention_weights_example.png" width="816px" height="652px" /> 61 62 63 ## Requirements 64 65 Software dependencies: 66 67 * [Theano](https://github.com/Theano/Theano) GPU computing library library 68 * [Blocks](https://github.com/mila-udem/blocks) deep learning framework 69 * [Fuel](https://github.com/mila-udem/fuel) data pipeline for Blocks 70 71 Optional dependencies: 72 73 * Blocks Extras and a Bokeh server for the plot 74 75 We recommend using [Anaconda 2](https://www.continuum.io/downloads) and installing them with the following commands (where `pip` refers to the `pip` command from Anaconda): 76 77 pip install git+git://github.com/Theano/Theano.git 78 pip install git+git://github.com/mila-udem/fuel.git 79 pip install git+git://github.com/mila-udem/blocks.git -r https://raw.githubusercontent.com/mila-udem/blocks/master/requirements.txt 80 81 Anaconda also includes a Bokeh server, but you still need to install `blocks-extras` if you want to have the plot: 82 83 pip install git+git://github.com/mila-udem/blocks-extras.git 84 85 The corresponding dataset is provided by [DeepMind](https://github.com/deepmind/rc-data) but if the script does not work (or you are tired of waiting) you can check [this preprocessed version of the dataset](http://cs.nyu.edu/~kcho/DMQA/) by [Kyunghyun Cho](http://www.kyunghyuncho.me/). 86 87 88 ## Running 89 90 Set the environment variable `DATAPATH` to the folder containing the DeepMind QA dataset. The training questions are expected to be in `$DATAPATH/deepmind-qa/cnn/questions/training`. 91 92 Run: 93 94 cp deepmind-qa/* $DATAPATH/deepmind-qa/ 95 96 This will copy our vocabulary list `vocab.txt`, which contains a subset of all the words appearing in the dataset. 97 98 To train a model (see list of models at the beginning of this file), run: 99 100 ./train.py model_name 101 102 Be careful to set your `THEANO_FLAGS` correctly! For instance you might want to use `THEANO_FLAGS=device=gpu0` if you have a GPU (highly recommended!) 103 104 105 ## Reference 106 107 [Teaching Machines to Read and Comprehend](https://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend.pdf), by Karl Moritz Hermann, Tomáš Kočiský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman and Phil Blunsom, Neural Information Processing Systems, 2015. 108 109 110 ## Credits 111 112 [Thomas Mesnard](https://github.com/thomasmesnard) 113 114 [Alex Auvolat](https://github.com/Alexis211) 115 116 [Étienne Simon](https://github.com/ejls) 117 118 119 ## Acknowledgments 120 121 We would like to thank the developers of Theano, Blocks and Fuel at MILA for their excellent work. 122 123 We thank Simon Lacoste-Julien from SIERRA team at INRIA, for providing us access to two Titan Black GPUs. 124 125