## Reasoning on representations learnt by neural networks

### Problem

Given a set of entities

In [1], Tomas Mikolov exhibits figure 1 which nicely shows what Antoine Bordes exploits in [2] : while learning embeddings for words, relations between them appear as translations.

In [2], Antoine Bordes proposes the following model : entities and relations are represented as

A nice example of dataset is Freebase, it shows that a solution to this problem could be used as an inference engine to solve general question/answer.

### Solutions

Reimplementation : Link to the repository.

#### Other transformations

The first tentative was to replace the translation

Name | Equation | 2D Illustration | Similarity | FB15k | |||
---|---|---|---|---|---|---|---|

Micro | Macro | ||||||

Mean | Top10 | Mean | Top10 | ||||

Translation | L1 | 233.61 | 36.74 | 361.27 | 32.98 | ||

L2 | 309.67 | 29.85 | 257.22 | 46.79 | |||

Cosine | 690.36 | 14.37 | 690.95 | 19.64 | |||

Point reflection | L1 | 298.27 | 31.91 | 626.19 | 32.00 | ||

L2 | 419.04 | 23.75 | 582.00 | 39.52 | |||

Cosine | 1564.66 | 16.43 | 2244.51 | 21.48 | |||

Reflection | $$x\; -\; 2\; (x\xb7r)/(r\xb7r)\; r$$ | L1 | 249.16 | 29.27 | 269.25 | 29.15 | |

L2 | 378.53 | 21.45 | 456.99 | 27.92 | |||

Cosine | 401.18 | 20.24 | 383.67 | 25.43 | |||

Offsetted reflection | $$x\; -\; 2\; (x\xb7r\u2080\; -\; r\u2081)/(r\u2080\xb7r\u2080)\; r\u2080$$ | L1 | 251.96 | 33.26 | 275.61 | 29.51 | |

L2 | 431.67 | 20.87 | 412.07 | 28.00 | |||

Cosine | 408.90 | 20.89 | 328.66 | 30.33 | |||

Anisotropic Scaling | L1 | 258.74 | 33.91 | 459.69 | 39.71 | ||

L2 | 550.67 | 18.48 | 1150.81 | 18.89 | |||

Cosine | 470.64 | 14.41 | 987.91 | 15.36 | |||

Homotheties | $$r\u2080+r\u2081(x-r\u2080)$$ | L1 | 400.44 | 23.67 | 627.74 | 27.63 | |

L2 | 501.25 | 21.17 | 750.74 | 31.90 | |||

Cosine | 403.52 | 26.89 | 938.44 | 36.51 | |||

Anisotropic homotheties | $$r\u2080+r\u2081\odot (x-r\u2080)$$ | L1 | 268.54 | 32.53 | 472.43 | 39.48 | |

L2 | 457.00 | 21.51 | 760.54 | 30.35 | |||

Cosine | 434.15 | 19.89 | 752.71 | 24.93 | |||

Element-wise affine | $$r\u2080\odot x+r\u2081$$ | None | L1 | 262.95 | 33.21 | 420.30 | 40.09 |

L2 | 417.50 | 22.87 | 692.14 | 32.32 | |||

Cosine | 401.14 | 21.20 | 738.38 | 26.19 |

Where

FB15k is a subset of Freebase with 14951 entities, 1345 relations and 483142 training triplets.

When presented with a training triplet, the left and right score are computed for all 14951 entities, the scores are then sorted and we report as micro mean the mean rank of the correct entity and as micro top10 the percentage of correct entity placed in the top 10. The macro mean is the mean mean rank over all relation : the mean rank is computed for each relation and then the mean over all relation's mean rank is taken (the top10 is defined similarly).

#### Combining models

The relations in the dataset are hard to analyse since we only have a subset of the graph defining it, however it seems natural that the optimal class of transformation depends on the relation. A first approach was implemented with a product of experts / mixture of models kind of model.

Results :

Models | Composition | FB15k | |||
---|---|---|---|---|---|

Micro | Macro | ||||

Mean | Top10 | Mean | Top10 | ||

All L1 | 185.82 | 47.20 | 111.36 | 60.88 |

Still working on it. See [3].

### References

- [1]
- Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality."
*Advances in Neural Information Processing Systems*. . - [2]
- Bordes, Antoine, et al. "Translating embeddings for modeling multi-relational data."
*Advances in Neural Information Processing Systems*. . - [3]
- Hinton, Geoffrey E. "Training products of experts by minimizing contrastive divergence."
*Neural computation*14.8 (): 1771-1800.