introduction.tex (4609B)
1 As we showcase in the last chapter, the relational semantics we are trying to model is challenging to capture in an unsupervised fashion. 2 The information available in each sentence is scarce. 3 To alleviate this problem, we can take a holistic approach by explicitly modeling the relational information at the dataset level, similarly to the aggregate approaches discussed in Section~\ref{sec:relation extraction:aggregate}. 4 The information encoded in the structure of the dataset can be modeled using a graph \parencitex{graphie}. 5 In this chapter, we propose a graph-based unsupervised aggregate relation extraction method to exploit the signal in the dataset structure explicitly. 6 7 Since we model dataset-level information, we need to place ourselves in the aggregate setup (Section~\ref{sec:relation extraction:definition}) as defined by Equation~\ref{eq:relation extraction:aggregate definition}. 8 As a reminder, the aggregate setup is in opposition to the sentential setup used in the previous chapter. 9 In the sentential setup, we process sentences independently. 10 In contrast, in the aggregate setup, we consider all the samples \(\dataSet\subseteq\sentenceSet\times\entitySet\) jointly to extract knowledge base facts \(\kbSet\subseteq\entitySet\times\relationSet\), without necessarily mapping each individual sample to a fact. 11 We already introduced two aggregate supervised relation extraction approaches relying on graph modeling, label propagation (Section~\ref{sec:relation extraction:label propagation}) and \textsc{epgnn} (Section~\ref{sec:relation extraction:epgnn}). 12 The latter uses a spectral graph convolutional network (\textsc{gcn}). 13 \textsc{gcn}s are the main contribution coming from a recent resurgence of interest in graph-based approaches through the use of deep learning methods. 14 It has been shown that these methods share some similarities with the Weisfeiler--Leman isomorphism test \parencitex{gcn_spectral_semi}. 15 A graph isomorphism test attempts to decide whether two graphs are identical. 16 To this end, it assigns a color to each element, classifying it according to its neighborhood. 17 Coupled with the assumption that sentences conveying similar relations have similar neighborhoods, this closely relates the isomorphism problem to unsupervised relation extraction. 18 However, unsupervised \textsc{gcn}s are usually trained by assuming that neighboring samples have similar representations, completely discarding the characteristic of the Weisfeiler--Leman algorithm that makes it interesting from a relation extraction point of view. 19 In this chapter, we propose alternative training objectives of unsupervised graph neural networks for relation extraction. 20 21 In Section~\ref{sec:graph:encoding}, we see how to extend the definition of a simple graph to model a relation extraction problem. 22 We then provide some statistics on the \textsc{t-re}x dataset in Section~\ref{sec:graph:analysis}. 23 The results support that large amount of information can be leveraged from topological features for the relation extraction problem. 24 In Section~\ref{sec:graph:related work}, we take a quick tour of graph neural networks (\textsc{gnn}) and the Weisfeiler--Leman isomorphism test. 25 Most \textsc{gnn}s apply to simple undirected graphs, whereas we need a more complex structure to encode the relation extraction task. 26 While most models, such as \textsc{epgnn}, try to adapt the encoding of relation extraction to simple undirected graphs, in Section~\ref{sec:graph:approach}, we propose to adapt existing \textsc{gnn} methods to the richer structure needed to fully capture the relation extraction problem. 27 Finally, Section~\ref{sec:graph:experiments} presents the experimental results of the proposed approaches. 28 29 \paragraph{Notations used in this chapter.} 30 A simple undirected graph is defined as a tuple \(G=(V, E)\) where \(V\) is a set of \(n\) vertices and \(E\) is a set of \(m\) edges.% 31 \sidenote{ 32 In a simple graph, we always have \(m\leq n (n-1)\) which tightens to \(m\leq n (n-1) \divslash 2\) for undirected ones. 33 } 34 An edge \(\{u, v\}\in E\) connects two vertices \(u, v\in V\), which are then said to be \emph{neighbors}. 35 We use \(\gfneighbors: V\to 2^V\) to denote the function which associates to each vertex the set of its neighbors \(\gfneighbors(u)=\{v\in V\mid \exists \{u, v\}\in E\}\). 36 Alternatively, a graph \(G\) can be represented by its adjacency matrix \(\mtrx{M}\in\{0, 1\}^{n\times n}\), with \(m_{uv}=1\) if \(\{u, v\}\in E\) and \(m_{uv}=0\) otherwise. 37 A graph is said to encode an adjacency relation on its vertices, which foreshadows the remainder of this chapter.