encoding.tex (10703B)
1 \section{Encoding Relation Extraction as a Graph Problem} 2 \label{sec:graph:encoding} 3 In this section, we describe how to frame the relation extraction problem as a problem on graphs. 4 In particular, we describe the structure of an attributed multigraph which is a generalization of the simple undirected graph defined in the previous paragraph. 5 This structure is needed to model entities linked by multiple relations or sentences since this can't readily be done with a simple graph. 6 7 \begin{marginparagraph} 8 The distinction between \(E\) and \(\entitySet\) is important. 9 We decided to keep the usual \(G=(V, E)\) notation for undirected graphs. 10 However, the multigraph we describe in this section has the set of entities \(\entitySet\) as vertices. 11 This set \(\entitySet\) takes the place of \(V\); despite the similar notation, it has nothing to do with \(E\). 12 \end{marginparagraph} 13 Since a knowledge base relation can be formally defined as a set of entity pairs (Section~\ref{sec:context:relation algebra}), we can represent it using a single graph \(G=(V, E)\) where \(V\) is the set of entities (\(V=\entitySet\)) and \(E\) is the set of pairs linked by the relation (\(E\in\relationSet\)). 14 However, to encode the relation extraction task on a graph, we need different kinds of edges. 15 We, therefore, use the structure of an attributed% 16 \sidenote{ 17 The term ``\emph{labeled}'' is usually reserved for graphs where the domain of attributes is discrete and finite. 18 However the set of possible sentences \(\sentenceSet\) is not (theoretically) finite. 19 } 20 multigraph \(G=(\entitySet, \arcSet, \gfendpoints, \gfrelation, \gfsentence)\) where:% 21 \sidenote{ 22 To be perfectly formal, \(G\) should also depend on \(\sentenceSet\) and \(\relationSet\), the co-domains of \(\gfsentence\) and \(\gfrelation\). 23 We omit them for conciseness. 24 } 25 \begin{itemize}[nosep] 26 \item \(\entitySet\) is the set of entities, which corresponds to the vertices of \(G\) (indeed \(\entitySet = V\)), 27 \item \(\arcSet\) is the set of arcs, which represent a directed% 28 \sidenote{ 29 We use the word \emph{edge} to refer to a symmetric connection \(\{u, v\}\), while \emph{arc} refers to an asymmetric connection \((u, v)\). 30 Using this nomenclature, an undirected graph has \emph{edges} while a directed graph has \emph{arcs}. 31 } 32 link (usually a sentence) between two entities (this approximately corresponds to the set of edges \(E\) in a simple graph, but can also be seen as equivalent to a supervised set of samples \(\dataSet_\relationSet\)), 33 \item \(\gfsource: \arcSet\to \entitySet\) assigns to each arc its source vertex (the entity \(e_1\)), 34 \item \(\gftarget: \arcSet\to \entitySet\) assigns to each arc its target vertex (the entity \(e_2\)), 35 \item \(\gfsentence: \arcSet\to \sentenceSet\) assigns to each arc \(a\in \arcSet\) the corresponding sentence containing \(\gfsource(a)\) and \(\gftarget(a)\), 36 \item \(\gfrelation: \arcSet\to \relationSet\) assigns to each arc \(a\in\arcSet\) the relation linking the two entities conveyed by \(\gfsentence(a)\). 37 \end{itemize} 38 39 In this graph, the vertices are entities with an arc linking them for each sentence in which they both appear. 40 Figure~\ref{fig:graph:samples example} shows the graph corresponding to the sentences in Table~\ref{tab:relation extraction:supervised samples}. 41 Let's call \(a\in\arcSet\) the highlighted bottom left arc in Figure~\ref{fig:graph:samples example} linking \textsc{smersh} to counterintelligence. 42 Applying the above definitions to this arc we have: 43 \begin{itemize}[nosep] 44 \item \(\gfsource(a) = \textsc{smersh}\) (\wdent{158363}) 45 \item \(\gftarget(a) = \text{counterintelligence}\) (\wdent{501700}) 46 \item \(\rlap{\(\gfsentence(a)\)}\hphantom{\gfsource(a)} = \parbox[t]{90mm}{\hbadness=2000\relax% Can't do better :'( 47 In its \utail{counter-espionage} and counter-intelligence roles, \uhead{\textsc{smersh}} appears to have been extremely successful throughout World War II.}\) 48 \item \(\rlap{\(\gfrelation(a)\)}\hphantom{\gfsource(a)} = \textsl{field of work}\) (\wdrel{101}) 49 \end{itemize} 50 \begin{marginparagraph}[-16mm] 51 Remember that \(\sentenceSet\) is not simply a set of regular sentences but a set of sentences with two tagged and ordered entities. 52 \end{marginparagraph} 53 \begin{figure*} 54 \centering 55 \input{mainmatter/graph/samples example.tex} 56 \scaption[Multigraph construction example.]{ 57 Multigraph \(G\) corresponding to the four samples of Table~\ref{tab:relation extraction:supervised samples}. 58 For each arc \(a\), its relation \(\gfrelation(a)\) is written over the arc, and the beginning of the conveying sentence \(\gfsentence(a)\) is written under the arc. 59 For ease of reading, surface forms are given instead of numerical identifiers. 60 The highlighted arc corresponds to the example given above. 61 \label{fig:graph:samples example} 62 } 63 \end{figure*} 64 65 In the supervised relation extraction task, the set of relations \(\relationSet\) is fully known, and \(\gfrelation\) is partially known; the goal is to complete \(\gfrelation\). 66 In the unsupervised relation extraction task, \(\relationSet\) is unknown, and \(\gfrelation\) must be built from the ground up. 67 We can also encode a knowledge base using this structure by removing the associated sentences (i.e.~the \(\gfsentence\) attributes).% 68 \sidenote[][1cm]{ 69 Indeed, in this case, the graph is simply a set of entities linked by relation arcs such as \inlineArc[\textsl{capital of}]{\text{Sanaa}}{\text{Yemen}}. 70 } 71 72 Note that the graph \(G\) is directed because most relations and sentences are asymmetric (inverting the two entities changes the meaning). 73 This is the only semantic associated with orientation.% 74 \sidenote[][12mm]{ 75 For example, while the notion of sink---a vertex with no outgoing arcs---might be of interest to graph theorists, it bears no special meaning in our encoding. 76 } 77 In the unsupervised setting, when the graph is not labeled with relations, each arc \inlineArc[s]{u}{v} has a symmetric arc \inlineArc*[\breve{s}]{u}{v} where \(\breve{s}\in\sentenceSet\) is the same sentence as \(s\in\sentenceSet\) with the tags \uhead{\null\kern3mm} and \utail{\null\kern3mm} inverted. 78 79 For ease of notation, let us define the incident function \(\gfincidents\) associating to each vertex its set of incident arcs \(\gfincidents(e) = \left\{ a\in\arcSet \middlerel{|} \gfsource(a)=e \lor \gftarget(a)=e \right\}\). 80 In other words, \(\gfincidents\) associates to each entity the set of samples in which it appears. 81 Furthermore, for each relation \(r\in \relationSet\), we define the relation graphs \(G_\gfsr = (\entitySet, \arcSet_\gfsr, \gfsource, \gftarget, \gfrelation, \gfsentence)\) where \(\arcSet_\gfsr = \{ a\in \arcSet \mid \gfrelation(a) = r \}\) is the set of arcs labeled with relation \(r\). 82 We can then define the out-neighbors \(\gfneighborsrr\) and in-neighbors \(\gfneighborsrl\) functions on the relation graph \(G_\gfsr\) as follows:% 83 \sidenote{ 84 Note that the functions we define here are for the open neighborhood. 85 This means that we don't consider a vertex to be its own neighbor. 86 } 87 \begin{align*} 88 \gfneighborsrr(e_1) & = \left\{\, e_2\in\entitySet \middlerel{|} \exists a\in \arcSet : \gfsource(a)=e_1 \land \gftarget(a)=e_2 \land \gfrelation(a)=r \,\right\}, \\ 89 \gfneighborsrl(e_1) & = \left\{\, e_2\in\entitySet \middlerel{|} \exists a\in \arcSet : \gftarget(a)=e_1 \land \gfsource(a)=e_2 \land \gfrelation(a)=r \,\right\}. 90 \end{align*} 91 Using these definitions we can write expressions for the generic neighbors function: 92 \begin{align*} 93 \gfneighbors_\gfsr(e) & = \gfneighborsrr(e) \cup \gfneighborsrl(e), \\ 94 \gfneighbors(e) & = \bigcup_{r\in\relationSet} \gfneighbors_\gfsr(e). 95 \end{align*} 96 Finally, we can define the degree of a vertex as its number of neighbors: 97 \begin{equation*} 98 \gfdegree(e) = |\gfneighbors(e)|, 99 \end{equation*} 100 which can be broken down into in-degree and out-degree using in-neighbors and out-neighbors. 101 102 \leavevmode 103 \begin{marginparagraph}[-15mm] 104 Since we mention several hypotheses, we take this opportunity to remind the reader that all assumptions are detailed in Appendix~\ref{chap:assumptions}. 105 \end{marginparagraph} 106 Using these notations we can reformulate modeling assumptions such as \hypothesis{biclique} (Section~\refAssumptionSection{biclique}), \hypothesis{1-adjacency} (Section~\refAssumptionSection{oneadjacency}) and \hypothesis{\(1\to1\)} (Section~\refAssumptionSection{onetoone}). 107 For example, the hypothesis \hypothesis{biclique} draw its name from the fact that for all relation \(r\in\relationSet\), the relation graph \(G_\gfsr\) is assumed to be a biclique.% 108 \sidenote[][-16mm]{ 109 A biclique is a \emph{complete bipartite graph}. 110 Its vertices can be split into two sets \(A, B\subseteq\entitySet\) such that each vertex in \(A\) is linked to all vertices in \(B\). 111 For example: 112 \begin{center} 113 \input{mainmatter/graph/biclique.tex} 114 \end{center} 115 } 116 This is especially of interest to study matching the blanks (\textsc{mtb}, Section~\ref{sec:relation extraction:mtb}). 117 It can be analyzed using the following graph:% 118 \begin{center} 119 \begin{tikzpicture} 120 \node (e1) {\(e_1\)}; 121 \node[right=of e1] (e2) {\(e_2\)}; 122 \node[left=of e1] (e3) {\(e_3\)}; 123 \draw[arrow] (e1) to node[midway,above] {\(r_3\)} (e3); 124 \draw[arrow] (e1) to[bend left=30] node[midway,above] {\(r_1\)} (e2); 125 \draw[arrow] (e1) to[bend right=30] node[midway,below] {\(r_2\)} (e2); 126 \end{tikzpicture} 127 \end{center} 128 \textsc{mtb} makes two main assumptions: \hypothesis{1-adjacency} and \hypothesis{\(1\to1\)}. 129 In the above graph, \hypothesis{1-adjacency} implies that \(r_1\) and \(r_2\) should be the same, while \hypothesis{\(1\to1\)} implies that \(r_3\) should be different from \(r_1\) and \(r_2\). 130 From this simple example, we can also see that \textsc{mtb} training is 1-localized, which means that it only exploits the fact that two samples are direct neighbors.% 131 \sidenote[][-16mm]{ 132 Here we use \emph{neighbors} as in ``arc-neighbors.'' 133 This is a relation between two arcs sharing a common endpoint. 134 Arc-neighbors are simple neighbors in the line graph described in Section~\ref{sec:graph:topological features}. 135 } 136 In contrast, a sentential approach is 0-localized; it completely ignores other samples. 137 This is actually the case of \textsc{mtb} during evaluation. 138 The same problem plagues the fill-in-the-blank model of Chapter~\ref{chap:fitb}; while training is influenced by the direct neighbors (through the entity embeddings), when classifying an unknown sample, its neighbors are ignored. 139 The goal of this chapter is to consider larger neighborhoods both for training unsupervised models and for making predictions with them.