PhD

The LaTeX sources of my Ph.D. thesis
git clone https://esimon.eu/repos/PhD.git
Log | Files | Refs | README | LICENSE

introduction.tex (3309B)


      1 Language conveys meaning.
      2 Thus, it should be possible to explicitly map a text to its semantic content.
      3 The research reported in this thesis seeks to algorithmically extract meaning conveyed by language using deep learning techniques from the information extraction and natural language processing (\textsc{nlp}) fields.
      4 We focus on the task of relation extraction, in which we seek to extract the semantic relation conveyed by a sentence.
      5 For example, given the sentence ``Paris is the capital of France,'' we seek to extract the relation ``\textsl{capital of}.''
      6 To build a formal representation of relations, we use knowledge bases.
      7 In their simplest form, knowledge bases encode knowledge as a set of facts, which take the form \((\text{entity}, \text{relation}, \text{entity})\) such as \((\text{Paris}, \textsl{capital of}, \text{France})\).
      8 Like natural languages, knowledge bases purpose to convey meaning%
      9 \sidenote{
     10 	Knowledge bases usually focus on knowledge which can be seen as a subset of all possible meanings.
     11 	For example, facts like \((\text{I}, \textsl{want}, \text{ice cream})\) are not usually encoded in knowledge bases.
     12 	However, they theoretically could.
     13 	To be precise, throughout this thesis we'll be using knowledge bases in two ways:
     14 	\begin{itemize}[nosep]
     15 		\item as a basic theoretical structured representation of meaning,
     16 		\item as a practical datasets to evaluate algorithms on.
     17 	\end{itemize}
     18 	This means that algorithms tested on existing knowledge bases are only tested on a subset of possible meanings.
     19 	However, when we discuss the representation of knowledge base facts, note that this can be generalized to any meaningful facts expressible in the knowledge base framework.
     20 	\label{note:context:knowledge vs meaning}
     21 }
     22 but in a structure that is readily manipulable by algorithms.
     23 However, most knowledge---like this thesis---comes in the form of text.
     24 There lies the usefulness of the relation extraction task on which we focus.
     25 By ``translating'' natural language into knowledge bases, we seek to make more knowledge available to algorithms.
     26 
     27 In this chapter, we focus on the two kinds of data we deal with in this thesis, namely text and knowledge bases.
     28 Subsequent chapters will deal with the extraction of knowledge base facts from text.
     29 In Section~\ref{sec:context:history}, we begin by positioning this task within the larger historical context by focusing on how the fields of machine learning, \textsc{nlp} and information extraction developed.
     30 Before delving into the specific algorithms for relation extraction, we must first define how to process language and how to represent semantic information in a way that can be manipulated by machine learning algorithms.
     31 In particular, we seek to obtain a \emph{distributed representation}---which we define in the next section---of both language and knowledge bases since deep learning algorithms cannot directly work with non-distributed representations.
     32 We first inspect the representation of words in Section~\ref{sec:context:word} before exploring how to process whole sentences in Section~\ref{sec:context:sentence}.
     33 Finally, Section~\ref{sec:context:knowledge base} focuses on knowledge bases by first giving a formal definition before studying methods for extracting distributed representations from them.