Info |
---|
Richard Socher |
Ph.D. Thesis |
2014 |
Stanford University |
As the amount of unstructured text data that humanity produces overall and on the Internet grows, so does the need to intelligently process it and extract different types of knowledge from it. My research goal in this thesis is to develop learning models that can automatically induce representations of human language, in particular its structure and meaning in order to solve multiple higher level language tasks.
There has been great progress in delivering technologies in natural language processing such as extracting information, sentiment analysis or grammatical analysis. However, solutions are often based on different machine learning models. My goal is the development of general and scalable algorithms that can jointly solve such tasks and learn the necessary intermediate representations of the linguistic units involved. Furthermore, most standard approaches make strong simplifying language assumptions and require well designed feature representations. The models in this thesis address these two shortcomings. They provide effective and general representations for sentences without assuming word order independence. Furthermore, they provide state of the art performance with no, or few manually designed features.
The new model family introduced in this thesis is summarized under the term Recursive Deep Learning. The models in this family are variations and extensions of unsupervised and supervised recursive neural networks (RNNs) which generalize deep and feature learning ideas to hierarchical structures. The RNN models of this thesis obtain state of the art performance on paraphrase detection, sentiment analysis, relation classification, parsing, image-sentence mapping and knowledge base completion, among other tasks.
Chapter 2 is an introductory chapter that introduces general neural networks. The main three chapters of the thesis explore three recursive deep learning modeling choices. The first modeling choice I investigate is the overall objective function that crucially guides what the RNNs need to capture. I explore unsupervised, supervised and semi-supervised learning for structure prediction (parsing), structured sentiment prediction and paraphrase detection.
The next chapter explores the recursive composition function which computes vectors for longer phrases based on the words in a phrase. The standard RNN composition function is based on a single neural network layer that takes as input two phrase or word vectors and uses the same set of weights at every node in the parse tree to compute higher order phrase vectors. This is not expressive enough to capture all types of compositions. Hence, I explored several variants of composition functions. The first variant represents every word and phrase in terms of both a meaning vector and an operator matrix. Afterwards, two alternatives are developed: The first conditions the composition function on the syntactic categories of the phrases being combined which improved the widely used Stanford parser. The most recent and expressive composition function is based on a new type of neural network layer and is called a recursive neural tensor network.
The third major dimension of exploration is the tree structure itself. Variants of tree structures are explored and assumed to be given to the RNN model as input. This allows the RNN model to focus solely on the semantic content of a sentence and the prediction task. In particular, I explore dependency trees as the underlying structure, which allows the final representation to focus on the main action (verb) of a sentence. This has been particularly eective for grounding semantics by mapping sentences into a joint sentence-image vector space. The model in the last section assumes the tree structures are the same for every input. This proves effective on thetask of 3d object classification.